You are on page 1of 32

Linear Regression

Anubha Gupta, PhD.


Professor
SBILab, Dept. of ECE,
IIIT-Delhi, India
Contact: anubha@iiitd.ac.in; Lab: http://sbilab.iiitd.edu.in
Machine Learning in Hindi

Linear Regression
Machine Learning in Hindi

Learning Objectives
In this module, we will learn to
• solve a regression problem
o Using the statistical approach
o Using the data Driven approach

• Regression errors and performance metrics


• Example
Machine Learning in Hindi

Linear Regression
(using the statistical approach)
Machine Learning in Hindi

Multiple Linear Regression


• A linear regression technique that uses several (more than two) explanatory
variables to predict the outcome of a response variable.

• System of linear equations


𝑦1 = 𝜃0 + 𝜃1 𝑥1,1 + 𝜃2 𝑥1,2 + ⋯ + 𝜃𝑝−1 𝑥1,𝑝−1 + 𝜀1
𝑦2 = 𝜃0 + 𝜃1 𝑥2,1 + 𝜃2 𝑥2,2 + ⋯ + 𝜃𝑝−1 𝑥2,𝑝−1 + 𝜀2

𝑦𝑛 = 𝜃0 + 𝜃1 𝑥𝑛,1 + 𝜃2 𝑥𝑛,2 + ⋯ + 𝜃𝑝−1 𝑥𝑛,𝑝−1 + 𝜀𝑛
• Matrix Notation

1 𝑥1,1 𝑥1,2 𝑥1, 𝑝−1 𝜃0 𝜀1


⋯ 𝑥2, 𝑝−1
1 𝑥2,1 𝑥2,2 𝜃1 𝜀
[y]nx1= ⋮ ⋱ ⋮ + 2
⋮ ⋮
1 𝑥𝑛,1 𝑥𝑛,2 ⋯ 𝑥𝑛, 𝑝−1 𝜃𝑝−1 𝜀𝑛 𝑛×1
𝑛×𝑝 𝑝×1
Machine Learning in Hindi

Maximum Likelihood Estimation (MLE)

1 𝑥1,1 𝑥1,2 𝑥1, 𝑝−1 𝜃0 𝜀1


𝑥2,1 𝑥2,2 ⋯ 𝑥2, 𝑝−1
1 𝜃1 𝜀
[y]nx1= ⋮ ⋱ ⋮ + 2
⋮ ⋮
1 𝑥𝑛,1 𝑥𝑛,2 ⋯ 𝑥𝑛, 𝑝−1 𝜃𝑝−1 𝜀𝑛 𝑛×1
𝑛×𝑝 𝑝×1
𝑦1
𝑦2 𝐱1T
: 𝐱 2T
: = : 𝛉 + 𝛆
:
𝑦𝑛 𝐱 nT

𝐲 = 𝐗𝛉 + 𝛆
Machine Learning in Hindi

Maximum Likelihood Estimation (MLE)


Random variable 𝑦𝑖 = 𝐱 𝑖T 𝛉 + 𝜀𝑖

𝐸 𝑦𝑖 = 𝐸 𝐱 𝑖T 𝛉 + εi = 𝐸 𝐱 𝑖T 𝛉 + 𝐸 𝜀𝑖
𝑚𝑦 = 𝐱 𝑖T 𝛉 + 0
𝑖

2
𝐸 𝑦𝑖 − 𝑚𝑦 = 𝜎𝑦𝟐𝑖
𝑖

2
=𝐸 𝑦𝑖 − 𝐱 𝑖T 𝛉
2
𝐸 𝜀𝑖 = 𝜎2
Machine Learning in Hindi

Maximum Likelihood estimator

𝑛 2
1 𝑦 −𝐱 𝐓 𝛉
− 𝑖 𝒊2 [Using the assumptions of linear
𝐿′ 𝛉 = 𝑝 𝐲ȁ𝐗, 𝛉 = ෑ ⅇ 2𝜎
𝑖=1
2𝜋𝜎 2 regression]
𝑛

𝐿′ 𝛉 = ෑ 𝑝 𝑦𝑖ȁ𝐱 𝑖 , 𝛉 [Maximize the Likelihood function]


𝑖=1

MLE: 𝜃෠ = max 𝑝 𝐲ȁ𝐗, 𝛉


𝛉

max 𝐿′ 𝛉 = max 𝑝 𝐲ȁ𝐱, 𝛉


𝛉 𝛉

𝑛 2
1 1 𝑦𝑖 − 𝐱 𝑖T 𝛉
𝑝 𝐲ȁ𝐱, 𝛉 = 𝑛 ⅇ𝑥𝑝 − 2 ෍
2𝜋 𝜎
2 𝑛 2𝜎 2𝜎 2
𝑖=1
Machine Learning in Hindi

Maximum Likelihood estimator


𝑛 2
𝑛 1 𝑦𝑖 − 𝐱 𝑖T 𝛉 [Log likelihood
𝐿 𝛉 = log 𝑝 𝒚ȁ𝐱, 𝛉 = − log 2𝜋 2 𝜎𝑛 − ෍
2𝜎 2 2𝜎 2 function]
𝑖=1

Maximum Likelihood estimator:


Minimization of 𝐿 𝛉 with respect to 𝛉:
𝑛 2
Minimization of ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉
𝑖=1
Machine Learning in Hindi

Maximum Likelihood estimator


MLE leads to Least Squares Solutions Convex
function

𝑛 Squared
2 Error
𝜃෠𝑚𝑙𝑒 = min ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉
𝛉
𝑖=1
Min
squared error
Machine Learning in Hindi

Linear Regression Solution


𝑛
2
𝐿 𝛉 = ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉
𝑖=1
Let us consider a case with two features:

𝑦𝑖 = 𝜃0 + 𝜃1 𝑥𝑖1 + 𝜃2 𝑥𝑖2 + 𝜀𝑖

𝜃0
𝛉 = 𝜃1
𝜃2
Machine Learning in Hindi

Linear Regression proof


𝑦𝑖 = 𝜃0 + 𝜃1 𝑥𝑖1 + 𝜃2 𝑥𝑖2 + 𝜀𝑖

2 2 2
𝐿 𝛉 = 𝑦1 − 𝜃0 − 𝑥11 𝜃1 − 𝑥12 𝜃2 + 𝑦2 − 𝜃0 − 𝑥21 𝜃1 − 𝑥22 𝜃2 + ….. + 𝑦n − 𝜃0 − 𝑥𝑛1 𝜃1 − 𝑥𝑛2 𝜃2

𝜕𝐿 𝛉
= 2 𝑦1 − 𝜃0 − 𝑥11 𝜃1 − 𝑥12 𝜃2 −1 + 2 𝑦2 − 𝜃0 − 𝑥21 𝜃1 − 𝑥22 𝜃2 −1 + ….. +
𝜕𝜃0
2 𝑦n − 𝜃0 − 𝑥𝑛1 𝜃1 − 𝑥𝑛2 𝜃2 −1

𝜕𝐿 𝛉 𝑛
= 2෌𝑖=1 𝑦𝑖 − 𝐱 𝑖T 𝛉 −1
𝜕𝜃0
Machine Learning in Hindi

Linear Regression proof


𝜕𝐿 𝛉
= 2 𝑦1 − 𝜃0 − 𝑥11 𝜃1 − 𝑥12 𝜃2 −𝑥11 + 2 𝑦2 − 𝜃0 − 𝑥21 𝜃1 − 𝑥22 𝜃2 −𝑥21
𝜕𝜃1
+ ….. + 2 𝑦n − 𝜃0 − 𝑥n1 𝜃1 − 𝑥n2 𝜃2 −𝑥n1

𝜕𝐿 𝛉 𝑛
= 2෌𝑖=1 𝑦𝑖 − 𝐱 𝑖T 𝛉 −𝑥i1
𝜕𝜃1
𝜕𝐿 𝛉 𝑛
= 2෌𝑖=1 𝑦𝑖 − 𝐱 𝑖T 𝛉 −𝑥i2
𝜕𝜃2

𝜕𝐿 𝜃
𝜕𝜃0
𝜕𝐿 𝜃 𝑛 −1
𝛻𝐿 𝛉 = 𝜕𝜃1 = 2෌𝑖=1 𝑦𝑖 − 𝐱 𝑖T 𝛉 −𝑥i1 = 0
𝜕𝐿 𝜃 −𝑥i2
𝜕𝜃2

𝑛
− 2෌𝑖=1 𝑦𝑖 − 𝐱𝑖T 𝛉 𝐱 𝑖 = 0
Machine Learning in Hindi

Closed form solution


𝑛
− 2෌𝑖=1 𝑦𝑖 − 𝐱 𝑖T 𝛉 𝐱 𝑖 = 0

𝑛 𝑛

෍ 𝑦𝑖 𝐱𝑖 = ෍ 𝐱𝑖 𝐱𝑖T 𝛉
𝑖=1 𝑖=1

𝑦1 𝐱1 + 𝑦2 𝐱 2 + ⋯ + 𝑦𝑛 𝐱 n = 𝐱1 𝐱1T 𝛉 + 𝐱 2 𝐱 2T 𝛉 + ⋯ + 𝐱 𝑛 𝐱 𝑛T 𝛉

𝐗𝐓 𝐲 = 𝐗𝐓 𝐗 𝛉
−1 𝐓
𝛉 = 𝐗𝐓𝐗 𝐗 𝐲
Machine Learning in Hindi

Closed form solution


𝑛 2
min ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉 ⟹ min 𝐲 − 𝐗𝛉 T
𝐲 − 𝐗𝛉 ⟹ min 𝐲 − 𝐗𝛉 2
2
𝛉 𝑖=1 𝛉 𝛉

[𝐗]𝑛x𝑝 = tall matrix

−1 T
𝐗† = 𝐗T 𝐗 𝐗 𝛉 = 𝐗†𝐲

If X is a square invertible matrix

−1 T
𝐗 † = 𝐗 −1 𝐗 T 𝐗 𝛉 = 𝐗†𝐲

= 𝐗 −1 𝐗 −1 𝐗 T 𝛉 = 𝐗 −1 𝐲

= 𝐗 −1 𝐲 = 𝐗𝛉
Machine Learning in Hindi

Geometric Interpretation: MLE


𝐱1T
𝐱 2T 1 … 𝑥𝑝−1,1
𝐗= : = ⋮ ⋱ ⋮ = 𝐳1 𝐳2 𝐳3 … 𝐳𝑝 y
𝜺 𝜺𝟐
𝜺1
: 1 ⋯ 𝑥𝑝−1,𝑛
𝐱 𝑛T ෝ
𝒚
T
𝐗 𝛆=0
𝐗 T 𝐲 − 𝐗𝛉 = 0
𝐗 T 𝐲 − 𝐗 T 𝐗𝛉 = 0
𝐗T 𝐗 𝛉 = 𝐗T 𝐲
−1 T
𝛉 = 𝐗T 𝐗 𝐗 𝐲 = 𝐏𝐲

P is the projection matrix which is the orthogonal


projection of y on the data space.
Machine Learning in Hindi

Drawbacks of MLE
1. Sensitivity to outliers: MLE assumes a specific probability distribution for the data which
can produce biased estimates if the data contains outliers.
2. Lack of robustness: Small changes in the data can lead to large changes in the estimates of
the parameters. This can be a problem if the data is noisy or contains errors.
3. Overfitting: MLE can suffer from overfitting if the model is too complex or if the data is
limited. Overfitting occurs when the model fits the noise in the data rather than the
underlying pattern, which can lead to poor generalization performance on new data.
4. Computational complexity: MLE can be computationally complex, especially for high-
dimensional data or complex models. This can make it difficult to estimate the parameters or
to perform inference on the model.
Linear Regression
(Data Driven Approach)
Machine Learning in Hindi

Linear Regression Solution


𝑛
2
𝐿 𝛉 = ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉
𝑖=1
𝑛 2
min ෍ 𝑦𝑖 − 𝐱 𝑖T 𝛉 ⟹ min 𝐲 − 𝐗𝛉 T
𝐲 − 𝐗𝛉 ⟹ min 𝐲 − 𝐗𝛉 2
2
𝛉 𝑖=1 𝛉 𝛉

𝛉 = 𝐗†𝐲

where

−1 T
𝐗† = 𝐗T 𝐗 𝐗
Linear Regression
(Types of Errors and Performance Metrics)
Machine Learning in Hindi

Types of Errors and Performance Metrics


• Regression errors:
o Sum of Squares Total (SST) / Total Squared Error (TSS)
o Sum of Squares due to Regression (SSR)
o Sum of Squares error (SSE)

• Performance Metrics:

o R-squared score

o Mean Absolute error

o Mean Squared error

o Root Mean Squared error


Machine Learning in Hindi

Regression
line
Sum of Squares
due to Residual
Sum of Squares
(SSR)
Total (SST)

Sum of Squares
Error (SSE)

Mean of Response
Variable = mean of
yis
Y

X
Machine Learning in Hindi

Regression Errors
➢ Sum of Squares Total (SST) / Total Squared Error (TSS)
𝑛
2
SST = ෍ 𝑦𝑖 − 𝑚
𝑖=1
➢ Sum of Squares due to Regression (SSR)/ Residual Sum of Squares (RSS)
𝑛
2
SSR = ෍ 𝑦𝑖 − 𝑦ෝ𝑖
𝑖=1
➢ Sum of Squares error (SSE) / Explained Sum of Squares (ESS)
𝑛
2
SSE = ෍ 𝑚 − 𝑦ෝ𝑖
𝑖=1

𝑚 = σ𝑛𝑖=1 𝑦𝑖 : mean value of the various actual values


Machine Learning in Hindi

Sum of Squares
Total (SST) Regression
line
Sum of Squares
due to Residual
(SSR)

Sum of Squares
Error (SSE)

Mean of Response
Variable = sample mean
of yis
Y

X
Machine Learning in Hindi

Performance Metric: R-Squared (𝑅2 ) Error


• Also known as the Coefficient of Determination, it is given by
𝑆𝑆𝑅 𝑆𝑆𝐸
R2 = 1 − =
𝑆𝑆𝑇 𝑆𝑆𝑇

• The measure of the proportion of the variance in the dependent variable that can be
explained by the independent variables in the model.
o 𝑅2 = 1 ⟹ the model explains all the variance in the dependent variable
o 𝑅2 = 0 ⟹ the model does not explain any of the variance in the dependent variable

• Measures the goodness of fit or best-fit-line


Machine Learning in Hindi

Performance Metric: Mean Absolute Error


• The measure of average absolute difference between the predicted and actual values
of a set of observations 𝑛
1
MAE = ෍ ȁ𝑦𝑖 − 𝑦ෝ𝑖 ȁ
𝑛
𝑖=1
where, 𝑦𝑖 : actual value of the i𝑡ℎ observation
𝑦ෝ𝑖 : predicted value of the i𝑡ℎ observation
𝑛 : total number of observations
• A lower MAE value indicates better performance of the model because it implies that
the predictions are closer to the actual values
• Advantages:
o Gives an idea of how far off the predictions are from the
actual values, in the same units as the original data
o Less sensitive to outliers than some other metrics, such as
Mean Squared Error (MSE)
o Treats all errors (+ve or –ve) equally

• Disadvantages:
o Gives equal weight to all errors, regardless of their values
Machine Learning in Hindi

Performance Metric: Mean Squared Error


• The measure of average squared difference between the predicted and actual values
of a set of observations 𝑛
1 2
MSE = ෍ 𝑦𝑖 − 𝑦ෝ𝑖
𝑛
𝑖=1
where, 𝑦𝑖 : actual value of the i𝑡ℎ observation
𝑦ෝ𝑖 : predicted value of the i𝑡ℎ observation
𝑛 : total number of observations
• Advantages:
o Gives an idea of how far off the predictions are from the actual values
o Penalizes large errors more heavily than small errors

• Disadvantages:
o Sensitive to outliers (a few large errors can have a
disproportionate impact on the metric)
Machine Learning in Hindi

Performance Metric: Root Mean Squared Error


• The measure of square root of the average squared difference between the predicted
and actual values of a set of observations
𝑛
1 2
RMSE = ෍ 𝑦𝑖 − 𝑦ෝ𝑖
𝑛
𝑖=1

where, 𝑦𝑖 : actual value of the i𝑡ℎ observation


𝑦ෝ𝑖 : predicted value of the i𝑡ℎ observation
𝑛 : total number of observations
• Advantages:
o Unlike MSE, its range is suppressed due to the square root and the
magnitude of error is comparatively lower.

• Disadvantages:
o Similar to MSE
Machine Learning in Hindi

Let us Try!
Explanatory variable Response variable
• Regression errors: (feature) (output)
• Sum of Squares Total (SST):
1 2
• Sum of Squares due to Regression (SSR):
• Sum of Squares error (SSE): 2 3

3 5
Performance Metric
• R-squared score: 4 6
• Mean Absolute error: 5 8
• Mean Squared error:
• Root Mean Squared error:
Machine Learning in Hindi

Let us Try!
Explanatory variable Response variable
• Regression errors: (feature) (output)
• Sum of Squares Total (SST): 22.8
1 2
• Sum of Squares due to Regression (SSR): 0.3
• Sum of Squares error (SSE): 22.5 2 3

3 5
Performance Metric
• R-squared score: 0.9869 4 6
• Mean Absolute error: 0.240 5 8
• Mean Squared error: 0.060
• Root Mean Squared error: 0.245
Machine Learning in Hindi

Try it Yourself!
Question: Let there be a dataset of 50 observations, where each data point consists of a single independent
variable (𝑥) and a single dependent variable (𝑦). You want to fit a linear regression model to this data to
predict 𝑦 based on 𝑥. After performing the regression analysis, you obtain the following regression
equation: 𝑦ො = 2.5 + 1.8𝑥

Using the above information, what would be the predicted value of 𝑦 for a value of 𝑥 equal to 3.2?

The solution to this question will not be provided.


Machine Learning in Hindi

To Summarize
In this module, we discussed on
● What is linear regression?
● What are the different types of errors?
● How to solve the problem?
● We also looked at one example.

You might also like