You are on page 1of 21

Mathematics behind

Machine Learning:
Linear Regression
Model

Dr Lotfi Ncib, Associate Professor Of applied mathematics Esprit School of Engineering

Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not
for any commercial business intention
What is The difference between AI, ML and DL?
• Artificial Intelligence AI tries to make computers intelligent in order to mimic
the cognitive functions of humans. So, AI is a general field with a broad scope
including:
• Computer Vision,
• Language Processing,
• Creativity…
• Machine Learning ML is the branch of AI that covers the statistical part of
artificial intelligence. It teaches the computer to solve problems by looking at
hundreds or thousands of examples, learning from them, and then using that
experience to solve the same problem in new situations:
• Regression,
• Classification,
• Clustering…
• DL is a very special field of Machine Learning where computers can actually
learn and make intelligent decisions on their own,
• CNN
• RNN…
1
Types of Machine Learning

2
Classical Machine Learning

3
What is Regression?
Regression is Supervised: Target is provided

X: Independent variable Y: dependent variable


Size (feet2) Number of Number of Age of home

Continuous variable
bedrooms floors (years) Price ($1000)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
Regression is the
1510 3 2 30 ?
process of predicting
a continuous value.
4
Types of Regression

• Simple Regression
• Simple Linear Regression Predict Price($1000) vs Size(feet2) of all houses
• Simple Non-Linear Regression.
• Multiple Regression
• Multiple Linear Regression Predict Price($1000) vs Size(feet2) and number of
• Multiple Non-Linear Regression. bedrooms

Types of Regression
One Variable 2+ Variables

Simple Multiple

Linear Non-Linear Linear Non-Linear

5
Applications of Regression

• Price estimation of house:


• size, number of bedrooms, and so on.

• Employment income:
• hours of work, education, occupation, sex age, years of
experience, and so on.

Indeed you can find many examples of the usefulness of regression


analysis in these and many other fields, or domains such as finance,
healthcare, retail, and more.

6
Exemple of Regression algorithms

We have many regression algorithms:

• Ordinal regression
• Poisson regression
• Fast forest quantile regression
• Linear, polynomial, Lasso, Stepwise, Ridge regression
• Bayesian linear regression
• Neural network regression
• Decision forest regression
• KNN
• Boosted decision tree regression

7
Simple Linear
regression
Model
representation

8
Simple Linear Regression
• Simple linear regression
• Predict Price($1000) vs Size(feet2) of all houses
• Independent variable (x): Size of house
• Dependent variable (y): Price of house
Size in feet2 (x) Price ($) in 1000 (y)
2104 460
1416 232
1534 315
852 178
1245 ?
Notation:
m = Number of training examples
x = “input” variable / features
y = “output” variable / “target” variable
9
Model representation

Choice of ℎ ?
Training Set
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥

Learning Algorithm

Size of Estimated
h
house price
hypothesis Linear regression with one variable.
Univariate linear regression.

10
Cost function

Training Set Size in feet2 (x) Price ($) in 1000 (y) Hypothesis :
2104 460 ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
1416 232
1534 315 Parameters : 𝜃0 , 𝜃1

852 178

Goal: Find regression line that makes


sum of residuals as small as possible
11
Cost function

Idea: Choose 𝜃0 , 𝜃1 so that ℎ𝜃 is close to 𝑦 for our training samples

Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥

Parameters : 𝜃0 , 𝜃1

𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1

Goal : min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1

12
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1

the vectorization expression of linear regression cost function can be denoted as:
1 1 𝑥 (1) 𝑦 (1)
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦) 𝜃0 𝑦= ⋮
𝑋= ⋮ ⋮ 𝜃=
2𝑚 (𝑚) 𝜃1 𝑦 (𝑚)
1 𝑥
1
Since is a constant, we omit this constant term. Then our cost function becomes:
2𝑚
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)

This can be further simplified as: 𝐽 𝜃 = ( 𝑋𝜃 𝑇


− 𝑦 𝑇 )(𝑋𝜃 − 𝑦)
𝑇 𝑋𝜃 − 𝑋𝜃 𝑇 𝑦 − 𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
We expand it to obtain: 𝐽 𝜃 = 𝑋𝜃

𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Or ( 𝑋𝜃 𝑇 𝑦)𝑇 = 𝑦 𝑇 (𝑋𝜃) Then 𝐽 𝜃 = 𝑋𝜃

13
Analytical Solution
Further more, we can write it as: 𝐽 𝜃 = 𝜃 𝑇 𝑋 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦

Now we need to take derivative of the cost function. For convenience, the common matrix derivative
formulas are listed as reference:
𝜕𝐴𝑋 𝜕𝑋 𝑇 𝐴 𝜕𝑋 𝑇 𝑋 𝜕𝑋 𝑇 𝐴𝑋
= 𝐴, = 𝐴, = 2𝑋, = 𝐴𝑋 + 𝐴𝑇 𝑋
𝜕𝑋 𝜕𝑋 𝜕𝑋 𝜕𝑋
Using the above formulas, we can derive our cost function respect to 𝜃 as:
𝜕𝐽 𝜃
= 2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦
𝜕𝜃
In order to solve the variables, we need to make the above derivation equal to zero, that is:
2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦 = 0 then 𝑋 𝑇 𝑋𝜃 = 𝑋 𝑇 𝑦
Thus we can compute θ as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)

14
Multiple Linear
regression
Model
representation

15
Model representation
Size (feet2) Number of Number of Age of home
bedrooms floors (years) Price ($1000)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
1510 3 2 30 ?
Notation:
m = Number of training examples
n = Number of features(variables)
𝑥 (𝑖) = “input” of the 𝑖𝑡ℎ training example
(𝑖)
𝑥𝑗 = value of feature 𝑗 in 𝑖𝑡ℎ training example

16
Model representation

Training Set Choice of h ?

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥

Learning Algorithm
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
Size of house,
Number of bedrooms, Estimated
Numbers of floors,
h
price
Age of home
hypothesis

17
Model representation
(𝑖) (𝑖) (𝑖) (𝑖) (𝑖)
Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
(𝑖)
For convenience of notation, define 𝑥0 = 1
(𝑖)
𝑥0 𝜃0
(𝑖)
𝑥1 𝜃1
𝑥 (𝑖) = 𝑥 (𝑖) 𝜖ℝ𝑛+1 , 𝜃 = 𝜃2 𝜖ℝ𝑛+1
2 ⋮
⋮ 𝜃𝑛
(𝑖)
𝑥𝑛
(𝑖) (𝑖) (𝑖) (𝑖)
ℎ𝜃 𝑥 (𝑖) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
= 𝜃 𝑇 𝑥 (𝑖)
Multivariate Linear regression
18
Cost function

Idea: Choose 𝜃0 , 𝜃1 ,… 𝜃𝑛 so that ℎ𝜃 is close to 𝑦 for our training samples


(𝑖) (𝑖) (𝑖) (𝑖)
Hypothesis : ℎ𝜃 𝑥 (𝑖) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4

Parameters : 𝜃0 , 𝜃1 ,… 𝜃𝑛
𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 =
2𝑚
෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
𝑖=1
Goal : min
𝜃0 ,𝜃1 ,… 𝜃𝑛
𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛

In order to achieve the hypothesis for all the samples we use the following equation:
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0
(2) (2) … (2) 𝜃1
ℎ𝜃 𝑥 = 𝑋𝜃 = 𝑥0 𝑥1

𝑥𝑛
⋮ ⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛
𝑥0 𝑥1 … 𝑥𝑛

19
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
the vectorization expression of linear regression cost function can be denoted as:
1
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)
2𝑚
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0 𝑦 (1)
(2) (2) … (2) 𝜃 𝑦 (2)
𝑋 = 𝑥0 𝑥1 𝑥𝑛 𝜃= 1 𝑦=
… ⋮ ⋮
⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛 𝑦 (𝑚)
𝑥0 𝑥1 … 𝑥𝑛

Thus we can compute 𝜃 as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦


- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)

20

You might also like