Mathematics Behind Machine Learning:: Linear Regression Model

Mathematics behind
Machine Learning:
Linear Regression
Model
Dr Lotfi Ncib, Associate Professor Of applied mathematics Esprit School of Engineering
Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not
for any commercial business intention
What is The difference between AI, ML and DL?
• Artificial Intelligence AI tries to make computers intelligent in order to mimic
the cognitive functions of humans. So, AI is a general field with a broad scope
including:
• Computer Vision,
• Language Processing,
• Creativity…
• Machine Learning ML is the branch of AI that covers the statistical part of
artificial intelligence. It teaches the computer to solve problems by looking at
hundreds or thousands of examples, learning from them, and then using that
experience to solve the same problem in new situations:
• Regression,
• Classification,
• Clustering…
• DL is a very special field of Machine Learning where computers can actually
learn and make intelligent decisions on their own,
• CNN
• RNN…
1
Types of Machine Learning
2
Classical Machine Learning
3
What is Regression?
Regression is Supervised: Target is provided
X: Independent variable Y: dependent variable

Size (feet2) Number of Number of Age of home
Continuous variable
bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
Regression is the
1510 3 2 30 ?
process of predicting
a continuous value.
4
Types of Regression
• Simple Regression
• Simple Linear Regression Predict Price($1000) vs Size(feet2) of all houses
• Simple Non-Linear Regression.
• Multiple Regression
• Multiple Linear Regression Predict Price($1000) vs Size(feet2) and number of
• Multiple Non-Linear Regression. bedrooms
Types of Regression
One Variable 2+ Variables
Simple Multiple
Linear Non-Linear Linear Non-Linear
5
Applications of Regression
• Price estimation of house:

• size, number of bedrooms, and so on.
• Employment income:
• hours of work, education, occupation, sex age, years of
experience, and so on.
Indeed you can find many examples of the usefulness of regression

analysis in these and many other fields, or domains such as finance,
healthcare, retail, and more.
6
Exemple of Regression algorithms
We have many regression algorithms:
• Ordinal regression
• Poisson regression
• Fast forest quantile regression
• Linear, polynomial, Lasso, Stepwise, Ridge regression
• Bayesian linear regression
• Neural network regression
• Decision forest regression
• KNN
• Boosted decision tree regression
7
Simple Linear
regression
Model
representation
8
Simple Linear Regression
• Simple linear regression
• Predict Price($1000) vs Size(feet2) of all houses
• Independent variable (x): Size of house
• Dependent variable (y): Price of house
Size in feet2 (x) Price ($) in 1000 (y)
2104 460
1416 232
1534 315
852 178
1245 ?
Notation:
m = Number of training examples
x = “input” variable / features
y = “output” variable / “target” variable
9
Model representation
Choice of ℎ ?
Training Set
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Learning Algorithm
Size of Estimated
h
house price
hypothesis Linear regression with one variable.
Univariate linear regression.
10
Cost function
Training Set Size in feet2 (x) Price ($) in 1000 (y) Hypothesis :
2104 460 ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
1416 232
1534 315 Parameters : 𝜃0 , 𝜃1
852 178
Goal: Find regression line that makes

sum of residuals as small as possible
11
Cost function
Idea: Choose 𝜃0 , 𝜃1 so that ℎ𝜃 is close to 𝑦 for our training samples
Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Parameters : 𝜃0 , 𝜃1
𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
Goal : min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1
12
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
the vectorization expression of linear regression cost function can be denoted as:
1 1 𝑥 (1) 𝑦 (1)
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦) 𝜃0 𝑦= ⋮
𝑋= ⋮ ⋮ 𝜃=
2𝑚 (𝑚) 𝜃1 𝑦 (𝑚)
1 𝑥
1
Since is a constant, we omit this constant term. Then our cost function becomes:
2𝑚
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)
This can be further simplified as: 𝐽 𝜃 = ( 𝑋𝜃 𝑇

− 𝑦 𝑇 )(𝑋𝜃 − 𝑦)
𝑇 𝑋𝜃 − 𝑋𝜃 𝑇 𝑦 − 𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
We expand it to obtain: 𝐽 𝜃 = 𝑋𝜃
𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Or ( 𝑋𝜃 𝑇 𝑦)𝑇 = 𝑦 𝑇 (𝑋𝜃) Then 𝐽 𝜃 = 𝑋𝜃
13
Analytical Solution
Further more, we can write it as: 𝐽 𝜃 = 𝜃 𝑇 𝑋 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Now we need to take derivative of the cost function. For convenience, the common matrix derivative
formulas are listed as reference:
𝜕𝐴𝑋 𝜕𝑋 𝑇 𝐴 𝜕𝑋 𝑇 𝑋 𝜕𝑋 𝑇 𝐴𝑋
= 𝐴, = 𝐴, = 2𝑋, = 𝐴𝑋 + 𝐴𝑇 𝑋
𝜕𝑋 𝜕𝑋 𝜕𝑋 𝜕𝑋
Using the above formulas, we can derive our cost function respect to 𝜃 as:
𝜕𝐽 𝜃
= 2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦
𝜕𝜃
In order to solve the variables, we need to make the above derivation equal to zero, that is:
2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦 = 0 then 𝑋 𝑇 𝑋𝜃 = 𝑋 𝑇 𝑦
Thus we can compute θ as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
14
Multiple Linear
regression
Model
representation
15
Size (feet2) Number of Number of Age of home
bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
1510 3 2 30 ?
Notation:
m = Number of training examples
n = Number of features(variables)
𝑥 (𝑖) = “input” of the 𝑖𝑡ℎ training example
(𝑖)
𝑥𝑗 = value of feature 𝑗 in 𝑖𝑡ℎ training example
16
Training Set Choice of h ?
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Learning Algorithm
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
Size of house,
Number of bedrooms, Estimated
Numbers of floors,
h
price
Age of home
hypothesis
17
(𝑖) (𝑖) (𝑖) (𝑖) (𝑖)
Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
(𝑖)
For convenience of notation, define 𝑥0 = 1
(𝑖)
𝑥0 𝜃0
(𝑖)
𝑥1 𝜃1
𝑥 (𝑖) = 𝑥 (𝑖) 𝜖ℝ𝑛+1 , 𝜃 = 𝜃2 𝜖ℝ𝑛+1
2 ⋮
⋮ 𝜃𝑛
(𝑖)
𝑥𝑛
(𝑖) (𝑖) (𝑖) (𝑖)
ℎ𝜃 𝑥 (𝑖) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
= 𝜃 𝑇 𝑥 (𝑖)
Multivariate Linear regression
18
Cost function
Idea: Choose 𝜃0 , 𝜃1 ,… 𝜃𝑛 so that ℎ𝜃 is close to 𝑦 for our training samples

(𝑖) (𝑖) (𝑖) (𝑖)
Hypothesis : ℎ𝜃 𝑥 (𝑖) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
Parameters : 𝜃0 , 𝜃1 ,… 𝜃𝑛
𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 =
2𝑚
෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
𝑖=1
Goal : min
𝜃0 ,𝜃1 ,… 𝜃𝑛
𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛
In order to achieve the hypothesis for all the samples we use the following equation:
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0
(2) (2) … (2) 𝜃1
ℎ𝜃 𝑥 = 𝑋𝜃 = 𝑥0 𝑥1
…
𝑥𝑛
⋮ ⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛
𝑥0 𝑥1 … 𝑥𝑛
19
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 = ෍(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
the vectorization expression of linear regression cost function can be denoted as:
1
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)
2𝑚
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0 𝑦 (1)
(2) (2) … (2) 𝜃 𝑦 (2)
𝑋 = 𝑥0 𝑥1 𝑥𝑛 𝜃= 1 𝑦=
… ⋮ ⋮
⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛 𝑦 (𝑚)
𝑥0 𝑥1 … 𝑥𝑛
Thus we can compute 𝜃 as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)
20

Mathematics Behind Machine Learning:: Linear Regression Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematics Behind Machine Learning:: Linear Regression Model

Uploaded by

Copyright:

Available Formats

Mathematics behind

Dr Lotfi Ncib, Associate Professor Of applied mathematics Esprit School of Engineering

X: Independent variable Y: dependent variable

Linear Non-Linear Linear Non-Linear

• Price estimation of house:

Indeed you can find many examples of the usefulness of regression

We have many regression algorithms:

Goal: Find regression line that makes

Idea: Choose 𝜃0 , 𝜃1 so that ℎ𝜃 is close to 𝑦 for our training samples

This can be further simplified as: 𝐽 𝜃 = ( 𝑋𝜃 𝑇

- What if 𝑋 𝑇 𝑋 is non-invertible? (singular/ degenerate)

Training Set Choice of h ?

Idea: Choose 𝜃0 , 𝜃1 ,… 𝜃𝑛 so that ℎ𝜃 is close to 𝑦 for our training samples

Thus we can compute 𝜃 as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

You might also like