Professional Documents
Culture Documents
Machine Learning:
Linear Regression
Model
Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not
for any commercial business intention
What is The difference between AI, ML and DL?
• Artificial Intelligence AI tries to make computers intelligent in order to mimic
the cognitive functions of humans. So, AI is a general field with a broad scope
including:
• Computer Vision,
• Language Processing,
• Creativity…
• Machine Learning ML is the branch of AI that covers the statistical part of
artificial intelligence. It teaches the computer to solve problems by looking at
hundreds or thousands of examples, learning from them, and then using that
experience to solve the same problem in new situations:
• Regression,
• Classification,
• Clustering…
• DL is a very special field of Machine Learning where computers can actually
learn and make intelligent decisions on their own,
• CNN
• RNN…
1
Types of Machine Learning
2
Classical Machine Learning
3
What is Regression?
Regression is Supervised: Target is provided
Continuous variable
bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
Regression is the
1510 3 2 30 ?
process of predicting
a continuous value.
4
Types of Regression
• Simple Regression
• Simple Linear Regression Predict Price($1000) vs Size(feet2) of all houses
• Simple Non-Linear Regression.
• Multiple Regression
• Multiple Linear Regression Predict Price($1000) vs Size(feet2) and number of
• Multiple Non-Linear Regression. bedrooms
Types of Regression
One Variable 2+ Variables
Simple Multiple
5
Applications of Regression
• Employment income:
• hours of work, education, occupation, sex age, years of
experience, and so on.
6
Exemple of Regression algorithms
• Ordinal regression
• Poisson regression
• Fast forest quantile regression
• Linear, polynomial, Lasso, Stepwise, Ridge regression
• Bayesian linear regression
• Neural network regression
• Decision forest regression
• KNN
• Boosted decision tree regression
7
Simple Linear
regression
Model
representation
8
Simple Linear Regression
• Simple linear regression
• Predict Price($1000) vs Size(feet2) of all houses
• Independent variable (x): Size of house
• Dependent variable (y): Price of house
Size in feet2 (x) Price ($) in 1000 (y)
2104 460
1416 232
1534 315
852 178
1245 ?
Notation:
m = Number of training examples
x = “input” variable / features
y = “output” variable / “target” variable
9
Model representation
Choice of ℎ ?
Training Set
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Learning Algorithm
Size of Estimated
h
house price
hypothesis Linear regression with one variable.
Univariate linear regression.
10
Cost function
Training Set Size in feet2 (x) Price ($) in 1000 (y) Hypothesis :
2104 460 ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
1416 232
1534 315 Parameters : 𝜃0 , 𝜃1
852 178
Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Parameters : 𝜃0 , 𝜃1
𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
Goal : min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1
12
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
the vectorization expression of linear regression cost function can be denoted as:
1 1 𝑥 (1) 𝑦 (1)
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦) 𝜃0 𝑦= ⋮
𝑋= ⋮ ⋮ 𝜃=
2𝑚 (𝑚) 𝜃1 𝑦 (𝑚)
1 𝑥
1
Since is a constant, we omit this constant term. Then our cost function becomes:
2𝑚
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)
𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Or ( 𝑋𝜃 𝑇 𝑦)𝑇 = 𝑦 𝑇 (𝑋𝜃) Then 𝐽 𝜃 = 𝑋𝜃
13
Analytical Solution
Further more, we can write it as: 𝐽 𝜃 = 𝜃 𝑇 𝑋 𝑇 𝑋𝜃 − 2𝑦 𝑇 𝑋𝜃 + 𝑦 𝑇 𝑦
Now we need to take derivative of the cost function. For convenience, the common matrix derivative
formulas are listed as reference:
𝜕𝐴𝑋 𝜕𝑋 𝑇 𝐴 𝜕𝑋 𝑇 𝑋 𝜕𝑋 𝑇 𝐴𝑋
= 𝐴, = 𝐴, = 2𝑋, = 𝐴𝑋 + 𝐴𝑇 𝑋
𝜕𝑋 𝜕𝑋 𝜕𝑋 𝜕𝑋
Using the above formulas, we can derive our cost function respect to 𝜃 as:
𝜕𝐽 𝜃
= 2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦
𝜕𝜃
In order to solve the variables, we need to make the above derivation equal to zero, that is:
2𝑋 𝑇 𝑋𝜃 − 2𝑋 𝑇 𝑦 = 0 then 𝑋 𝑇 𝑋𝜃 = 𝑋 𝑇 𝑦
Thus we can compute θ as: 𝜃 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
14
Multiple Linear
regression
Model
representation
15
Model representation
Size (feet2) Number of Number of Age of home
bedrooms floors (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
1510 3 2 30 ?
Notation:
m = Number of training examples
n = Number of features(variables)
𝑥 (𝑖) = “input” of the 𝑖𝑡ℎ training example
(𝑖)
𝑥𝑗 = value of feature 𝑗 in 𝑖𝑡ℎ training example
16
Model representation
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Learning Algorithm
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
Size of house,
Number of bedrooms, Estimated
Numbers of floors,
h
price
Age of home
hypothesis
17
Model representation
(𝑖) (𝑖) (𝑖) (𝑖) (𝑖)
Hypothesis : ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
(𝑖)
For convenience of notation, define 𝑥0 = 1
(𝑖)
𝑥0 𝜃0
(𝑖)
𝑥1 𝜃1
𝑥 (𝑖) = 𝑥 (𝑖) 𝜖ℝ𝑛+1 , 𝜃 = 𝜃2 𝜖ℝ𝑛+1
2 ⋮
⋮ 𝜃𝑛
(𝑖)
𝑥𝑛
(𝑖) (𝑖) (𝑖) (𝑖)
ℎ𝜃 𝑥 (𝑖) = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
= 𝜃 𝑇 𝑥 (𝑖)
Multivariate Linear regression
18
Cost function
Parameters : 𝜃0 , 𝜃1 ,… 𝜃𝑛
𝑚
1
Cost function : 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 =
2𝑚
(ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
𝑖=1
Goal : min
𝜃0 ,𝜃1 ,… 𝜃𝑛
𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛
In order to achieve the hypothesis for all the samples we use the following equation:
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0
(2) (2) … (2) 𝜃1
ℎ𝜃 𝑥 = 𝑋𝜃 = 𝑥0 𝑥1
…
𝑥𝑛
⋮ ⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛
𝑥0 𝑥1 … 𝑥𝑛
19
Analytical Solution
𝑚
1
Cost function: 𝐽 𝜃0 , 𝜃1 ,… 𝜃𝑛 = (ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 )2
2𝑚
𝑖=1
the vectorization expression of linear regression cost function can be denoted as:
1
𝐽 𝜃 = 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦)
2𝑚
(1) (1) (1)
𝑥0 𝑥1 … 𝑥𝑛 𝜃0 𝑦 (1)
(2) (2) … (2) 𝜃 𝑦 (2)
𝑋 = 𝑥0 𝑥1 𝑥𝑛 𝜃= 1 𝑦=
… ⋮ ⋮
⋮ ⋮ ⋮
(𝑚) (𝑚) (𝑚) 𝜃𝑛 𝑦 (𝑚)
𝑥0 𝑥1 … 𝑥𝑛
20