Professional Documents
Culture Documents
Session I
Pierre Michel
pierre.michel@univ-amu.fr
M2 EBDS
2021
1. Introduction
1. Introduction
Machine Learning
• Statistics
• Computer Science
• Artificial Intelligence
Automatic diagnosis
2.5
1.5
1.5
x2
x2
0.5
0.5
1 2 3 4 5 6 7 1 2 3 4 5 6 7
x1 x1
Density
0.2
y
120
0.0
60
20 30 40 50 60 4 5 6 7 8
x x
Semi-supervised learning
Reinforcement learning
This course is based on the Python programming language and the following
libraries:
Install Anaconda
Anaconda:
Anaconda Prompt
First of all, you have to choose a directory in which you are going to work,
we open Anaconda Prompt:
Anaconda Navigator
Jupyter Notebook
2. Linear regression
Context
4e+05
2e+05
Size
4e+05
2e+05
Size
Prediction 6e+05
Price
406509.5
4e+05
2e+05
2500
Size
Learning sample
Size (x) Price (y)
1600 329900
2400 369000
1416 232000
3000 539900
1985 299900
1534 314900
Notations:
Learning
From the learning sample and a learning algorithm, we look for a function h
which expresses y as a function of x. We represent it as a linear function:
variable to be explained
hθ (x) = θ0 + θ1 x
Parameters
3.0
3.0
2.5
2.5
2.5
2.0
2.0
2.0
1.5
1.5
1.5
1.0
1.0
1.0
0.5
0.5
0.5
0.0
0.0
0.0
0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0
Idea 5
4
3
y
2
1
0
0 1 2 3 4 5
Optimization problem
n
1X
J(θ0 , θ1 ) = (hθ (x(i) ) − y (i) )2
2 i=1
min J(θ0 , θ1 )
θ0 ,θ1
hθ(x)
3.0
3.0
2.0
2.0
J(θ1)
y
1.0
1.0
0.0
0.0 1.0 2.0 3.0 0.0 0.0 0.5 1.0 1.5 2.0
x θ1
80
Cost
60
40
20
1500
z
1000
500
y
1500
z
1000
500
y
Gradient descent
∂
θj := θj − α J(θ0 , θ1 ) (for j = 0 and j = 1)
∂θj
α is the learning rate (or learning step), it controls the step length of the
gradient descent.
Reminders
hθ (x) = θ0 + θ1 x
n 2
1 X
J(θ0 , θ1 ) = hθ (x(i) ) − y (i)
2 i=1
∂
θj := θj − α J(θ0 , θ1 ) (for j = 0 and j = 1)
∂θj
n 2
∂ ∂ 1 X
J(θ0 , θ1 ) = hθ (x(i) ) − y (i)
∂θj ∂θj 2 i=1
n
∂ 1X
= (θ0 + θ1 x(i) − y (i) )2
∂θj 2 i=1
∂
Pn
• for j = 0: ∂θ0 J(θ0 , θ1 ) = i=1 (hθ (x(i) ) − y (i) )
∂
Pn
• for j = 1: ∂θ1 J(θ0 , θ1 ) = i=1 (hθ (x(i) ) − y (i) )x(i)
n
X
θ0 := θ0 − α (hθ (x(i) ) − y (i) )
i=1
n
X
θ1 := θ1 − α (hθ (x(i) ) − y (i) )x(i)
i=1
}
Note: each iteration of the algorithm uses all observations.
}
}
Note: each iteration of the algorithm uses only one observation.
Multivariate data (p ≥ 2)
• p: number of variables
• x(i) : values for the i-th observation
• x(i)
j : value for the j-th variable for the i-th observation
Learning
hθ (x) = θ0 + θ1 x1 + θ2 x2 + ... + θp xp
x0 θ0
x1 θ1
x = x2 ∈ Rp+1 et θ = θ2 ∈ Rp+1
.. ..
. .
xp θp
Reminders
Pp
• Function of the model: hθ (x) = θT x = θ0 + j=1 θj xj
• Parameters: θ ∈ Rp+1
Pn
• Cost function: J(θ) = 1
2 i=1 (hθ (x
(i)
) − y (i) )2
• Gradient descent (batch): we repeat until convergence:
∂
θj := θj − α J(θ) (for j = 1, ..., p)
∂θj
The algorithm (p ≥ 2)
Repeat until convergence:
n
(i)
X
θj := θj − α (hθ (x(i) ) − y (i) )xj
i=1
n
(i)
X
θ0 := θ0 − α (hθ (x(i) ) − y (i) )x0
i=1
n
(i)
X
θ1 := θ1 − α (hθ (x(i) ) − y (i) )x1
i=1
n
(i)
X
θ2 := θ2 − α (hθ (x(i) ) − y (i) )x2
i=1
...
Pierre Michel Prediction methods & Machine learning 58/76
3. Multivariate linear regression
3.3. Feature scaling
Motivation
1 https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.
MinMaxScaler.html)
2 https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.
StandardScaler.html
Pierre Michel Prediction methods & Machine learning 60/76
3. Multivariate linear regression
3.3. Feature scaling
Min-max scaling
xj − min(xj )
xmin-max
j =
max(xj ) − min(xj )
Standardization
xj − µj
xstd
j =
σj
With µ the empirical mean of the observations and σ the standard deviation.
∂
θj := θj − α J(θ) (pour j = 1, ..., p)
∂θj
3 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
SGDRegressor.html#sklearn.linear_model.SGDRegressor.score
Pierre Michel Prediction methods & Machine learning 65/76
3. Multivariate linear regression
3.4. The learning rate α
Diagnosis
4 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
SGDRegressor.html#sklearn.linear_model.SGDRegressor.score
Pierre Michel Prediction methods & Machine learning 66/76
3. Multivariate linear regression
3.5. Polynomial regression
Introduction
5 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
SGDRegressor.html#sklearn.linear_model.SGDRegressor.score
Pierre Michel Prediction methods & Machine learning 68/76
3. Multivariate linear regression
3.5. Polynomial regression
Polynomial regression
hθ (x) = θ0 + θ1 x + θ2 x2 + ... + θd xd
where d ∈ N is the degree of the polynomial.
Examples:
• d = 1 (linear): hθ (x) = θ0 + θ1 x
• d = 2 (quadratic): hθ (x) = θ0 + θ1 x + θ2 x2
• d = 3 (cubic): hθ (x) = θ0 + θ1 x + θ2 x2 + +θ3 x3
Example (Portland)
7e+05
5e+05
Price
3e+05
Size
3e+05
Size
Pierre Michel Prediction methods & Machine learning 71/76
3. Multivariate linear regression
3.6. Normal equation
Minimizing J(θ)
6 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
LinearRegression.html
7 http://cs229.stanford.edu/notes2020fall/notes2020fall/cs229-notes1.pdf
Pierre Michel Prediction methods & Machine learning 73/76
3. Multivariate linear regression
3.6. Normal equation
θ = (X T X)−1 X T y
where θ ∈ Rp+1 ,
X ∈ Rn×(p+1) ,
and y ∈ Rn .
θ = (X T X)−1 X T y
Conclusion
• Gradient descent
I Sensitive to the choice of the α hyperparameter
I Iterative method
I Works for p very large
• Normal equation
I No need for hyperparameters
I No iterations
I Compute (X T X)−1
I Slow for p very large (complexity O(p3 ))