Professional Documents
Culture Documents
Regression 2021 Partie2 Anglais Annote PDF
Regression 2021 Partie2 Anglais Annote PDF
Simon Malinowski
2 Non-linear regression
I 20 variables : 1 million
I 30 variables : 1 billion
2 How to compare models with different number of variables ?
The most used stopping criterion is : stop as soon as one iteration does
not improve the perforance (according to C ).
It is a bit strict, sometimes better models can be found by waiting a little
bit more
Other stopping criteria :
stop when no better model has been found since δ iterations
don’t stop (go until the end) and keep the best model found
9
8
7
6
5
4
5 10 15 20
Nombre de variables
Initialization :
Vs = [] : selected variables
Vnu = [X1 , . . . , Xp ] : variables not (yet) used
Cf = ∞ : performance of the best model found
stop = F : variable to handle the stopping criterion
WHILE stop = F
∀x ∈ Vnu , Compute the performance of the models with variables [Vs , x]
Let xb be the best x (above), and Cb its performance
IF Cb < Cf ,
Vs = [Vs , xb ]; Cf = Cb ; Vnu = Vnu \ {xb }
ELSE stop = T
END WHILE
1 Split the dataset into a training set and a test set (with about 20%
for the test set
2 Apply the variable selection algorithm using the training set
→ if the generalization error is the performance criterion, you will
need to split the training set again (into training and validation)
3 Estimate the generalization error of the selected model on the test set.
2 Non-linear regression
Polynomial regression
100
100
80
80
80
60
60
60
y
y
40
40
40
20
20
20
0
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
x x x