You are on page 1of 7

ML course

ESP-UCAD

Dr. Mamadou Camara


mamadou.camara@ucad.edu.sn

2021-2022
Table des matières

1 Supervised Learning
Multiple linear regression 2
1.1 10 folds CV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Practical work Multiple linear regression : calculation of Beta β . . . . . . . . . . . . . . . . . 3
1.3 Practical work Cross-validation and multiple linear regression . . . . . . . . . . . . . . . . . . 4

1
Chapitre 1

Supervised Learning
Multiple linear regression

1.1 10 folds CV
Description of the example
— The model to build contains 7 predictors (possible) (x1 , x2 , x3 , x4 , x5 , x6 and x7 ), and a response
variable y.
— Sample contains two additional columns, one that identifies cases sequentially in the sample (s.id) and
one that identifies cases sequentially in the population from which the sample was drawn (p.id) .
— The sample contains 100 observations (cases) drawn randomly from a defined population of 1,000,000
individuals.

— One R package that enables cross-validation is Data Analysis And Graphing (”DAAG”) (Maindonald
and Braun, 2011) .

— The ”DAAG” package contains three functions for k–fold cross validation :
1. the ‘cv.lm’ function used simple linear regression models,

2
2. the ‘CVlm’ function used for multiple regression models,
3. and the ‘CVbinary’ function used for logistic regression models.
— Below is a commonly accepted application of 10-folds cross-validation (Harrell, 1998).
— Each f oldi corresponds to a test sample used at iteration i of the cross-validation. The observations
included in the f oldi by a random draw without replacement.
— Predicted : the predicted value using all observations
— cvpred : cross-validation prediction

— On the last line of the output, we have the sum of squares of residuals (errors) from cross-validation
(Overall MS) ;
— which represents the correct measure of the average prediction error over all folds.

1.2 Practical work Multiple linear regression : calculation of Beta


β
Use two methods to find the Beta β
1. Using daag’s lm function.

3
2. Applying the Beta formulation with R commands.

To display the coefficients of a model obtained with lm, just type the name of the variable containing the
model.

1.3 Practical work Cross-validation and multiple linear regression


Below we have manually performed part of the cross-validation process performed by functions in the
DAAG package. Carry out the following actions :

4
1. Train the multiple regression model by cross-validation.
2. Divide the global dataset into learning and test parts by using the choices made by the CVlm at
iteration 10 for example.
3. Find the model of iteration 10 using the lm function.
4. Use the second sample to test the resulting model. At least 1 case of this sample must be tested
manually (i.e. without predict-cf function line 26 of the code)
5. Check that the predictions obtained are identical to those found by CVlm 1 .

Dans la figure ci-après, nous avons la sortie de cvlm correspondant à l’itération 10.

Dans la figure ci-après, nous avons les résultats de la fonction predict la sortie de lm correspondant à l’itération
10.

1. in cvlm cvpredicted designates the prediction of the model obtained at iteration i

5
6

You might also like