0% found this document useful (0 votes)
21 views74 pages

ML 11 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views74 pages

ML 11 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Introduction to Machine Learning


Course 10 - Tree-Based Methods : Boosting Methods 2
4th year Statistics and Data Science

Ayoub Asri

16 April 2025

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

1 Gradient Boosting

2 Gradient Boosting ideas

3 Gradient Boosting : Regression Example

4 Gradient Boosting : Logic and Algorithm

5 Extreme Gradient Boosting (Xgboost)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Section 1

Gradient Boosting

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Section 2

Gradient Boosting ideas

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : idea

Gradient Boosting is very similar idea to AdaBoost, where weak


learners are created in series in order to produce a strong
ensemble model.
Gradient Boosting makes use of the residual error for learning.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting vs AdaBoost

Larger trees allowed in Gradient Boosting.


Learning rate coefficient same for all weak learners.
Gradual series learning is based on training on the residuals of
the previous model.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Section 3

Gradient Boosting : Regression Example

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 1

We will start by presenting an example. We will use 3 features


to predict the weight (the target variable).

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 2

First step, we will start by predict the weight variable with the
same value for every observation. The value is the average of
the variable.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 3

Then, we will calculate the residuals (real value − prediction)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 4

Now, we will use the 3 features to create a tree to predict the


residuals (residuals not the real value)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 5

After using Gini index, we will get this tree.


PS. in this example we allow a maximum of 4 leaves.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 6

If a leaf contain more than one observation (and this will be


common in larger datasets), we have to replace its value by the
mean of the different observations.
−14.2−15.2
For this leaf we will have : 2 = −14.7

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 7

If a leaf contain more than one observation (and this will be


common in larger datasets), we have to replace its value by the
mean of the different observations.
1.8+5.8
and for this one : 2 = 3.8

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 8

The predictions from this tree, will then become.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 9

Now, we can combine the original leaf with the new tree to get :

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 10

To make a new prediction of an individual’s weight from the


training data we will use these scheme.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 11

The prediction will, then, be :

71.2 + 16.8 = 88

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 12

We observe that the prediction is literally equal the real value.


This happen because the model fit the traing data very well
(low bias and probably very high variance).

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 13

To deal with this problem, Gradient Boosting introduce the


notion of learning rate to scale the contribution form the new
tree.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 14

In this example, we will use a learning rate = 0.1


We will get a prediction of :

71.2 + 0.1 × 16.8 = 72.9

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 15

Empirical evidence shows that taking lots of small steps


in the right direction results in better predictions with
a Testing Dataset, i.e. lower variance
— Jerom Friedman

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 16

If we use what we built so far to calculate the new predictions.


We can observe that we have taken a step to the right direction.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 17

Let’s now build a new tree to predict the residuals.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 18

Here is the new tree.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 19

As before, we will remplace the leaves with their averages.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 20

We will then combine all what we built so far to obtain the


predictions (the original average and the two trees).

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 21

We will get a prediction of :

71.2 + 0.1 × 16.8 + 0.1 × 15.1 = 74.4

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 22

We can see that each time we add a tree to the prediction, the
residuals get smaller.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 23

Now we build another tree to predict the new residuals.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 24

And add it the the chain of trees that we created so far.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Example 25

and we keep making trees until we reach the maximum


specified, or adding additional trees does not significantly
reduce the size of the residuals.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Section 4

Gradient Boosting : Logic and Algorithm

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Algorithm 1


Gradient Boosting algorithm

Input : Data {(xi , yi )}ni=1 and differentiable Loss Function


L(yi , F (x))
step 1 Initialize the model with a constant value :
F0 (x) = argminγ L(yi , γ)
step 2 for m = 1 to M
h i
∂L(yi ,F (xi ))
(A) Compute rim = − ∂F (xi ) for i = 1, ..., n
F (x)=Fm−1 (x)
(B) Fit a regression tree to the rim values and create terminal
regions Rjm for j = 1, ..., Jm
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)
PJm
(D) Update Fm (x) = Fm−1 (x) + η j=1 γm I(x ∈ Rjm )
step 3 Output FM (x)
Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 1

To illustrate this algorithm we will use a very simple example.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 2


Input

Input : Data {(xi , yi )}ni=1 and differentiable Loss Function


L(yi , F (x))

Since we already have the data, we will define the Loss function

1
Loss Function = L(yi , F (x)) = (Observed − P redicted)2
2
And its derivative:


L(yi , F (x)) = −(Observed − P redicted) = −error
∂P redicted
Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 3

step 1

step 1 Initialize the model with a constant value :


F0 (x) = argminγ L(yi , γ)

In this step we initialize the value of the predictions.


Using the derivative we can find the minimum of the Loss
function is the mean of the Observations. Which means we will
predict the same weights (mean = 73.3) for all observations.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 4

step 2

step 2 for m = 1 to M

Step 2 is a loop where we have to determine the number of trees


used.
Generally and typically we use M = 100.
We start by stting m = 1.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 5


step 2.A
h i
∂L(yi ,F (xi ))
(A) Compute rim = − ∂F (xi ) for i = 1, ..., n
F (x)=Fm−1 (x)

We start by calculating the

∂L(yi , F (xi ))
 

∂F (xi )

Which is equal to the residual since the Loss function is a


quadratic function

Residual = (Observed − P redicted)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 6


step 2.A
h i
∂L(yi ,F (xi ))
(A) Compute rim = − ∂F (xi ) for i = 1, ..., n
F (x)=Fm−1 (x)

We have just to replace the value of the predicted value of the


last step (F (x) = Fm−1 (x))
For every observation in the data set we will use :

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 7


step 2.B
(B) Fit a regression tree to the rim values and create terminal
regions Rjm for j = 1, ..., Jm

We will use the features to predict the residuals ri,1 calculated


in step (A).

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 8

step 2.B
(B) Fit a regression tree to the rim values and create terminal
regions Rjm for j = 1, ..., Jm

Using Gini index we obtain this tree (stump in this case because
we have only 3 observation but it is larger for a general case)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 9

step 2.C
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)

The leaves are the terminal regions.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 10


step 2.C
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)

Typically, for larger datasets, every leaf Rjm contain many


observations.
We have to determine the prediction value for each leaf γjm .
We can easily prove that since the Loss function is quadratic
than the value of
X
γjm = argmin L(yi , Fm−1 (xi ) + γ)
Jm

is the mean of the values in that leaf.


Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 11

step 2.C
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)

In our case, we obtain.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 12

step 2.D
PJm
(D) Update Fm (x) = Fm−1 (x) + η j=1 γm I(x ∈ Rjm )

Finally, we have to calculate the prediction :

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Regression Algorithm 13

step 2

step 2 for m = 1 to M

After finishing all the steps for m = 1, we repeat the steps from
(A) to (D) for m = 2

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 1

In the case of a classification problem, Gradient Boosting is the


same.
We have only to change the nature of the target variable and
the Loss function.
Changing the Loss function will have some implications (that
we will explore without proof !)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 2

To illustrate the difference we will use this dataset to predict if


a person who attends a cinema like the movie “troll 2”

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 3

Input

Input : Data {(xi , yi )}ni=1 and differentiable Loss Function


L(yi , F (x))

The Loss function in case of a classification problem is the cross


entropy :

N
X
L(yi , F (x)) = − yi × log (p) + (1 − yi ) × log (1 − p)
i=1

where : p is the predicted probability (of loving troll 2), in our


case : p = 2/3 (2 persons out of 3 loving the movie).

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 4


step 1

step 1 Initialize the model with a constant value :


F0 (x) = argminγ L(yi , γ)

We initialize the model by using :

p
F0 (x) = argmin L(yi , γ) = log odds = log ( )
γ 1−p

2/3
In our case : F0 (x) = log ( 1/3 ) = log 2 = 0.69

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 5


step 2.A
h i
∂L(yi ,F (xi ))
(A) Compute rim = − ∂F (xi ) for i = 1, ..., n
F (x)=Fm−1 (x)

For m = 1, we calculate the residuals


elog (odds)
ri,m = Observed −
1 + elog (odds)
elog (2) 2
In our case : r1,m = Observed − 1+elog (2)
= Observed − 3

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 6

step 2.B
(B) Fit a regression tree to the rim values and create terminal
regions Rjm for j = 1, ..., Jm

Now, we fit the tree for the residuals and determine the
terminal regions.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 7

step 2.C
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)

For the case of classification, we have :


P
redidual
γ=P
p × (1 − p)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 8

step 2.C
(C) For j = 1, ..., Jm compute
P
γjm = argminJm L(yi , Fm−1 (xi ) + γ)

0.33
So, for γ1,1 = 2/3×1/3 = 3/2
0.33−0.67
and γ2,1 = 2/3×1/3+2/3×1/3 = −0.77

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 9

step 2.D
PJm
(D) Update Fm (x) = Fm−1 (x) + η j=1 γm I(x ∈ Rjm )

Finally, we calculate the prediction for m = 1 :

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Gradient Boosting : Classification Algorithm 10

step 3

step 3 Output FM (x)

We repeat these steps, till finish the loop for M trees

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Section 5

Extreme Gradient Boosting (Xgboost)

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Idea

Xgboost is the most recent version of Gradient Boosting


The algorithm is basically the same with some modification on
the Loss Function.
We will introduce this new Loss function and its implications on
the other steps of the algorithm.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost : Loss function

The loss function used in any problem of Xgboost is :

n
X 1 2
loss function = L(yi , pi) + λOvalue
i=1
2
or

n
X 1 2
loss function = L(yi , p0i + Ovalue ) + λOvalue
i=1
2

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression 1

The first prediction value in Xgboost is always

p0i = 0.5

For the case of regression models, the specifi loss function used
is :

1
L(yi , pi) = (yi − pi )2
2

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression 2

Using this loss function, after creating any tree with different
regions the output of the tree will be :

Sum of residuals
Ovalue =
number of residuals + λ

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression

For each node we have to calculate the similarity score

(Sum of residuals)2
similarity score =
number of residuals + λ

To decide the best split we have to calculate the Gain for each
node :

Gain = Leftsimilarity + Rightsimilarity − Rootsimilarity

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 1

To illustrate Xgboost we will use this data set of the effect of


drug dosage on drug effectiveness

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 2

We put all residuals residual = real value − 0.5 on the same node

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 3

we calculate the similarity score for the node

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 4

Now we try values of the spliting criterion (dosage sup or inf of


some values)
We start by dosage < 15

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 5

We calculate the similairy score for each node. We calculate the


Gain also. Gain equal to 120.33 in this case.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 6

We repeat this process for different values of dosage.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 7

We repeat this process for different values of dosage.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for regression : example 8

We reapeat this step for the right node to decide if we can split
again.

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for classification 1

In the case of classification, the function loss used is the binary


cross entropy.

L(yi , pi ) = −(yi log (pi ) + (1 − yi ) log (1 − pi ))

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for classification 2

Using this loss function, will give us this similarity score :

similarity score
( residuals)2
P
=P
(previous probability × (1 − previous probability)) + λ

Ayoub Asri
Introduction to Machine Learning
Gradient Boosting Gradient Boosting ideas Gradient Boosting : Regression Example Gradient Boosting : Lo

Xgboost for classification 3

The ouput value is then given by:

P
residuals
Ovalue = P
(previous probability × (1 − previous probability)) + λ

Ayoub Asri
Introduction to Machine Learning

You might also like