Professional Documents
Culture Documents
XGBoost
Dimensionality Reduction
This happens because the model is trying too hard to capture the noise in training dataset.
By noise we mean the data points that don’t really represent the true properties of your data,
but random chance.
Learning such data points, makes models more flexible, at the risk of overfitting.
This minimum gain can usually be set for anything between (0, ∞).
The commonly used regularisation techniques are :
L1 regularisation
L2 regularisation
Dropout regularisation
Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the
loss function(L).
Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss
function(L).
NOTE that during Regularisation the output function(y_hat) does not change. The change is
only in the loss function.
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 6
Regularisation
The loss function before regularisation:
prediction cost
The loss function after regularisation:
prediction cost +
regularization cost
Output : y ∈ R768 x 1
Whether he / she has diabetes (0 or 1).
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 11
Code
Around 30 lines (including some pre-processing).
It can be used as a basis when using other models (such as Deep Learning to learn
later).
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 14
XGBoost: What It Is?
XGBoost stands for eXtreme Gradient Boosting.
There are, however, the difference in modeling details. Specifically, XGBoost used a
more regularized model formalization to control over-fitting, which gives it better
performance.
Parallelization of tree construction using all of your CPU cores during training.
Distributed Computing for training very large models using a cluster of machines.
Out-of-Core Computing for very large datasets that don’t fit into memory.
A design goal was to make the best use of available resources to train the model.
Continued Training so that you can further boost an already fitted model on
new data.
Let us look into an example where there is a comparison between the untuned
XGBoost model and tuned XGBoost model based on their RMSE score.
Later, you will know about the description of the hyperparameters in XGBoost.
It is available at:
http://xgboost.readthedocs.io/en/latest/parameter.html#general-parameters
Increasing this value will make the model more complex and more likely to
overfit.
Default = 6.
This comes into play every time when we achieve the new
level of depth in a tree.
They learn how to optimise for the target variable using different set of features.
So, if you have enough data you can try tuning colsample parameters!
esv
gi
el ta
od of e
.
r m ue
tte val
a sing
be
r ea
nc
ei
Th
Kazi Shah Nawaz Ripon | Faculty of Computer Sciences 18.02.2021 36
What Makes XGBoost so Popular?
Speed and performance.
Core algorithm is parallelizable.
Consistently outperforms single-algorithm methods.
State-of-the-art performance in many ML tasks.
Out-of-Core computing (large datasets that do not fit in memory).
Computer vision.
When the number of training samples is significantly smaller than the number
of features.
33 bins
k: number of bins per feature
Curse of Dimensionality
The curse of dimensionality is the phenomena
whereby an increase in the dimensionality of a data
set results in exponentially more data being required
to produce a representative sample of that data set.
You might ask the question, “How do you take all of the variables you have
collected and focus on only a few of them?”
In technical terms, you want to “reduce the dimension of your feature space.”
Ways:
Feature Elimination
Feature Extraction
But — and here’s the kicker — because these new independent variables are
combinations of our old ones,
It is still possible to keep the most valuable parts of our old variables,
even when one or more of these “new” variables are dropped!