Professional Documents
Culture Documents
net/publication/339499154
CITATIONS READS
9 9,055
3 authors:
Akash Singh
Chitkara University
1 PUBLICATION 9 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shubham Malik on 26 February 2020.
● XGBoost at a glance
● Flashback to:-
— —● Boosting
— — ● Ensemble Learning
— — — — ● Types of Ensemble learning
— — — — ● Working of Boosting Algorithm
— — ● CART (Classification and Regression Trees)
— — ● Gradient Boosting
— — — — ● Gradient Boosting Process flow
● XGBoost in action
— — ● Algorithmic enhancements
— — ● System Enhancements
— — ● Flexibility in XGBoost
— — ● Cross Validation (with code)
— — ● Model Tuning
— — — — ● Grid Search (with code)
— — — — ● Random Search (with code)
— — ● Extendibility
— — — — ● Classification (with code)
— — — — ● Regression (with code)
— — — — — — —● Tree as Base Learner (with code)
— — — — — — — ● Linear Base Learner (with code)
— — ● Plot importance
● References
Every day we hear about the breakthroughs in
Artificial Intelligence. However, have you wondered
what challenges it faces?
Challenges occur in highly unstructured data like DNA sequencing, credit
card transaction and even in cybersecurity, which is the backbone of keeping
our online presence safe from fraudsters. Does this thought make you yearn
to know more about the science and reasoning behind these systems? Do not
worry! We’ve got you covered. In the cyber era, Machine Learning (ML) has
provided us with the solutions to these problems with the implementation of
Gradient Boosting Machines (GBM). We have ample algorithms to choose
from to do gradient boosting for our training data but still, we encounter
different issues like poor accuracy, high loss, large variance in the result.
Here, we are going to introduce you to a state of the art machine learning
algorithm XGBoost built by Tianqi Chen, that will not only overcome the
issues but also perform exceptionally well for regression and classification
problems. This blog will help you discover the insights, techniques, and skills
with XGBoost that you can then bring to your machine learning projects.
XGBoost, a glance!
eXtreme Gradient Boosting (XGBoost) is a scalable and improved
version of the gradient boosting algorithm (terminology alert) designed for
efficacy, computational speed and model performance. It is an open-source
library and a part of the Distributed Machine Learning Community. XGBoost
is a perfect blend of software and hardware capabilities designed to enhance
existing boosting techniques with accuracy in the shortest amount of time.
Here’s a quick look at an objective benchmark comparison of XGBoost with
other gradient boosting algorithms trained on random forest with 500 trees,
performed by Szilard Pafka.
Benchmark Performance of XGBoost (source)
Too complex to comprehend? We get it, let’s first understand the rudiments
of machine learning methods which leads us to XGBoost and the vitality of
its performance.
The 3rd rule will classify it as a bot, but this will be a false prediction as a
person can know and tweet in multiple languages. Consequently, based on a
single rule, our prediction can be flawed. Since these individual rules are not
strong enough to make an accurate prediction, they are called weak
learners. Technically, a weak learner is a classifier that has a weak
correlation with the actual value. Therefore, to make our predictions more
accurate, we devise a model that combines weak learners’ predictions to make
a strong learner, and this is done using the technique of boosting.
Weak rules are generated at each iteration by base learning algorithms
which in our case can be of two types:
● Tree as base learner
● Linear base learner
Generally, decision trees are default base learners for boosting.
Interested to know more about learners? Be assured, we will demonstrate
how to use both the base learners to train XGBoost models in a later section.
First, let’s get to know about the ensemble learning technique mentioned
above.
Ensemble Learning:
Ensemble learning is a process in which decisions from multiple machine
learning models are combined to reduce errors and improve prediction when
compared to a Single ML model. Then the maximum voting technique is used
on aggregated decisions (or predictions in machine learning jargon) to
deduce the final prediction. Puzzled?
Think of it as organizing efficient routes to your work/college/or grocery
stores. As you can use multiple routes to reach your destination, you tend to
learn the traffic and the delay it might cause to you at different times of the
day, allowing you to devise a perfect route, Ensemble learning is alike!
Though both have some fascinating math under the cover, we do not need to
know it to be able to pick it up as a tool. Our focus will be more on the
understanding of Boosting due to its relevance in XGBoost.
F(i) is current model, F(i-1) is previous model and f(i) represents weak model
Internal working of boosting algorithm
Feeling braced up? Take a look at two more algorithms (CART and Gradient
Boosting) to understand the mechanics of XGBoost before we delve deeper
into the topic.
XGBOOST in action
What makes XGBoost a go-to algorithm for winning Machine Learning and
Kaggle competitions?
XGBoost Features
Isn’t it interesting to see a single tool to handle all our boosting problems!
Here are the features with details and how they are incorporated in XGBoost
to make it robust.
Algorithm Enhancements:
1. Tree Pruning — Pruning is a machine learning technique to reduce
the size of regression trees by replacing nodes that don’t contribute to
improving classification on leaves. The idea of pruning a regression tree
is to prevent overfitting of the training data. The most efficient method
to do pruning is Cost Complexity or Weakest Link Pruning which
internally uses mean square error, k-fold cross-validation and learning
rate. XGBoost creates nodes (also called splits) up
to max_depth specified and starts pruning from backward until the
loss is below a threshold. Consider a split that has -3 loss and the
subsequent node has +7 loss, XGBoost will not remove the split just by
looking at one of the negative loss. It will compute the total loss (-3 + 7
= +4) and if it turns out to be positive it keeps both.
System Enhancements:
1. Parallelization — Tree learning needs data in a sorted manner. To cut
down the sorting costs, data is divided into compressed blocks (each
column with corresponding feature value). XGBoost sorts each block
parallelly using all available cores/threads of CPU. This optimization is
valuable as a large number of nodes gets created frequently in a tree. In
short, XGBoost parallelizes the sequential process of generating trees.
xgb.cv()
X, y = load_boston(return_X_y=True)
#
reg_params = [1, 10, 100]
# params
params = {"objective": "reg:squarederror", "max_depth": 3}
# Create an empty list for storing rmses as a function of ridge regression complexity
ridge_regression = []
Extendibility:
1. Classification using XGBoost
#XGBoost classification
import numpy as np
from numpy import loadtxt
import xgboost as xgb
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
# load data
df = loadtxt('./dataset.csv', delimiter=",")
# XGB Classifier
xg_cl = xgb.XGBClassifier(objective='binary:logistic', n_estimators=100, seed=123)
eval_set = [(X_train, y_train), (X_test, y_test)]
# KC House Data
df = pd.read_csv('./kc_house_data.csv')
df_train = df[['bedrooms', 'bathrooms', 'sqft_living', 'floors', 'waterfront',
'view', 'grade', 'lat', 'yr_built', 'sqft_living15']]
X = df_train.values
y = df.price.values
# Fitting XGB regressor model and default base learner is Decision Tree
xgb_reg = xgb.XGBRegressor(objective="reg:linear", n_estimators=75, subsample=0.75,
max_depth=7)
xgb_reg.fit(X_train, y_train)
# Making Predictions
predictions = xgb_reg.predict(X_test)
# Variance_score
print((explained_variance_score(predictions, y_test)))
xgb.plot_importance(xg_reg)
# Making predictions
predictions = xg_reg.predict(boston_test)
# Computing RMSE
print("RMSE: %f" % (np.sqrt(mean_squared_error(y_test, predictions))))
rmse using xgboost regression with linear base learner