You are on page 1of 26

LightGBM - An In-Depth Guide [Python]

coderzcolumn.com/tutorials/machine-learning/lightgbm-an-in-depth-guide-python

Table of Contents

LightGBM - Gradient Boosted Decision Trees


LightGBM is a framework that provides an implementation of gradient boosted decision trees. It's created by the researchers and developers
team at Microsoft. Light GBM is known for its faster-training speed, good accuracy with default parameters, parallel, and GPU learning, low
memory footprint, and capability of handling large dataset which might not fit in memory. LightGBM provides API in C, Python, and R
Programming. LightGBM even provides CLI which lets us use the library from the command line. LightGBM estimators provide a large set of
hyperparameters to tune the model. It even has a large set of optimization/loss functions and evaluation metrics already implemented. As a
part of this tutorial, we'll be covering the Python API of lightgbm. We'll try to explain and cover the majority of the Python API of lightgbm. The
main aim of this tutorial is to make readers aware of the majority of functionalities available through lightgbm and get them started with the
framework. There are other libraries (xgboost, catboost, scikit-learn) that also provide an implementation of gradient boosted decision trees.
Please feel free to check the references section to know about them.

We'll start by importing the necessary libraries.

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 50)

import lightgbm as lgb


import sklearn

print("LightGBM Version : ", lgb.__version__)


print("Scikit-Learn Version : ", sklearn.__version__)

LightGBM Version : 3.1.0


Scikit-Learn Version : 0.21.2

Load Datasets
We’ll be using the below-mentioned three different datasets which are available from sklearn as a part of this tutorial for explanation purposes.

Boston Housing Dataset: It's a regression problem dataset which has information about the various attribute of houses in Boston and
their price in dollar. This will be used for regression tasks.
Breast Cancer Dataset: It's a classification dataset that has information about two different types of tumor. It'll be used for explaining
binary classification tasks.
Wine Dataset - It's a classification dataset that has information about ingredients used in three different types of wines. It'll be used for
explaining multi-class classification tasks.

We have loaded all three datasets mentioned one by one below. We have printed descriptions of datasets which gives us an overview of dataset
features and size. We have even loaded each dataset as a pandas data frame and displayed the first few samples of data.

Boston Housing Dataset

from sklearn.datasets import load_boston

boston = load_boston()

for line in boston.DESCR.split("\n")[5:29]:


print(line)

boston_df = pd.DataFrame(data=boston.data, columns = boston.feature_names)


boston_df["Price"] = boston.target

boston_df.head()

1/26
**Data Set Characteristics:**

:Number of Instances: 506

:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

:Attribute Information (in order):


- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's

:Missing Attribute Values: None

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2

Breast Cancer Dataset

from sklearn.datasets import load_breast_cancer

breast_cancer = load_breast_cancer()

for line in breast_cancer.DESCR.split("\n")[5:31]:


print(line)

breast_cancer_df = pd.DataFrame(data=breast_cancer.data, columns = breast_cancer.feature_names)


breast_cancer_df["TumorType"] = breast_cancer.target

breast_cancer_df.head()

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)

The mean, standard error, and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

- class:
- WDBC-Malignant
- WDBC-Benign

2/26
mean mean
mean mean mean mean mean mean mean concave mean fractal radius textur
radius texture perimeter area smoothness compactness concavity points symmetry dimension error error

0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053

1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339

2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869

3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560

4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813

Wine Dataset

from sklearn.datasets import load_wine

wine = load_wine()

for line in wine.DESCR.split("\n")[5:29]:


print(line)

wine_df = pd.DataFrame(data=wine.data, columns = wine.feature_names)


wine_df["WineType"] = wine.target

wine_df.head()

**Data Set Characteristics:**

:Number of Instances: 178 (50 in each of three classes)


:Number of Attributes: 13 numeric, predictive attributes and the class
:Attribute Information:
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline

- class:
- class_0
- class_1
- class_2

alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanin

0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29

1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28

2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81

3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18

4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82

train()
The simplest way to create an estimator in lightgbm is by using the train() method. It takes as input estimator parameter as dictionary and
training dataset. It then trains the estimator and returns an object of type Booster which is a trained estimator that can be used to make
future predictions.

Below are some of the important parameters of the train() method.

params - This parameter accepts dictionary specifying parameters of gradient boosted decision trees algorithm. We just need to provide
an objective function to get started with based on the type of problem (classification/regression). We'll later explain a commonly used list
of parameters that can be passed to this dictionary.

3/26
train_set - This parameter accepts lightgbm Dataset object which holds information about feature values and target values. It's an
internal data structure designed by lightgbm to wrap data.
num_boost_round - It specifies the number of booting trees that will be used in the ensemble. The group of gradient boosted trees is
called ensemble to whom we generally refer as an estimator. The default value is 100.
valid_sets - It accepts list of Dataset objects which as validation sets. These validation sets will be evaluated after each training
round.
valid_names - It accepts a list of strings of the same length as that of valid_sets specifying names for each validation set. These
names will be used when printing evaluation metrics for these datasets as well as when plotting them.
categorical_feature - It accepts list of strings/ints or string auto . If we give a list of strings/ints then those columns from the
dataset will be treated as categorical columns.
verbose_eval - It accepts bool or int as value. If we set the value to False or 0 then it won't print metrics evaluation results calculated
on validation sets that we passed. If we pass True then it'll print results for each round. If we pass an integer greater than 1 then it'll print
results repeatedly after that many rounds.

Dataset

The dataset is a lightgbm internal data structure for holding data and labels. Below are important parameters of the class.

data - It accepts numpy array, pandas dataframe, scipy sparse matrix, list of numpy arrays, h2o data table’s frame as input holding
feature values.
label - It accepts numpy array, pandas series, pandas one column dataframe specifying target values. We can even set this parameter
to None if we don't have target values. The default is None.
feature_name - It accepts a list of strings specifying feature names.
categorical_feature - It has the same meaning as that mentioned in the train() method parameter above. We can handle
categorical feature here or in that method.

Regression
The first problem that we'll solve using lightgbm is a simple regression problem using the Boston housing dataset which we loaded earlier. We
have divided the dataset into train/test sets and created a Dataset instance out of them. We have then called the lightgbm.train() method
giving it train and validation set. We have set the number of boosting rounds to 10 hence it'll create 10 boosted trees to solve the problem. After
training completes, it'll return an instance of type Booster which we can later use to make future predictions on the dataset. As we have given
the validation set as input, it'll print the validation l2 score after each iteration of training. Please make a note that by default lightgbm
minimizes l2 loss for regression problems.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression"},


train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000076 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 975
[LightGBM] [Info] Number of data points in the train set: 379, number of used features: 13
[LightGBM] [Info] Start training from score 22.590501
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] valid_0's l2: 63.038
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2] valid_0's l2: 54.5739
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3] valid_0's l2: 47.6902
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4] valid_0's l2: 41.6301
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5] valid_0's l2: 36.776
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[6] valid_0's l2: 32.8883
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[7] valid_0's l2: 29.8897
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[8] valid_0's l2: 27.244
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[9] valid_0's l2: 24.9776
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[10] valid_0's l2: 22.8617

4/26
Below we have made predictions on train and test data using a trained booster. We have then calculated R2 metrics for both using the sklearn
metric method. Please make a note that the predict() method accepts numpy array, pandas dataframe, scipy sparse matrix, or h2o data
table’s frame as input for making predictions.

If you are interested in learning the list of available metrics in scikit-learn then please feel free to check our tutorial on the same.

Scikit-Learn : Model Evaluation and Scoring Metrics

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Test R2 Score : 0.67


Train R2 Score : 0.74

The predict() method of a few important parameters which can be used to make a different kind of predictions.

raw_score - It a boolean parameter which if set to True will return raw predictions. For regression problems, this won't make any
difference but for classification problems, it'll return function values as output than probabilities.
pred_leaf - This parameter accepts boolean values which if set to True will return an index of leaf in each tree that was predicted for a
particular sample. The size of the output will be n_samples x n_trees .
pred_contrib - It returns an array of features contribution for each sample. It'll return an array of size (n_features + 1) for each
sample of data where the last value is the expected value and the first n_features values are the contribution of features in making that
prediction. We can add the contribution of each feature to the last expected value and we'll get an actual prediction. It's commonly
referred to as SHAP values.

If you are interested in learning about SHAP values and our tutorial on the awesome SHAP package which lets us visualize these SHAP values
in different ways to understand the performance of the model then check our tutorial on the same.

SHAP - Explain Machine Learning Model Predictions using Game-Theoretic Approach

idxs = booster.predict(X_test, pred_leaf=True)

print("Shape : ", idxs.shape)

idxs

Shape : (127, 10)

array([[ 2, 2, 2, ..., 4, 4, 4],


[ 9, 12, 6, ..., 5, 8, 7],
[ 9, 6, 6, ..., 5, 10, 7],
...,
[ 2, 2, 2, ..., 4, 4, 4],
[13, 10, 12, ..., 13, 14, 10],
[11, 0, 8, ..., 8, 9, 13]], dtype=int32)

shap_vals = booster.predict(X_test, pred_contrib=True)

print("Shape : ", shap_vals.shape)

print("\nShap Values of 0th Sample : ", shap_vals[0])


print("\nPrediction of 0th using SHAP Values : ", shap_vals[0].sum())
print("Actual Prediction of 0th Sample : ", test_preds[0])

Shape : (127, 14)

Shap Values of 0th Sample : [ 2.83275837e-01 0.00000000e+00 1.18896249e-01 0.00000000e+00


7.28958665e-02 5.53802603e+00 -2.43603336e-02 7.18686350e-02
-2.33464487e-03 4.29395596e-02 7.31633672e-02 -1.87498941e-02
3.34226697e+00 2.24936676e+01]

Prediction of 0th using SHAP Values : 31.99155522517617


Actual Prediction of 0th Sample : 31.991555225176175

We can call the num_trees() method on the booster instance to get a number of trees in the ensemble. Please make a note that if we don't
stop training early then a number of trees will be the same as num_boost_round . But if we are stopping training early then a number of trees
will be different from num_boost_round . We have explained later in this tutorial how we can stop training if the ensemble's performance is
not improving when evaluated on the validation set.

booster.num_trees()

10

5/26
The booster instance has another important method named feature_importance() which can return us the importance of features based on
gain and split values of the trees.

booster.feature_importance(importance_type="gain")

array([ 3814.74202061, 0. , 207.29499817, 0. ,


2729.91098022, 38139.31585693, 891.23509979, 529.9323864 ,
256.73030472, 198.98090363, 657.979702 , 150.48840141,
76114.31529617])

booster.feature_importance(importance_type="split")

array([20, 0, 1, 0, 9, 33, 8, 6, 2, 4, 7, 5, 40], dtype=int32)

Binary Classification

In this section, we have explained how we can use the train() method to create a booster for a binary classification problem. We are training
the model on the breast cancer dataset and later evaluating the accuracy of it using a metric from sklearn. We have set an objective to binary for
informing the train() method that we'll be giving data for binary classification problem. We have also set the verbosity parameter value
to -1 in order to prevent training messages. It'll still print validation set evaluation results which can be turned off by setting the
verbose_eval parameter to False.

Please make a note that for classification problems predict() method of booster return probabilities. We have included logic to convert
probabilities to the target class.

LightGBM evaluates binary log loss function by default on the validation set for binary classification problems. We can give the metric
parameter in the dictionary which we are giving to the train() method with any metric names available with lightgbm and it'll evaluate that
metric. We'll later explain the list of available metrics with lightgbm.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=breast_cancer.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=breast_cancer.feature_names.tolist())

booster = lgb.train({"objective": "binary", "verbosity": -1},


train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]


train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))


print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))

Train/Test Sizes : (426, 30) (143, 30) (426,) (143,)


[1] valid_0's binary_logloss: 0.593312
[2] valid_0's binary_logloss: 0.532185
[3] valid_0's binary_logloss: 0.484191
[4] valid_0's binary_logloss: 0.442367
[5] valid_0's binary_logloss: 0.406814
[6] valid_0's binary_logloss: 0.373153
[7] valid_0's binary_logloss: 0.344765
[8] valid_0's binary_logloss: 0.320929
[9] valid_0's binary_logloss: 0.296162
[10] valid_0's binary_logloss: 0.278894

Test Accuracy Score : 0.96


Train Accuracy Score : 0.98

MultiClass Classification
As a part of this section, we have explained how we can use the train() method for multi-class classification problems. We are using it on
the wine dataset which has three different types of wine as the target variable. We have set an objective function to multiclass . We need to
provide the num_class parameter with an integer specifying a number of classes whenever we are using the method for multi-class
classification problems.

6/26
The predict() method returns the probabilities of each class in case of multi-class problems. We have included logic to select the class with
maximum probability as a prediction.

LightGBM evaluates multi-class log loss function by default on the validation set for binary classification problems.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(wine.data, wine.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=wine.feature_names)


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=wine.feature_names)

booster = lgb.train({"objective": "multiclass", "num_class":3, "verbosity": -1},


train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = np.argmax(test_preds, axis=1)


train_preds = np.argmax(train_preds, axis=1)

print("\nTest Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))


print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))

Train/Test Sizes : (133, 13) (45, 13) (133,) (45,)


[1] valid_0's multi_logloss: 1.00307
[2] valid_0's multi_logloss: 0.88303
[3] valid_0's multi_logloss: 0.787628
[4] valid_0's multi_logloss: 0.712626
[5] valid_0's multi_logloss: 0.645891
[6] valid_0's multi_logloss: 0.588107
[7] valid_0's multi_logloss: 0.540376
[8] valid_0's multi_logloss: 0.501455
[9] valid_0's multi_logloss: 0.458277
[10] valid_0's multi_logloss: 0.420889

Test Accuracy Score : 0.91


Train Accuracy Score : 0.99

List of Important Parameters of LightGBM Estimators


We'll now list down important parameters of lightgbm which can be provided in a dictionary when calling the train() method. We can
provide the same parameters to estimators (LGBMModel, LGBMRegressor, and LGBMClassifier) that are readily available in lightgbm with the
only difference that we don't need to provide them as a dictionary but we can provide them directly when creating an instance. We'll be
introducing those estimators from the next section onwards.

objective - This parameter lets us define an objective function to use for the task. The default value of this parameter is regression .
Below is a list of commonly used values for this parameter.
regression
regression_l1
tweedie
binary
multiclass
multiclassova
cross_entropy
Other available objective functions

7/26
metric - This parameter accepts metrics to be evaluated on evaluation datasets if evaluation datasets are provided as
eval_set/validation_sets parameter value. We can provide more than one metric and all will be evaluated on validation sets. Below
is a list of the commonly used values of metrics.
rmse
l2
l1
tweedie
binary_logloss
multi_logloss
auc
cross_entropy
Other available metrics
boosting - This parameter accepts one of the below-mentioned string specifying which algorithm to use.
gbdt - Default. Gradient Boosting Decision Tree
rf - Random Forest
dart - Dropouts meet multiple additive regression trees
goss - Gradient-based on side sampling
num_iterations - This parameter is an alias to num_boost_round which lets us specify the number of trees to the ensemble to create
an estimator. The default is 100 .
learning_rate - This parameter accepts a learning rate to use for the training process. The default is 0.1 .
num_class - If we are working with multi-class classification problems then we need to provide a number of classes to this parameter.
num_leaves - This parameter accepts integer specifying the number of max leaves allowed per tree. The default is 31 .
num_threads - It accepts integer specifying the number of threads to use for training. We can set it to the same number of cores of the
system.
seed - This lets us specify the default seed for training which lets us regenerate the same results.
max_depth - This parameter lets us specify the maximum depth allowed for trees in the ensemble. The default is -1 which let trees
grow as deep as possible. We can restrict this behavior by setting this parameter.
min_data_in_leaf - This parameter accepts integer value specifying a minimum number of data points that can be kept in one leaf.
This parameter can be used to control overfitting. The default value is 20 .
bagging_fraction - This parameter accepts float value between 0-1 letting us specify randomly select that much part of the data when
training. This parameter can help prevent overfitting. The default is 1.0 .
feature_fraction - This parameter accepts a float value between 0-1 that informs the algorithm to select that fraction of features from
the total for training at each iteration. The default is 1.0 hence selecting all features.
extra_trees - This parameter accepts boolean values specifying whether to use an extremely randomized tree or not.
early_stopping_round - This parameter accepts integer specifying we should stop training if evaluation metric is not improving on
last evaluation set for iterations specified by this parameter.
monotone_constraints - This parameter lets us specify whether our model should enforce increasing, decreasing, or no relation of an
individual feature with the target value. We have explained the usage of this parameter in a section named monotonic constraints.
monotone_constraints_method - This parameter accepts one of the below-mentioned string specifying the type of monotonic
constraints to impose.
basic - Basic monotone constraints method which can over constrain the model.
intermediate - It’s a little advanced constraints method which is a little less constraining than the basic method but can take a
little more time.
advanced - - It’s an advanced constraints method that is less constraining than basic and intermediate methods but can take more
time.
interaction_constraints - This parameter accepts a list of lists where individual list specify feature indices which are allowed to
interact with one another. We have explained feature interaction in detail in section feature interaction constraints.
verbosity - This parameter accepts integer value for controlling logging message when training.
<0 - Only Fatal Errors are displayed.
0 - Only Error/Warning messages are displayed.
1 - Only info messages are displayed.
>1 - Only debug information is displayed.
is_unbalance - This is a boolean parameter that should be set to True if data is imbalanced. It should be used with binary and multi-
class classification problems.
device_type - It accepts one of the below string specifying device type of training.
cpu
gpu
force_col_wise - This parameter accepts boolean value specifying whether to force column-wise histogram building when training. If
data has too many columns then setting this parameter to True will improve training process speed by reducing memory usage.
force_row_wise - This parameter accepts boolean value specifying whether to force row-wise histogram building when training. If data
has too many rows then setting this parameter to True will improve training process speed by reducing memory usage.

Please make a note that this is not the full list of parameters available with lightgbm but only a few important parameters list. If you are
interested in learning about all parameters then please feel free to check the below link.

8/26
LightGBM Full Parameters List

LGBMModel
LGBMModel class is a wrapper around Booster class that provides scikit-learn like API for training and prediction in lightgbm. It let us create
an estimator object with a list of parameters as input. We can then call the fit() method giving train data for training and the predict()
method for making a prediction. The parameters which we had given as a dictionary to params parameter of train() can now directly be
given to the constructor of LGBMModel to create a model. LGBMModel let us perform both classification and regression tasks by specifying the
objective of the task.

Regression
Below we have explained with a simple example of how we can use LGBMModel to perform regression tasks with Boston housing data. We
have first created an instance of LGBMModel with the objective as regression and number of trees set to 10. The n_estimators parameter is
an alias of num_boost_round parameter of train() method.

We have then called the fit() method for the training model giving train data to it. Please make a note that it accepts numpy arrays as input
and not lightgbm Dataset object. We have also given a dataset to be used as an evaluation set and metrics to be evaluated on the evaluation
dataset. The parameter of the fit() method is almost the same as that of the train() method.

At last, we have called the predict() method to make predictions.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric="rmse")

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's rmse: 8.50151 valid_0's l2: 72.2756
[2] valid_0's rmse: 7.82463 valid_0's l2: 61.2248
[3] valid_0's rmse: 7.22264 valid_0's l2: 52.1665
[4] valid_0's rmse: 6.72909 valid_0's l2: 45.2806
[5] valid_0's rmse: 6.29399 valid_0's l2: 39.6144
[6] valid_0's rmse: 5.90399 valid_0's l2: 34.8571
[7] valid_0's rmse: 5.58942 valid_0's l2: 31.2417
[8] valid_0's rmse: 5.3252 valid_0's l2: 28.3577
[9] valid_0's rmse: 5.07205 valid_0's l2: 25.7257
[10] valid_0's rmse: 4.82126 valid_0's l2: 23.2445

Test R2 Score : 0.72


Train R2 Score : 0.74

Binary Classification

Below we have explained with a simple example of how we can use LGBMModel for classification tasks. We have a trained model with a breast
cancer dataset. Please make a note that the predict() method returns probabilities. We have included logic to calculate class from
probabilities.

9/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]


train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))


print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))

Train/Test Sizes : (426, 30) (143, 30) (426,) (143,)


[1] valid_0's binary_logloss: 0.569994
[2] valid_0's binary_logloss: 0.511938
[3] valid_0's binary_logloss: 0.463662
[4] valid_0's binary_logloss: 0.423662
[5] valid_0's binary_logloss: 0.391412
[6] valid_0's binary_logloss: 0.361046
[7] valid_0's binary_logloss: 0.332719
[8] valid_0's binary_logloss: 0.311722
[9] valid_0's binary_logloss: 0.292474
[10] valid_0's binary_logloss: 0.270656

Test Accuracy Score : 0.92


Train Accuracy Score : 0.97

LGBMRegressor
LGBMRegressor is another wrapper estimator around the Booster class provided by lightgbm which has the same API as that of sklearn
estimators. As its name suggests, it’s designed for regression tasks. LGBMRegressor is almost the same as that of LGBMModel with the only
difference that it’s designed for only regression tasks. Below we have explained the usage of LGBMRegressor with a simple example using the
Boston housing dataset. Please make a note that LGBMRegressor provides the score() method which evaluates the R2 score for us which we
used to evaluate using the sklearn metric method till now.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMRegressor(objective="regression_l2", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric=["rmse", "l2", "l1"])

print("\nTest R2 Score : %.2f"%booster.score(X_train, Y_train))


print("Train R2 Score : %.2f"%booster.score(X_test, Y_test))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's rmse: 8.32795 valid_0's l2: 69.3548 valid_0's l1: 6.29438
[2] valid_0's rmse: 7.7053 valid_0's l2: 59.3716 valid_0's l1: 5.82674
[3] valid_0's rmse: 7.13747 valid_0's l2: 50.9434 valid_0's l1: 5.41024
[4] valid_0's rmse: 6.64829 valid_0's l2: 44.1998 valid_0's l1: 5.021
[5] valid_0's rmse: 6.17531 valid_0's l2: 38.1344 valid_0's l1: 4.68112
[6] valid_0's rmse: 5.77563 valid_0's l2: 33.3579 valid_0's l1: 4.40061
[7] valid_0's rmse: 5.44279 valid_0's l2: 29.624 valid_0's l1: 4.14437
[8] valid_0's rmse: 5.13386 valid_0's l2: 26.3566 valid_0's l1: 3.89693
[9] valid_0's rmse: 4.87077 valid_0's l2: 23.7244 valid_0's l1: 3.68527
[10] valid_0's rmse: 4.61592 valid_0's l2: 21.3067 valid_0's l1: 3.50584

Test R2 Score : 0.73


Train R2 Score : 0.74

LGBMClassifier
LGBMClassifier is one more wrapper estimator around the Booster class that provides a sklearn-like API for classification tasks. It works
exactly like LGBMModel but for only classification tasks. It also provides a score() method which evaluates the accuracy of data passed to it.

Please make a note that LGBMClassifier predicts actual class labels for the classification tasks with the predict() method. It provides the
predict_proba() method if we want probabilities of target classes.

10/26
Binary Classification
Below we have explained with a simple example of how we can use LGBMClassifier for binary classification tasks. We have explained its usage
with the Breast cancer dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMClassifier(objective="binary", n_estimators=10)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

print("\nTest Accuracy Score : %.2f"%booster.score(X_test, Y_test))


print("Train Accuracy Score : %.2f"%booster.score(X_train, Y_train))

Train/Test Sizes : (426, 30) (143, 30) (426,) (143,)


[1] valid_0's binary_logloss: 0.599368
[2] valid_0's binary_logloss: 0.536792
[3] valid_0's binary_logloss: 0.487134
[4] valid_0's binary_logloss: 0.444999
[5] valid_0's binary_logloss: 0.409009
[6] valid_0's binary_logloss: 0.377066
[7] valid_0's binary_logloss: 0.349213
[8] valid_0's binary_logloss: 0.324688
[9] valid_0's binary_logloss: 0.303217
[10] valid_0's binary_logloss: 0.284869

Test Accuracy Score : 0.92


Train Accuracy Score : 0.97

Test Accuracy Score : 0.92


Train Accuracy Score : 0.97

Multi-Class Classification
Below we have explained the usage of LGBMClassifier for multi-class classification tasks using the Wine classification dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(wine.data, wine.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMClassifier(objective="multiclassova", n_estimators=10, num_class=3)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

print("\nTest Accuracy Score : %.2f"%booster.score(X_test, Y_test))


print("Train Accuracy Score : %.2f"%booster.score(X_train, Y_train))

Train/Test Sizes : (133, 13) (45, 13) (133,) (45,)


[1] valid_0's multi_logloss: 0.923875
[2] valid_0's multi_logloss: 0.81018
[3] valid_0's multi_logloss: 0.726106
[4] valid_0's multi_logloss: 0.660671
[5] valid_0's multi_logloss: 0.594604
[6] valid_0's multi_logloss: 0.546413
[7] valid_0's multi_logloss: 0.498342
[8] valid_0's multi_logloss: 0.460875
[9] valid_0's multi_logloss: 0.421938
[10] valid_0's multi_logloss: 0.37877

Test Accuracy Score : 0.98


Train Accuracy Score : 0.99

Please make a note that LGBMModel , LGBMRegressor and LGBMClassifier provides an attribute named booster_ which returns an
instance of the Booster class which we can save to disk after training and later load for prediction.

booster.booster_

<lightgbm.basic.Booster at 0x7f21e9f69eb8>

Saving and Loading Model


We'll now explain how we can save the trained model to a disk to use later for predictions. Lightgbm provides the below-mentioned methods
for our purpose of saving and loading models.

11/26
save_model() - This method takes as an input file name to which save the model.
model_to_string() - This method returns a string representation of the model which we can then save to a text file.
lightgbm.Booster() - This constructor lets us create an instance of the Booster class. It has two important parameters that can help us
load a model from a file or from a string.
model_file - This parameter accepts the file name from which to load the trained model.
model_str - This parameter accepts a string that has information about the trained model. We need to give a string that was
generated using model_to_string() to this parameter after loading from the file.

Below we have explained with simple examples how we can use above mentioned methods to save models to a disk and then load it.

Please make a note that in order to save model trained using LGBMModel, LGBMRegressor, and LGBMClassifier, we first need to get their
Booster instance by using the booster_ attribute of an estimator and then save it. LGBMModel, LGBMRegressor, and LGBMClassifier do not
provide saving and loading functionalities. It’s only available with the Booster instance.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1},


train_set=train_dataset, valid_sets=(test_dataset,),
verbose_eval=False,
feature_name=boston.feature_names.tolist(),
num_boost_round=10)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)

Test R2 Score : 0.70


Train R2 Score : 0.74

booster.save_model("lgb.model")

<lightgbm.basic.Booster at 0x7f08e8967c50>

loaded_booster = lgb.Booster(model_file="lgb.model")

loaded_booster

<lightgbm.basic.Booster at 0x7f08e8e744a8>

from sklearn.metrics import r2_score

test_preds = loaded_booster.predict(X_test)
train_preds = loaded_booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Test R2 Score : 0.70


Train R2 Score : 0.74

model_as_str = booster.model_to_string()

with open("booster2.model", "w") as f:


f.write(model_as_str)

model_str = open("booster2.model").read()

booster_frm_str = lgb.Booster(model_str = model_str)


booster_frm_str

Finished loading model, total used 10 iterations

<lightgbm.basic.Booster at 0x7f08e8938940>

12/26
from sklearn.metrics import r2_score

test_preds = booster_frm_str.predict(X_test)
train_preds = booster_frm_str.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Test R2 Score : 0.70


Train R2 Score : 0.74

Cross Validation
Lightgbm let us perform cross-validation using cv() method. It accepts model parameters as a dictionary like the train() method. We can
then give a dataset on which to perform cross-validation. It performs 5-fold cross-validation by default. We can change the number of folds by
setting the nfold parameter. It also accepts sklearn's data splitter like KFold , StratifiedKFold , ShuffleSplit , and
StratifiedShuffleSplit . We can provide these data splitters to the folds parameter of the method.

The cv() method returns a dictionary that has information about the mean and standard deviation of loss for each round of training. We can
even ask the method to return an instance of CVBooster by setting the return_cvbooster parameter to True. CVBooster object has
information about cross-validation.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=breast_cancer.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=breast_cancer.feature_names.tolist())

lgb.cv({"objective": "binary", "verbosity": -1},


train_set=test_dataset, num_boost_round=10,
nfold=5, stratified=True, shuffle=True,
verbose_eval=True)

[1] cv_agg's binary_logloss: 0.586297 + 0.00814598


[2] cv_agg's binary_logloss: 0.536385 + 0.0139104
[3] cv_agg's binary_logloss: 0.494618 + 0.021394
[4] cv_agg's binary_logloss: 0.457766 + 0.0266986
[5] cv_agg's binary_logloss: 0.427578 + 0.0317981
[6] cv_agg's binary_logloss: 0.400594 + 0.0347366
[7] cv_agg's binary_logloss: 0.378743 + 0.0393459
[8] cv_agg's binary_logloss: 0.355944 + 0.0406613
[9] cv_agg's binary_logloss: 0.341757 + 0.0431176
[10] cv_agg's binary_logloss: 0.324393 + 0.0439941

{'binary_logloss-mean': [0.5862971048268162,
0.536385329057131,
0.4946178001035051,
0.4577660981720048,
0.42757828019512817,
0.40059432541714546,
0.3787432348470402,
0.355943799374708,
0.3417565456639551,
0.3243928378974005],
'binary_logloss-stdv': [0.008145979941642538,
0.013910430256742287,
0.02139399288171927,
0.026698647074055896,
0.0317980957740354,
0.03473655291456087,
0.039345850387526374,
0.04066125361064387,
0.04311758960643671,
0.04399410008603076]}

from sklearn.model_selection import StratifiedShuffleSplit

cv_output = lgb.cv({"objective": "binary", "verbosity": -1},


train_set=test_dataset, num_boost_round=10,
metrics=["auc", "average_precision"],
folds=StratifiedShuffleSplit(n_splits=3),
verbose_eval=True,
return_cvbooster=True)

for key, val in cv_output.items():


print("\n" + key, " : ", val)

13/26
[1] cv_agg's auc: 0.891975 + 0.0243025 cv_agg's average_precision: 0.903601 + 0.0403935
[2] cv_agg's auc: 0.947531 + 0.0218243 cv_agg's average_precision: 0.966003 + 0.0157877
[3] cv_agg's auc: 0.959877 + 0.0340906 cv_agg's average_precision: 0.97341 + 0.0230962
[4] cv_agg's auc: 0.962963 + 0.0302406 cv_agg's average_precision: 0.976702 + 0.018958
[5] cv_agg's auc: 0.969136 + 0.0314754 cv_agg's average_precision: 0.980817 + 0.0197982
[6] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[7] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[8] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[9] cv_agg's auc: 0.975309 + 0.0230967 cv_agg's average_precision: 0.985447 + 0.0135086
[10] cv_agg's auc: 0.969136 + 0.0314754 cv_agg's average_precision: 0.980817 + 0.0197982

auc-mean : [0.8919753086419754, 0.9475308641975309, 0.9598765432098766, 0.9629629629629629, 0.9691358024691358,


0.9753086419753086, 0.9753086419753086, 0.9753086419753086, 0.9753086419753086, 0.9691358024691358]

auc-stdv : [0.02430249343830806, 0.02182428336995516, 0.034090620423417484, 0.0302406141084343, 0.031475429096251756,


0.023096650535641628, 0.023096650535641628, 0.023096650535641628, 0.023096650535641628, 0.031475429096251756]

average_precision-mean : [0.9036008230452675, 0.9660026187803966, 0.9734100261878039, 0.9767022072577629, 0.9808174335952113,


0.9854470632248411, 0.9854470632248411, 0.9854470632248411, 0.9854470632248411, 0.9808174335952113]

average_precision-stdv : [0.04039346734018979, 0.0157876653573454, 0.02309624528782449, 0.01895799108192039, 0.01979815600144299,


0.013508585022458419, 0.013508585022458419, 0.013508585022458419, 0.013508585022458419, 0.01979815600144299]

cvbooster : <lightgbm.engine.CVBooster object at 0x7f21e9693518>

cvbooster = cv_output['cvbooster']

cvbooster.boosters

[<lightgbm.basic.Booster at 0x7f21e96937b8>,
<lightgbm.basic.Booster at 0x7f21e90dfc88>,
<lightgbm.basic.Booster at 0x7f21e9693240>]

Plotting Functionality
Lightgbm provides a list of the below-mentioned plotting functions.

plot_importance()

This method accepts a booster instance and plots feature importance using it. Below we have created a feature importance plot using the
booster trained earlier for the regression task. The method has a parameter named importance_type which can be set to string split will
plot the number of times feature was used for split and plots gains of splits if set to string gain . The value of parameter importance_type is
split . The plot_importance() method has another important parameter max_num_features which accepts an integer specifying how
many features to include in the plot. We can limit the number of features using this parameter as it'll include only that many top features in the
plot.

lgb.plot_importance(booster, figsize=(8,6));

plot_metric()

This method plots the results of an evaluation metric. We need to give a booster instance to the method in order to plot an evaluation metric
evaluated on the evaluation dataset.

14/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,)

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", eval_names = ["Validation Set"],
feature_name=boston.feature_names.tolist()
)

lgb.plot_metric(booster, figsize=(8,6));

lgb.plot_metric(booster, metric="rmse", figsize=


(8,6));

plot_split_value_histogram()

This method takes as input booster instance and feature name/index. It then plots a split value histogram for the feature.

lgb.plot_split_value_histogram(booster, feature="LSTAT", figsize=(8,6));

15/26
plot_tree()

This method lets us plot the individual tree of the ensemble. We need to give a booster instance and index of the tree which we want to plot to
it.

lgb.plot_tree(booster, tree_index = 1, figsize=(20,12));

Early Stopping Training


Early stopping training is a process where we stop training if the evaluation metric evaluated on the evaluation dataset is not improving for a
specified number of rounds. Lightgbm provides parameter named early_stopping_rounds as a part of train() method as well as fit()
method of lightgbm sklearn-like estimators. This parameter accepts integer value specifying that stop the training process if the evaluation
metric result has not improved for that many rounds.

Please make a note that we need an evaluation dataset in order for this to work as it’s based on evaluation metric results evaluated on the
evaluation dataset.

Below we have explained the usage of the parameter early_stopping_rounds for regression and classification tasks with simple examples.

16/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"},


train_set=train_dataset, valid_sets=(test_dataset,),
early_stopping_rounds=5,
num_boost_round=100)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

17/26
Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)
[1] valid_0's rmse: 8.82485
Training until validation scores don't improve for 5 rounds
[2] valid_0's rmse: 8.09497
[3] valid_0's rmse: 7.46686
[4] valid_0's rmse: 6.90991
[5] valid_0's rmse: 6.4172
[6] valid_0's rmse: 5.99212
[7] valid_0's rmse: 5.62928
[8] valid_0's rmse: 5.30155
[9] valid_0's rmse: 5.05191
[10] valid_0's rmse: 4.84863
[11] valid_0's rmse: 4.63474
[12] valid_0's rmse: 4.44933
[13] valid_0's rmse: 4.28644
[14] valid_0's rmse: 4.15939
[15] valid_0's rmse: 4.01791
[16] valid_0's rmse: 3.92719
[17] valid_0's rmse: 3.82892
[18] valid_0's rmse: 3.77695
[19] valid_0's rmse: 3.69585
[20] valid_0's rmse: 3.64548
[21] valid_0's rmse: 3.58403
[22] valid_0's rmse: 3.54853
[23] valid_0's rmse: 3.51134
[24] valid_0's rmse: 3.4976
[25] valid_0's rmse: 3.45016
[26] valid_0's rmse: 3.42836
[27] valid_0's rmse: 3.41483
[28] valid_0's rmse: 3.40661
[29] valid_0's rmse: 3.39959
[30] valid_0's rmse: 3.38903
[31] valid_0's rmse: 3.37894
[32] valid_0's rmse: 3.35784
[33] valid_0's rmse: 3.37572
[34] valid_0's rmse: 3.3732
[35] valid_0's rmse: 3.35426
[36] valid_0's rmse: 3.35484
[37] valid_0's rmse: 3.34265
[38] valid_0's rmse: 3.33666
[39] valid_0's rmse: 3.33256
[40] valid_0's rmse: 3.33374
[41] valid_0's rmse: 3.32778
[42] valid_0's rmse: 3.33335
[43] valid_0's rmse: 3.33888
[44] valid_0's rmse: 3.34715
[45] valid_0's rmse: 3.32557
[46] valid_0's rmse: 3.34178
[47] valid_0's rmse: 3.3474
[48] valid_0's rmse: 3.33983
[49] valid_0's rmse: 3.33105
[50] valid_0's rmse: 3.3198
[51] valid_0's rmse: 3.31533
[52] valid_0's rmse: 3.31672
[53] valid_0's rmse: 3.32232
[54] valid_0's rmse: 3.3158
[55] valid_0's rmse: 3.31626
[56] valid_0's rmse: 3.32085
Early stopping, best iteration is:
[51] valid_0's rmse: 3.31533

Test R2 Score : 0.88


Train R2 Score : 0.95

18/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
early_stopping_rounds=3)

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]


train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))


print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))

Train/Test Sizes : (426, 30) (143, 30) (426,) (143,)


[1] valid_0's auc: 0.986129
Training until validation scores don't improve for 3 rounds
[2] valid_0's auc: 0.989355
[3] valid_0's auc: 0.988925
[4] valid_0's auc: 0.987097
[5] valid_0's auc: 0.990108
[6] valid_0's auc: 0.993011
[7] valid_0's auc: 0.993011
[8] valid_0's auc: 0.993441
[9] valid_0's auc: 0.993441
[10] valid_0's auc: 0.994194
[11] valid_0's auc: 0.994194
[12] valid_0's auc: 0.994194
[13] valid_0's auc: 0.994409
[14] valid_0's auc: 0.995914
[15] valid_0's auc: 0.996129
[16] valid_0's auc: 0.996989
[17] valid_0's auc: 0.996989
[18] valid_0's auc: 0.996344
[19] valid_0's auc: 0.997204
[20] valid_0's auc: 0.997419
[21] valid_0's auc: 0.997849
[22] valid_0's auc: 0.998065
[23] valid_0's auc: 0.997849
[24] valid_0's auc: 0.998065
[25] valid_0's auc: 0.997634
Early stopping, best iteration is:
[22] valid_0's auc: 0.998065

Test Accuracy Score : 0.97


Train Accuracy Score : 0.98

Lightgbm provides early stopping training functionality using the early_stopping() callback function as well. We can give number of
rounds to early_stopping() function and give that function to callbacks parameter of train()/fit() method. We have explained
callbacks in an upcoming section.

19/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
callbacks=[lgb.early_stopping(3)]
)

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]


train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))


print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))

Train/Test Sizes : (426, 30) (143, 30) (426,) (143,)


[1] valid_0's auc: 0.954328
Training until validation scores don't improve for 3 rounds
[2] valid_0's auc: 0.959322
[3] valid_0's auc: 0.982938
[4] valid_0's auc: 0.988244
[5] valid_0's auc: 0.987203
[6] valid_0's auc: 0.98762
[7] valid_0's auc: 0.98814
Early stopping, best iteration is:
[4] valid_0's auc: 0.988244

Test Accuracy Score : 0.94


Train Accuracy Score : 0.95

Feature Interaction Constraints


When lightgbm has completed training trees of the ensemble on a dataset, the individual node of trees represents some condition based on
some value of the feature. When we are making predictions using an individual tree, we start from the root node of the tree, checking the
feature condition specified in the node with our sample feature values. We make decisions based on the feature values in our sample and the
condition present in the tree. This way we follow a particular path reaching the leaf of the tree to make the final prediction. By default, there is
no restriction on which node can have which feature as a condition. This process of making a final decision by going through nodes of tree
checking feature condition is called feature interaction because predictor has come to the particular node after evaluating the condition of the
previous node. Lightgbm can let us define restrictions on which feature to interact with which another feature. We can give a list of indices and
only that many features will interact with one another. Those features won't be allowed to interact with other features and this restriction will
be forced when creating trees during the training process.

Below we have explained with a simple example of how we can force feature interaction constraint on estimator in lightgbm. Lighgbm
estimators provide a parameter named interaction_constraints which accepts a list of lists where individual lists are indices of
parameters that are allowed to interact with one another.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse",


'interaction_constraints':[[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]]},
train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

20/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)

[1] valid_0's rmse: 7.50225


[2] valid_0's rmse: 7.01989
[3] valid_0's rmse: 6.58246
[4] valid_0's rmse: 6.18581
[5] valid_0's rmse: 5.83873
[6] valid_0's rmse: 5.47166
[7] valid_0's rmse: 5.19667
[8] valid_0's rmse: 4.96259
[9] valid_0's rmse: 4.69168
[10] valid_0's rmse: 4.51653

Test R2 Score : 0.67


Train R2 Score : 0.69

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,


interaction_constraints = [[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]])

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's rmse: 8.97871 valid_0's l2: 80.6173
[2] valid_0's rmse: 8.35545 valid_0's l2: 69.8135
[3] valid_0's rmse: 7.93432 valid_0's l2: 62.9535
[4] valid_0's rmse: 7.61104 valid_0's l2: 57.9279
[5] valid_0's rmse: 7.16832 valid_0's l2: 51.3849
[6] valid_0's rmse: 6.93182 valid_0's l2: 48.0501
[7] valid_0's rmse: 6.57728 valid_0's l2: 43.2606
[8] valid_0's rmse: 6.41497 valid_0's l2: 41.1518
[9] valid_0's rmse: 6.13983 valid_0's l2: 37.6976
[10] valid_0's rmse: 5.9864 valid_0's l2: 35.837

Test R2 Score : 0.60


Train R2 Score : 0.69

Monotonic Constraints
Lightgbm let us specify monotonic constraints on a model that specifies whether the individual feature has increasing, decreasing, or no
relation with the target value. It let us specify monotone values of -1, 0, and 1 forcing model to impose decreasing, none, and increasing
relationship of the feature with the target. We can provide a list with the same length as a number of features specifying 1,0 or -1 for the
monotonic relationship by using the monotone_constraints parameter. We have explained below with a simple example of how we can
enforce monotonic constraints in lightgbm.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse",


'monotone_constraints':(1,0,1,-1,1,0,1,0,-1,1,1, -1, 1)},
train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

21/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)

[1] valid_0's rmse: 7.50077


[2] valid_0's rmse: 7.01013
[3] valid_0's rmse: 6.57254
[4] valid_0's rmse: 6.19802
[5] valid_0's rmse: 5.8771
[6] valid_0's rmse: 5.59538
[7] valid_0's rmse: 5.35168
[8] valid_0's rmse: 5.15228
[9] valid_0's rmse: 4.95664
[10] valid_0's rmse: 4.81777

Test R2 Score : 0.63


Train R2 Score : 0.63

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,


monotone_constraints = (1,0,1,-1,1,0,1,0,-1,1,1, -1, 1))

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's rmse: 8.87332 valid_0's l2: 78.7359
[2] valid_0's rmse: 8.37389 valid_0's l2: 70.122
[3] valid_0's rmse: 7.89759 valid_0's l2: 62.3719
[4] valid_0's rmse: 7.51069 valid_0's l2: 56.4105
[5] valid_0's rmse: 7.18851 valid_0's l2: 51.6747
[6] valid_0's rmse: 6.90391 valid_0's l2: 47.664
[7] valid_0's rmse: 6.66775 valid_0's l2: 44.4589
[8] valid_0's rmse: 6.46139 valid_0's l2: 41.7495
[9] valid_0's rmse: 6.27545 valid_0's l2: 39.3813
[10] valid_0's rmse: 6.12082 valid_0's l2: 37.4644

Test R2 Score : 0.58


Train R2 Score : 0.62

Custom Objective/Loss Function


Lightgbm let us define custom objective function as well. We need to define a function that takes a list of prediction and actual labels as input
and returns the first derivative and second derivative of the loss function. We need to return the first derivative and the second derivative of
loss function evaluated using predictions and actual values. We can give a custom-defined objective/loss function to the objective
parameter of the estimator. If we are using the train() method then we need to give this function to the fobj parameter.

Below we have designed the mean squared error objective function. We have then given this function to an objective parameter of LGBMModel
for an explanation.

def first_grad(predt, dmat):


'''Compute the first derivative for mean squared error.'''
y = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
return 2*(y-predt)

def second_grad(predt, dmat):


'''Compute the second derivative for mean squared error.'''
y = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
return [1] * len(predt)

def mean_sqaured_error(predt, dmat):


''''Mean squared error function.'''
predt[predt < -1] = -1 + 1e-6
grad = first_grad(predt, dmat)
hess = second_grad(predt, dmat)
return grad, hess

22/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric="rmse")

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's rmse: 19.3349
[2] valid_0's rmse: 15.5417
[3] valid_0's rmse: 12.5873
[4] valid_0's rmse: 10.2379
[5] valid_0's rmse: 8.43293
[6] valid_0's rmse: 7.08919
[7] valid_0's rmse: 6.09021
[8] valid_0's rmse: 5.39551
[9] valid_0's rmse: 4.88447
[10] valid_0's rmse: 4.59251

Test R2 Score : 0.75


Train R2 Score : 0.83

Custom Evaluation Function


Lightgbm lets us define our own evaluation metric if we don't want to use evaluation metrics available with lightgbm. We need to define a
function that takes an input list of predictions and actual target values and returns a string specifying metric name, metric evaluation value,
and boolean value specifying whether higher is better or not. The value higher is better should be returned True if we want the metric value to
be maximized else it should be False ft we want the metric value to be minimized.

We need to give reference to this function as the value of parameter feval if we are using train() method to design our estimator. If we
are using a sklearn-like estimator then we need to give this function to the eval_metric parameter of the fit() method.

Below we have explained with simple examples of how we can use custom evaluation metrics with lightgbm.

def mean_absolute_error(preds, dmat):


actuals = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
err = (actuals - preds).sum()
is_higher_better = False
return "MAE", err, is_higher_better

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())


test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"},


feval=mean_absolute_error,
train_set=train_dataset, valid_sets=(test_dataset,),
num_boost_round=10)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

23/26
Train/Test Sizes : (455, 13) (51, 13) (455,) (51,)

[1] valid_0's rmse: 7.40798 valid_0's MAE: -74.3941


[2] valid_0's rmse: 6.83504 valid_0's MAE: -68.5244
[3] valid_0's rmse: 6.32897 valid_0's MAE: -63.4968
[4] valid_0's rmse: 5.90259 valid_0's MAE: -59.304
[5] valid_0's rmse: 5.53393 valid_0's MAE: -55.712
[6] valid_0's rmse: 5.17631 valid_0's MAE: -52.3329
[7] valid_0's rmse: 4.87576 valid_0's MAE: -48.2586
[8] valid_0's rmse: 4.62314 valid_0's MAE: -46.1631
[9] valid_0's rmse: 4.38363 valid_0's MAE: -41.8425
[10] valid_0's rmse: 4.2398 valid_0's MAE: -39.1324

Test R2 Score : 0.71


Train R2 Score : 0.76

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric=mean_absolute_error)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[1] valid_0's MAE: -2230.03
[2] valid_0's MAE: -1775.07
[3] valid_0's MAE: -1413.3
[4] valid_0's MAE: -1127.13
[5] valid_0's MAE: -900.256
[6] valid_0's MAE: -719.848
[7] valid_0's MAE: -572.454
[8] valid_0's MAE: -449.162
[9] valid_0's MAE: -357.412
[10] valid_0's MAE: -281.703

Test R2 Score : 0.66


Train R2 Score : 0.82

Callbacks
Lightgbm provides users with a list of callback functions for a different purpose that gets executed after each iteration of training. Below is a list
of available callback functions with lightgbm:

early_stopping(stopping_rounds) - This callback function accepts an integer specifying whether to stop training if evaluation
metric results on the last evaluation set are not improved for that many iterations.
print_evaluation(period, show_stdv) - This callback function accepts integer values specifying how often to print evaluation
results. Evaluation metric results are printed at every that many iterations as specified.
record_evaluation(eval_result) - This callback function accepts a dictionary in which evaluation results will be recorded.
reset_parameter() - This callback function lets us reset the learning rate after each iteration of training. It accepts an array of size the
same as the number of iterations or callback returning the new learning rate for each iteration.

The callbacks parameter which is available with the train() method and the fit() method of estimators accepts a list of callback
functions.

Below we have explained with simple examples of how we can use different callback functions. The explanation of the early_stopping()
callback function has been covered in the early stopping training section of this tutorial.

24/26
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
callbacks=[lgb.callback.print_evaluation(period=3)])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[3] valid_0's rmse: 12.1433
[6] valid_0's rmse: 6.86157
[9] valid_0's rmse: 4.37858

Test R2 Score : 0.79


Train R2 Score : 0.80

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

evals_results = {}

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
callbacks=[lgb.print_evaluation(period=3), lgb.record_evaluation(evals_results)])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
print("Evaluation Results : ", evals_results)

Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)


[3] valid_0's rmse: 12.8003
[6] valid_0's rmse: 7.40552
[9] valid_0's rmse: 5.11615

Test R2 Score : 0.67


Train R2 Score : 0.82
Evaluation Results : {'valid_0': OrderedDict([('rmse', [19.235743778402917, 15.611391428644854, 12.800304472773783,
10.469162299753663, 8.715414846943654, 7.405524963318977, 6.417956121607763, 5.66002020770034, 5.116147011782366,
4.786323495504935])])}

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),], eval_metric="rmse",
callbacks=[lgb.reset_parameter(learning_rate=np.linspace(0.1,1,10).tolist())])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest R2 Score : %.2f"%r2_score(Y_test, test_preds))


print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))

25/26
Train/Test Sizes : (379, 13) (127, 13) (379,) (127,)
[1] valid_0's rmse: 19.224
[2] valid_0's rmse: 12.167
[3] valid_0's rmse: 6.42527
[4] valid_0's rmse: 4.44198
[5] valid_0's rmse: 4.22668
[6] valid_0's rmse: 4.43308
[7] valid_0's rmse: 4.29187
[8] valid_0's rmse: 4.47696
[9] valid_0's rmse: 4.5301
[10] valid_0's rmse: 4.64636

Test R2 Score : 0.73


Train R2 Score : 0.95

This ends our small tutorial explaining the API of LightGBM. Please feel free to let us know your views in the comments section.

26/26

You might also like