You are on page 1of 12

Journal Pre-proof

Explainable Machine Learning in Identifying Credit Card Defaulters

Tanmay Srinath , Gururaja H S

PII: S2666-285X(22)00061-9
DOI: https://doi.org/10.1016/j.gltp.2022.04.025
Reference: GLTP 155

To appear in: Global Transitions Proceedings

Please cite this article as: Tanmay Srinath , Gururaja H S , Explainable Machine Learn-
ing in Identifying Credit Card Defaulters, Global Transitions Proceedings (2022), doi:
https://doi.org/10.1016/j.gltp.2022.04.025

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2022 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co.
Ltd.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Global Transitions
ScienceDirect Proceedings
http://www.keaipublishing.co
Procedia Manufacturing 00 (2019) 000–000 m/en/journals/global-transiti
ons-proceedings/

Explainable Machine Learning in Identifying Credit Card Defaulters

Tanmay Srinatha,*, Gururaja H Sa


a
Department of Information Science and Engineering, B.M.S.College of Engineering, Bangalore, tanmaysrinath00@gmail.com

* Corresponding author. E-mail address: tanmaysrinath00@gmail.com

Abstract

Machine learning is fast becoming one of the central solutions to various real-world problems. Thanks to powerful hardware and large datasets,
training a machine learning model has become easier and more rewarding. However, an inherent problem in various machine learning models
is a lack of understanding of what goes on ’under the hood’. A lack of explainability and interpretability leads to lower levels of trust in the
model’s predictions, which means it can’t be used in sensitive applications like diagnosing medical ailments and detecting terrorism. This has
led to various advances in making machine learning explainable. In this paper various black-box models are used to classify credit card
defaulters. These models are compared using different performance metrics, and explanations of these models are provided using a
model-agnostic explainer. Finally, the best model-explainer combo is proposed with potential areas of future exploration.

Keywords:Machine Learning; ensemble learning; explainability; interpretability; DALEX

1. Introduction

Machine learning is one of the hottest research areas in the world right now. Lying at the intersection of computer science
and statistics, it is benefiting from a rapid improvement in hardware and algorithms [1]. However, while machine learning
models are powerful, they don’t give out any information about how they came to their conclusions [2]. This has led to various
algorithms being developed that attempt to explain the inner workings of a black-box model, and the application of those
algorithms in various fields [3, 4, 5].

One of the biggest challenges faced by the banking sector is the assessment of credit risk[6,7]. Recent studies mostly
focus on enhancing the classifier performance for credit card default prediction rather than an interpretable model [8]. Here the
target is to balance both accuracy and explain-ability using an array of black-box models and a corresponding model-agnostic
explainer [9].

This paper is organized as follows: in Section 2 some of the related work in the field is explained. Section 3 provides
some background on the methods used, including their mathematical underpinnings. Section 4 compares various models in terms
of performance metrics and explanations. Finally, in Section 5 conclusions are stated with some potential areas of future work.

2. Related Work

While there have been a lot of advances in machine learning and artificial intelligence algorithms at the turn of the century,
attempting to make those models understandable is a relatively newer field, which explains the many divergent found [10]. Deep
Neural Networks have received the most attention, as they have a wide range of applications, and there have been survey papers
dedicated to reviewing the most common methods to explain these complex models [11,12].
The applications of explainable ML stretch out in many directions. Be it dissecting the chess moves of the super-strong Alpha
Zero [13], detection of money-laundering [14], cyber security [15], vibration signal analysis to detect faults [16], solar power
generation [17], traffic classification [18], operation of unmanned aerial vehicles [19]or various sub-domains of medical sciences
[20,21,22], understanding how a black-box model converges to an optimal solution is receiving increased attention.

3. Methods

This section provides succinct explanations on each of the models used, along with their mathematical underpinnings.
Author name / Procedia Manufacturing 00 (2019) 000–000 2

3.1. Support Vector Machine


The support vector machine (SVM) [23] is a generalization of the maximal margin classifier [24,25]. SVM works by
using kernels to enlarge the feature space in a specific way. Equation 1 provides the formula for a linear SVM kernel.

( ) ∑

(1)

Most SVM models trained today use non-linear kernels. Equation 2 provides the formula for the radial kernel, one of the most
popular kernel choices.

( ) ( ∑( ))

(2)

Figure 1 is an example of a radial SVM applied on the iris dataset[26].

Figure 1: SVM classification plot on the iris dataset

3.2. Random Forest

Random forest (RF) [27,28] is a powerful ensemble learning algorithm that can be used for both classification and regression.
Random forests provide an improvement over bagged trees by decorrelating the trees [29]. Equation 3 defines the margin function
for a classifier’s ensemble ( ) ( ) ( ), and a training set drawn randomly from the random vector distributions.
( ) ( ( ) ) ( ( ) )
(3)

Here I(.) is the indicator function. Equation 4 provides the generalisation error for the model.

( ( ) )
(4)
Author name / Procedia Manufacturing 00 (2019) 000–000 3

3.3.XGBoost

XGBoost (XGB) is a scalable machine learning system for tree boosting. It is one of the most widely used tree-boosting
algorithms and is a part of many winning solutions on different Kaggle contests[30]. The most important reason for XGBoost’s
success is its scalability, obtained due to various system and algorithmic optimisations[31].

3.4. Neural Networks

A neural network (NN) is a black box model that attempts to model the human brain’s thought process. It takes an n-variable
input vector and constructs a non-linear function ( ) to figure out the dependent variable [32].
Compared to other non-linear models, what distinguishes a neural network is its specific structure - a simple feed-forward neural
network for example can consist of an input layer, one or many hidden layers and an output layer [33].

Equation 5 represents the basic form of the neural network.

( ) ∑ ( )

(5)

Equation 6 provides the expanded form of the basic equation provided in equation 5.

( ) ∑ ( ∑ )

(6)

The computations are performed in two steps. First, as shown in Equation 7, the k activators are computed as functions of
the input features.

( ) ( ∑ )

(7)

These K activations from the hidden layer are then fed into the output layer as shown in Equation 8.

( ) ∑

(8)

3.5. DALEX

DALEX is a consistent collection of explainers for predictive models. Presented approaches are model-agnostic, what
means that they extract useful information from any predictive method despite its internal structure [34]. After trying out
numerous explainer models, DALEX was chosen for its high speed of computation and ease of use.

The proposed model/explainer combo is ‘XGBoost with DALEX’. It will aim to combine the best performance metric and
understandable logic behind its decisions. Experimentally it was determined that XGBoost was the most suitable models for this
task. This result supersedes previous results that showed that Random Forest models performed the best on this dataset [35]. The
flowchart for determining the final model/explainer combo is provided in Figure 2.
Author name / Procedia Manufacturing 00 (2019) 000–000 4

Figure 2: Flowchart for the Proposed Model


Author name / Procedia Manufacturing 00 (2019) 000–000 5

4. Results and Discussion

4.1 Data Analysis

The dataset that was used is sourced from the UCI Machine Learning Laboratory. It contains information about credit card
defaults of various customers [34]. The dataset was cleaned, preprocessed and split into training and testing (80:20 train-test
split) components before further analysis. After training and testing the 4 models, the following metrics were obtained, which are
displayed in Table 1.

Table 1: Performance Metrics [33]

Model Accuracy Sensitivity Specificity


XGB 0.7978 0.8429 0.7057
(Proposed
Model)
SVM 0.7916 0.8410 0.6906
RF 0.7928 0.8521 0.6717
NN 0.7531 0.8281 0.6000

The results show that XGBoost outperforms all the other models in both accuracy and specificity. SVM is perhaps better suited
to this task than Random Forest, because it is accurately able to detect more defaulters than Random Forest (higher specificity),
even if it has a slightly lower accuracy and sensitivity. The neural net [35] is the worst performer, which seems to indicate that it
is not suitable for this specific task.
4.2Model Explanations – Variable Importance
This is a measure of how much a particular variable contributed to the prediction of the dependent variable. If the shaded blue
region exists to a greater extent on the left side of the graph, it indicates that the variable is better off being removed from the
dataset.
XGBoost: The variables sex and marriage, along with the payment made in the 6th month seem to play no role in the
prediction. The model performs better when information like delay in the 1st month’s payment, which is a very insignificant
variable, is removed. What is also interesting to note is that the age of a person is less important than his outstanding bill amounts
when predicting if he will default his payment. This goes to show that the models consider higher amounts to be better indications
of a payment default, rather than older people with lesser incomes. The details of all the variables are shown in Figure 3.

Figure 3: Variable importance plot for XGBoost

SVM: The variables age, sex, education and marriage seem to be important according to the SVM model. Payments in the
later months (specifically the 5th and 6th months) are also considered important variables for prediction. The SVM model also
seems to value the limit_bal variable more than other models. The conclusion that can be drawn is that later in the year, when
payments tend to pile up, the ability of a person to pay off all or most his outstanding amounts essentially determines whether he
will default his payments or not. The details of all the variables are shown in Figure 4.
Author name / Procedia Manufacturing 00 (2019) 000–000 6

Figure 4: Variable importance plot for SVM

Random Forest: Quite interestingly, the Random Forest learners consider the payment delays in the last 3 months to be the
best indicators of credit card defaults. The model also considers age to be one of the least important variables. It can thus be
inferred that the delay in payment in later months, when there can be a lot of accumulation of unpaid bills, strongly indicates that
a person will default on his payments, irrespective of his age. The details of all the variables are shown in Figure 4.

Figure 3: Variable importance plot for RF

Neural Network: As the worst performing model of the 4, it is easy to understand why the neural network provides more
inaccurate predictions - it is unable to clearly distinguish variables with the most discriminative power. The details of all the
variables are shown in Figure 5.

Figure 4: Variable importance plot for NN

4.3. Model Explanations – Partial Dependence

This computes the dependence of the final prediction on a particular variable. 4 sets of variables are examined - personal
information, bill amounts, paid amounts and delay in payments.

Personal information: This concerns parameters unique to a person like age, educational background, marital status and
Author name / Procedia Manufacturing 00 (2019) 000–000 7

sex. The partial dependence plots for these 4 variables are provided in figure 7.

Figure 5: Partial dependence profile for personal information

It can be observed that the ensemble learning models XGBoost and Random Forest predict a larger number of defaults for
people over the age of 40. This can be attributed to the common phenomenon ’mid-life crisis’ that many middle-aged adults
undergo. It can also be attributed to the higher health-care costs that come with a stressful work-life balance. The SVM model, on
the other hand has a very slightly increasing trend upto the age of 50 years, and then seems to predict a lower number of defaults
as a person crosses the age of 60 years. This has a potential explanation of a lower number of expensive payments that retired
individuals need to make.
With regards to education, it can be noticed that both XGBoost and Random Forest assess that a graduate level of
education leads to higher chances of defaults in payment, a trend that is approximated by the neural network as well. This is
because graduate degrees are expensive and lead to expensive long-term education loans that put an additional burden on a
person’s income. The SVM model also agrees with the trend that a person who hasn’t studied at higher levels tends to have lesser
financial burdens and hence has a lower chance of defaulting payments.
There seems to be no real impact that marriage or sex has on the final prediction according to the ensemble learning models.
The SVM model very slightly lowers the risk of defaulting when the defaulter is female, while the neural network agrees with the
trend but increases the drop in prediction.

Bill Amounts: This concerns the amounts due on a monthly basis during a 6-month time frame. The partial dependence
plots for these 6 variables are provided in figure 8.
Author name / Procedia Manufacturing 00 (2019) 000–000 8

(a)

(b)

(c)

Figure 6: Partial dependence profile for bill amounts, (a) – Month 1 and 2, (b) – Month 3 and 4, (c) – Month 5 and 6

Three models seem to have similar dependencies on all the 6 variables, with a particular emphasis on bill amounts during
month 4, month 5 and month 6, which correspond to the variable importance plots provided previously. It can be observed that
the neural network has very surprising trends for all the 6 bill amounts. It seems to increase the chances of default based on
increasing amounts in bills 1, 2, 4 and 5 while decreasing the risk in bills 3 and 6. The increase in 1 and 2 coupled with the
decrease in 6 seem to indicate one of two things - either the neural network has understood the data at a much higher level than
the other models, or it has failed to grasp the most essential features of the dataset. Considering its poorer results, the possibility
that the latter is true seems to be higher.
Author name / Procedia Manufacturing 00 (2019) 000–000 9

Paid Amounts: This concerns the actual amounts that were paid during the 6-month time frame. The partial dependence
plots for these 6 variables are provided in figure 9.

(a)

(b)

(c)

Figure 7: Partial dependence profile for paid amounts, (a) – Months 1 and 2, (b) – Months 3 and 4, (c) – Months 5 and 6.

Once again, three models more or less agree with each other on the trends of the final prediction given increasing amounts
of money paid. The neural network’s trend in the paid amounts during the 6th month seem to provide an explanation for its poor
results, as the sharp increase in defaults corresponding with increasing amounts of money paid doesn’t seem to make logical
sense.

Payment Delays: These variables concern the specific delay durations for the 6 payments covered in the previous two
plots. The partial dependence plots for these 6 variables are provided in figure 10.

This plot seems to show that the tree-based algorithms share similar trends for months 4,5 and 6. The dips in prediction
after a longer time-frame in each of the models during the 1st 3 months seem to indicate that the customers who delayed for a
Author name / Procedia Manufacturing 00 (2019) 000–000 10

longer time seemed to pay back the money in bulk. Compared to this, in later months the predictions remained steady after an
initial rise for the tree-based algorithms, while decreasing ever so slightly for the SVM model in months 4 and 5. The neural
network did not place much significance on this feature set.

Figure 8: Partial dependence profile for payment delays.

4. CONCLUSION AND FUTURE WORK

In this paper, various machine learning models to classify credit card defaulters were investigated. Attempts were made to
explain these results with the help of the model-agnostic tool DALEX. Experimental results seem to indicate that the tree-based
‘XGBoost with DALEX’ model provides the best results, with the Random Forest and SVM model not far behind. However,
where XGBoost distinguishes itself is in its surprisingly human-like decision making. This model uses an understandable chain
of logic in making its decisions, which can help enhance human thought processes. Hence, the XGBoost tree-boosting algorithm
along with DALEX is being proposed as the solution to identify credit card defaulters.

In the future, it might make more sense for the model itself to provide explanations by keeping track of various
calculations that it performs. It will also be useful to generalize libraries like DALEX to a greater range of models and further
fine-tune the attempts to look ’under the hood’. The field of explainable machine learning promises to remain an exciting ground
for new breakthroughs in the coming years.

References

[1] Michael I Jordan and Tom M Mitchell. Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260, 2015.
[2] Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue,
16(3):31–57, 2018.
[3] Subramani, P., & BD, P. (2021). Prediction of muscular paralysis disease based on hybrid feature extraction with machine learning technique for COVID-19 and
post-COVID-19 patients. Personal and ubiquitous computing, 1-14.
[4] Luca Brunese, Francesco Mercaldo, Alfonso Reginelli, and Antonella Santone. Explainable deep learning for pulmonary disease and coronavirus covid-19
detection from x-rays. Computer Methods and Programs in Biomedicine,11 196:105608, 2020.
[5] Le, N. T., Wang, J. W., Le, D. H., Wang, C. C., & Nguyen, T. N. (2020). Fingerprint enhancement based on tensor of wavelet subbands for classification. IEEE Access, 8,
6602-6615.
[6] IsminiPsychoula, Andreas Gutmann, Pradip Mainali, SH Lee, Paul Dunphy, and Fabien AP Petitcolas. Explainable machine learning for fraud detection.
arXiv preprint arXiv:2105.06314, 2021.
[7] Yu, K., Tan, L., Lin, L., Cheng, X., Yi, Z., & Sato, T. (2021). Deep-learning-empowered breast cancer auxiliary diagnosis for 5GB remote E-health. IEEE Wireless
Communications, 28(3), 54-61.
[8] Shubham Rathi. Generating counterfactual and contrastive explanations using shap, 2019.
[9] Talha Mahboob Alam, Kamran Shaukat, Ibrahim A. Hameed, Suhuai Luo, Muhammad Umer Sarwar, Shakir Shabbir, Jiaming Li, and Matloob Khushi.
An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8:201173–201198, 2020.
[10] Ning Xie, Gabrielle Ras, Marcel van Gerven, and Derek Doran. Explainable deep learning: A field guide for the uninitiated. arXiv preprint
arXiv:2004.14545, 2020.
[11] Wojciech Samek, Gr´egoireMontavon, Sebastian Lapuschkin, Christopher J Anders, and Klaus-Robert Muller. Explaining deep neural networks and
beyond: A review of methods and applications. Proceedings of the IEEE, 37 109(3):247–278, 2021.
[12] Tjoa, E. and Guan, C., 2020. A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE Transactions on Neural Networks and
Learning Systems.
[13] Thomas McGrath, Andrei Kapishnikov, NenadTomaˇsev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and Vladimir Kramnik. Acquisition of
chess knowledge in alphazero. arXiv preprint arXiv:2111.09259, 2021.
Author name / Procedia Manufacturing 00 (2019) 000–000 11

[14] Dattatray V Kute, Biswajeet Pradhan, Nagesh Shukla, and Abdullah Alamri. Deep learning and explainable artificial intelligence techniques applied for
detecting money laundering–a critical review. IEEE Access, 2021.
[15] Kuppa, A. and Le-Khac, N.A., 2021. Adversarial xai methods in cybersecurity. IEEE Transactions on Information Forensics and Security, 16,
pp.4924-4938.
[16] Kuppa, A. and Le-Khac, N.A., 2021. Adversarial xai methods in cybersecurity. IEEE Transactions on Information Forensics and Security, 16,
pp.4924-4938.
[17] Chen, H.Y. and Lee, C.H., 2020. Vibration signals analysis by explainable artificial intelligence (xai) approach: Application on bearing faults diagnosis.
IEEE Access, 8, pp.134246-134256.
[18] Kuzlu, M., Cali, U., Sharma, V. and Güler, Ö., 2020. Gaining insight into solar photovoltaic power generation forecasting utilizing explainable artificial
intelligence tools. IEEE Access, 8, pp.187814-187823.
[19] Ahn, S., Kim, J., Park, S.Y. and Cho, S., 2020. Explaining Deep Learning-Based Traffic Classification Using a Genetic Algorithm. IEEE Access, 9,
pp.4738-4751.
[20] Keneni, B.M., Kaur, D., Al Bataineh, A., Devabhaktuni, V.K., Javaid, A.Y., Zaientz, J.D. and Marinier, R.P., 2019. Evolving rule-based explainable
artificial intelligence for unmanned aerial vehicles. IEEE Access, 7, p.17001-17016.
[21] Shivappriya, S. N., Priyadarsini, M. J. P., Stateczny, A., Puttamadappa, C., & Parameshachari, B. D. (2021). Cascade object detection and remote sensing object detection
method based on trainable activation function. Remote Sensing, 13(2), 200.
[22] KhishigsurenDavagdorj, Jang-Whan Bae, Van-Huy Pham, NiponTheera-Umpon, and Keun Ho Ryu. Explainable artificial intelligence-based framework
for non-communicable diseases prediction. IEEE Access, 9:123672–123688, 2021.
[23] Vu, D. L., Nguyen, T. K., Nguyen, T. V., Nguyen, T. N., Massacci, F., & Phung, P. H. (2020). HIT4Mal: Hybrid image transformation for malware classification. Transactions on
Emerging Telecommunications Technologies, 31(11), e3789.
[24] Calderon-Ramirez, S., Yang, S., Moemeni, A., Colreavy-Donnelly, S., Elizondo, D.A., Oala, L., Rodríguez-Capitán, J., Jiménez-Navarro, M.,
López-Rubio, E. and Molina-Cabello, M.A., 2021. Improving Uncertainty Estimation with Semi-supervised Deep Learning for COVID-19 Detection
Using Chest X-ray Images. IEEE Access.
[25] Guo, Z., Shen, Y., Bashir, A. K., Imran, M., Kumar, N., Zhang, D., & Yu, K. (2020). Robust spammer detection using collaborative neural network in Internet-of-Things
applications. IEEE Internet of Things Journal, 8(12), 9549-9558.
[26] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
[27] Kiran, P., Parameshachari, B. D., Yashwanth, J., & Bharath, K. N. (2021). Offline signature recognition using image processing techniques and back propagation neuron
network system. SN Computer Science, 2(3), 1-8.
[28] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R (Second Edition), 2021.
[29] Dash, R. K., Nguyen, T. N., Cengiz, K., & Sharma, A. (2021). Fine-tuned support vector regression model for stock predictions. Neural Computing and Applications, 1-15.
[30] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[31] Zhang, Q., Yu, K., Guo, Z., Garg, S., Rodrigues, J., Hassan, M. M., & Guizani, M. (2021). Graph Neural Networks-driven Traffic Forecasting for Connected Internet of
Vehicles. IEEE Transactions on Network Science and Engineering.
[32] PrzemyslawBiecek. Dalex: explainers for complex predictive models in r. The Journal of Machine Learning Research, 19(1):3245–3249, 2018.
[33] Yu, Y., 2020, August. The Application of Machine Learning Algorithms in Credit Card Default Prediction. In 2020 International Conference on
Computing and Data Science (CDS) (pp. 212-218). IEEE.
[34] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acmsigkdd international conference on knowledge
discovery and data mining, pages 785–794, 2016.
[35] FraukeGu¨nther and Stefan Fritsch. Neuralnet: training of neural networks. R J., 2(1):30, 2010.

You might also like