Applied Soft Computing: Cheng-Feng Wu, Shian-Chang Huang, Chei-Chang Chiou, Yu-Min Wang

Applied Soft Computing 111 (2021) 107668
Contents lists available at ScienceDirect
Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc
A predictive intelligence system of credit scoring based on deep

multiple kernel learning✩
∗
Cheng-Feng Wu a,b,c , Shian-Chang Huang d , Chei-Chang Chiou e , , Yu-Min Wang f
a
School of Business Administration, Hubei University of Economics, Wuhan, Hubei Province, China
b
School of Business, Wuchang University of Technology, Wuhan, Hubei Province, China
c
Research Center of Hubei Logistics Development, Hubei University of Economics, Wuhan, Hubei Province, China
d
Department of Business Administration, National Changhua University of Education, Changhua, Taiwan
e
Department of Accounting, National Changhua University of Education, Changhua, Taiwan
f
Department of Information Management, National Chi Nan University, Puli, Taiwan
article info a b s t r a c t
Article history: Banks face the task of improving the accuracy in predicting the behavior of individuals who utilize
Received 7 December 2020 credit cards, as issuing cards to an appropriate applicant is considered an important matter. Credit card
Received in revised form 17 June 2021 debt that is overdue by at least six months has seen a rapid increase in the past decade in the Chinese
Accepted 23 June 2021
credit card industry. The problem of delinquency in credit cards not only affects the development
Available online 7 July 2021
of credit cards but it also influences the sustainability of banks. The use of credit risk assessment is
Keywords: critical in providing a solid foundation upon which the credit card issuer can appropriately approve
Predictive intelligence an applicant for a credit card. In previous studies, a variety of machine learning methods have
Deep learning been proposed to assess credit risk. However, conventional methods are viewed as shallow models
Deep multiple kernel classifier and are not good at representing compositional features. Thus, this study applies a deep multiple
Machine learning kernel classifier as a state-of-the-art technique, which is proficient in coping with deep structure and
Credit scoring complex data in credit risk assessment. It will support decision-makers issuing credit cards in China
appropriately. The results indicate that deep multiple kernel classifier outperforms conventional and
ensemble models. Credit card departments with better risk management can avoid possible bad debt,
hence benefiting banks’ operations. The applications of predictive intelligence enhance the prediction
of human behavior in the credit card industry.
© 2021 Elsevier B.V. All rights reserved.
1. Introduction cards in China has been increasingly frequent in a vast number of

transactions. To enhance economic growth after the 2008 reces-
Credit cards have become an increasingly important method sion, the Chinese government established a long-term strategy for
of electronic payment for customers throughout the world. This improving domestic demand by diversifying the emphasis from
is because they offer buyers and vendors several advantages. exports and investment to domestic consumption [7]. The adop-
These include the use of cash or checks, the provision of reli- tion of credit cards in consumer credit is playing an increasingly
able transaction documents, and documentation of a history of important role in China’s future economic growth. Therefore, an
creditworthiness [1]. In business, providing transaction services assessment of credit before credit card approval is vital for the
through various payment modes can increase revenue, boost op- Chinese credit card industry.
erational efficiency, and minimize operating costs [2]. In previous In China, the credit card has reached rapid development with
studies, forms of electronic payment such as credit card rewards increasing residential income levels and growing domestic de-
have been found to increase transparency and accountability and mand. The issuance of credit cards increased rapidly from 0.175
promote economic growth [3–5]. to 0.734 billion from the third quarter of 2009 to the third quarter
Although the traditional attitude of ‘‘Save first, Spend later’’ of 2019, multiplying more than four times [8].
towards money drives Chinese end users [6], the use of credit However, as the amount of credit card holders has increased,
defaults on credit card payments have grown exponentially. Fig. 1
✩ Funding: Hubei University of Economics with the grant number shows credit card debt and balance trends in China over the past
of XJ201901; XJ201902; 11024225. Research Center of Hubei Logistics
decade. According to the statistics published by the Payment and
Development. Settlement Department of the People’s Bank of China [8], the
∗ Corresponding author. amount of credit card debt that is overdue by at least six months
E-mail address: ccchiou@cc.ncue.edu.tw (C.-C. Chiou). rose by $8.03 billion RMB, from the end of the second quarter
https://doi.org/10.1016/j.asoc.2021.107668
1568-4946/© 2021 Elsevier B.V. All rights reserved.
C.-F. Wu, S.-C. Huang, C.-C. Chiou et al. Applied Soft Computing 111 (2021) 107668
Fig. 1. Credit card debt and balance trends in China over the past decade.
of 2009 to $91.92 billion RMB by the third quarter of 2019. This them to make robust decisions [19]. Therefore, applying artificial
trend has been increasing sharply for several decades. In partic- intelligence techniques, such as DL methods, will assist banks
ular, the amount of unpaid credit card balance rapidly increased in assessing applicants’ credit risks and approving credit cards
by over 33 times, from 0.22 trillion in the third quarter of 2009 appropriately.
to 7.42 trillion RMB in the third quarter of 2019. The problem of Taking into account the previous background, the contribution
delinquency in credit cards only becomes apparent after credit of this paper is two-fold. First, cardholders’ behavior in east-
cards are issued. This not only affects the development of credit ern and western countries differs and only a limited number of
cards but it also influences the sustainability of banks. research studies have examined Chinese datasets in the classifica-
Credit risk assessment is currently critical in providing a solid tion of ‘‘good credit’’ and ‘‘bad credit’’ applicants. This paper also
foundation upon which the credit card issuer can approve an focuses on assessing credit card delinquency in the second-largest
applicant for a credit card. Failure to evaluate risk effectively economy in the world, where unpaid credit card balance and the
could result in debt collection, non-performing loans, and an number of applicants has been increasing in recent years. Under
increasing loss from bad debt, thus threatening the credit card the implementation of Basel II, decisions relating to card approval
industry [9–11]. In terms of regulation, banks have been encour- are crucial for banks. Second, this study applies a relatively novel
aged to strengthen their internal model frameworks to achieve method that is expected to provide a more accurate classification
advanced internal accreditation under the Basel II Agreement [12, of card applicants, which will help decrease credit risks in the
13]. Consequently, banks and financial institutions have begun to issuance of credit cards. This is because traditional statistic meth-
ods might show ineffective performance under complex data. The
thoroughly consider their customers’ credit risk and assess the
relationship between features and the outcome is non-linear or
efficiency of existing credit risk assessment models.
features only non-monotonous characteristics in credit risk mod-
Because of the importance of credit risk (especially the risk
eling. The weakness of traditional kernel methods is that they use
of delinquency), previous studies have proposed a variety of
only a single kernel to represent data. Recent research has shown
data mining methods such as decision trees, logistic regression,
that using multiple kernels can effectively enhance the rich-
discriminant analysis, and support vector machines (SVMs) [14].
ness of feature representation and improve performance. Hence,
The existing literature has also considered using artificial neural
this study applies a deep multiple kernel classifier (DMKC) as a
networks to assess credit risk [15,16]. Over the last decade, deep
state-of-the-art technique that is proficient in coping with deep
learning (DL) (also known as a deep neural network [DNN]) has structure and complex data in credit risk assessment to support
emerged as a popular artificial intelligence technique that has decision-makers in issuing credit cards in China appropriately.
been employed in a wide range of areas. It has performed well The remainder of the paper is as follows: Section 2 reviews and
in healthcare and computer science, where decision-makers are classifies recent literature on credit risk models using conven-
unable to make accurate decisions based on complex but signif- tional and DL methods. Section 3 presents information on kernel
icant data [17,18]. However, in China, few studies have applied methods and SWMs. Section 4 introduces the multiple kernel
this novel approach to predict credit card delinquency. Moreover, classifier. Section 5 presents the experimental results. Conclu-
what remains unknown is whether this novel approach is supe- sions are then drawn in Section 6.
rior to other machine learning approaches in assessing credit card
datasets in China. As the business development in the current 2. Literature
era becomes more complex, the complexity and diversity of data
leads to an increase in the challenges that decision-makers in Because there is a large amount of existing literature based on
financial institutions face while conducting analyses that enable the credit risk model, this section’s focus will be on reviewing
2
credit risk assessment studies utilizing conventional statistical default for start-up businesses. They compared the performance
and machine learning techniques such as SVMs, neural networks, of this model with various sets of predictors. The analysis showed
logistic regression, discriminant analysis, and DL. that the proposed model was superior to the logistical regression
In the literature on SVM-based methods, Xu et al. [20] utilized model. Xia et al. [34] employed three real-life loan datasets and
a real-world credit dataset to test three-link analysis algorithms two peer-to-peer lending datasets to determine the efficiency of
based on the SWM’s preprocessing. They found that methods a newly proposed method that utilized extreme gradient boost-
for ranking genetic links perform best with regard to classifi- ing and Bayesian hyper-parameter optimization. The empirical
cation accuracy. Huang et al. [21] employed three methods to results showed that Bayesian optimization performed better than
build hybrid SVM-based credit score models to determine each the methods of random search, matrix search, and manual search.
applicant’s input credit score. Their results showed that SVMs Siswantoro et al. [35] developed a neural network approach in a
work better than existing methods of data mining. Harris [22] Kalman-filter-based linear model. They concluded that the pro-
compared clustered SVMs with other non-linear SVM-based tech- posed model boosted the efficiency of the neural network. Feng
niques and found that the clustered SVM yields comparable re- et al. [36] evaluated a new soft probability, a dynamic ensem-
sults while remaining relatively cheap in computational terms. ble classification method based on ten real-world datasets, and
Ling and Zhang [23] proposed a new method based on opti- found that the ability and efficiency of the proposed method can
mization of credit score: the SVM and chaos particle swarm. increase the performance of benchmark predictions. Xia et al. [37]
Four distinct multi-kernel functions were contrasted with one examined a recent ensemble credit model’s performance that in-
radial function kernel. The experimental results showed that the volved six credit scoring datasets along with a bagging algorithm
multi-kernel model is adequate for such a classification problem. stacking process. They found that the performance of the stacking
Maldonado et al. [24] proposed two simultaneous classification model exceeded that of human calculations and homogeneous
and feature selection formulations based on SVMs that explicitly ensemble benchmark models. Zhang et al. [38] evaluated seven
incorporate acquisition attributes. They also developed mixed- classification methods with five different ensemble methods us-
integer linear programming models for the construction of clas- ing three credit datasets in an ensemble model with a genetic
sifiers that reduce acquisition costs. The results of an experiment algorithm. They concluded that the accuracy of the ensemble
utilizing datasets for credit scoring demonstrated the efficient classifiers showed improvement following the implementation of
predictive reliability of the methods at a low cost, relative to well- genetic algorithms. Trinkle and Baldwin [39] reviewed previous
known feature selection techniques. Luo and Wu [25] applied a studies and concluded that artificial neural networks help com-
new approach to the issue of credit scoring, involving DL algo-
panies develop their credit assessment model, hence increasing
rithms. In their research, a deep belief network was implemented
their profitability.
with restricted Boltzmann engines, and the classification perfor-
Regarding the application of the logit model, Abid et al. [40]
mance was compared with multinomial logistic regression, mul-
utilized 14 churn datasets to test the logit leaf model. They found
tilayer perceptron, and SVM. The results showed that the deep
that the current logit leaf model performs better than traditional
belief network with restricted Boltzmann machines exceeded
predictive approaches. Zhu et al. [41] employed quarterly finan-
other algorithms. He et al. [26] then developed a three-phase
cial and non-financial data from listed companies to test three
ensemble model with particle swarm optimization-optimized pa-
forms of a two-stage hybrid model composed of logistic regres-
rameters that adapted various imbalances to achieve excellent
sion and artificial neural network approaches. They concluded
predictive performance. The results indicated that the average
that, in terms of predicting the credit risk of small and medium-
model performance was superior to comparative algorithms for
sized enterprises (SMEs), the two-stage hybrid model integrating
different datasets on most evaluation measures. Luo et al. [27]
a logistic regression and artificial neural network performs better
proposed a new, uninitialized two-stage classification process. A
new kernel-free quadratic surface SVM model was introduced than the pure neural network method. Ghodselahi [42] tested
to prevent selecting kernels and associated kernel parameters; the SVM-based hybrid model on a real-world German dataset
a golden-sector algorithm was then developed to produce the using ten clustering and classification techniques. The results
required classification for balanced and imbalanced data. showed that the proposed ensemble model increases classifi-
With regard to neural networks, Hsieh and Hung [28] devel- cation efficiency in the evaluation of credit risk. Khemakhem
oped a model that classifies individual neural networks – the and Boujelbene [43] employed real-world data from 86 Tunisian
Bayes naïve and the SVM – using class bagging as a data opti- companies to evaluate discriminant analyses and artificial neural
mization technique for better performance generalization. They network methods. They concluded that artificial neural networks
found that the ensemble method significantly improved efficiency are performing better and are helping firms assess credit risks.
in comparison to traditional ensemble classifications. In another Wang et al. [44] proposed a two-phase hybrid approach based
study, by Zekic-Susac et al. [29], logistic regression, neural net- on the filter methodology and multiple population genetic al-
work, and decision tree model methods were compared, which gorithm to test two UCI database credit-scoring datasets. The
showed that the neural network performs best in prediction results showed that the proposed model performs better than
accuracy, followed by the decision tree and the logistic model. the multiple population genetic algorithm. Utilizing a Brazilian
Chen and Huang [30] also developed a scoring model using neural dataset, de Castro et al. [45] assessed various machine learning
networks. Subsequently, they used genetic algorithms to provide algorithms and found that the traditional ensemble model of
advanced knowledge of the class of applicants disqualified by bagging, boosting, and random forest was more effective than
conditional reclassification. Bekhet and Eletter [31] proposed two other individual classifications.
credit scoring models using data mining technologies supporting DL [46], as the name suggests, involves stacking processing
Jordanian commercial banks’ loan decisions. The results showed layers one atop the other. The deeper the architecture, the more
that the logistic regression model was superior in terms of the layers it has. The intuition behind DL is derived from the com-
overall accuracy of the radial basis function (RBF) model. By positional nature of natural stimuli such as speech and vision.
integrating mathematical and machine learning models, Tsai and Natural signals are highly compositional such that simple prim-
Hung [32] analyzed four approaches and found that hybrid neu- itive features combine to form mid-level features and mid-level
ronal networks and neural network ensembles outperform single features combine to form high-level features. From biochemistry
neural networks. Sohn and Kim [33] proposed a decision-tree- to genetics, life itself is based on such a compositional structure.
based credit model that identifies significant predictors of loan Thus, DL is concerned with learning increasingly more abstract
3
representations in a layer-wise manner. Each layer feeds from the methods have, thus, become popular and successful in numer-
layer below and then sends the output to the layer above. In the ous applications. Such methods project the data onto a high-
case of vision applications, this process leads to neurons higher up dimensional reproducing kernel Hilbert space, in which simple
the hierarchy becoming sensitive to particular complete objects linear classifiers can be applied to separate data. However, tra-
or scenes. ditional kernel methods’ weakness is that they use only a single
DL is also useful for unsupervised feature learning. The un- kernel to represent data. Recent research has shown that using
derlying purpose of DL is to learn a hierarchy of features one multiple kernels can effectively enhance the richness of feature
level at a time, which is referred to as greedy layer-wise unsu- representation and improve performance [56]. The objective of
pervised pre-training. This process can be unsupervised, which multiple kernel learning (MKL) [57] is to learn the kernel from
means taking advantage of massive unlabeled data for learning training data. However, MKL remains a shallow model and is not
purposes. The learning process begins by training each layer with good at representing compositional features. Several researchers
unsupervised algorithms (e.g., K-means or restricted Boltzmann have attempted to combine the advantages of both kernel meth-
machine [RBM]) and continues by taking the features produced at ods and DL. In so doing, the modified kernel method has become a
the previous level as input for the next layer. Finally, the extracted new version called deep kernel learning. Cho et al. [58] attempted
features can be used as input for a supervised classifier. to optimize an arc-cosine kernel and successfully integrated this
Previous methods generally suffer from two drawbacks: they into a deep architecture. However, in this method, it is difficult
rely on a small number of input variables and limited construc- to tune parameters beyond the first layer. In a study conducted
tion of features. The construction tends to involve a simple math- by Zhuang et al. [59], an attempt was made to tune a combi-
ematical operation (not composition), that is, they are types of nation of kernels. However, optimizing the network beyond two
shallow models or shallow learning. The training of shallow mod- layers proved difficult. The most important weakness was that
els is simple, and they are applicable for only a few features. the second layer consisted only of a single Gaussian RBF kernel. To
Thanks to several recently developed feature-generating methods address these problems, Strobl and Visweswaran [60] developed
(such as DNN), the hierarchical representation of data is available a new method capable of optimizing multiple complete layers of
for a large number of applications. Shallow learning techniques kernels. Their method increased generalization performance on
are, however, poor or inefficient in modeling complex and com- several types of datasets.
positional relationships between inputs and outputs. Therefore, Machine learning models concentrate on predictive precision
deep architecture is more popular and becomes essential when and need limited assumptions regarding the characteristic of
high-dimensional data is being processed. the data generation process. This feature allows researchers to
With respect to DL techniques, Pandey [47] utilized a transac- observe the data-driven interactions and non-linear relationships
tion dataset to evaluate DL using the H2O algorithm system. The between explanatory and the outcome variable. Having reviewed
results showed that the DL method accurately classifies fraud- a wealth of studies that have applied statistical and machine
ulent transactions. Galeshchuk and Mukherjee [48] employed
learning techniques, we conclude that it is difficult to determine
foreign exchange evidence to test the deep convolution of the
the best method for assessing prediction across different datasets.
neural network. They concluded that the proposed method is
Indeed, Abdou and Pointon [61] have indicated that no overall
ideal for detecting the change in the direction of foreign ex-
best method for evaluating credit scoring is available. However,
change. Li et al. [49] evaluated approaches based on artificial
prior studies have indicated that machine learning methods per-
intelligence, including neural networks, machine learning, and
form better than conventional statistical methods, as the rela-
DL methods. They found that these methods perform differently
tionships between indicators and the outcome is non-linear or
in various datasets; thus, it is essential to find an appropriate
non-monotone characteristics in credit risk modeling [62]. DL, as
approach for adapting specific data. Sun and Vasarhelyi [50]
an innovative machine learning method, is rarely used in China
employed DNNs to forecast credit card delinquencies. They con-
to predict credit card delinquency, which is a major issue in the
cluded that it is worthwhile utilizing artificial intelligence as a
current bank industry, and the future credit card industry in China
method for evaluating credit risk in companies. Wang et al. [51]
offers fruitful business opportunities.
applied a DL algorithm to online operation behavior data to assess
default risk. They concluded that, compared with the traditional
3. Previous research
artificial feature extraction method, the proposed algorithm sig-
nificantly boosted predictive precision. Yu et al. [52] proposed a
method of forecasting credit risk evaluation using an integrated 3.1. Kernel methods and support vector machines
learning method composed of an extreme learning machine based
on a multilevel deep trust network. They concluded that the The basic principle underlying kernel methods is to map data
proposed ensemble learning machine is more efficient than other from an original input space to a high-dimensional transformed
conventional ensemble approaches. A novel approach, deep ge- space, where simple linear classifiers are sufficient to separate
netic cascade ensembles of classifiers, consisting of two types data. SVMs, a type of kernel method, were first proposed by
of SVM classifiers is proposed in a study by Pławiak et al. [53]. Vapnik [55]. SVM classifiers construct a hyperplane in high-
The novel model is used in the banking system to evaluate the dimensional transformed space to separate two classes (labeled
UCI database for the bank credits of the applicants. The system y ∈ {−1, 1}) so that the margin (the distance between the
performs better in classification accuracy than other traditional decision hyperplane and the nearest point) is maximal. This
ensembles. To get a more comprehensive level of knowledge, is based on the structured risk minimization (SRM) principle.
Bastani et al. [54] evaluate the profitability of a credit-scoring Instead of minimizing the empirical error, the advantage of SVMs
issue as opposed to only evaluating the risk of default. The in- is that they minimize an upper bound of the generalization (or
ternal rate of return as a proxy variable is used to evaluate the SRM) error. Minimizing empirical error is usually the case in other
profitability of the peer-to-peer market. This study conducted neural networks. The SVM classification function is formulated as
a two-stage approach to assess the probability of default to- follows:
gether with the profitability applying a wide and DL strategy. The y = sign(wT φ (x) + b), (1)
proposed method outperforms the existing approaches.
Conversely, kernel methods [55] are effective in learning a where φ (x) is the feature function, which denotes nonlinear map-
complex decision boundary with only a few parameters. Kernel ping used to map data from the input space x to the future space.
4
The coefficients w and b are estimated based on the generaliza- where α, β, γ stand for the free parameters of the polynomial
tion error using the following optimization problem: kernel. Similarly, the K(2) of a Gaussian RBF kernel is
K(2) = Φ (2) Φ (1) (x) · Φ (2) Φ (1) (y)
l
( ) ( )
1 ∑ (13)
min R(w, ξ ) = ∥w∥ + C2
ξi , (2) −2γ (1−K(1) (x,y))
w,b 2
i=1
=e . (14)
with Using basic functions, kernels can be designed to create differ-

ent representations. To increase representation power (or rich-
yi (w φ (xi ) + bi ) ≥ 1 − ξi , i = 1, . . . , l
T
(3) ness) in a DL structure, we can stack two kernels of different
ξi ≥ 0, i = 1, . . . , l, (4) types; this develops a different representation from either alone.
The most important aspect to consider is that a single kernel does
where C is a prescribed parameter used to determine the trade- not easily approximate the richer representations of multiple
off between the empirical risk and model smoothness (a smooth kernels.
and simple function is the best choice). However, how can we measure the richness of a data repre-
Based on convex analysis, we can take the Lagrangian function sentation? Strobl and Visweswaran [60] proposed a method to
and pose an optimal condition for the above problem. The dual analyze the richness/complexity of a kernel based on its pseudo-
solution of this problem can be formulated as follows: dimension. More specifically, they measured richness/complexity
l l by the upper bound of the second-order Rademacher chaos com-
∑ 1∑
max D(α ) = αi − yi yj αi αj K (xi , xj ), (5) plexity [63]. Using their measure, multiple layers (or deep ar-
α 2
i=1 i,j=1 chitecture) can increase the representation richness of kernels.
However, the increased richness of the kernels can also increase
with constraints,
the risk of over-fitting. Nevertheless, compared with traditional
0 ≤ αi ≤ C , i = 1, . . . , l (6) DNNs, Strobl and Visweswaran [60] demonstrated that the upper
l bound of the generalization error for DMKC is significantly less
than that of deep feedforward networks.
∑
αi yi = 0, (7)
The architecture of DMKC used in this study is an l-level
i=1
multiple kernel architecture with h sets of m kernels at each
where α are Lagrangian multipliers, and K (xi , xj ) is the ker- layer:
nel function. The solution of b can follow the complementarity { ( ) }
(l) (l) (l) (l)
K(l) = W1,1 K1,1 W1,1 K1,1 + · · · + · · · + Wh,m Kh,m ,
(l−1) (l−1)
Karush–Kuhn–Tucker (KKT) conditions. Finally, the classification (15)
function is given by
(l)
( l
) where Kh,m represents the mth kernel in set h at layer l with an
(l)
associated weight parameter Wh,m , and K(l) represents the single
∑
f (x) = sign αi yi K (x, xi ) + b . (8)
i=1
combined kernel at layer l.
The most important is the kernel function, which is equal to the 5. Experimental results
inner product of two vectors x and xi in the feature space, namely,
K (x, xi ) = φ (x)φ (xi ). To be a kernel function, it must satisfy 5.1. Data sources and sampling
Mercer’s condition. For details, please refer to Vapnik [55].
The sample comprised a Chinese credit dataset obtained from
4. Deep Multiple Kernel Classifier (DMKC) the big data platform of machine learning databases (www.cfdsj.
cn). It contained financial information on a large number of cus-
The most critical issue in kernel methods is data representa- tomers of an anonymized bank. Of these, 517 were creditworthy
tion, which is implicitly chosen through the so-called kernel. Re- cardholders and 183 exhibited credit card delinquency. This data
cent applications have shown that using multiple kernels rather set included one categorical attribute: education, and seven nu-
than one can enhance the richness of feature representation and merical attributes: age, seniority, duration of residence, income,
improve performance [56]. Recently, Strobl and Visweswaran [60] debt to income ratio, credit card debt, and other debt.
proposed a deep version of the multiple kernel classifier. They
combined kernels at each layer and successfully optimized the DL 5.2. Experimental results
structure. Their experiments on many datasets have shown that
each layer can increase performance with just a few base kernels. Customer credit ratings provide important information on
The role of kernels within kernel methods is used to compute credit risk for banks in financial markets. DL was, therefore,
the similarity between two inputs. Traditionally, the dot product applied to create a novel rating prediction system. The perfor-
of its two basic functions can be used to describe a kernel mance of the system was then examined in relation to the sample
K(1) (x, y) = Φ (1) (x) · Φ (1) (y), (9) dataset.
Descriptive statistics for the data are listed in Table 1 for refer-
(1)
where K (x, y) represents a first-layer kernel. In DL (or com- ence. In the first stage, we tested several traditional models used
positional) structure, how to view a kernel within a kernel is as to forecast credit ratings, namely, multilayer perceptron (MP),
respective basic functions within basic functions; for example, an Bayesian network (BayesNet), SVM, nearest neighbors classifier
l number of layers (IBK), logistic regressions, and decision tree (J48). In the second
K(l) (x, y) = Φ (l) . . . Φ (1) (x) · Φ (l) . . . Φ (1) (y) ,
( ) ( )
(10) stage, we compared several advanced ensemble models, namely,
AdaBoost M1, bagging, stacking, random committee, and random
For example, as for the polynomial kernel, a polynomial of higher subspace. The base classifiers of these ensemble methods were
order can be formed by the following operation: set up as a decision tree (J48), except for the random committee,
which used a random tree as base classifiers. The dataset was
K(1) (x, y) = (α (x · y) + β )γ (11)
randomly divided into ten parts, and ten-folds cross validation
(2) (1) γ
K (x, y) = α (K (x, y) + β ) , (12) was applied to evaluate the models’ performance.
5
Fig. 2. The performance comparison of traditional models.
Fig. 3. The performance comparison of advanced models.
Table 1
Descriptive statistics of input variables.
Age Education Seniority Duration of Residence Income Debt-to-Income Ratio Credit Card Debt Other Debt
Min. 20 1 0 0 14 0.4 0.011696 0.045584
Max. 56 5 31 34 446 41.3 20.56131 27.0336
Median 34 1 7 7 34 8.6 0.85487 1.987568
Mode 29 1 0 2 25 4.5 0.085785 7.8234
Range 36 4 31 34 432 40.9 20.54961 26.98802
Std. 7.997342 0.928206 6.658039 6.824877 36.81423 6.827234 2.117197 3.287555
Skewness 0.360703 1.198744 0.829371 0.936086 3.850475 1.093713 3.890258 2.722314
Traditionally, multiple kernel learning algorithms used RBF Table 2

and polynomial kernels. Since our objective is to maximize the Performance comparison of traditional models.
upper bound of the pseudo-dimension of the final kernel (which BayesNet MP Logistic SVM IBK (with 3 neighbors) J48
will increase its richness with each successive layer), this study 75.14% 77.29% 80.43% 79.29% 77.14% 75.71%
chooses not to use these traditional kernels. This study used four Note: BayesNet denotes Bayesian network; MP denotes multilayer perceptron;
unique base kernels: a linear kernel, an RBF kernel, a polynomial Logistic denotes logistic regressions; SVM denotes support vector machines; IBK
kernel of degree 2, and a polynomial kernel of degree 3. We used denotes nearest neighbors classifier; J48 denotes decision tree.
one set of kernels for each layer and made a 5-layer architecture
for the DMKC. This study used a gradient descent algorithm to
train the DMKC. these ensemble models. Although ensemble models usually per-
The accuracies (in %) of traditional models are presented in form well in many applications and are robust, they do not appear
Table 2 and Fig. 2. The best was logistic regression, followed to be very effective in this case. This may be attributed to the
by SVM. These were better than other classifiers. The poorest
fact that although ensemble methods are powerful, their base
in terms of performance was the BayesNet. We then compared
several advanced ensemble models with our DMKC. Details on the classifiers are shallow models and cannot take full advantage
comparative performance of these advanced models is presented of ensemble methods. If we utilize deep models as the base
in Table 3 and Fig. 3, which show that our DMKC outperforms classifiers for an ensemble model, the computation required will
6
Table 3
Performance comparison of advanced models.
DMKC AdaBoostM1 Bagging Stacking RandomCommittee RandomSubspace
82.86% 75% 77.5714% 73.8571% 74.71% 77.43%
Note: DMKC denotes deep multiple kernel classifier.
be extreme, which is not affordable on a general computation framework. The findings offer clear proof that DL models gener-
platform. ate more reliable performance on default and prepayment risk
Nonlinear kernel methods are prevalent in financial data min- than conventional models. Based on this observation, analysts
ing. The kernel is effective at discovering nonlinear patterns in assessing the sustainability of a business could regard DL meth-
data. Customer credit data is usually complex and have a deep ods as standards for validating other frameworks such as the
structure. Traditional kernel methods cannot fully capture or International Financial Reporting Standard 9 model.
represent complex, compositional, and hierarchical data features. Thus, constructing a better credit risk assessment model to
Consequently, this study applied DMKC to handle customer credit classify good and bad credit card holders is an important and
data. Compared with the ensemble and traditional methods, the essential task for bank managers and policy-makers in China.
DMKC exhibited the best performance. The computational load- This research’s future work is to build a hybrid classification
ing was not as heavy as that of an ensemble model with deep process focused on further features to assess a credit risk model.
base classifiers. Therefore, the framework proposed by this study In addition, large-scale applications of other credit risk datasets
is both effective and efficient for this type of customer credit data. are expected in the future.
6. Conclusion CRediT authorship contribution statement
Credit risk assessment is a widely employed method that helps Cheng-Feng Wu: Research topic, Data collection, Developed
credit card departments decide whether to approve credit card implications, Writing. Shian-Chang Huang: Research model and
applicants. Machine learning techniques are utilized in credit risk statistical analysis, Developed implications, Writing. Chei-Chang
assessment as customer credit data is complex and has a deep Chiou: Statistical analysis, Edited this manuscript, Developed im-
structure. A DL method such as the DMKC employed in this study plications, Responsible for correspondence. Yu-Min Wang: Statis-
performs better than other traditional statistic methods when tical analysis, Developed implications.
certain essential assumptions cannot be satisfied for complex
data. Declaration of competing interest
This experiment shows that the deep kernel-based method in
machine learning is relatively effective in this Chinese dataset The authors declare that they have no known competing finan-
compared to traditional statistical approaches such as BayesNet, cial interests or personal relationships that could have appeared
Logistic, IBK, and J48. This finding contributes knowledge on to influence the work reported in this paper.
Chinese applicants to the literature on credit risk assessment for
credit card approval, a field in which datasets in Western coun- References
tries, such as UCI Machine Learning Repository, have been applied
in the vast majority of studies. The results in this paper are similar [1] J.B. Soll, R.L. Keeney, R.P. Larrick, Consumer misunderstanding of credit
to those of Li et al. [64], who focused on Australian and German card use, payments, and debt: Causes and solutions, J. Publ. Policy Mark.
32 (1) (2013) 66–81.
credit card datasets and found that the deep kernel-based method
[2] S.C. Alliance, Contactless Payment and the Retail Point of Sale: Applications,
performs better than traditional statistical approaches. Regarding Technologies and Transaction Models, Smart Card Alliance, Princeton
deep multiple kernel learning, Huang and Wu [65] indicated deep Junction, NJ, 2003.
kernel-based method is more effective in forecasting energy price. [3] O.S. Oyewole, J.G. El-Maude, M. Abba, Onuh. M.E., Electronic payment
Additionally, testing on 12 real-world databases, Rebai et al. [66] system and economic growth: a review of transition to cashless economy
in Nigeria, Int. J. Sci. Eng. Technol. 2 (2013) 913–918.
addressed that deep kernel-based method better obtains the ac-
[4] M. Zandi, V. Singh, J. Irving, The impact of electronic payments on eco-
curacy of classification. Finally, this study provides evidence to nomic growth. Moody’s Analytics: Economic and Consumer Credit Analyt-
show that the DMKC performs relatively better than traditional ics, 217 (2), , 2013, https://www.visa.co.in/content/dam/vcom/download/
methods in shallow learning, as customer credit data are complex corporate/media/moodys-economy-white-paper-feb-2013.pdf.
with a deep structure. The effectiveness of DMKC is derived from [5] E.G. Mieseigha, U.K. Ogbodo, An empirical analysis of the benefits of
cashless economy on Nigeria’s economic development, J. Finance Account.
the fact that it combines multiple kernels within each layer to 4 (2013) 11–16.
increase the richness of representations and stacks various layers [6] L. Wang, W. Lu, N.K. Malhotra, Demographics, attitude, personality and
to process a signal in an increasingly abstract manner. credit card features correlate with credit card debt: a view from China, J.
The better the accuracy of classification, the lower the risk of Econ. Psychol. 32 (1) (2011) 179–193.
[7] N. Ding, Consumer credits and economic growth in China, Chin. Econ. 48
operations in the credit card business. Banks using DL methods
(4) (2015) 269–278.
can adjust the number of credit lines for customers holding [8] Payment and settlement department of the people’s bank of China, China
credit cards who are probably going into delinquency. Credit card payment system, 2020, http://www.pbc.gov.cn/zhifujiesuansi/128525/
departments with better risk management can avoid possible bad 128545/128643/17694/index1.html.
debt, which will benefit banks’ operations [67], especially those [9] B. Twala, Multiple classifier application to credit risk assessment, Expert
Syst. Appl. 37 (4) (2010) 3326–3336.
in the world’s second-largest economy, where the unpaid credit
[10] S.C. Chen, M.Y. Huang, Constructing credit auditing and control & manage-
card balance has rapidly increased by over 33 times in recent ment model with data mining technique, Expert Syst. Appl. 38 (5) (2011)
decades. The strength of DL shows that it contribute more to 5359–5365.
financial institutions as the available data is large and compli- [11] S.C. Huang, C.F. Wu, Customer credit quality assessments using data
cated. In addition, an efficient model is expected to use powerful mining methods for banking industries, Afr. J. Bus. Manage. 5 (11) (2011)
4438–4445.
algorithms to assess credit scoring automatically [50]. Similarly, [12] BCBS, International Convergence of Capital Measurement and Capital
Blumenstock et al. [68] examine machine learning techniques Standards: A Revised Framework- Comprehensive Version, Bank for
with DL methods for survival analysis in a credit risk modeling International Settlements, 2006.
7
[13] BIS, Implementation of Basel II: Practical Considerations, Bank for [40] L. Abid, A. Masmoudi, S. Zouari-Ghorbel, The consumer loan’s payment
International 919 Settlements, 2004. default predictive model: an application of the logistic regression and the
[14] A.I. Marques, V. Garcia, J.S. Sanchez, Exploring the behaviour of base discriminant analysis in a Tunisian commercial bank, J. Knowl. Econ. 9 (3)
classifiers in credit scoring ensembles, Expert Syst. Appl. 39 (11) (2012) (2018) 948–962.
10244–10250. [41] Y. Zhu, C. Xie, B. Sun, G.J. Wang, X.G. Yan, Predicting China’s SME credit risk
[15] H.C. Koh, K.L.G. Chan, Data mining and customer relationship mar- keting in supply chain financing by logistic regression, artificial neural network
in the banking industry, Singapore Manage. Rev. 24 (2) (2002) 1–27. and hybrid models, Sustainability 8 (5) (2016) 433.
[16] L.C. Thomas, A survey of credit and behavioural scoring: forecasting [42] A. Ghodselahi, A hybrid support vector machine ensemble model for credit
financial risk of lending to consumers, Int. J. Forecast. 16 (2) (2000) scoring, Int. J. Comput. Appl. 17 (5) (2011) 1–5.
149–172. [43] S. Khemakhem, Y. Boujelbene, Credit risk prediction: A comparative study
[17] P. Hamet, J. Tremblay, Artificial intelligence in medicine, Metabolism 69S between discriminant analysis and the neural network approach, Account.
(2017) S36–S40. Manage. Inf. Syst. 14 (1) (2015) 60.
[18] C. Ohlsson, Exploring the Potential of Machine Learning: How Ma- [44] D. Wang, Z. Zhang, R. Bai, Y. Mao, A hybrid system with filter approach
chine Learning Can Support Financial Risk Management (Master’s thesis), and multiple population genetic algorithm for feature selection in credit
Uppsala University, 2017. scoring, J. Comput. Appl. Math. 329 (2018) 307–321.
[19] E. Turban, J.E. Aronson, T.-P. Liang, R. McCarthy, in: A.K. Ghosh (Ed.), De- [45] J.R. de Castro Vieira, F. Barboza, V.A. Sobreiro, H. Kimura, Machine learning
cision Support Systems and Intelligent Systems, seventh ed., Prentice-Hall models for credit analysis improvements: Predicting low-income families’
of India, New Delhi, India, 2005. default, Appl. Soft Comput. 83 (2019) 105640.
[20] X. Xu, C. Zhou, Z. Wang, Credit scoring algorithm based on link analysis [46] Y. Bengio, Y. LeCun, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
ranking with support vector machine, Expert Syst. Appl. 36 (2) (2009) [47] Y. Pandey, Credit card fraud detection using deep learning, Int. J. Adv. Res.
2625–2632. Comput. Sci. 8 (5) (2017).
[21] C.L. Huang, M.C. Chen, C.J. Wang, Credit scoring with a data mining [48] S. Galeshchuk, S. Mukherjee, Deep networks for predicting direction of
approach based on support vector machines, Expert Syst. Appl. 33 (4) change in foreign exchange rates, Intell. Syst. Account. Finance Manage.
(2007) 847–856. 24 (4) (2017) 100–110.
[22] T. Harris, Credit scoring using the clustered support vector machine, Expert [49] Y. Li, W. Jiang, L. Yang, T. Wu, On neural networks and learning systems
Syst. Appl. 42 (2) (2015) 741–750. for business computing, Neurocomputing 275 (2018) 1150–1159.
[23] Y. Ling, Q. CAO, H. Zhang, Credit scoring using multi-kernel support vector [50] T. Sun, M.A. Vasarhelyi, Predicting credit card delinquencies: An application
machine and chaos particle swarm optimization, Int. J. Comput. Intell. Appl. of deep neural networks, Intell. Syst. Account. Finance Manage. 25 (4)
11 (03) (2012) 1250019. (2018) 174–189.
[24] S. Maldonado, J. Pérez, C. Bravo, Cost-based feature selection for Support [51] C. Wang, D. Han, Q. Liu, S. Luo, A deep learning approach for credit scoring
Vector Machines: An application in credit scoring, European J. Oper. Res. of peer-to-peer lending using attention mechanism LSTM, IEEE Access 7
261 (2) (2017) 656–665. (2018) 2161–2168.
[25] C. Luo, D. Wu, D. Wu, A deep learning approach for credit scoring using [52] L. Yu, Z. Yang, L. Tang, A novel multistage deep belief network based
credit default swaps, Eng. Appl. Artif. Intell. 65 (2017) 465–470. extreme learning machine ensemble learning paradigm for credit risk
[26] H. He, W. Zhang, S. Zhang, A novel ensemble method for credit scoring: assessment, Flexible Serv. Manuf. J. 28 (4) (2016) 576–592.
Adaption of different imbalance ratios, Expert Syst. Appl. 98 (2018) [53] P. Pławiak, M. Abdar, U.R. Acharya, Application of new deep genetic
105–117. cascade ensemble of SVM classifiers to predict the Australian credit
[27] J. Luo, X. Yan, Y. Tian, Unsupervised quadratic surface support vector scoring, Appl. Soft Comput. 84 (2019) 105740.
machine with application to credit risk assessment, European J. Oper. Res. [54] K. Bastani, E. Asgari, H. Namavari, Wide and deep learning for peer-to-peer
280 (3) (2020) 1008–1017. lending, Expert Syst. Appl. 134 (2019) 209–224.
[28] N.C. Hsieh, L.P. Hung, A data driven ensemble classifier for credit scoring [55] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed., Springer,
analysis, Expert Syst. Appl. 37 (1) (2010) 534–545. New York, 1999.
[29] M. Zekic-Susac, N. Sarlija, M. Bensic, Small business credit scoring: a com- [56] M. Varma, B.R. Babu, More generality in efficient multiple kernel learning,
parison of logistic regression, neural network, and decision tree models, in: Proceedings of the International Conference on Machine Learning,
in: 26th International Conference on Information Technology Interfaces, Montreal, Canada, 2009, pp. 1065–1072.
IEEE, 2004, pp. 265–270. [57] G.R.G. Lanckriet, N. Cristianini, L.E. Ghaoui, P. Bartlett, M.I. Jordan, Learning
[30] M.C. Chen, S.H. Huang, Credit scoring and rejected instances reassigning the kernel matrix with semidefinite programming, J. Mach. Learn. Res. 5
through evolutionary computation techniques, Expert Syst. Appl. 24 (4) (2004) 27–72.
(2003) 433–441. [58] Y. Cho, S.K. Saul, Kernel methods for deep learning, Adv. Neural Inf.
[31] H.A. Bekhet, S.F.K. Eletter, Credit risk assessment model for Jordanian Process. Syst. 22 (2009) 342–350.
commercial banks: neural scoring approach, Rev. Dev. Finance 4 (1) (2014) [59] J. Zhuang, I.W. Tsang, S.C.H. Choi, Two-layer multiple kernel learning,
20–28. in: Proceedings of International Conference on Artificial Intelligence and
[32] C.F. Tsai, C. Hung, Modeling credit scoring using neural network ensembles, Statistics, 2011, 2011.
Kybernetes 43 (7) (2014) 1114–1123. [60] E.V. Strobl, S. Visweswaran, Deep multiple kernel learning, in: 2013 12th
[33] S.Y. Sohn, J.W. Kim, Decision tree-based technology credit scoring for International Conference on Machine Learning and Applications, Vol. 1,
start-up firms: Korean case, Expert Syst. Appl. 39 (4) (2012) 4007–4012. IEEE, 2013, pp. 414–417.
[34] Y. Xia, C. Liu, Y. Li, N. Liu, A boosted decision tree approach using Bayesian [61] H.A. Abdou, J. Pointon, Credit scoring, statistical techniques and evaluation
hyper-parameter optimization for credit scoring, Expert Syst. Appl. 78 criteria: a review of the literature, Intell. Syst. Account. Finance Manage.
(2017) 225–241. 18 (2–3) (2011) 59–88.
[35] J. Siswantoro, A.S. Prabuwono, A. Abdullah, B. Idrus, A linear model based [62] M. Moscatelli, F. Parlapiano, S. Narizzano, G. Viggiano, Corporate default
on Kalman filter for improving neural network classification performance, forecasting with machine learning, Expert Syst. Appl. 161 (2020) 113567.
Expert Syst. Appl. 49 (2016) 112–122. [63] Y. Ying, C. Campbell, Rademacher chaos complexities for learning the
[36] X. Feng, Z. Xiao, B. Zhong, J. Qiu, Y. Dong, Dynamic ensemble classification kernel, Neural Comput. 22 (2010) 2858–2886.
for credit scoring using soft probability, Appl. Soft Comput. 65 (2018) [64] J. Li, L. Wei, G. Li, W. Xu, An evolution strategy-based multiple kernels
139–151. multi-criteria programming approach: The case of credit decision making,
[37] Y. Xia, C. Liu, B. Da, F. Xie, A novel heterogeneous ensemble credit scoring Decis. Support Syst. 51 (2) (2011) 292–298.
model based on bstacking approach, Expert Syst. Appl. 93 (2018) 182–199. [65] S.C. Huang, C.F. Wu, Energy commodity price forecasting with deep
[38] H. Zhang, H. He, W. Zhang, Classifier selection and clustering with fuzzy multiple kernel learning, Energies 11 (11) (2018) 3029.
assignment in ensemble model for credit scoring, Neurocomputing 316 [66] I. Rebai, Y. BenAyed, W. Mahdi, Deep multilayer multiple kernel learning,
(2018) 210–221. Neural Comput. Appl. 27 (8) (2016) 2305–2314.
[39] B.S. Trinkle, A.A. Baldwin, Research opportunities for neural networks: [67] F. Butaru, Q. Chen, B. Clark, S. Das, A.W. Lo, A. Siddique, Risk and
the case for credit, Intell. Syst. Account. Finance Manage. 23 (3) (2016) risk management in the credit card industry, J. Bank. Financ. 72 (2016)
240–254. 218–239.
[68] G. Blumenstock, S. Lessmann, H.V. Seow, Deep learning for survival and
competing risk modelling, J. Oper. Res. Soc. (2020) 1–13.

Applied Soft Computing: Cheng-Feng Wu, Shian-Chang Huang, Chei-Chang Chiou, Yu-Min Wang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Soft Computing: Cheng-Feng Wu, Shian-Chang Huang, Chei-Chang Chiou, Yu-Min Wang

Uploaded by

Copyright:

Available Formats

Applied Soft Computing 111 (2021) 107668

Contents lists available at ScienceDirect

Applied Soft Computing

A predictive intelligence system of credit scoring based on deep

1. Introduction cards in China has been increasingly frequent in a vast number of

with Using basic functions, kernels can be designed to create differ-

Fig. 2. The performance comparison of traditional models.

Fig. 3. The performance comparison of advanced models.

Traditionally, multiple kernel learning algorithms used RBF Table 2

Note: DMKC denotes deep multiple kernel classifier.

6. Conclusion CRediT authorship contribution statement

You might also like