You are on page 1of 20

The current issue and full text archive of this journal is available on Emerald Insight at:

www.emeraldinsight.com/1746-5664.htm

JM2
13,4 Credit risk assessment for
unbalanced datasets based on data
mining, artificial neural network
932 and support vector machines
Received 5 January 2017 Sihem Khemakhem
Revised 6 April 2017
20 July 2017
Faculty of Economics and Management of Sfax, Sfax University, Tunisia
Accepted 22 July 2017
Fatma Ben Said
National Engineering School of Sfax (ENIS), Sfax University, Tunisia, and
Younes Boujelbene
Faculty of Economics and Management of Sfax, Sfax University, Tunisia

Abstract
Purpose – Credit scoring datasets are generally unbalanced. The number of repaid loans is higher than that
of defaulted ones. Therefore, the classification of these data is biased toward the majority class, which
practically means that it tends to attribute a mistaken “good borrower” status even to “very risky borrowers”.
In addition to the use of statistics and machine learning classifiers, this paper aims to explore the relevance
and performance of sampling models combined with statistical prediction and artificial intelligence
techniques to predict and quantify the default probability based on real-world credit data.
Design/methodology/approach – A real database from a Tunisian commercial bank was used and
unbalanced data issues were addressed by the random over-sampling (ROS) and synthetic minority over-
sampling technique (SMOTE). Performance was evaluated in terms of the confusion matrix and the receiver
operating characteristic curve.
Findings – The results indicated that the combination of intelligent and statistical techniques and re-
sampling approaches are promising for the default rate management and provide accurate credit risk
estimates.
Originality/value – This paper empirically investigates the effectiveness of ROS and SMOTE in
combination with logistic regression, artificial neural networks and support vector machines. The authors
address the role of sampling strategies in the Tunisian credit market and its impact on credit risk. These
sampling strategies may help financial institutions to reduce the erroneous classification costs in comparison
with the unbalanced original data and may serve as a means for improving the bank’s performance and
competitiveness.
Keywords Artificial intelligence, Data mining, Credit scoring, Credit risk, Unbalanced data
Paper type Research paper

1. Introduction
Credit risk, also called counterparty risk, is the first risk that a bank has to bear. The credit
risk for banks is the probability that the borrower fails to repay debt, in part or in full, within
the deadline and under the terms specified in the credit agreement (Apostolik et al., 2009).
Journal of Modelling in
Management
Vol. 13 No. 4, 2018
pp. 932-951
© Emerald Publishing Limited The authors are grateful to Kamel MAALOUL, translator and English professor, for having
1746-5664
DOI 10.1108/JM2-01-2017-0002 proofread the manuscript.
Therefore, effective management seems essential for the long-term survival of banks and the Artificial
stability of the whole financial system. neural network
Tunisia is one of countries highly affected by the problem of credit risk. Indeed,
according to the statics of the World Bank (2015), the non-performing loan rate among
Tunisian banks rose from 13.2 per cent in 2009 to 16.2 per cent in 2014, a high rate according
to international standards. Therefore, given that bank credit is the backbone of the Tunisian
economy, the Tunisian banking sector is called upon to apply good risk management
practices. In this perspective, the study of credit risk in the Tunisian context is of particular 933
interest.
Following the increase in non-performing debt ratio recorded in the Tunisian credit
market and therefore the credit risk, the role of bad credit management by Tunisian banking
institutions is highlighted as an obstacle to the development of the banking sector.
Therefore, it seems necessary to study the Tunisian credit market notably to find new
solutions and strategies to minimize credit risk.
To measure credit risk, credit scoring is an important decision-making process
used in many business areas (Huang and Wang, 2017). In recent years, credit scoring
has become a primary method used by financial institutions to assess the credit risk
(Huang et al., 2007). It is a set of decision-taking models and techniques that advises
lenders on whether to grant credit. The objective of these models is to assign a score
to a potential borrower to estimate the future performance of the loan (Thomas et al.,
2002).
Thanks to the vigorous development of data mining techniques in the field of data
processing and computer science, many researchers have tried to apply them to other areas.
Nowadays, financial services are becoming more and more complex. In addition, there is an
enormous amount of data available to financial institutions. As a result, data mining and
statistical techniques could be used to access these data and support financial managers in
their credit risk decisions (Abedini et al., 2016).
Thus, data mining techniques can also be adopted to build credit scoring models.
Researchers have developed two categories of credit scoring techniques, i.e. statistical
models and optimization techniques on the one hand and artificial intelligence methods
on the other (Yu et al., 2008). Many statistical models and optimization techniques have
widely been reported in the literature. They are used to assess the credit risk including
discriminant analysis (Karels and Prakash, 1987), logistic regression (LR) (Henley,
1995), multivariate adaptive regression splines (Friedman, 1991), K-nearest neighbors
(Henley and Hand, 1996), linear programming (Glover, 1990) and the Bayesian classifier
(Yeh and Lien, 2009).
Although these methods can be used to assess the risk of credit, the ability to
distinguish good and bad clients is still worth to be further improved. In recent years,
with the development of artificial intelligence and machine learning, many studies have
shown that artificial neural networks (ANN) (Desai et al., 1996; Malhotra and Malhotra,
2002; West, 2000), decision trees (Davis et al., 1992) and support vector machines
(SVMs) (Huang et al., 2007; Schebesch and Stecking, 2005) can be used to predict credit
risk. These techniques were proved to be advantageous over statistical models and
optimization techniques for the assessment of credit risk (Yu et al., 2008). Among these
techniques, we focus on the LR, ANN and SVM to determine the credit worthiness of
clients. However, the implementation of these techniques to deal with unbalanced
datasets raises the supervised learning issue. In fact, learning from unbalanced
datasets requires appropriate strategies to achieve a correct classification of the
minority class.
JM2 In the classical credit risk assessment process, Tunisian banks do not take into account
13,4 the class imbalance in calculating their scores. In this paper, we focus on the class imbalance
problem in the calculation of these scores.
The unbalanced distribution of classes occurs naturally in the credit scoring if in general
the class of solvent borrowers is much larger than the class of default ones (Kiefer, 2009;
Phua et al., 2004). Therefore, the classifier tends to favor creditworthy clients as the majority
934 class. In other words, these clients could be learned on the model and thus identified with
high accuracy. However, insolvent clients within the minority class cannot be properly
identified. In real business, it is more important to identify insolvent clients to minimize
credit risk. This work aims at improving the classification accuracy of insolvent clients in
the minority class.
Most current unbalanced classification research has been focused on suggesting
solutions to improve prevision within the model. The most common solution to the problem
of imbalance is to alter the class distribution into a more balanced sample. The re-sampling
of data is frequently used to overcome the impact of unbalanced datasets on learning. Two
techniques were considered to restore balance between classes: under-sampling the majority
class or over sampling the minority class until they get almost equally represented (Provost,
2000; Witten and Frank, 2005). In our study, we chose two over sampling methods to
address this problem, namely, the synthetic minority over-sampling technique (SMOTE)
and random over-sampling (ROS). These re-sampling techniques are the most popular and
most studied because they are independent of the underlying classifier and can be easily
implemented (Oreški and Oreški, 2014; Brown and Mues, 2012).
In this paper, we address the problem of building scoring models on unbalanced
datasets. The dataset was collected from a Tunisian bank where the number of default
borrowers is lower than solvent ones. The objectives of this study are to:
 investigate the contribution of SMOTE and ROS to the performance of the classifier
in reducing the impact of class imbalance in the classification process; and
 study the role of these sampling methods in improving the performance of the
classification algorithm and obtaining an efficient tool for predicting credit risk.

The remainder of this paper is organized as follows. Section 2 explores the literature review
on the subject of the classification of unbalanced datasets. Section 3 describes the main
methodological steps used in our research. Section 4 presents a case study and findings.
Section 5 discusses the experimental design and results of our investigation. Section 6
concludes this paper and gives some guidelines for future work.

2. Literature review
The contribution of data mining techniques to the field of credit risk assessment has long
been approached. Recent research has focused on integrating various methods of artificial
intelligence data mining to increase precision and flexibility (Hens and Tiwari, 2012).
However, these methods did not take unbalanced datasets into consideration; and the extent
to which the comparison is affected by the class imbalance issue was not addressed. For this
reason, contradictory results were obtained by researchers in the domain as far as the
effectiveness of data mining techniques is concerned. For example, Yobas et al. (2000)
showed that the linear discriminant analysis outperforms the ANN in the prediction of
payment default, while Desai et al. (1996) conversely reported that ANN were rather more
efficient than discriminant analysis.
In fact, many comparative studies (Huang et al., 2004; Xiao et al., 2006; Wang et al., 2011)
have reported the inability to claim the superiority of one method over other competing
algorithms because of the discrepancy of data characteristics (noise, missing values, biased Artificial
class distribution [. . .]). These may significantly affect the reliability of most classification neural network
techniques.
The present work highlights the influence of biased class distribution, which is
considered as the most influential factor on the performance of the classification techniques
(Japkowicz and Stephen, 2002; Chawla et al., 2004). For example, Oreški and Oreški (2014)
studied the influence of unbalanced data classes on artificial intelligence methods, i.e. ANN
and SVM, and on the classical classification methods represented by Ripper and Naïve
935
Bayes classifiers.
The current research considers the classification problems. To measure the classification
quality, the accuracy rate and the area under the ROC curve (AUC) were used. To reduce the
negative influence of unbalanced data, the SMOTE sampling technique was used. The
experiment was performed on 30 original and repeated balanced datasets generated by
means of SMOTE. The research results indicated that unbalanced data had a negative
influence on the AUC when applying ANN and SVM. The same methods showed an
improvement in AUC measurement when applied on balanced data. Moreover, they showed
the deterioration of classification accuracy results. RIPPER results were also similar but the
changes were smaller in scale, while the Naïve Bayes classifier showed an overall
deterioration of the results on the balanced distributions.
García et al. (2014) reported that a very common problem related to credit risk prediction
occurs when the dataset is biased, i.e. the class of non-default exceeds the class of default. In
case of an incorrect classification, the minority class has a higher cost. This is a very
important issue that should be addressed when choosing a performance model because
many metrics are biased toward the majority class and therefore they may be inappropriate
for this type of financial application. They noted that the high accuracy rate is highly biased
and remains the only measure reported in various studies, despite the voices arguing that
other criteria should be used instead. In this sense, for example, AUC appears to be a more
appropriate performance measure than accuracy for unbalanced datasets, as it does not
implicitly imply equal costs of classification error.
According to Lopez et al. (2013), standard classifiers such as LR, SVM and the decision
tree are adapted for balanced learning data. When faced with unbalanced scenarios, these
models often provide suboptimal classification results, i.e. good representation of majority
examples, while minority examples are discarded.
Crone and Finlay (2012) conducted an empirical study of case sampling to predict a
borrower's behavior in repaying debts by evaluating the relative precision of LR,
discriminant analysis, decision trees and ANN. They concluded that balanced datasets
created through random sampling outperform those that are created using random
sampling. Their study is a practical contribution to the construction of models on credit
scoring datasets and provides evidence that the use of larger samples than those
recommended in the practice of credit scoring provides a significant increase in accuracy
through the algorithms.
Batista (2004) identified ten alternative techniques to deal with class imbalance and
tested them on 13 datasets. The selected techniques included a variety of ROS and random
under sampling (RUS) methods. His results suggest that the ROS generally provides more
accurate results than RUS.
Brown and Mues (2012) carried out a comparative study of various clustering techniques
when applied to asymmetrical credit datasets. They gradually increased the imbalance class
levels in each of the five real-world datasets to identify how the predictive power of each
JM2 technique is assigned. The results showed that traditional models such as LR and linear
13,4 discriminant analysis are robust enough for the size of unbalanced classes.
Marqués et al. (2013) studied the relevance and performance of several re-sampling
techniques when applied simultaneously with statistical prediction and LR and SVM
models on five sets of real credit data, which were artificially modified to derive different
imbalance ratios. All techniques were evaluated in terms of their AUC, and then compared
936 for statistical differences using the Friedman’s average rank test and a post hoc test. It is
also important to note that, in general, ROS techniques perform better than any other RUS
approaches.

3. Research methodology
The main methodological steps used in this research are presented in Figure 1 and detailed
and discussed in the following paragraphs.

3.1 Re-sampling data


Various kinds of techniques have been compared in the literature to try to determine the
most effective way to overcome a large class imbalance. The most obvious and simplest
method is RUS, where data are randomly removed from the training set of individuals

Figure 1.
Overview of the
methodology used in
our study
belonging to the majority class, so as to rebalance the dataset. This method has the Artificial
advantage of being very simple to implement. However, the problem is the loss of valuable neural network
information. It might delete important individuals for the concept of the majority class and
generate a lack of performance of the classifier (Weiss, 2004).
Another way to rebalance the datasets is the random replication of the number of
individuals belonging to the minority class (ROS). This simplistic approach risks to slow the
algorithms by adding individuals, while providing models unable to generalize (risk of
overfitting) (Chawla et al., 2002). Previous research (Japkowicz, 2000) discussed ROS and 937
noted that it does not improve the recognition of minority class significantly. To avoid the
disadvantages of ROS, Chawla et al. (2002) suggested SMOTE, an advanced sampling
technique that generates synthetic individuals to expand the limits of the minority class.
SMOTE is an intelligent oversampling method. It proceeds as follows: for each individual in
the minority class, its k-nearest neighbors in the same class are calculated and a number of
them (according to the rate of desired-sampling) are selected. Artificial individuals are then
randomly scattered along the line between the individual of the minority class and its
selected neighbors. Thus, the problem of over-learning is avoided and the minority class
border tends to approach the area of the majority class. SMOTE was proved to be a
powerful method for treating unbalanced classification problems. When it comes to
unbalanced data classifications, synthetic oversampling is more powerful than the re-
sampling of existing data. The power of synthetic oversampling seems to lie in the simple
fact that additional data are synthesized.
In our work, we opted for ROS and SMOTE to address the problem of unbalanced
datasets. The implementation of these methods was made with the “unbalanced library” of
the R software that is designed to provide standards and more sophisticated tools to
improve the binary classification task in an unbalanced framework. The perc.over and perc.
under parameters of the SMOTE algorithm control the over-sampled quantities of the
minority class and under-sampled of the majority class, respectively. The implementation of
the SMOTE algorithm uses five nearest neighbors and the perc.over and perc.under
parameters are equal to 200 per cent. Therefore, only two of the five nearest neighbors are
selected and one sample is generated in the direction of each.

3.2 Construction of the training and test samples


A training sample whose classification is known was used to model the various
classification methods and the training of classification rules for a borrower based on these
characteristics. It is necessary to study the reliability of those rules to compare and apply
them. For this reason, we used a second independent sample called test.
A random sampling was performed on the dependent variable “Credit”. We chose about
70 per cent of data for training and 30 per cent to test our risk assessment model. Table I
summarizes databases produced by ROS and SMOTE and divides them into sub-samples
which underwent the various experiments.

Total Training Test


Credit Not credit Credit Not credit Credit Not credit
Database worthy worthy worthy worthy worthy worthy
Table I.
Unbalanced data 300 108 210 76 90 32 Characteristics of
ROS 300 300 210 210 90 90 credit scoring
SMOTE 432 324 302 226 130 98 datasets
JM2 3.3 Data mining techniques
13,4 A brief explanation of each of the forecasting techniques applied in this work is presented
below.
3.3.1 Logistic regression. LR is a widely used statistical modeling technique in which the
probability of a dichotomous outcome (Y = 0 or Y = 1) is linked to a set of explanatory
variables. The endogenous variable Y corresponds to the coding of borrowers: 0 if it is a bad
938 borrower, 1 otherwise. X is the matrix of exogenous variables used to explain the variable Y.
The LR model is expressed as follows:
" #
P ðY ¼ 1Þ
Log ¼ / þ b 1 X1 þ b 2 X2 þ    þ b K XK (1)
1  P ðY ¼ 1Þ

where: P (Y = 1) is the probability that the event occurs in different subpopulations, called
prior probability; ! is a constant and b is the vector of coefficients of exogenous variables.
LR assumes that the dependent variable is linear in the coefficients of the predictor
variables. Its advantage lies in its application as a simple probabilistic formula for
classification (Wanke and Barros, 2016).
Wiginton (1980) was one of the first researchers to use LR to handle credit scoring.
Although the results were not very impressive, the simplicity of LR made it a popular
approach in many practical score applications. Thus, the objective of an LR model in credit
scoring is to determine the conditional probability of a credit applicant belonging to a class
(default or not default) given the independent variables of the credit applicant (Yap et al.,
2011).
3.3.2 Artificial neural networks. ANNs are widely used because of their modeling power.
ANN is a flexible and non-parametric tool inspired from biological neural systems. The
multilayer perceptron is especially suitable for classification and is widely used in practice.
The network consists of an input layer, one or more hidden layers and an output layer. Each
of them is composed of several neurons. Each neuron performs the processing of the inputs
and generates an output value that is transmitted to the neurons in the subsequent layer. It
has been shown that a multi-layer perceptron with a single hidden layer provided with a
sufficient number of neurons can approximate any function with the desired precision
(Hornik, 1991). ANN learning is used to find all wi weights that minimize the error function.
The most popular algorithm for learning the multilayer perceptron is the backpropagation
gradient algorithm. As the name indicates, the error calculated from the output layer is back
propagated through the network, and the weights are modified according to their
contribution to the error function.
The model implemented is a multilayer perceptron with gradient backpropagation
including an input layer, a hidden layer, determining the number of appropriate neurons to
be optimized, and an output layer.
Up to our knowledge, there is no theoretical basis to predict the number of hidden
neurons needed to obtain the specific performance of the model. It is therefore necessary to
implement a digital model design procedure. Each neuron in the hidden layer is a processing
element that receives the n weighted inputs wi by weight. The weighted sum of the inputs is
transformed with a nonlinear (sigmoid) activation function f(.):

1
f ðxÞ ¼ (2)
1 þ ex
ANNs were examined by Arminger et al. (1997), Desai et al. (1996), Lee et al. (2002), West Artificial
(2000), Khashman (2010) and Oreski et al. (2012) in accordance with credit scoring problems. neural network
Most studies have shown that ANNs are more precise, adaptable and robust than traditional
statistical methods for the evaluation of credit risk (Oreski et al., 2012).
3.3.3 Support vector machines. The SVM technique is a machine learning method that
witnessed in the past decade a great development in theory and application (Cristianini and
Taylor, 2000). It rests on a solid theoretical foundation based on the principle of maximizing
the margin of creating a hyperplane separating the data into two classes by giving them
939
high generalization ability. The performance of SVM is very sensitive to the standardization
of data. Furthermore, SVM requires fundamental choices such as the type of kernel to use
and the appropriate parameters of this kernel. This kind of choice can sometimes be time-
consuming because of the nature of the techniques used at this level.
The hyperplane wTx þ b = 0 separates the two classes and the distance between the
hyperplane, and the closest example is called the margin. Points on the boundaries of the
margin are called support vectors. The middle of the margin is referred to as optimal
separation hyperplane. The region which lies between the two hyperplanes wTx þ b = 1
and wTx þ b = þ1 is called the generalization region of the learning machine. The
generalization capacity of the machine depends on the size of this area.
However, in most real problems, it is usually difficult to separate any dataset by a single
hyperplane. The optimal boundary is generally nonlinear. Under SVM, the inclusion of non-
linearity in the model is done by the introduction of non-linear kernels. The main idea behind
the use of kernel functions is to project the input data x to another dimensional space D greater
than the size of the old space d called features space and build a linear classifier in this space.
Kernel functions K can be defined as (xi,xj) = hU(xi).U(xj)i. They calculate the scalar
product of the projections of the input data in a highly dimensional space.
The kernel function used in SVM must respect the mathematical principle known as
theorem of Mercer (Mercer, 1909). This principle ensures that the kernel function can
be expressed as a scalar product between the two input vectors in a space with large
dimensions. K functions satisfying Mercer's theorem, symmetric and positive, are called
kernel functions. Some of the widely used kernel functions include Linear
function: K ðxi ; xj Þ ¼ xTi :xj , Polynomial
2
function: K(xi,xj) = ( g xi.xj)d, Gaussian radial basis
function: K ðxi ; xj Þ ¼ es jxi xj j and Sigmoïd function: K(xi,xj) = tanh ( g xi.xj), g > 0.
The optimal linear separation is then the equation:
X n
1X n
Max/ Lð/Þ ¼ /i  yi yj /i /j K ðxi ; xj Þ
i¼1
2 i;j¼1
S:C (3)
X
n
/ i yi ¼ 0
i¼1
0 # /i # C i ¼ 1; . . . n
The decision function is:

X
n
hðxÞ ¼ /i y i K ðxi ; xj Þ þ b (4)
i¼1

The parameter C in the above formulation is called the regularization parameter C used to
penalize the misclassification error. This value is the compromise between misclassification
and complexity of the function, between empirical error and generalization error. To
JM2 determine the parameter C, it is possible to try different parameter values and measure the
13,4 SVM generalization error by cross-validation. Thus, we take the parameters for which the
generalization error is lowest. The cross-validation method is used to test the effect of
sampling variation on the performance of the model.

3.4 Performance measures


940 Numerous measurement tools are proposed in the literature to evaluate the
generalization capacity of classification models. The confusion matrix is one of the
most widely used criteria in the field of accounting and finance (in loan scoring
applications) (Table II).
The accuracy rate measures the proportion of borrowers classified correctly (true
positives and true negatives) compared to the whole list of studied borrowers. This indicator
is an important criterion in the evaluation of the classification capacity of the suggested
scoring models (Abdou and Pointon, 2011). The sensitivity is the ratio of correctly predicted
positive borrowers and the overall number of positive borrowers. The specificity is the ratio
between correctly predicted negative borrowers and the total number of negative borrowers.
G-mean was proposed by Kubat and Matwin (1997) and has been used by several
researchers to evaluate classifiers on unbalanced datasets (Ertekin et al., 2007; Su and Hsiao,
2007). G-mean indicates the classification performance balance between majority
and minority classes. This measure takes into account both the sensitivity and
specificity. Ideally, a system would be described as having 100 per cent sensitivity and 100
per cent specificity. However, two types of errors, Type I and Type II, often occur (Wang
et al., 2012):

TP þ TN
Accuracy ¼
TP þ TN þ FP þ FN

TP
True positive rate ðSensitivityÞ ¼
TP þ FN

TN
True negative rate ðSpecificityÞ ¼
TN þ FP
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G  mean ¼ Sensitivity  Specificity

FN
False negative rate ðError IÞ ¼
TP þ FN

^
Predicted Y
Actual Y ^ 1
Good payer Y= ^ 0
Poor payer Y=
Table II. No risky Y = 1 True Positive (TP) False Negative (FN) (error I)
Confusion matrix Risky Y = 0 False Positive (FP) (error II) True Negative (TN)
FP Artificial
False positive rate ðError IIÞ ¼
TN þ FP neural network
We also consider the measurement of AUC performance, which is often considered a better
quantification of overall performance. AUC measures the discrimination quality of the
model by translating the probability that a healthy business will have a higher score than
the score of a defaulted company. AUC = 1 means that the classifier is perfectly able to
941
discriminate between classes with a 100 per cent sensitivity and specificity. In this case,
both classes are well separated. An area equal to 0.5 indicates that the classifier does not
display a discrimination power (Harris, 2013).

3.5 Experimental approach


A series of experiments was made on a credit dataset that had been artificially modified by
means of two re-sampling methods and three prediction models. The purpose of the
experiments was to evaluate the performance of re-sampling strategies (ROS and SMOTE)
as part of credit scoring.
Classification was carried out according to three models widely applied for credit risk
prediction: LR, ANN and SVM. In fact, the unbalanced training dataset was first used as the
input of the three training algorithms. The second and third procedures concerned ROS and
SMOTE. Here, our input dataset was resampled by balancing it in a pretreatment step. The
obtained balanced dataset was used as the input for the three learning algorithms. All the
classification and validation results were recorded as a confusion matrix. From these results,
several performance measurements were calculated.
The aim of our research was to check the usefulness of ROS and SMOTE in balancing
datasets and analyze the impact of this procedure on the performance of the classification
algorithms. The ultimate goal of this work is to use data mining techniques by applying
data re-sampling measures jointly with the classification algorithms to obtain an accurate
prediction of credit risk that overcomes the class imbalance problem. Nine classification
models were constructed as indicated in Table III.
To create, manipulate and visualize results of multilayer neural networks, we used the
MATLAB software R2011.b. For SVM, we used the radial basis kernel function. Most often,
the credit risk prediction literature has applied the radial basis kernel function because of its
versatility, good overall performance and small number of parameters (C and g )
(Bhattacharyya et al., 2011). The implementation of this algorithm was provided by the
application of the package e1071 within the R software. The establishment of LR was done
with the MASS package within R.

No. Description

1 LR with original data


2 LR with ROS
3 LR with SMOTE
4 ANN with original data
5 ANN with ROS
6 ANN with SMOTE
7 SVM with original data Table III.
8 SVM with ROS The various models
9 SVM with SMOTE of the study
JM2 4. Case study and findings
13,4 4.1 Case study
The case study of this research consists of a set of management loans granted to Tunisian
companies in various sectors. Management loans represent the majority of credits granted
by the Tunisian commercial banks. All loan records for this study were collected from a
Tunisian bank along the period from 2011 to 2012. From the available list, 408 companies
942 were selected. Each company is represented by the dependent variable called “Credit” that
informs about its credibility. “Credit” is a binary variable of two values: 1 for creditworthy
borrowers and 0 for non-creditworthy borrowers. Out of the 408 companies, 300 have
successfully accomplished their credit obligations and are classified as creditworthy
borrowers (Class 1), and 108 companies did not assume the execution of their duties and are
classified in a group of non-creditworthy borrowers (Class 0). In this analysis, the borrower
is considered good if they pay (or has always paid) the loan properly and was never in
default for 90 days or more. However, a bad borrower has defaulted for 90 days or more at
any time during the life of the loan according to the New Basel Framework (Oreski et al.,
2012). In total, 25 variables are supposed to influence the loan repayment capacity including
3 qualitative binary variables and 22 quantitative variables (Table IV).

4.2 Findings
To accurately compare the different methods and obtain pertinent results, the comparison of
performance indicators concerned the test sample rather than the training sample as the
latter may give overoptimistic results.
The performance of LR, ANN and SVM with the different indicators is shown in
Tables V, VI and VII, respectively. For each technique, the results are given for the three test
datasets.

5. Discussion
The accuracy rates are generally high on the unbalanced dataset and for the different
classifiers. These increases are because of the fact that this dataset is heavily
overrepresented by good loans (sensitivity is the order of 90 per cent) and under-represented
by the number of bad loans (specificity varies between 53 and 59 per cent). It shows a huge
weakness in correctly classifying minority cases. These results confirm the work of Loyola-
González et al. (2016). They pointed out that the learning process guided by global
performance indicators such as the accuracy rate leads to a bias toward the majority class,
while the minority class remains unknown even if the prediction model produces a high
overall accuracy. The SVM method has the highest accuracy with the ROS sampling
strategies (92.22 per cent) and SMOTE (92.105 per cent) followed by ANN (91.667 per cent
with ROS and 89.035 per cent with SMOTE). Artificial intelligence methods significantly
outperform the LR model on the balanced dataset (83.90 per cent with ROS and 83.30 per
cent with SMOTE). Sensitivity and Error I led to similar results for LR, ANN and SVM
through the various datasets (about 90 per cent). The specificity increased after balancing
data for all techniques. So, by introducing the sampling methods (ROS and SMOTE) beside
the intelligent techniques, we achieved better results in terms of specificity: about 90 per
cent using the ROS and 80 per cent using the SMOTE method. The costs associated with
Error II were very high for the unbalanced dataset using the three prevision methods (0.469
for LR and 0.406 for ANN and SVM). From the obtained results, it is implied that all
methods are biased toward learning better about majority class, and its performance
degraded very badly for minority class. Error II showed better performance with sampling
Variables Measurement of variables
Artificial
neural network
Profitability ratios
V1 Financial profitability Net income/Net equity
V2 Operating profitability Gross operating surplus/Turnover
V3 Economic profitability Operating income/Economic assets
V4 Net profitability Net income/ Sales
Structure ratios 943
V5 Financial autonomy Shareholders' equity/Permanent capital
V6 Structural balance Permanent capital/Fixed assets
V7 WC coverage Working capital/working capital requirement
V8 Solvency Net capital/ Total assets
V9 Asset coverage Net capital/ Fixed assets
V10 Debt to fixed assets Long-/Medium-term debt/Fixed assets
Debt ratios
V11 Financial dependence Long- and medium-term debt/Permanent equity
V12 Repayment capacity Long and medium term/Cash flow net debt
V13 Debt ratio Financial charges/Turnover
V14 Financial burden Financial expenses/Gross operating surplus
Ratios of rotation
V15 Working capital ratio Turnover/Total Fixed assets
V16 Inventory turnover ratio Turnover/Net stocks
Other variables
V17 Devoted turnover Movement/Sales
V18 Share of funding Bank commitment/Banking system commitment
V19 Study duration of a credit report Log (study period)
V20 Corporate banking relationship duration 1 if the relationship length 15 months; 0 otherwise
V21 Guarantees Log (guarantees)
V22 Size of the company Log (turnover)
V23 Score: credit line number Log (Score) Table IV.
V24 Ownership structure 1 if the officer holds more than 50% of the capital; otherwise The variables of the
V25 Legal form 1 = SARL; 0 otherwise study

LR Accuracy (%) Sensitivity Specificity Error I Error II G-Mean AUC


Table V.
Unbalanced data 81.15 0.911 0.531 0.089 0.469 0.696 0.812 LR performance
ROS 83.90 0.922 0.756 0.078 0.244 0.835 0.900 before and after
SMOTE 83.30 0.915 0.725 0.085 0.276 0.814 0.900 balancing data

ANN Accuracy (%) Sensitivity Specificity Error I Error II G-Mean AUC


Table VI.
Unbalanced data 81.967 0.900 0.594 0.100 0.406 0.731 0.747 ANN performance
ROS 91.667 0.911 0.922 0.089 0.078 0.917 0.917 before and after
SMOTE 89.035 0.954 0.806 0.046 0.194 0.877 0.880 balancing data

strategies and for all classification methods. It was in the range of 0.244 (0.276) for LR, 0.078
(0.194) for ANN and 0.067 (0.112) for SVM with ROS data (SMOTE data).
As for of G-Mean and AUC measures, ANN and SVM with balanced databases provide
almost similar results, significantly better than LR.
JM2 Tables V, VI and VII show that models developed using the unbalanced dataset had low
13,4 G-Mean values (LR = 0.696, ANN = 0.731 and SVM = 0.736). After applying the balancing
techniques, the G-Mean increased up to 0.814 for LR, 0.877 for ANN and 0.916 for SVM with
the SMOTE data. With the ROS data, the improvements were even more important (0.835
for LR, 0.917 for ANN and 0.922 for SVM).
Typically, the higher the AUC, the better our classifier is. We note from Table VII that
944 the SVM with ROS are the best performers with an AUC equal to 0.974. This value shows
that our model has an excellent discriminating power. SVM with SMOTE (AUC = 0.961),
ANN with ROS (AUC = 0.917), LR with ROS and SMOTE (AUC = 0.900) and ANN with
SMOTE (AUC = 0.880) are less efficient than SVM with ROS. Methods that predict classes
less well are those with the unbalanced dataset: ANN with an AUC equal to 0.747, LR with
an AUC equal to 0.812 and SVM with an AUC equal to 0.879. We can conclude that the
predictive power of these methods is not satisfactory and that the models are not
discriminant. SVM generally showed better performance than other classification
techniques.
Given the obtained improvements in prevision methods through re-sampling techniques,
these findings bring about novel and significant advancements for credit scoring practice.
Figure 2 illustrates the comparison between prediction techniques through different
datasets on various performance measures.
The accuracy graph shows that all classifiers had better accuracy over unbalanced
and balanced data. According to Picinini et al. (2003), credit scoring models with
classification rates higher than 65 per cent are considered good. Accordingly, if the
dataset is unbalanced, the accuracy is high even if the classifier classes correctly all
the majority examples and tends to misclassify all minority examples, because
majority examples outnumber minority examples largely. We conclude that the
accuracy is generally better suited to enjoy the majority class and misbehaves at the
minority class when used to evaluate the performance of a learner with a set of
unbalanced data. In this condition, the rate of correct classification cannot evaluate the
prediction of minority class reliably. It is not an appropriate measure as the minority
class has less influence on the accuracy as compared to the majority class.
Sensitivity is also regarded to have a consistent performance across all datasets. This
result means that the number of creditworthy clients is large and their characteristics can be
learned enough in the models. The specificity obtained by LR using the SMOTE and ROS
approaches give better results than unbalanced data. In the field of credit rating, research
focused on analyzing the behavior of conventional forecasting models, showing that the
performance of the minority class drops significantly when the imbalance ratio increases
(Brown and Mues, 2012; Kennedy et al., 2010).
The performance in classifying borrowers of the minority class using the SVM method
has been significantly improved by the re-sampling approaches. Wu and Chang (2003)
reported that the performance of SVM drops significantly when confronted with unbalanced
datasets where the number of positive cases far exceeds the negative cases. ANN tends to
have better performance on balanced class distributions (Oreski and Oreski, 2014). The most

SVM Accuracy (%) Sensitivity Specificity Error I Error II G-Mean AUC


Table VII.
SVM performance Unbalanced data 82.787 0.911 0.594 0.089 0.406 0.736 0.879
before and after ROS 92.222 0.911 0.933 0.089 0.067 0.922 0.974
balancing data SMOTE 92.105 0.946 0.888 0.054 0.112 0.916 0.961
Artificial
neural network

945

Figure 2.
Performance of
different
classification
techniques before and
after balancing data

interesting result refers to the fact that for all classifiers, re-sampling approaches produced
significant gains in performance compared to the use of unbalanced datasets. So working
with asymmetric datasets is still a major obstacle for the learning classifier (Weiss and
Provost, 2003).
JM2 In credit scoring applications, a small increase in performance can result in significant
13,4 future savings and have important commercial implications. Consequently, improving the
performance achieved by the re-sampling strategies can be of great importance for banks
and financial institutions (Garcia et al., 2012). In the credit analysis, Type I error leads to a
loss of business with healthy existing business that was mistakenly classified as risky. A
Type II error involves a loss of capital loans and associated interest with a company that
946 went bankrupt while it was estimated healthy. In practice, it is very important to achieve an
appropriate balance between errors Type I and Type II in order not to lose potential healthy
customers.
Type I and Type II errors are very important criteria to assess the overall performance of
the forecasting models developed in our study and find the minimal cost of misclassification
for the suggested scoring models. The different classification techniques with three
databases (unbalanced, ROS and SMOTE) showed weak Type I error to identify healthy
customers of the majority class. The class imbalance seems to significantly increase the
credit risk for Tunisian banks. This result can be explained as follows: Type II error shows
the classification error rate of a model that incorrectly classified insolvent customers within
those who are healthy. When this happens, banks are exposed to high credit risk. The
methods proposed for re-sampling make it possible to control errors of Type I and Type II
simultaneously. In other words, they could not only correctly identify healthy borrowers of
the majority class but also classify borrowers of the minority class with a very low error
rate. Our study clearly indicates that it is not advisable to use skewed samples as the Type II
error increases dramatically, while the Type I error witnesses only a slight improvement. In
fact, the implementation of sampling strategies (SMOTE and ROS) could help financial
institutions reduce the costs of erroneous classification in comparison with the original data.
Our model would be of great practical implications for banks and can provide a way to
ensure a competitive advantage over other banks that fail to implement such a methodology.
The AUC graph shows that the classifiers, LR, ANN and SVM, have better AUC scores
on the balanced sets of data. The class imbalance significantly affects the performance of the
three classification methods. The latter deteriorates dramatically with the class imbalance.
The G-Mean indicator shows the balance between the classification performance of the
majority and minority class. LR, SVM and ANN have poor performance in terms of G-Mean
with unbalanced dataset. This result confirms the work of Hido et al. (2009). They mentioned
that poor performance in predicting negative examples leads to a low value of G-Mean,
although positive examples are correctly classified by the model. The classification
algorithms after re-balancing the dataset gave best results in terms of G-Mean. These results
demonstrated that balancing strategies have a superior ability to identify healthy borrowers
of the majority class and insolvent borrowers of the minority class than that of the original
database.

6. Conclusions
This work attempts to solve the credit assessment problem by suggesting a new approach
based on the treatment of class imbalance. Three different methods were developed in this
study to assess the credit risk: LR, SVM and ANN. The proposed methods were applied to
solve a credit assessment problem in a Tunisian bank. Our approach is not based on not
only the application of classification techniques developed in the literature but also re-
sampling methods that we adapt to the issue of class imbalance.
For the experiment, we used a sample of 408 Tunisian firms along the period from 2011
to 2012, among which 108 firms were non-creditworthy borrowers. We used a battery of 3
qualitative binary variables and 22 quantitative ones that showed their influence in the loan
repayment capacity. Empirically, we used SMOTE and ROS to balance the datasets. To Artificial
evaluate the different techniques, we used the following performance indicators to compare neural network
the results: accuracy rate, sensitivity, specificity, Type I and Type II errors, G-Mean and
AUC. Comparing these performance indicators of the different prediction methods before
and after data balancing, we concluded that the historical data used for the construction of
credit risk prediction models cannot be reliable to produce scores when the number of
examples of the non-creditworthy class is not sufficient. Implementation of sampling
strategies (SMOTE and ROS) may help financial institutions to reduce the erroneous 947
classification costs (Error I and II) in comparison with the unbalanced data. These strategies
improved the performance of prediction models with regard to unbalanced data
independently of the classifier used. They can also identify insolvent clients better than
unbalanced data. After a detailed analysis, we concluded that for a classifier, the ideal
distribution is certainly not of origin one. The accuracy rate is not sufficient to make
predictions as the minority class has less effect on precision than the majority class. Finally,
methods of LR, ANN and SVM have a high sensitivity to unbalanced data when used for
modeling the bank's credit risk.
Our results may be useful in the credit risk assessment process. Indeed, the Tunisian
banks do not take into account the problem of the class imbalance for the calculation of their
scores. Therefore, it is very important to solve this problem when training credit scoring
models. Another contribution of this study is to explore the relevance and performance of re-
sampling models when applied together with statistical and artificial intelligence techniques
to predict the probability of default settings using a real-world credit dataset. Specifically,
the results demonstrated performance gains when introducing re-sampling strategies into
artificial intelligence methods, leading to significant benefits in the borrowers’ solvency
prediction accuracy. This can provide decision-makers with efficient and effective tools to
more accurately assess credit risk.
The main implications expressed in this work are that, on the basis of these conclusions,
Tunisian banks are advised to adopt this approach to assess credit risk. Creditors must
properly assess the company's financial position and be vigilant for signs of imminent credit
risk to avoid capital losses and costs associated with counterparty default.
Our results provide an interesting perspective on the role of sampling strategies in the
Tunisian credit market and its impact on credit risk. To improve the credit risk management
process in Tunisian banks, it would be wise to set up templates and sampling strategies in
addition to classification techniques to monitor and control the attributed credit.
Meanwhile, this work has a few limitations that should be addressed in future
investigations:
 Very often, the main problem in credit risk research is to determine which variables
have a significant influence on the probability of default. As a result, the accuracy of
credit risk prediction is improved by selecting the most significant financial and
non-financial variables that can be better used to construct the credit scoring model
and determine the solvency of firms. To distinguish between creditworthy
companies and non-creditworthy ones, Huang and Wang (2017) identified the key
financial ratios and their corresponding financial factors that provide the best
discrimination power. They concluded that cash flow, profitability, solvency and
leverage financial factors are among the most useful financial ratios that can
effectively differentiate between creditworthy and non-creditworthy companies.
 Class imbalance is not the only problem responsible for credit risk. The problem of
credit risk assessment also stems from the information asymmetry that makes it
difficult to assess borrowers. The minimization of credit risk depends mainly on the
JM2 ability of the bank to collect and process the information when selecting credit
13,4 applications. At the level of the selection, the banks are faced with the heterogeneity
of the clients and they are unable to have sufficient information on all the borrowers.
Thus, the lender is the victim of an asymmetry of information because of the
opportunism of the borrowers who conceal the reality of their level of risk. In an
imperfect information frame, the credit decision is very difficult and the bank can
948 distribute credit to a bad borrower and refuses to finance a good borrower.

This work paves the way to building and developing new perspectives. The development of
other dataset balancing methods, their adaptation to the credit scoring model shall be of
great interest. In addition to the sampling methods, new classification tendencies emerged
such as the selection of parameters and optimization of the model structure. The selection of
the parameters was tested on credit datasets and led to good results (Kou et al., 2014). The
technique of cost-sensitive learning is also an alternative for the modeling and the forecast of
the credit risk in case of unbalanced data. This approach must be thoroughly studied
because it deals with the prediction of real-world credit risk characteristics.

References
Abdou, H. and Pointon, J. (2011), “Credit scoring, statistical techniques and evaluation criteria: a review
of the literature”, Intelligent Systems in Accounting, Finance and Management, Vol. 18 Nos 2/3,
pp. 59-88.
Abedini, M., Ahmadzadeh, F. and Noorossana, R. (2016), “Customer credit scoring using a hybrid data
mining approach”, Kybernetes, Vol. 45 No. 10, pp. 1576-1588.
Apostolik, R., Donohue, C. and Went, P. (2009), Foundations of Banking Risk: An Overview of Banking,
Banking Risks, and Risk-Based Banking Regulation, John Wiley& Sons, NJ.
Arminger, G., Enache, D. and Bonne, T. (1997), “Analyzing credit risk data: a comparison of logistic
discriminant classification tree analysis and feedforward networks”, Computational Statistics,
Vol. 12 No. 2, pp. 93-310.
Batista, G. (2004), “A study of the behavior of several methods for balancing machine learning training
data”, ACM SIGKDD Explorations Newsletter, Vol. 6 No. 1, pp. 20-29.
Bhattacharyya, S., Jha, S., Tharakunnel, K. and Westland, J.C. (2011), “Data mining for credit card
fraud: a comparative study”, Decision Support Systems, Vol. 50 No. 3, pp. 602-613.
Brown, I. and Mues, C. (2012), “An experimental comparison of classification algorithms for imbalanced
credit scoring data sets”, Expert Systems with Applications, Vol. 39 No. 3, pp. 3446-3453.
Chawla, N.V., Japkowicz, N. and Kotcz, A. (2004), “Editorial: special issue on learning from imbalanced
datasets”, SIGKDD Explorations Newsletter, Vol. 6 No. 1, pp. 1-6.
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002), “SMOTE: Synthetic minority
over-sampling technique”, Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357.
Cristianini, N. and Taylor, J.S. (2000), An Introduction to Support Vector Machines and Other Kernel-
Based Learning Methods, University Press, Cambridge.
Crone, S.F. and Finlay, S. (2012), “Instance sampling in credit scoring: an empirical study of sample size
and balancing”, International Journal of Forecasting, Vol. 28 No. 1, pp. 224-238.
Davis, R.H., Edelman, G.B. and Gammerman, A.J. (1992), “Machine learning algorithms for credit card
applications”, IMA Journal of Management Mathematics, Vol. 4 No. 1, pp. 43-51.
Desai, V.S., Crook, J.N. and Overstreet, G.A. Jr (1996), “A comparison of neural networks and linear
scoring models in the credit union environment”, European Journal Operational, Vol. 95 No. 1,
pp. 24-37.
Ertekin, S., Huang, J., Bottou, L. and Giles, C.L. (2007), “Learning on the border: active learning in Artificial
imbalanced data classification”, Proceedings of the Sixteenth ACM Conference on Information
and Knowledge Management, pp. 127-136.
neural network
Friedman, J.H. (1991), “Multivariate adaptive regression splines”, The Annals of Statistics, Vol. 19 No. 1,
pp. 1-141.
García, V., Marqués, A.I. and Sánchez, J.S. (2014), “An insight into the experimental design for credit
risk and corporate bankruptcy prediction systems”, Journal of Intelligent Information Systems,
Vol. 44 No. 1, pp. 159-189. 949
Garcia, V., Marques, A.I. and Sanchez, J.S. (2012), “Improving risk predictions by preprocessing
imbalanced credit data”, Neural Information Processing, 19th International Conference (ICONIP
2012), Proceedings, pp. 68-75.
Glover, F. (1990), “Improved linear programming models for discriminant analysis”, Decision Science,
Vol. 21 No. 4, pp. 771-785.
Harris, T. (2013), “Quantitative credit risk assessment using support vector machines: broad versus
narrow default definitions”, Expert Systems with Applications, Vol. 40 No. 11, pp. 4404-4413.
Henley, W.E. (1995), “Statistical aspects of credit scoring”, PhD dissertation, The Open University,
Milton Keynes.
Henley, W.E. and Hand, D.J. (1996), “A k-nearest neighbour classifier for assessing consumer risk”,
Statistician, Vol. 45 No. 1, pp. 77-95.
Hens, A.B. and Tiwari, M.K. (2012), “Computational time reduction for credit scoring: an integrated
approach based on support vector machine and stratified sampling method”, Expert Systems
with Applications, Vol. 39 No. 8, pp. 6774-6781.
Hido, S., Kashima, H. and Takahashi, Y. (2009), “Roughly balanced bagging for imbalanced data”,
Statistical Analysis and Data Mining, Vol. 2 Nos 5/6, pp. 5-6.
Hornik, K. (1991), “Approximation capabilities of multilayer feedforward networks”, Neural Networks,
Vol. 4 No. 2, pp. 251-257.
Huang, C.L., Chen, M.C. and Wang, C.J. (2007), “Credit scoring with a data mining approach based on
support vector machines”, Expert System with Applications, Vol. 33 No. 4, pp. 847-856.
Huang, J. and Wang, H. (2017), “A data analytics framework for key financial factors”, Journal of
Modelling in Management, Vol. 12 No. 2, available at: http://dx.doi.org/10.1108/JM2-08-2015-
0056.
Huang, Z., Chen, H., Hsu, C.J., Chen, W.H. and Wu, S. (2004), “Credit rating analysis with support vector
machines and neural networks: a market comparative study”, Decision Support Systems, Vol. 37
No. 4, pp. 543-558.
Japkowicz, N. (2000), “The class imbalance problem: significance and strategies”, in Proceedings of the
2000 International Conference on Artificial Intelligence (ICAI).
Japkowicz, N. and Stephen, S. (2002), “The class imbalance problem: a systematic study”, Intelligent
Data Analysis, Vol. 6 No. 5, pp. 429-449.
Karels, G. and Prakash, A. (1987), “Multivariate normality and forecasting of business bankruptcy”,
Journal of Business Finance and Accounting, Vol. 14 No. 4, pp. 573-593.
Kennedy, K., Mac, Namee, B. and Delany, S.J. (2010), “Learning without default: a study of one-class
classification and the low-default portfolio problem”, Proceedings of the 20th Irish Conference on
Artificial Intelligence and Cognitive Science, Dublin, pp. 174-187.
Khashman, A. (2010), “Neural networks for credit risk evaluation: investigation of different neural
models and learning schemes”, Expert Systems with Applications, Vol. 37 No. 9,
pp. 6233-6239.
Kiefer, N.M. (2009), “Default estimation for low-default portfolios”, Journal of Empirical Finance, Vol. 16
No. 1, pp. 164-173.
JM2 Kou, G., Peng, Y. and Lu, C. (2014), “MCDM approach to evaluating bank loan default models”,
Technological and Economic Development of Economy, Vol. 20 No. 2, pp. 292-311.
13,4
Kubat, M. and Matwin, S. (1997), “Addressing the curse of imbalanced training sets: one-sided
selection”, Proceedings of the Fourteenth International Conference on Machine Learning,
pp. 179-186.
Lee, T., Chiu, C., Lu, C. and Chen, I. (2002), “Credit scoring using the hybrid neural discriminant
technique”, Expert Systems with Applications, Vol. 23 No. 3, pp. 245-254.
950
Lopez, V., Fernández, A., García, S., Palade, V. and Herrera, F. (2013), “An insight into classification
with imbalanced data: empirical results and current trends on using data intrinsic
characteristics”, Information Sciences, Vol. 250, pp. 113-141.
Loyola-González, O., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A. and García-Borroto, M. (2016),
“Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced
databases”, Neurocomputing, Vol. 175, pp. 935-947.
Malhotra, R. and Malhotra, K. (2002), “Differentiating between good credits and bad credits using
Neuro-Fuzzy systems”, Computer Journal of Operational Research, Vol. 136 No. 2, pp. 190-201.
Marqués, A.I., García, V. and Sánchez, J.S. (2013), “On the suitability of resampling techniques for the
class imbalance problem in credit scoring”, Journal of the Operational Research Society, Vol. 64
No. 7, pp. 1060-1070.
Mercer, J. (1909), “Functions of positive and negative type and their connection with the theory of
integral equations”, Philosophical Transactions of the Royal Society, Vol. 83 No. 559, pp. 441-458.
Oreški, G. and Oreški, S. (2014), “An experimental comparison of classification algorithm
performances for highly imbalanced datasets”, available at: https://bib.irb.hr/datoteka/717193.
CECIIS_2014_Oreski_Oreski.pdf (accessed 6 June 2016).
Oreski, S., Oreski, D. and Oreski, G. (2012), “Hybrid system with genetic algorithm and artificial neural
networks and its application to retail credit risk assessment”, Expert Systems with Applications,
Vol. 39 No. 16, pp. 12605-12617.
Phua, C., Alahakoon, D. and Lee, V. (2004), “Minority report in fraud detection: classification of skewed
data”, SIGKDD Explorations Newsletter, Vol. 6 No. 1, pp. 50-59.
Picinini, R., Oliveira, G.M.B. and Monteiro, L.H.A. (2003), “Mineração de critério de credit scoring
utilizando algoritmos genéticos”, VI Brazilian Symposium of Intelligent Automation, pp. 463-466.
Provost, F. (2000), “Learning with imbalanced datasets”, Invited paper for the AAAI’2000 Workshop on
Imbalanced Datasets.
Schebesch, K.B. and Stecking, R. (2005), “Support vector machines for classifying and describing credit
applicants: detecting typical and critical regions”, Journal of the Operational Research Society,
Vol. 56 No. 9, pp. 1082-1088.
Su, C.T. and Hsiao, Y.H. (2007), “An evaluation of the robustness of MTS for imbalanced data”, IEEE
Transactions on Knowledge and Data Engineering, Vol. 19 No. 10, pp. 1321-1332.
Thomas, L.C., Edelman, D.B. and Crook, J.N. (2002), Credit Scoring and Its Applications, 1st ed., SIAM.
Wang, G., Hao, J., Ma, J. and Jiang, H. (2011), “A comparative assessment of ensemble learning for credit
scoring”, Expert Systems with Application, Vol. 38 No. 1, pp. 223-223.
Wang, G., Ma, J., Huang, L. and Xu, K. (2012), “Two credit scoring models based on dual strategy
ensemble trees”, Knowledge-Based Systems, Vol. 26, pp. 61-68.
Wanke, P. and Barros, C.P. (2016), “Efficiency drivers in Brazilian insurance: a two-stage DEA meta
frontier-data mining approach”, Economic Modelling, Vol. 53, pp. 8-22, ISSN 0264-9993, available
at: http://dx.doi.org/10.1016/j.econmod.2015.11.005
Weiss, G. and Provost, F.J. (2003), “Learning when training data are costly: the effect of class
distribution on tree induction”, Journal of Artificial Intelligence Research, Vol. 19, pp. 315-354.
Weiss, G.M. (2004), “Mining with rarity: a unifying framework”, SIGKDD Explorations, Vol. 6, pp. 7-9.
West, D. (2000), “Neural network credit scoring models”, Computers and Operations Research, Vol. 27 Artificial
Nos 11/12, pp. 1131-1152.
neural network
Wiginton, J.C. (1980), “A note on the comparison of logit and discriminant models of consumer credit
behavior”, The Journal of Finance and Quantitative Analysis, Vol. 15 No. 3, pp. 757-770.
Witten, I.H. and Frank, E. (2005), “Data mining: practical machine learning tools and techniques”,
Amsterdam: Morgan Kaufman, 2nd ed., Elsevier.
Wu, G. and Chang, E.Y. (2003), “Class boundary alignment for imbalanced dataset learning”, ICML
2003 workshop on learning from imbalanced datasets II, Washington, DC, pp. 49-56.
951
Xiao, W., Zhao, Q. and Fei, Q. (2006), “A comparative study of data mining methods in consumer loans
credit scoring management”, Journal of Systems Science and Systems Engineering, Vol. 15 No. 4,
pp. 419-435.
Yap, B.W., Ong, S.H. and Husaina, N.H.M. (2011), “Using data mining to improve assessment of credit
worthiness via credit scoring models”, Expert System with Applications, Vol. 38 No. 10,
pp. 13274-13283.
Yeh, I.C. and Lien, C.H. (2009), “The comparisons of data mining techniques for the predictive accuracy
of probability of default of credit card clients”, Expert Systems with Applications, Vol. 36 No. 2,
pp. 2473-2480.
Yobas, M.B., Crook, J.N. and Ross, P. (2000), “Credit scoring using neural and evolutionary techniques”,
IMA Journal of Management Mathematics, Vol. 11 No. 2, pp. 111-125.
Yu, L., Wang, S. and Lai, K.K. (2008), “Credit risk assessment with a multistage neural network
ensemble learning approach”, Expert Systems with Applications, Vol. 34 No. 2, pp. 1434-1444.

Further reading
Bellotti, T. and Crook, J. (2009), “Support vector machines for credit scoring and discovery of significant
features”, Expert System with Applications, Vol. 36 No. 2, pp. 3302-3308.

Corresponding author
Sihem Khemakhem can be contacted at: sihemkhemakhem@yahoo.fr

For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

You might also like