You are on page 1of 11

International Journal of Computational Intelligence Systems, Vol.

11 (2018) 925-935
___________________________________________________________________________________________________________

Credit risk prediction in an imbalanced social lending environment

Anahita Namvar1,2, Mohammad Siami2, Fethi Rabhi1, Mohsen Naderpour2

1
FinanceIT Research Group, University of New South Wales,
Sydney, NSW, Australia
E-mail: Anahita.namvar@gmail.com
f.rabhi@unsw.edu.au
2
Centre for Artificial Intelligence, University of Technology Sydney,
Sydney, NSW, Australia
E-mail: Mohammad.siaminamini@uts.edu.au
Mohsen.Naderpour@uts.edu.au

Received January 2, 2018

Accepted March 9, 2018

Abstract

Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly
in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for
social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still
controversial. In an attempt to address these problems, this paper presents an empirical comparison of various
combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates
imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias
towards the majority class, which has not been considered in similar studies. The results reveal that combining random
forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan
applicants in social lending markets.
Keywords: Risk prediction, peer-to-peer lending, imbalance classification, resampling.

1. Introduction lending usually occurs in settings with a high level of


information asymmetry – that is, settings where the
Social lending, also known as peer-to-peer (P2P) lenders do not have complete information about the
lending, uses online trading platforms as a channel for borrowers’ credit history. Even when that information
lending money without the interference of traditional is available, lenders may not know how to extract
financial intermediaries, such as banks. Conducting useful knowledge from the data [1], and manually
business on peer platforms has recently become assessing a borrower’s credit risk is rarely a practical
popular because it not only reduces financing costs but alternative, given the high level of expertise that
also has the potential for higher profitability for both requires.
lenders and borrowers [1]. Borrowers benefit from However, supporting collateral, certified accounts,
lower interest rates; lenders receive a higher return than and regular reports are available through traditional
they would from a bank [2]. banks, which could be used to supplement credit risk
However, evaluating the creditworthiness of loan prediction in P2P lending markets if doing so did not
applicants is a common challenge in micro-financing, increase transaction costs [3]. Therefore, predicting a
where loans are typically unsecured [2]. Further, P2P borrower’s creditworthiness to support decision

Copyright © 2018, the Authors. Published by Atlantis Press.


This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
925
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

making on whether or not to fund particular loans has 2. Literature Review


emerged as a critical problem for P2P lending
platforms. More effective loan evaluation tools are 2.1. Loan Evaluation in P2P lending
needed for these lending platforms to increase their P2P lending has emerged as a new e-commerce
market share in the financial industry [2, 4]. platform in the financial marketplace. As with many
Traditional loan evaluation techniques assume a crowd-sourced services, P2P lending is bringing new
balanced distribution of misclassifications; however, economic efficiencies to financing [1]. Currently,
an imbalanced dataset is far more typical of social scholars are undertaking three streams of research into
lending platforms. To the best of our knowledge, none this new business model: the reasons behind the
of the contemporary studies into P2P lending have development of P2P lending (which is outside the
explored resampling approaches for class imbalance scope of this paper); the factors and methods that affect
issues in credit risk prediction. the likelihood of defaults or funding success; and the
Class imbalance problems arise when there are a far performance of a range of P2P platforms and credit risk
greater or fewer number of objects in one class than prediction tools for evaluating loans [3].
another. Effectively predicting credit risk from an Some of the studies that have examined the factors
imbalanced dataset is difficult because imbalanced data affecting funding success and the risk of defaults find
affects the ability of the model to discriminate between that strong social networking relationships, personal
good borrowers and potential defaulters [4],and Data characteristics and variables such as the amount of the
mining algorithms ignore the minority classes and loan, the interest rate, and the term of the loan are an
focus on the majority class[5]. important factors in determining a borrower’s credit
Therefore, to increase the reliability of credit risk risk and influence funding success[3].
prediction in social lending, we aimed to study the However, as a new trend in finance, few studies
advantages and disadvantages of various strategies for have explored credit risk prediction in P2P lending,
processing imbalanced data using machine learning although this number is growing. Emekter, Tu,
techniques. The research in this paper makes several Jirasakuldech and Lu [3] applied logistic regression to
contributions to the literature in the new and fast- investigate the probability of defaults and find some
growing field of P2P lending. First, we develop a new associations between loan defaults and credit scores,
credit risk prediction process based on computational debt-to-income ratios, FICO scores, and revolving
intelligence methods, and apply the most recent dataset credit lines. Malekipirbazari and Aksakalli [2] used a
of lending club, one of the biggest online P2P lending range of machine learning methods to classify good
platforms. To the best of our knowledge, no study has and bad loans, such as random forest, logistic
used the most recent dataset of this platform. Second, regression, k-nearest neighbor, and support vector
this paper introduces a new attribute we developed that machines. They find that using a machine learning
helps to capture a borrower’s creditworthiness. Third, approach is much more effective than relying on the
we address the imbalance problem by comparing existing financial metrics, like FICO and LC grades,
various resampling approaches to determine which which the Lending Club provides to help lenders to
ones improve creditworthiness evaluations and how. make loan investment decisions.
Further, we explore various machine learning Several decision support systems based on credit
classifiers and resampling techniques to determine scoring models have been developed to help banks
which combinations handle imbalanced data most decide whether or not to extend credit to loan
efficiently. applicants. There are two types of predictive models in
The rest of the paper is organized as follows. credit scoring: statistical approaches and artificial
Section 2 provides a literature review of the loan intelligence methods. Statistical methods include linear
evaluation techniques in P2P lending markets and discriminant analysis and logistic regression. The
research related to class imbalance problems. Section 3 artificial intelligence methods include decision trees,
describes our research methodology, Section 4 presents random forest, support vector machine, neural
the experimental results, and Section 5 provides our networks, and naïve Bayes. Despite the development
conclusions and future research directions. of more advanced AI methods for evaluating the credit

926
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

risk of borrowers, simple statistical approaches such as corrects SMOTE data by finding pairs of minimally
linear discriminate analysis and logistic regression still distanced nearest neighbors of opposite classes. It then
remain popular because of their high accuracy and ease identifies and removes Tomek links to produce a
of implementation [6-9]. balanced dataset with well-defined classes. SMOTE-
ENN follows the same procedure as SMOTE-Tomek
2.2. Class Imbalance Problem but uses edited nearest neighbors to balance the dataset.
The class imbalance problems are common in Class imbalance is a common problem in loan
classification problems. Class imbalance occurs when default prediction. Abeysinghe, Li and He [17], Brown
the number of instances in one class is vastly different and Mues [18], and Sanz, Bernardo, Herrera, Bustince
than the instances in another. In such cases, classifiers and Hagras [19] have all provided solutions to class
tend to be biased towards the majority class, while the imbalance problems for credit risk prediction.
minority class is ignored [10]. Many algorithms However, to the best of our knowledge, only a few
designed to address imbalanced data classification studies have addressed the class imbalance problem in
problems have emerged over the past decade, and detail in the social lending context.
resampling is one of the most important strategies for
solving this issue [11]. Resampling generates a balance 3. Research Methodology
training dataset prior to building the classification
To help lenders evaluate the creditworthiness of
model.
borrowers in social lending platforms, we developed a
The three types of resampling techniques are over-
decision support system that includes a novel
sampling, under-sampling, and a hybrid of the two
prediction model to reduce the risk of loan defaults.
[11].
Figure 1 illustrates the model; The model first takes a
Over-sampling creates new samples in the minority
raw data and passed it through feature engineering step
class. Three prominent oversampling methods are
random over-sampling, synthetic minority over-
sampling technique (SMOTE), and adaptive synthetic
sampling. Random over-sampling randomly duplicates
the minority samples to balance the distribution of data
[12]. SMOTE uses k-nearest neighbors to produce new
instances based on the distance between the minority
data and some randomly selected nearest neighbors
[13]. Adaptive synthetic sampling uses the density
distribution to generate synthetic samples for each
minority instance [14].
Under-sampling discards samples from majority
class [15]. Random under-sampling and instance
hardness threshold are the state-of-the-art under-
sampling techniques. Random under-sampling
randomly eliminates examples from the majority class
to balance the class distribution [12]. The instance
hardness threshold method measures the probability
that an instance will be misclassified and uses a
constant threshold to filter instances in all iterations
[16].
The third resampling approach is a hybrid method
that combines over- and under-sampling. SMOTE +
Tomek links (SMOTE-TOMEK) and SMOTE+ edited
Fig. 1. Research methodology
nearest neighbor (SMOTE-ENN) are the two hybrid
techniques compared in this study. SMOTE-TOMEK

927
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

to clean data and select the appropriate number of These features typically do not have available values at
features. After that, the prepared data is divided into the the time the prediction model is used. So, training
training set and testing set. The challenge is that the model with these features may make unrealistically
imbalanced dataset is a common problem in credit risk good predictions. Surprisingly, the Lending Club
evaluation, and it can cause misclassification. So, dataset contains leakage data, but few studies relying
resampling approaches have taken into consideration to on this dataset have taken that into consideration.
solve imbalance issue in training set, then balanced Data transformation including; converting
data feed to state of the art prediction models to train categorical features to numeric, standardization and log
the model. Then the classification results are validated, transformation. Some classification algorithms, such as
we may feed back and revise our approach to reaching logistic regression, cannot handle categorical features,
better classification results. so they are transformed into new forms of data. For
Since this study intends to investigate the most standardization, the min-max normalization in Eq. (2)
efficient combination of resampling approaches and ensures that all parameters use the same scale.
state of the art machine learning algorithms; therefore, 𝑋 − 𝑋>?=
the model implementation will be repeated for different 𝑋= = (2)
𝑋>@A − 𝑋>?=
combinations.
In the following subsections, each model A log transformation reduces skewness in the data
component is explained in detail. distribution (which cannot be applied to zero or
negative values).
3.1. Feature Engineering Correlations for numeric and binarized nominal
attributes are then computed with respect to the loan
The first component in the model is a feature
status to provide a better understanding of the data and
engineering module. Its main purpose is to enhance
its attributes. Finally, according to Malekipirbazari and
data reliability by cleaning data and selecting the subset
Aksakalli [2], we define variables that are simple ratio
of data features with the most discriminatory power. In
of other features. Moreover, this study introduces the
credit risk prediction, ignoring irrelevant features can
new non-standard financial feature. These ratios help
increase classification accuracy [20] and decrease the
to capture certain borrower characteristics to make the
computational costs associated with running several
most of the available data.
machine learning models [21]. Feature selection also
reduces the dimensionality of the data, which helps
3.2. Imbalanced Learning Approaches
mitigate the risk of overfitting. Our model comprises
four important steps in feature engineering: data This study employs resampling approach to deal with
cleaning, leaky data removal, data transformation and the imbalance problem. Figure2 demonstrates three
correlation analysis, and deriving new attributes. categories of resampling approach including under-
The data is cleaned by first removing missing and sampling, over-sampling, and hybrid methods. We
null values from dataset [22], then all outliers are address the state of the art algorithms in each of these
removed according to the acceptable range defined in categories. The under-sampling approach includes
[23]. Eq. (1) shows the upper bound and lower bounds random under-sampling (RUS), and instance hardness
of this range. threshold(IHT) algorithms. For over-sampling
approach, random over-sampling (ROS), synthetic
𝐿𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 < 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑏𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 <
(1) minority over-sampling technique (SMOTE), and
𝑈𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑
adaptive synthetic sampling (ADASYN) are studied.
𝐿𝑜𝑤𝑒𝑟 𝑏𝑎𝑛𝑑 = 𝑄1 − 1.5× 𝑄3 − 𝑄1 Finally, SMOTE + Tomek links (SMOTE-TOMEK)
𝑈𝑝𝑝𝑒𝑟 𝑏𝑎𝑛𝑑 = 𝑄3 − 1.5× 𝑄3 − 𝑄1 and SMOTE+ edited nearest neighbor (SMOTE-ENN)
are two prominent hybrid approaches that considered
Where Q3 is the third quartile and Q1 is the first by this research. Section 2.2. presents complete
quartile. literature on these algorithms.
Once the data has been cleaned, the features that
may cause data leakage in the model are identified.

928
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

studies into credit scoring have used linear discriminate


analysis because it tends to achieve better performance
than other classifiers when linear patterns are involved
[27].

3.1.3. Random Forest


Random forest algorithms are based on ensemble trees.
This method, which can be seen as an enhanced
bagging technique, is a powerful way to construct a
forest of random decision trees. A random forest
algorithm can also build multiple decision trees that
have been trained on bootstrap samples from the
training data. Rather than considering all available
features, the algorithm randomly chooses a subset of
attributes when building the trees or splitting the nodes.
Fig. 2. Imbalanced learning approach Once all the trees have been generated, the most
popular class is decided with a voting function [2, 6,
24].
3.3. Classification Models
To the best of our knowledge, logistic regression, linear 3.4.Validation
discriminate analysis, and random forest have
The dataset was divided into a train set and a test set at
demonstrated the best performance in the area of
a ratio of 70:30. Only the training set was balanced,
classification. Therefore, these three algorithms were through resampling, then validated with the still
selected for loans evaluation in this research. imbalanced test set.
3.1.1. Logistic Regression 4. Experimental Results
Logistic regression is a standard industry algorithm that
is commonly used in practice because of its simplicity All experiments were conducted in Python (version
and balanced error distribution [24]. It is a binary 2.7.13), on MacBook Pro with 2GHz Intel Core i7
classification technique that generates one of two CPU, and 8GB of memory. The average values of 20
variables as its result, e.g., good or bad borrowers. The repetitions of learning procedure with random numbers
logistic regression formula is shown in Eq. (3). are reported [15].
=
𝐹 𝑥
𝐿𝑛 = 𝛽E + 𝛽? 𝑋? (3) 4.1. Dataset Description
1−𝐹 𝑥
?GH
Advances in P2P lending markets have generated large
Where F(x) refers to the probability prediction, β0 is the amounts of data on real-world P2P lending
constant coefficient, and βi is the coefficient for the transactions. This analysis is based on 2016 and 2017
feature xi, which is calculated using maximum data from the publicly available datasets released by
likelihood. Therefore, for a set of features, xi i=1,…,n, the Lending Club, a well-known P2P lending platform
the logistic regression algorithm predicts the (lendingclub.com).
probability that a sample belongs to a specific class [2, The dataset contains approximately 636K borrower
24, 25]. records and 145 features in total. Redundant
information, noise, and leakage features were removed
3.1.2. Linear Discriminate Analysis from the dataset. The features detected as leaks were:
Linear discriminate analysis is a statistical algorithm LC grade, interest rate, issue date, outstanding
that determines the relationship between a target principal, total payment, total received principle, total
variable and a set of independent variables [26]. Many received interest, total late fees received, recoveries,

929
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

post charge off collection fee, last payment date, last status, they should not be included in the modeling
payment amount, and fund amount. process or overly optimistic predictions may result.
The variables with skewed distributions were The new DTI ratio sits in third place with a
annual income, income to payment ratio, and revolving correlation of 0.171, which shows that it can have a
to income ratio. Log transformations were applied to positive effect on the classification results.
create normal distributions for these variables.
Table 1. Lending Club attributes
Four categorical attributes were selected as Category Attribute Description
nominal attributes and transformed into binarized data: Loan status

variable
Target
term (2 categories), home ownership (3 categories), Current status of the loan
verification status (3 categories), and purpose (12
categories). Ultimately, 2+3+3+12 = 20 numerical Term The number of monthly payments on the loan

characteristic
– either 36 or 60.
attributes replaced these four categorical attributes.

loan
Loan amount The total amount of the loan
The new attributes, defined based on monthly
purpose A category provided by the borrower for the
income, installment amount, and revolving amount loan request. 12 purposes included.
features were Income-to payment ratio and revolving- New debt to The ratio of new monthly repayment amount
income (if the loan is approved) to monthly income.
to-payment ratio [2]. Further, we defined a new ratio (New DTI) This considers the impact of the new loan
repayments.
that, to the best of our knowledge, has not been Income to
(annual income / 12) / installment
previously used in any study called new debt to income Payment Ratio
(New DTI). New DTI considers the impact of the loan Debt to Ratio of the borrower’s total monthly debt
Borrowers' solvency

income (DTI) payments to the borrower's monthly income.


repayments, should it be granted, on the borrower’s home The home ownership status provided by the
solvency. It is a ratio of the repayment amount to the ownership borrower during registration. Values are
RENT, OWN, and MORTGAGE
borrower’s monthly income if the loan is approved and verification Verified income (whether or not pay slips or
a bank statement have been verified by the
is defined as: status
Lending Club) values are verified, not
verified, and source verified
MNO PQ=RSTU VNW@U>N=R X>QY=R (MPVX) Annual
𝑁𝑒𝑤 𝐷𝑇𝐼 = (4)
Self-reported annual income provided by the
PQ=RSTU ?=\Q>N income borrower during registration.
Employ Employment length in years. Possible values
𝑁𝑀𝑅𝐴 = 𝐷𝑇𝐼 ∗ are between 0 and 10, where 0 means less
Length
(𝐴𝑛𝑛𝑢𝑎𝑙 𝐼𝑛𝑐𝑜𝑚𝑒 12) + 𝑖𝑛𝑠𝑡𝑎𝑙𝑙𝑚𝑒𝑛𝑡 than one year and 10 means ten or more years
Revolving to Ratio of revolving credit balance to the
Where DTI, annual income, and installment amount are income Ratio borrower's monthly income
Lending Club features. Revolving The amount of credit the borrower is using
utilization rate relative to all available revolving credit.
The features used in the credit risk prediction (Drawn amount over the total limit)
process are presented in Table 1, grouped into three percentBcGt75 Percentage of all bankcard accounts > 75% of
limit
categories: loan characteristics, creditworthiness, and Average
solvency [4]. Current Average current balance of all accounts
For a better understanding of the variables, we also Balance
Total Current
performed a correlation analysis of all the variables Balance
Total current balance of all accounts
Borrowers' creditworthiness

with respect to loan status. Table 2 shows the installment The monthly payment owed by the borrower
if the loan originates.
correlations for the top-20 attributes. inquiries last 6 The number of inquiries in past 6 months
LC grade and interest rate sit at the top of the list months (excluding auto and mortgage inquiries)

with the highest correlations to loan status. The Total Revenue


Total revolving high credit limit
High Limit
Lending Club assigns an LC grade of between A and Total account The total number of credit lines currently in
the borrower's credit file
G, where A represents safer loans and G represents
Finance
riskier loans. These grades were transformed into inquiries
Number of personal finance inquiries

numbers between 1 and 7 (1=A and 7=G). They also credit age How long has the earliest account been
opened by the borrower
assign an interest rate to each loan. G-grade loans are Delinquencies The number of delinquencies in the
given the highest interest rate; A-grade loans have the borrower's credit file for the past 2 years
Public record Number of derogatory public records
lowest rate. However, LC grades and interest rates are Open account The number of open credit lines in the
leaky data, so despite their high correlation with loan borrower's credit file.
Revolving
Total credit revolving balance
balance

930
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

Table 2. Correlation with respect to loan status accuracy does not consider that false positives are
NO Attributes Correlation worse than false negatives and, therefore, accuracy can
1 Lc Grade -0.242 be a misleading criterion that causes erroneous results
2 Interest Rate -0.224 [28]. As such, other performance measures are more
3 New Dti -0.171 appropriate when working in domains with class
4 Income to Payment Ratio 0.141
5 Dti -0.132 imbalance issues [29]. Receiver operating
6 Home ownership RENT -0.101 characteristics (ROC), G-mean, and F-measure (FM)
7 Revolving utilization rate -0.100 are preferred as the likelihood these measures will be
8 percent_bc_gt_75 -0.096
affected by imbalanced class distributions is low [11].
9 Average Current Balance 0.095
10 Term 0.094 In a binary classification problem with good and
11 Home ownership MORTGAGE 0.092 bad class labels, the classifier’s result is considered
12 Total Current Balance 0.089
successful if both the false positive rate and false
13 Verification status Verified -0.077
14 installment -0.076 negative rate are small [15]. Sensitivity measures the
15 Verification status - not verified 0.074 accuracy of positive samples and specificity measures
16 inquiries last 6 months -0.074 negative samples. Additionally, an effective
17 Loan amount -0.067
18 Annual Income 0.066
performance measure in imbalanced settings will
19 Total Revenue High Limit 0.064 indicate the balance between classification
20 Employment Length 0.043 performance in both the minority and majority class
[30]. The G-mean measure considers both sensitivity
A brief description of the final dataset appears in and specificity for both classes in calculating its scores
Table 3. and is therefore an effective measure for imbalanced
Table 3. Final dataset description datasets [15]. The G-mean equation is shown in Eq. (5).

Imbalance
N Features % Default % Fully paid
Ratio
G-Mean= 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ∗ 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (5)
66376 43 18.3 81.7 4.46 where
hiYN WQj?R?kN
𝑆𝑒𝑛𝑠𝑖𝑣𝑖𝑡𝑦 =
“N” represents the number of instances in the hiYN lQj?R?kNmn@TjN MNo@R?kN

dataset, “features” is the number of variables we used Specificity =


hiYN MNo@R?kN
hiYN MNo@R?kN mn@TjN WQj?R?kN
in the prediction analysis. “% default”, “% fully paid”
and “imbalance ratio” are reflective of “loan status” in
the Lending Club dataset. Loans with a status of The area under curve (AUC) measure determines
“current” have not yet been issued or have not reached the area under receiver operating characteristic (ROC)
maturity and, therefore, do not contain information a curve. This measure is another effective metric for
borrower’s creditworthiness. These records were measuring classification performance with imbalanced
removed. “% default” reflects the proportion once the datasets [15].
current loans had been removed with a status of
“defaulted”; likewise, “% fully paid” reflects the 4.3. Classification Results
proportion of loans that have reached maturity. In this empirical evaluation, we evaluated the
performance of selected classifiers in combination with
4.2. Performance Measurement various resampling methods. The combinations are
Accuracy is traditionally the most popular performance shown in Table 4.
metric in a binary classification problem. However, The credit risk prediction results of the three
when assessing imbalanced datasets, accuracy tends to selected classifiers – random forest, logistic regression,
emphasize the majority class, making it difficult for the and linear discriminate analysis – were tested with each
classifier to perform well on the minority class. type of resampling method in groups. The best pair
Moreover, in credit risk prediction, measuring from each group was then compared with each other

931
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

and with a non-sampling approach, i.e., with an


imbalanced dataset that had not been resampled. The performance of the over-sampling methods
with different classifiers appears in Table 6. In terms of
Table 4. Classifier / resampling method combinations
accuracy, adaptive synthetic sampling behaved very
Resampling Performance differently when combined with a random forest than
Classifier
methodology measure the other two classifiers. RF-ADASYN achieved an
overall accuracy of 0.8 compared to 0.64 for LR-
sampling
Under-

RUS ADASYN and 0.61 for LDA-ADASYN. In addition,


IHT G-mean RF-ADASYN achieved the highest sensitivity (0.94);
Logistic
regression AUC however, it fell to the bottom of the list for specificity.
This demonstrates RF-ADASYN’s poor performance
sampling

ROS
Over -

Linear Sensitivity
discriminant
SMOTE in accurately predicting defaulters, and its bias toward
ADASYN the majority class despite an over-sampling approach
analysis Specificity

SMOTE-
designed to solve class imbalance issues. Random
Random forest False positive
TOMEK rate forest showed the lowest specificity and G-mean, but
Hybrid

had the highest false positive rate of all the over-


SMOTE-
ENN sampling techniques. These results indicate that
For this and all subsequent tables: LR = logistic regression, LDA = random forest is an inappropriate classifier to hybridize
linear discriminate analysis, RF = random forest, ROS = random with an over-sampling approach. When logistic
over-sampling, RUS = random under-sampling, IHT = instance regression and linear discriminate analysis were
hardness threshold, SMOTE = synthetic minority over-sampling hybridized with over-sampling techniques, their AUC
technique, ADASYN = adaptive synthetic sampling, EEN = edited sat at around 0.7, whereas random forest fell to the
nearest neighbor bottom of the list in terms of AUC at between 0.65 and
As shown in Table 5, the effectiveness of the 0.68. Among all the over-sampling methods, LDA-
different under-sampling methods depends on the SMOTE emerged as the best classification method
measure used to gauge performance. As discussed in according to the G-mean measure.
Section 4.2, G-means is the most effective measure for Table 6. Classification Result (Over-sampling approach)
assessing the classification results when class
imbalance problems exist in the dataset. From a G-
Sensitivity

Specificity
Accuracy

FP-Rate

G-mean
AUC

means perspective, all classifiers in combination with Classifier


RUS significantly outperformed the combinations with
the instance hardness threshold method. The RF-RUS
LDA-
method had the highest G-mean (0.65). Therefore, RF- 0.64 0.7 0.63 0.65 0.35 0.643
SMOTE
RUS is the best classification approach in the under-
LR-
sampling group. 0.6479 0.702 0.641 0.644 0.356 0.642
SMOTE
Table 5. Classification Results (Under-sampling approach) LDA-
0.61 0.7 0.59 0.7 0.3 0.642
ADASYN
0.648 0.702 0.65 0.64 0.359 0.64
Sensitivity

Specificity

LDA-ROS
Accuracy

FP-Rate

G-mean
AUC

Classifier LR-
0.64 0.7 0.64 0.64 0.36 0.64
ADASYN
LR-ROS 0.7 0.703 0.735 0.542 0.458 0.63
RF-RUS 0.692 0.69 0.717 0.582 0.42 0.65 RF-ROS 0.699 0.689 0.74 0.513 0.487 0.616
LR-RUS 0.693 0.71 0.723 0.558 0.442 0.635 RF-
0.6814 0.658 0.725 0.486 0.513 0.594
LDA-RUS 0.676 0.7034 0.695 0.589 0.42 0.64 SMOTE
LR-IHT 0.71 0.7 0.76 0.51 0.49 0.62 RF-
0.8 0.66 0.94 0.16 0.84 0.39
LDA-IHT 0.713 0.7 0.759 0.505 0.494 0.619 ADASYN
RF-IHT 0.75 0.688 0.83 0.4 0.61 0.57

932
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

Given the performance measures in Table.7, the Table 8.Classification Results (Final Comparison)
hybrid re-sampling methods did not produce good

Sensitivity

Specificity
Accuracy
results. All methods had low accuracy. The models

FP-Rate

G-mean
AUC
using SMOTE-ENN resulted in lower false positive Classifier
rates and higher specificity, but their sensitivity scores
were incredibly low. This is an indication that the RF-RUS 0.692 0.69 0.717 0.582 0.42 0.65
model identifies most customers as defaulters based on LDA-
significantly high performance in the minority class 0.64 0.7 0.63 0.65 0.35 0.643
SMOTE
(the defaulting customers) while showing low LR -
performance on the other classes. For example, RF- 0.64 0.7 0.638 0.648 0.352 0.643
Tomek
SMOTE-ENN only correctly predicted 33.7% of the Logistic
customers as good with a class of 1. Despite the poor 0.8173 0.703 0.988 0.048 0.95 0.218
Regression
performance of models presented in Table 7, LR-
Random
SMOTE-Tomek, with a G-means of 0.64, represents 0.8176 0.696 0.996 0.015 0.98 0.12
forest
the best classification of the hybrid re-sampling
methods.
In terms of G-mean, random forest had the lowest
Table 7. Classification Result (Hybrid approach) rating (0.12) on an imbalanced dataset. While, RF-RUS
(i.e., random forest on a balanced dataset) had the
highest G of 0.65. However, random forest and logistic
Sensitivity

Specificity
Accuracy

FP-Rate

G-mean
AUC

Classifier regression returned the highest false positive rates on


their own at 0.98 and 0.95, respectively, and 0.48 and
0.35, respectively, when combined with a resampling
LR- 0.64 0.7 0.638 0.648 0.352 0.643 approach. This notable difference between non-
SMOTETomek sampling and resampling proves the effectiveness of
LDA- 0.64 0.701 0.637 0.646 0.354 0.642 resampling techniques on the performance of
SMOTETomek prediction modeling.
RF- 0.68 0.66 0.705 0.516 0.483 0.603 RF-RUS emerged as the best method for predicting
SMOTETomek a borrower’s status in a social lending marketplace. Our
LR- 0.47 0.699 0.377 0.862 0.138 0.57 computational results indicate that considering a
SMOTEENN random under-sampling technique for class imbalance
LDA- 0.46 0.698 0.37 0.86 0.137 0.566 issues, then use a random forest classifier on the
SMOTEENN resulting balanced training set outperforms the other
RF- 0.43 0.664 0.337 0.84 0.15 0.53 methods examined in the experiments.
SMOTEENN
5. Conclusions and Future Works
Next, we compared the best combinations from
Identifying the risk score for a potential borrower is
each group and list the results in Table 8. Table 8 also
crucial for the healthy functioning of social lending
includes the results of the two non-sampling strategies,
markets, where class imbalance problems are
logistic regression and random forest, to illustrate the
prevalent. However, few studies into social lending
significant difference between prediction results on an
platforms have considered the characteristics of
imbalanced dataset versus a balanced one.
imbalanced data. Moreover, the efficiency of
The non-sampling strategies were the most
resampling techniques in evaluating P2P loans is a
accurate; however, they performed worst in terms of G-
controversial issue.
mean, specificity, and false positive rates. This
To calculate the creditworthiness of borrowers in
indicates that these classification models, by
P2P lending platforms, we used the most recent data
considering all the samples in the majority class (i.e.,
published by the Lending Club. Appropriate features
the good customers), are biased towards the majority
were selected through comprehensive feature
class, but tend to ignore the minority.

933
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

engineering process, and we introduced a non-standard Peer-to-Peer (P2P) lending’, Applied Economics, 2015,
financial feature to increase the reliability of the 47, (1), pp. 54-70
4. Xia, Y., Liu, C., and Liu, N.: ‘Cost-sensitive boosted
computed risk scores. Additionally, given the Lending
tree for loan evaluation in peer-to-peer lending’,
Club dataset contains imbalanced classes, we also Electronic Commerce Research and Applications, 2017,
compared different resampling methods to determine 24, pp. 30-49
the best overall technique. Accordingly, the state-of- 5. Lin, W.-C., Tsai, C.-F., Hu, Y.-H., and Jhang, J.-S.:
the-art classifiers – random forest, logistic regression ‘Clustering-based undersampling in class-imbalanced
data’, Information Sciences, 2017, 409, pp. 17-26
and linear discriminate analysis – were combined with
6. Ala'raj, M., and Abbod, M.F.: ‘Classifiers consensus
different resampling techniques and tested on Lending system approach for credit scoring’, Knowledge-Based
Club’s data. Our experiments show that random forest Systems, 2016, 104, pp. 89-105
and random under-sampling may be an efficient 7. Byanjankar, A., Heikkilä, M., and Mezei, J.: ‘Predicting
combination of classifier and resampling strategy to credit risk in peer-to-peer lending: A neural network
compute risk scores for loan applicants in social approach’, in Editor (Ed.)^(Eds.): ‘Book Predicting
credit risk in peer-to-peer lending: A neural network
lending markets.
approach’ (IEEE, 2015, edn.), pp. 719-725
P2P lenders can take advantage of the credit risk 8. Siami, M., Gholamian, M.R., Basiri, J., and Fathian, M.:
prediction modeling discussed in this study to make ‘An Application of Locally Linear Model Tree
smarter decisions when evaluating loan applications. Algorithm for Predictive Accuracy of Credit Scoring’,
Moreover, lenders might apply the attributes identified in Editor (Ed.)^(Eds.): ‘Book An Application of Locally
Linear Model Tree Algorithm for Predictive Accuracy
in this study to compute the creditworthiness of
of Credit Scoring’ (Springer, 2011, edn.), pp. 133-142
borrowers. Identifying default borrowers in advance 9. Siami, M., Gholamian, M.R., and Basiri, J.: ‘An
can prevent financial loss. Furthermore, more accurate application of locally linear model tree algorithm with
assessments of the probability of default may also help combination of feature selection in credit scoring’,
when developing strategies to compensate for risk, International Journal of Systems Science, 2014, 45,
such as increased interest rates. (10), pp. 2213-2222
10. Chawla, N.V., Japkowicz, N., and Kotcz, A.: ‘Special
One area for future research may consider the
issue on learning from imbalanced data sets’, ACM
support vector machine as a classifier. A support vector Sigkdd Explorations Newsletter, 2004, 6, (1), pp. 1-6
machine algorithm may show better performance than 11. Haixiang, G., Yijing, L., Shang, J., Mingyun, G.,
the algorithms assessed in this paper, even though fine- Yuanyue, H., and Bing, G.: ‘Learning from class-
tuning the parameters is time-consuming. Additionally, imbalanced data: review of methods and applications’,
Expert Systems with Applications, 2017, 73, pp. 220-
the Lending Club regularly publishes its historical data.
239
Given the underlying distribution of incoming data 12. Zhu, B., Baesens, B., Backiel, A., and vanden Broucke,
may change unpredictably over time, i.e., the data may S.K.: ‘Benchmarking sampling techniques for
contain concept drift, these changes could affect the imbalance learning in churn prediction’, Journal of the
accuracy of prediction models in future. Therefore, Operational Research Society, 2017, pp. 1-17
another area for future work could focus on concept 13. Chawla, N.V., Bowyer, K.W., Hall, L.O., and
Kegelmeyer, W.P.: ‘SMOTE: synthetic minority over-
drift in imbalanced data streams on social lending
sampling technique’, Journal of artificial intelligence
platforms. research, 2002, 16, pp. 321-357
14. He, H., Bai, Y., Garcia, E.A., and Li, S.: ‘ADASYN:
References Adaptive synthetic sampling approach for imbalanced
learning’, in Editor (Ed.)^(Eds.): ‘Book ADASYN:
1. Guo, Y., Zhou, W., Luo, C., Liu, C., and Xiong, H.: Adaptive synthetic sampling approach for imbalanced
‘Instance-based credit risk assessment for investment learning’ (IEEE, 2008, edn.), pp. 1322-1328
decisions in P2P lending’, European Journal of 15. Gong, J., and Kim, H.: ‘RHSBoost: Improving
Operational Research, 2016, 249, (2), pp. 417-426 classification performance in imbalance data’,
2. Malekipirbazari, M., and Aksakalli, V.: ‘Risk Computational Statistics & Data Analysis, 2017, 111,
assessment in social lending via random forests’, Expert pp. 1-13
Systems with Applications, 2015, 42, (10), pp. 4621- 16. Smith, M.R., Martinez, T., and Giraud-Carrier, C.: ‘An
4631 instance level analysis of data complexity’, Machine
3. Emekter, R., Tu, Y., Jirasakuldech, B., and Lu, M.: learning, 2014, 95, (2), pp. 225-256
‘Evaluating credit risk and loan performance in online

934
International Journal of Computational Intelligence Systems, Vol. 11 (2018) 925-935
___________________________________________________________________________________________________________

17. Abeysinghe, C., Li, J., and He, J.: ‘A Classifier Hub for based approaches’, IEEE Transactions on Systems,
Imbalanced Financial Data’, in Editor (Ed.)^(Eds.): Man, and Cybernetics, Part C (Applications and
‘Book A Classifier Hub for Imbalanced Financial Data’ Reviews), 2012, 42, (4), pp. 463-484
(Springer, 2016, edn.), pp. 476-479 30. Wang, S., Minku, L.L., and Yao, X.: ‘Online class
18. Brown, I., and Mues, C.: ‘An experimental comparison imbalance learning and its applications in fault
of classification algorithms for imbalanced credit detection’, International Journal of Computational
scoring data sets’, Expert Systems with Applications, Intelligence and Applications, 2013, 12, (04), pp.1-19
2012, 39, (3), pp. 3446-3453
19. Sanz, J.A., Bernardo, D., Herrera, F., Bustince, H., and
Hagras, H.: ‘A compact evolutionary interval-valued
fuzzy rule-based classification system for the modeling
and prediction of real-world financial applications with
imbalanced data’, IEEE Transactions on Fuzzy Systems,
2015, 23, (4), pp. 973-990
20. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and
Alsaadi, F.E.: ‘A survey of deep neural network
architectures and their applications’, Neurocomputing,
2017, 234, pp. 11-26
21. Koutanaei, F.N., Sajedi, H., and Khanbabaei, M.: ‘A
hybrid data mining model of feature selection
algorithms and ensemble learning classifiers for credit
scoring’, Journal of Retailing and Consumer Services,
2015, 27, pp. 11-23
22. Umarani, J., and Manikandan, S.: ‘Generating Enhanced
Web Log File using Advanced Data Cleansing
Algorithm in Pre-Processing Phase’, Indian Journal of
Science and Technology, 2016, 9, (48), pp. 1-7
23. de Oliveira, E.C., de Faro Orlando, A., dos Santos
Ferreira, A.L., and de Oliveira Chaves, C.E.:
‘Comparison of different approaches for detection and
treatment of outliers in meter proving factors
determination’, Flow Measurement and
Instrumentation, 2016, 48, pp. 29-35
24. Xia, Y., Liu, C., Li, Y., and Liu, N.: ‘A boosted decision
tree approach using Bayesian hyper-parameter
optimization for credit scoring’, Expert Systems with
Applications, 2017, 78, pp. 225-241
25. James, G., Witten, D., Hastie, T., and Tibshirani, R.: ‘An
introduction to statistical learning’ (Springer, 2013.
2013)
26. Kumar, K., and Bhattacharya, S.: ‘Artificial neural
network vs linear discriminant analysis in credit ratings
forecast: A comparative study of prediction
performances’, Review of Accounting and Finance,
2006, 5, (3), pp. 216-227
27. Khemakhem, S., and Boujelbene, Y.: ‘Credit risk
prediction: A comparative study between discriminant
analysis and the neural network approach’, Accounting
and Management Information Systems, 2015, 14, (1),
pp. 60
28. Abellán, J., and Castellano, J.G.: ‘A comparative study
on base classifiers in ensemble methods for credit
scoring’, Expert Systems with Applications, 2017, 73,
pp. 1-10
29. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H.,
and Herrera, F.: ‘A review on ensembles for the class
imbalance problem: bagging-, boosting-, and hybrid-

935

You might also like