You are on page 1of 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

A Novel Reject Inference Model Using Outlier


Detection and Gradient Boosting Technique in
Peer-to-Peer Lending
Yufei Xia1
1
Businees School, Jiangsu Normal University, Xuzhou, Jiangsu 221116 PR China

Corresponding author: Yufei Xia (e-mail: 6020180093@jsnu.edu.cn).


This work was supported in part by Research Support Project for Doctoral Degree Teachers of Jiangsu Normal University under Grant 18XWRX021.

ABSTRACT Credit scoring is an efficient tool in handling the information asymmetry of P2P lending.
Typically, credit scoring models are built only with the accepted applicants, which may cause sample bias
and further hinder the predictive performances. Reject inference methods utilize the information contained in
the rejected samples by inferring their potential status and incorporate them with the accepted samples. In
this paper, we propose a novel reject inference model (i.e., OD-LightGBM) that combines an outlier detection
technique (i.e., isolation forest) and a state-of-the-art gradient boosting decision tree algorithm. The model is
evaluated on two real-world P2P lending datasets and the results of predictive performances demonstrate that
our proposal significantly outperforms the benchmarks in terms of discriminative capability. The analysis of
computational cost shows that our proposal has a great potential in handling large-sized problems. The
proposed framework remains robust under different parameter settings and provides stable results given
various combinations of outlier detection algorithms and classifiers.

INDEX TERMS credit scoring, gradient boosting decision tree, outlier detection, P2P lending, reject
inference

I. INTRODUCTION lenders can hardly know the borrowers’ willingness and


FinTech, a close integration of IT and financial sectors, is capabilities to pay. Thus, P2P lending is criticized as high
emerging rapidly across the world. According to KMPG, information asymmetry [4].
global FinTech investment doubled in 2018, reaching to USD To avoid the market failure resulted by information
111.8 million1. Among areas of FinTech, Peer-to-Peer lending asymmetry [5], various methods have been employed in field
(also known as social lending) has received much attention in of P2P lending. In early period of P2P lending, group lending,
China, partially due to its critical role of inclusive finance [1]. a concept derived from microfinance, is a popular way to deal
In P2P lending, borrowers and lenders are matched directly via with information asymmetry. The members of group monitor
online platforms. The platforms function as information each other before extending credit and punish defaulters via
intermediaries that transfer concerns between borrowers and informal enforcement mechanisms after the loans are assigned.
lenders. P2P lending is typically operated online and bypasses By these means, group lending mitigates the adverse selection
bank. Thus, P2P lending is usually convenient and leads to and moral hazard [6]. Though Prosper, a popular P2P lending
lower transaction costs. Despite the considerable benefits, P2P platform, established lending group at its very beginning, such
lending is characterized as inherent high risk due to lack of a mechanism is gradually abandoned in recent P2P lending,
collateral and information asymmetry [2, 3]. Specifically, the partially because group membership does not take any
lenders may suffer from huge loss due to the non-performing collateral responsibility or make any interaction when the
P2P loans when collateral is limited. Moreover, the borrowers credit is extended. Internal credit scoring is another important
have information advantages over lenders. In other words, the tool to deal with information asymmetry in P2P lending.

1
https://home.kpmg/xx/en/home/insights/2019/01/pulse-of-fintech-h2-
2018.html

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access
Yufei Xia: A Novel Reject Inference Model Using Outlier Detection and Gradient Boosting Technique in Peer-to-Peer Lending (June
2019)

Unskilled lenders may suffer from marked biases and the Compared with other GBDT algorithms such as GBM and
prospective incapability to convert the information into XGBoost, LightGBM makes some modifications in training
profitable decision-making. Therefore, the platforms usually base learners. Moreover, LightGBM supports GPU-
build credit scoring system to screen the potential borrowers. computing, which can deal with large-sized problems.
Abundant research has demonstrated that credit scores affect Consequently, the proposed reject inference framework is
the decision-making of borrowers [7-9]. The rationale behind expected to be high-performance, flexible, and scalable.
credit scoring is to apply classification method based on the Second, we conduct a comprehensive sensitivity analysis on
application characteristics to predict the probability default reject inference. In most concerning studies, the proportion of
(PD) of loan requests. If the PD exceeds a preset threshold, accepted and rejected loans (i.e., rejection rate) is fixed. Using
then the loan application would be rejected. The economic such a fixed dataset may lose generalization capability or
benefits of credit scoring is also significant: even a minor cause a waste on computing resources since the rejection rate
improvement of credit scoring system would prevent dramatic of P2P lending is extremely high. This paper addresses these
loss for lenders [10]. issues via sensitivity analysis. Concretely, we sample the
Current studies mainly optimize the credit scoring rejected dataset to generate multiple datasets with different
techniques from two aspects, namely the methodology and rejection rates. The proposed framework is validated on two
data. Regarding the methodology, there is a growing trend that real-world datasets to examine whether rejection rate affects
AI-based techniques are within the mainstream of credit the model performances. Moreover, the usage of outlier
scoring models instead of traditional statistical methods. As detection algorithm provides us an opportunity to determine
the award-winning research by Lessmann, et al. [11] claimed, the contamination rate (i.e., the proportion of outliers or Good-
several AI-based methods outperforms the industry Bad ratio, abbreviated as CR) in inference results. Thus, we
benchmark (i.e., logistic regression (LR)). Thus, ensemble aim to verify the performances of our proposal under different
learning, deep learning and other state-of-the-art classifiers scenario and further present some recommendations on the
have been applied to credit scoring. In terms of data, both data pre-processing in practical application.
academia and industry explore predictive variables to further The rest of this paper is structured as follows. In Section II,
enhance the performance of credit scoring. Soft information is we conduct a comprehensive literature review concerning
a valuable data source of credit scores recently. In a thorough reject inference in P2P lending. Section III introduces the
discussion of Liberti and Petersen [12], the authors highlighted methodology. The mechanism of GBDT, LightGBM and the
the role of soft information in credit risk assessment, proposed OD-LightGBM are demonstrated. Section IV shows
especially for P2P lending. Besides the hard and soft the data used in this paper. The experiment setup and results
information in accepted loans, rejected loan applications also are explained in Section V and finally in Section VI, main
contain crucial information that potentially benefits credit conclusions and future research are presented.
scoring. It is worth mentioning that most credit scoring models II. LITERATURE REVIEW
are established on historical performances, which means that
only accepted requests have been used in estimating PDs. A. REJECT INFERENECE IN P2P LENDING
However, the amount of rejected loan request is much larger Most of credit scoring models are built based on the
than the accepted ones especially in P2P lending: roughly 9% information of accepted loans which were previously
of loan applications are accepted and the remaining over 90% considered as trustworthy borrowers. However, the models
of loans are rejected [13]. Reject inference, therefore, a should be established on information of all requests
process makes educated guesses on how rejected loans would theoretically, otherwise it may cause sample bias [19]. Sample
have performed if accepted, is an efficient way to improve data bias is inherently a missing data dilemma that can be further
size in credit scoring. divided into missing at random (MAR) and missing not at
The main contribution of this study includes the following random (MNAR) [20]. The former indicates that the missing
two aspects. First, we develop a novel reject inference data mechanism is independent of the loan characteristics (i.e.,
framework (OD-LightGBM) that combines a recent outlier features) and the status (i.e., labels). In such a situation, the
detection algorithm (i.e., isolation forest) and state-of-the-art parameters estimated from a credit scoring model using only
GBDT classifier (LightGBM). Several studies have applied accepted requests do not exhibit sample bias. As indicated by
unsupervised outlier detection in credit scoring as a pre- Feelders [21], expectation-maximization is an efficient model
processing step for the purpose of mitigating negative effect to execute reject inference when missing mechanism is MAR
of outliers in training classifiers [14-16]. To the best of our However, if the data is MNAR, which means that the rejected
knowledge, there is no existing study that employs outlier outcomes are related with the features and labels, sample bias
detection algorithm in inferring the potential status of rejected would occur. Sample bias may result in biased parameters and
requests. Moreover, the superiority of isolation forest poor predictability [13].
algorithm is empirically demonstrated over various tasks [17]. Reject inference is regarded as a remedy to sample bias. A
LightGBM is an efficient GBDT algorithm that achieves variety of techniques have been applied in reject inference,
better accuracy in many machine learning tasks [18]. among which can be roughly categorized into two subsets, the
VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access

statistical methods and machine learning techniques. The early usage of outlier detection. Some representative studies include
reject inference research mainly considers statistical methods, Srivastava, et al. [30], and Panigrahi, et al. [31].
including augmentation, re-weighting, extrapolation and Credit scoring is inherently a classification model. However,
survival analysis [19, 20, 22]. On contrast, recent machine some classifiers, such as SVM and decision tree, are very
learning methods usually utilize semi-supervised SVM for sensitive to outliers and noisy data [32]. Outlier detection may
reject inference [13, 23]. Interestingly, Chen and Åstebro [24] benefit supervised learning mainly from two aspects. First,
proposed a Bayesian analysis model which solved the missing outlier detection reduces the data noise and retain clean data
value issue in reject inference by a bound and collapse [33]. Second, removing outliers can decrease the data size and
imputation technique. save computing resources [34]. Thus, some scholars have
Many scholars have also demonstrated the efficiency of attempted to apply outlier detection in pre-processing of credit
internal credit scores in affecting lenders’ behavior and scoring. García, et al. [14] claim that the datasets that removes
identifying potential risky borrowers [8, 25]. Thus, the missing outliers outperforms the original datasets significantly when
mechanism in credit scoring in the specific P2P lending is also training credit scoring model. The empirical results of Setiono,
likely to be MNAR since accepted loans are selected et al. [35] and Tian, et al. [16] also demonstrate the superiority
according to the expected credit quality [24]. Such a missing of outlier detection as pre-processing tool. However, few
mechanism, along with the high rejection rate, highlights the studies have applied outlier detection in reject inference.
necessity of reject inference in P2P lending. Some studies
have developed SVM-based reject inference models for P2P C. CREDIT SCORING IN P2P LENDING
lending. Among them, Li, et al. [13] is a pioneering research, Due to the critical role of credit scoring in P2P lending, a
which uses semi-supervised SVM (S3VM) for reject inference. variety of studies focused on the specific P2P lending domain.
Validated on Lending Club dataset, the proposed S3VM These models are roughly categorized into the statistical
outperforms industry benchmark LR. Tian, et al. [16] further models and AI-based models. LR is one of the most popular
develop a kernel-free fuzzy quadratic surface SVM reject statistical models in P2P lending mainly due to its acceptable
inference model, which addresses the issues of hyper- performances and interpretability [36]. AI-based methods,
parameter tuning and scalability. A recent study of Kim and such as support vector machine [37], neural network [38], and
Cho [26] combines combine label propagation and semi- deep neural network [39] have been applied in credit scoring
supervised SVM to infer the potential status of reject of P2P lending due to their superior predictability [11, 40]. For
applicants. Besides SVM, GBDT is also applied in field of the pursuit of better predictability, ensemble learning is a hot
reject inference as classifier. Xia, et al. [27] propose a reject research field in terms of methodology of credit scoring.
inference framework by integrating contrastive pessimistic Random forest is a representative model among ensemble
likelihood estimation and LightGBM. All the above studies methods [7, 41] and achieves a balance between performance
have shown the superiority of reject inference compared with and complexity. Recently, because of its sound performance,
supervised credit scoring model. Table 1 gives a summarize of gradient boosting decision tree (GBDT) have been widely
the research on reject inference in P2P lending. applied to credit risk assessment in P2P lending [42].
Specifically, Xia, et al. [36] developed a credit scoring based
Insert Table 1 here on an advanced GBDT algorithm (i.e., XGBoost) to measure
PD of P2P lending applications. Ma, et al. [43] combined data
As indicated in Table 1, the rejection rate is usually fixed. cleaning techniques and GBDT techniques in predicting loan
Semi-supervised SVM is typically considered as both reject default. Furthermore, heterogeneous ensemble models, which
inference approach and classifiers, and few studies apply combine the predictions of different classifiers, are also
ensemble classifiers in reject inference. On the contrary, proposed by some scholars [42, 44].
multiple datasets with different rejection rates are employed to
validate the proposed ensemble reject inference model in this III. METHODOLOGY
paper. The ensemble architecture is expected to improve the As indicated in Lessmann, et al. [11], ensemble credit scoring
performance of reject inference models and various rejection models perform well and resisting them in practice is more
rates would enhance the robustness of the proposal. psychological than business related. The basic idea of
ensemble learning is to integrate multiple “weak” models into
B. OUTLIER DETECTION IN CREDIT SCORING a “strong” one. Bagging and boosting are two representative
An outlier indicates any data point that differs greatly from types of ensemble learning. Boosting herein means iteratively
other samples as to doubt it was generated by a different training weak models and then adding them to a final model.
mechanism [28]. An early application of outlier detection in For the specific gradient boosting, the base model is usually
credit risk assessment would be the fraud detection [29]: the decision tree. Hence, we usually call such a model as GBDT.
fraudulent applications and risky customers may have GBDT is a family of learning algorithms, including some
different patterns from normal ones, which is suitable for recent variations such as XGBoost and LightGBM. Thus, it is

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access
Yufei Xia: A Novel Reject Inference Model Using Outlier Detection and Gradient Boosting Technique in Peer-to-Peer Lending (June
2019)

necessary to provide an overview on GBDT before where 𝐽 is the number of leaves. CART partitions the
introducing LightGBM. samples into a number of 𝐽 disjoint regions and these regions
are denoted as 𝑅 . 𝑏 is the prediction of region 𝑅 . The
A. AN OVERVIEW OF GBDT coefficients 𝑏 are usually multiplied by some parameters γ .
GBDT builds models in a step-wise manner that optimizes an γ is determined in line search method, namely
arbitrary differentiable objective function [45]. Formally, 𝛾 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝛹(𝑦, 𝐹 (𝑥) + 𝛾𝑓 (𝑥)). (5)
given a training set X = {(𝐱 , y )} , the target of GBDT is
GBDT is therefore updated as follows:
to search for an approximation 𝐹 (𝒙) to a specific function
𝐹(𝒙) for the purpose of minimizing the expected value of a 𝐹 (𝑥) = 𝐹 (𝑥) + 𝛾 𝑓 (𝑥) (6)
certain objective function L(𝑦, 𝐹(𝒙)), namely,
𝐹 = argmin 𝐸 𝐿 𝑦, 𝐹(𝒙) . B. LIGHTGBM ALGORITHM
, (1)
LightGBM is a powerful and distributed GBDT algorithm
Since GBDT can handle regression, classification, and proposed by Ke, et al. [18]. Such a high-performance
ranking problems, the objective function may vary for algorithm has become the winning solution of many machine
different learning tasks. For classification tasks, the learning competitions. LightGBM technique can handle
commonly-used objective function is log loss or hinge loss. various task, such as classification, regression, and ranking.
For details of the two objective functions, please see Bishop Several studies have demonstrated the superiority of
[46]. As aforementioned, GBDT typically add the based LightGBM in many applications, such as tumor classification
models together to form a new model, that is, [47] and loan default prediction [43]. The promising
performances of LightGBM can be partially explained by the
application of leaf-wise learning. Different from GBDT,
𝐹 (𝑥) = 𝑓 (𝑥), (2) LightGBM builds best-first tree [48], which chooses the leaf
with expected best objective function to grow (see Fig. 1 (b)).
It is worth mentioning that there is a trade-off between level-
where T is the number of base models. The subsequent issue wise and leaf-wise trees. The advantage of level-wise tree is
is to determine the optimal base model. The optimization that it can be designed into parallel learning easily. However,
process at the t-th iteration can be described as given the same stopping criteria (e.g., number of leaves), best-
𝑓 (𝑥) = argmin ∈ ℒ first tree is expected to achieve better loss than level-wise tree.
= argmin ∈ (𝐿(𝑦, 𝐹 (𝑥) (3) The major drawback of best-first tree, however, is prone to
+ 𝑓 (𝑥))). overfitting. Thus, LightGBM controls the maximum tree
The optimal base learner can be searched in different ways. depth.
Classical GBDT technique applies the steepest descent Besides the best-first tree, LightGBM makes some
method to determine the optimal base model. Such a method technical modifications that speed up the training and save
employs only the first-order gradient of the training sample computing sources. For example, a histogram-based algorithm
given an objective function. However, in state-of-the-art is employed to reduce the computing cost of searching for
GBDT techniques (e.g., XGBoost and LightGBM), the optimal tree structure. Moreover, LightGBM makes
objective function is quickly approximated via Newton– considerable improvement in parallel learning to process large
Raphson algorithm. Note that for classification task, the base amount of data. Specifically, feature parallel, data parallel and
model in GBDT is normally determined as classification and GPU support are utilized to completing best-first trees
regression tree (CART). A CART is a tree structure, which efficiently. Moreover, LightGBM employs advanced
splits the data samples into different regions (leaves) following collective communication algorithms which optimize the
a series of “if-then-else” rules. Thus, CART is interpretable network communication issue in parallel learning.
and can be easily understood. Building a CART mainly
consists of three successive steps, namely selecting splitting Insert Fig. 1 here
variable, determining cut-off value and pruning. Regarding
original GBDT, the base learners are trained in a level-wise
learning. All the variables and possible cut-off values are
evaluated using objective function and selected in a greed C. OD-LIGHTGBM FOR REJECT INFERENCE
manner. Once a leaf is decided, it stops further partition until In this subsection, we introduce the proposed reject inference
the leaves in the same level are determined (please see Fig.1 framework that combines outlier detection and LightGBM.
(a)). Let denote the base learner at t-th as follows This approach consists the following three steps.
1) DATA PREPROCESSING
Let 𝐴 and 𝑅 denote the index of accepted and rejected
𝑓 (𝑥) = 𝑏 𝟏 (𝑥), (4)
applications, respectively. The basic idea of reject inference is
to infer the potential status of rejected loans (𝑦 ) and then
VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access

build a credit scoring model combining the information on status of rejects can be determined depending on whether a
both accepted (𝐱 , y ) and rejected applicants (𝐱 , 𝑦 ). Thus, sample belongs to outlier or not. We use a simple example in
we first preprocess the data and separate the data into the Fig. 2 to illustrate the mechanism of outlier detection in reject
accepted and rejected sets. inference. The triangles and circles denote good and bad
Moreover, the features of accepted and rejected data maybe applicants, respectively. The x points imply rejected samples.
inconsistent and requires handling missing value issue. There The full plane in black herein is the optimal classification
are several measures to address the problem, among which the boundary considering only the accepts. In such a case, a
most commonly-used ones are deletion, imputation, and classifier provides sub-optimal performances since a few
dummy variables [49]. Deletion and dummy variables samples are misclassified. We further apply outlier detection
approach are considered unsuitable since list-wise deletion or algorithm in rejected applicants. Specifically, the outliers are
dummy variable may hinder the process of reject inference and denoted as good applicants and printed in blue. On contrast,
column-wise deletion harms the predictability of credit the inliers are denoted as bad applicants and printed in green.
scoring models. Imputation, therefore, seems to be a rational In this case, an optimal classification boundary can be depicted
choice. Mean imputation and median imputation are two in the figure (dashed red plane). The misclassification rate is
representative imputation methods. However, we consider it is reduced after employing outlier detection in reject inference.
not feasible to employ these two methods since some missing Moreover, outlier detection is typically an unsupervised
features (i.e., term, interest rate, debt-to-income ratio and total learning and independent of existing credit scoring models.
amount of credit lines) in accepted and rejected loans may be Consequently, such a method relaxes the constraints in current
quite different. Replacing the missing data in rejected reject inference approaches.
applicants with mean or median of those accepted applicants
may lead to confusing classification boundary and thus, poor Insert Fig. 2 here
results. Following the work of Xia, et al. [27], we replace the
missing values in rejected samples with 0 to maintain some In this paper, we use isolation forest [17] to infer the status
dissimilarity between accepted and rejected samples. of the rejects. The basic idea of isolation forest, which hugely
Though GBDT-based techniques are seldom affected by different from popular outlier detection algorithm such as local
scalar of data, the performances of outlier detection algorithm outlier factor or stochastic outlier selection, is to explicitly
used in next subsection may be highly dependent on data identify outliers instead of profiling normal samples. Isolation
normalization Thus, the features of all data are normalized into forest is inherently an ensemble tree approach. In the root node
[0,1] after imputation. of each binary search tree, a feature and a cut-off value in the
2) INFERRING REJECTED LOANS VIA OUTLIER range of the feature is randomly selected. The data points are
DETECTION therefore split into the child nodes of the tree. The splitting
The core issue of this subsection is to infer rejected loans using continue until there is only one data point is one node or the
available information. Some techniques have been applied to maximum tree height is reached. The subsequently issue lies
handle this task. However, several drawbacks exist for these on how to quantify the degree of anomaly for a given data
popular approaches. First, all these methods cannot modify the point. Theoretically, the outliers are much less frequently than
good/bad (GB) ratio of the rejects group flexibly despite the inliers and have different patterns from normal samples. Thus,
fact that the efficiency of reject inference is highly dependent the outliers tend to be identified closer to the root node (shorter
on the accurate estimation of GB ratio for all the applicants path length) than inliers. By these means, the anomaly score 𝑠
[19]. Second, some approaches hold key assumptions that may of isolation forest is described as follows:
be far away from reality. For example, augmentation assumes ( ( ))
the potential performances of the rejects can be directly s(x, n) = 2 ( ) , (7)
imputed from those of the accepts. However, such an
where ℎ(𝑥) is the path length of sample 𝑥 . A detailed
assumption may be violated due to sample bias. Finally, the
definition of path length can be found in Liu, et al. [17]. 𝑐(𝑛)
current reject inference techniques are mainly built on
denotes the average path length of unsuccessful search in
statistical approaches though AI-based method is regarded as
binary search tree. 𝐸(ℎ(𝑥)) herein implies the average of
an interesting direction for further studies [16, 27]. Thus, we
ℎ(𝑥) from an ensemble of isolation trees. Each sample can be
aim to employ outlier detection algorithm to overcome these
given an anomaly score following Eq. (7) and decisions can
drawbacks.
be made based on the score. If a sample has an anomaly score
Different from those of definitions in prior studies which
very closet 1, it is possibly an outlier. A sample with anomaly
apply outlier detection in data preprocessing (e.g., García, et
score below 0.5 indicates an inlier. Specifically, if the anomaly
al. [14] and Tian, et al. [16]), the outlier herein implies a good
scores of all the samples are approximately 0.5, then the entire
applicant who is accidentally rejected due to some incidental
sample does not have distinct anomalies. Using the anomaly
factors. Under this circumstance, the features of outliers are
scores, we can also adjust the proportion of outliers (or CR)
distinctly different from those of inliers and should be
based on their rankings.
regarded as good applicants. By these means, the potential

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access
Yufei Xia: A Novel Reject Inference Model Using Outlier Detection and Gradient Boosting Technique in Peer-to-Peer Lending (June
2019)

Given an outlier ratio, we execute the isolation forest


algorithm to detect the outliers in the rejected dataset. Insert Tables 2 and 3 here
Subsequently, we label the status of outliers as good applicants,
whereas denote the inliers as risky applicants.
3) CLASSIFICATION USING LIGHTGBM V. EXPERIMENT SETUP AND RESULTS
We now have status of all the samples. Then we can use the
whole dataset (including both the accepts and rejects) to train A. EXPERIMENT DESIGN
a LightGBM model. Given a new data point, the LightGBM In this subsection, we aim to compare the proposed reject
model can return the PD and the class that the applicants inference model with benchmark models in both fields of
should be labeled as. Finally, the model performances are credit scoring and reject inference, such as LR, RF, SVM, and
evaluated in terms of predictability and computational S3VM. Furthermore, we conduct a sensitivity analysis on the
complexity. proposed model under different settings of rejection rate and
contamination rate. Besides, we compare the computational
IV. DATA costs of the proposal and benchmark models. As suggested by
In this paper, we use two real-world datasets (i.e., Lending Li, et al. [13], we carry out the experiment as follows:
Club dataset and We.com dataset) to perform the experiment. Step 1: Sample accepted and rejected samples randomly,
The Lending Club dataset is a public one, which can be freely whose number is denoted as N and N , respectively. To
downloaded from the official website of Lending Club. The mitigate the effect of sample bias, we use a different random
We.com dataset is a private one that accumulated via web seed for each sample.
crawler algorithm. Step 2: Split the accepted samples into training set and test
Lending Club is the world’s largest online marketplace for set, the proportion of which 701% and 30%.
social lending. To maintain the transparency of business, Step 3: Building supervised credit scoring models, namely
Lending Club have released the real transaction documents LR, RF, and SVM, using the training set of the accepted
after eliminating privacy information. In this paper, we extract dataset, and S3VM, OD-LightGBM using both the training set
the data from January 2009 to December 2012 since even 60- of the accepted dataset and the rejected dataset with potential
months loans will have clear status till now. After removing labels.
the samples with obvious error or omit, the whole dataset Step 4: Utilizing the classifiers built up in Step 3 to predict
contains samples, among which 91,825 are accepted the PD and label of samples in test set. Then compare the
applicants and 716,505 are rejected. Two critical problems model performances of different approaches.
must be addressed before using Lending Club dataset. First, To make fair comparison, we repeat the experiment for 50
the rejected samples contain only five features, namely, loan times and the results are computed as the average value of
amount, FICO score, DTI ratio, region, and employment these experiments. Besides, we set N = 2000 following the
length. However, as shown in Serrano-Cinca, et al. [8], some work of Xia, et al. [27]. To test the influence of rejection rate
powerful predictors are not included in the rejected dataset, on model performances, we set N as different values, ranging
which may further hinder the predictability of credit scoring from 2000 to 20000. The whole experiment is coded in Python
models. Thus, we add some extra variables, namely revolving 2.7 and the scikit-learn package [50], and LightGBM package
line utilization rate, number of open credit lines, term, and [18] are employed.
interest rate. Second, the region information is recorded in text
and requires processing. Following the suggestion of Xia, et B. EVALUATION METRICS
al. [27], we convert the address state into ordinal number Several evaluation metrics have been employed to measure
according to the prior default rate. model performance. Accuracy, which is the proportion of
We.com is a mainstream P2P lending platform in China. The correctly classified samples among the total number of
We.com dataset is a comparatively small dataset, including samples, is used to evaluate the label predictability of models.
1,489 accepted applicants and 2,968 rejected ones after Following the work of Bequé and Lessmann [51], we adjust
removing observations with obvious mistakes. One of the the cut-off value of accuracy according to the GB ratio in
defining characteristics of We.com dataset is that the features training set. Moreover, Area under Receiver Operation
of the accepts and the rejects are consistent. Thus, imputation Characteristic Curve (AUC) and H measure are utilized to
is no longer required for this dataset. measure the discriminative capability of models. Specifically,
The summary statistics and correlation matrixes of the two H measure overcomes some drawbacks of traditional AUC
datasets are shown in Table 2 and 3, respectively. From the measure and is applied in several recent studies. Finally, we
two tables, we can draw that features of the accepts and the consider computational complexity since it is critical for large-
size problem. Computational time consumption is employed
rejects have very different distributions, which inversely
as an indicator of computation complexity. To ensure the
implies that the missing mechanism is probably MNAR.
comparability of the running time, all the experiments are
executed in the same desktop PC with AMD Rayzen 1800X
VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access

8-core 3.6GHz CPU, 24 GB RAM, Nvidia GeForce GTX differences between the two model performances should be
1700 GPU, and 64-bit Windows 10 operating system. approximately normally distributed, is held for our cases. For
each evaluation metric, we report the Shapiro-Wilk test
C. HYPER-PARAMETER OPTIMIZATION statistic for the difference between best model and second-best
AI-based classifiers, especially SVM and LightGBM used in model.
our experiment, usually have several hyper-parameters that We first focus on the Lending Club dataset. It can be drawn
largely influence model performance. Thus, we need to tune from Table 4 that the proposed OD-LightGBM provided a
hyper-parameters of these models. For SVM and S3VM, since promising result. Concretely, it outperformed all other models
we employ a radial basis function kernel, the hyper-parameters regarding AUC and H measure over the four datasets. The
and their functions are as follows: second-best model is LightGBM for AUC and H measure.
Soft margin hyper-parameter 𝐶 : controls the trade-off OD-LightGBM also achieved the best accuracy, type I and
between training error and margin maximization.
type II error rates in 2009 dataset. The results of normality test
Kernel hyper-parameter 𝛾: determines the flexibility of the
showed that for every evaluation measure, the difference
decision boundary.
between best and second-best model followed normal
The hyper-parameters of LightGBM are very similar with
distribution, indicating that it is feasible to employ a paired-t
those of GBDT. Moreover, LightGBM provides some more
hyper-parameters that control overfitting and lead to test for the two models. The results of parametric tests also
promising performances. The detailed hyper-parameters of implied that OD-LightGBM significantly outperformed the
LightGBM are summarized as follows: benchmarks for AUC and H measure in 2009, 2011, and 2012
Learning rate: shrinks the contribution of each base model. dataset. Though our proposal became the second-best model
Number of iterations: number of boosting iterations. for accuracy-related measures, the gap is negligible based on
Maximum leaves: limit the maximum child leaves of single the paired-t tests.
base model.
Bagging fraction: the proportion of training data that sued Insert Tables 4 and 5 here
to build up base model.
Maximum number of bins: limit the maximum number of
bins that feature values will be bucketed in. We then focus on We.com dataset. From Table 5 we can
Overall, all these hyper-parameters are tuned via a state-of- conclude that the gap between different models are
the-art Bayesian hyper-parameter optimization algorithm [52]. comparatively minor for this dataset. In such as dataset, the
Such an approach is empirically more effective than popular proposed OD-LightGBM performed significantly better than
grid search. A 5-fold cross-validation AUC value is employed the second-best model (i.e., supervised LightGBM) regarding
as the evaluation metric to select the optimal hyper-parameter AUC and H measure. In terms of accuracy and type I error rate,
setting of the aforementioned models. OD-LightGBM performed slightly better than supervised
LightGBM. However, the paired-t test was not performed due
D. Results to the corresponding difference did not pass normality test. For
1) ANALYSIS OF PREDICTIVE PERFORMANCE type II error rate, supervised LightGBM outperforms OD-
The numerical results of the proposal and benchmark models LightGBM a bit whereas the non-normal distribution of
are shown in Tables 4 and 5 for Lending Club and We.com differences between the two models hinders the further
dataset, respectively. Regarding the classification accuracy, significance test.
we provide extra information on type I and type II error rates. To summarize, the proposed OD-LightGBM perform the
The two types of error rates reflect the predictability for good best among the benchmark models on average. Concretely, it
and bad applicants, respectively. Note that type II error achieved surprisingly good results in terms of AUC and H
(misclassifying a real bad applicant) is more expensive than measure, whereas it became the second-best model for
type II error (treat a good applicant as risky one) [51]. accuracy, type I and type II error rates in most cases. It is worth
Moreover, we set N = N = 2000 for Lending Club dataset. mentioning that the accuracy, type I and type II error rates will
We maintain the original N and N for We.com dataset due change if the threshold described in subsection B of Section V
to its relatively small size. For both datasets, the CR is is modified. The sound results on AUC imply that the
determined as 0.2. The best-performing model for each performances of OD-LightGBM on these measures can be
evaluation metric is highlighted in bold, and the second-best further improved by modifying the threshold.
model is denoted in italics. We have also employed a 2) ANALYSIS OF COMPUTATIONAL COMPLEXITY
parametric statistical hypothesis test (i.e., paired t test) to Apart from predictive ability, we analyze computational
perform between the best-performing and second-best models. complexity of models in this subsection. Computational cost
Though Lessmann, et al. [11] claimed that the assumptions of indicates the computational resources consumed in building
parametric tests are typically violated in multiple comparisons credit scoring models. Computational complexity is drawing
of credit scoring models, we execute extra tests to show that much attention in practical credit scoring, especially for large-
one of the strictest assumptions, namely the distribution of the sized problems. For example, P2P lending platforms are

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access
Yufei Xia: A Novel Reject Inference Model Using Outlier Detection and Gradient Boosting Technique in Peer-to-Peer Lending (June
2019)

expected to provide the credit score of a new applicant with implement. Thus, the proposed approach has great potential in
low latency. Similarly, low computational complexity means some large-sized reject inference applications.
the credit scores can be updated frequently. Thus, a perfect 3) SENSITIVITY ANALYSIS
credit scoring model should achieve sound predictive In previous subsections, the experiments are conducted under
performances and meanwhile maintain relative efficiency. some parameter settings. For example, N and N is
Since reject inference model has an additional step, namely determined beforehand. The contamination rate is also set as
inferring the potential status of the rejects, the computational 0.2. Furthermore, the proposed “outlier detection + classifier”
complexity of reject inference model may increase. However, reject inference framework is limited to a specific combination
the application of multi-thread computing and GPU (i.e., isolation forest + LightGBM). It is necessary to perform
computing help to tackle the complexity issue. Specifically, a sensitivity analysis for the following three reasons: first, the
the tree-based ensemble algorithms, such as random forest and accepted data is very precious, whereas the rejected data is
isolation forest, can be quickly executed using multiple comparatively valueless since most of applicants are rejected
threads of CPU. Moreover, LightGBM support GPU in P2P lending domain. Thus, reject inference is especially
computing, which has limitless potential in handling large- attractive since it provides an approach to convert the useless
sized data. To the best of our knowledge, no prior studies have data into knowledge. It is interesting to explore the optimal N
applied GPU acceleration in building up GBDT-based credit since it will benefit practical reject inference in P2P lending.
scoring models. Second, CR potentially indicates the good/bad proportion of
We consider five model (i.e., LR, RF, S3VM, OD- the rejected dataset. The change of CR, therefore, will affect
LightGBM (CPU version), and OD-LightGBM (GPU version) the performances of reject inference model theoretically.
regarding computational complexity. The results of average Consequently, it is important to investigate in the mechanism
computational time (in seconds) of different approaches are of CR on model performances and at best, provide a rule-of-
summarized in Table 6. Recall that in the original setting, thumb to guide practical implementation. Finally, the
N = 2000 . Table 6 clearly shows that LR requires least efficiency of the proposed reject inference framework may be
computation costs in training and predicting, whereas its questioned if only applying one combination of outlier
predictive performance is unsatisfactory as indicated in Tables detection algorithm and classifier. To check the robustness of
4 and 5. RF is slightly slower than LR but achieves much our proposal, we integrate local outlier factor (LOF) and
better prediction accuracy than LR. Such a result, again, Histogram-based Outlier Detection (HBOS) with RF and
supports the claim of Lessmann, et al. [11], who advocated LightGBM.
replacing the industry benchmark LR with RF. Despite OD-
LightGBM consumes more time than LR and RF, the running Insert Table 7 here
time can be dramatically save using GPU computing,
decreasing by nearly 50%.
To reach a reliable conclusion, we first examine the effect
Insert Table 6 here of N on model performances. Table 7 reports the
performances of OD-LightGBM under various combinations
of N and CR. The results clearly show that the performances
Though S3VM is reported as an effective tool for reject of OD-LightGBM is quite stable, outperforming the
inference [13], SVM, as a kernel method, is easily affected by benchmarks under various parameter settings. For each dataset,
curse of dimension and is not suitable to large-scale the average AUC and H measure are calculated for different
applications. To confirm such a claim, we test the efficiency N . From these average values, we can draw two main
of reject inference models under different settings of N . The conclusions. First, the differences between models under
results of running time are summarized in Table 6. Note that various N are minor. Specifically, the gap between the best
the maximum running time is determined as 10,000s. From the and the worst model does not exceed 0.001, which implies that
results we can conclude that the proposed OD-LightGBM not N can hardly affect model performances. Second, despite the
only provides the best AUC, but also consumes limited negligible difference, an inverse U-shape can be observed for
computation resources. It is worth mentioning that S3VM is AUC and H-measure of each dataset. In other words, the
unable to handle large-sized data within a reasonable time, model performances increase until a certain N is reached.
whereas OD-LightGBM can handle them quietly efficiently. Though varying from datasets, a candidate N ranging from
In summary, compared with the benchmark LR and RF 4000 to 8000 is encouraged since it achieves a promising
methods, our proposal provides better performances at the cost tradeoff between model performances and computational
costs. Though N = 20000 is quite close to the real rejection
of extra computational resources, whereas the application of
rate in Lending Club, our proposal achieved sub-optimal
GPU-computing can partially remedy such a drawback. When
performances under such a parameter setting. This implies that
comparing the state-of-the-art S3VM with the proposed OD-
a sampling of the rejected data benefits both model
LightGBM model, our approach is more accurate and easy-to- performances and computational costs.

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access

We then analyze the influences of CR. As shown in Table handling large-sized problems using GPU computing relative
7, the optimal CR varies for different datasets. Concretely, the to classic reject inference methods. Thus, OD-LightGBM
differences of model performances are minor across CRs for shows great potentials as a powerful reject inference method
Lending Club 2009 dataset and CR = 0.3 achieves the best in practice. The sensitivity analysis further illustrates that the
results. However, for the Lending Club 2010, 2011, and 2012 proposed reject inference framework can achieve robust
dataset, the optimal CR equals to 0.05. Such a result indicates performances under different parameter settings and maybe
that the optimal choice of CR should be approximately 0.05 in improved by employing advanced outlier detection algorithms
most cases. The low optimal CR also demonstrates that an and classifiers.
overwhelming majority of the rejects are potentially risky For future studies, one can explore the following three
applicants. directions. First, though a random sampling of the rejects
improves the performances of reject inference model is
Insert Table 8 here demonstrated in sensitivity analysis, one may explore whether
the model would be further optimized using some special
sampling techniques. Second, due to the flexibility of our
The results of LOF-RF, LOF-LightGBM, HBOS-RF, and proposal, powerful outlier detection algorithms, as well as
HBOS-XGBoost are summarized in Table 8. We herein use classification techniques can be incorporated in such a reject
the original parameter setting for these reject inference inference framework. Finally, it is interesting to discuss the
methods, namely N = N = 2000. From the table we can efficiency of reject inference from a cost-benefit perspective.
conclude that the proposed reject inference framework
provided robust performances under different combinations of
outlier detection algorithm and classifiers. Specifically, the REFERENCES
application of LOF and HBOS improved the AUC and H
[1] L. Chen, "From fintech to finlife: the case of fintech
measure for all the datasets relative to the supervised RF or development in China," China Economic Journal, vol. 9, no. 3,
LightGBM. Our proposal also increased accuracy and pp. 225-239, 2016.
decreased type I and II error rates for Lending Club 2011, 2012, [2] R. Emekter, Y. Tu, B. Jirasakuldech, and M. Lu, "Evaluating
and We.com dataset. These results imply that the proposed credit risk and loan performance in online Peer-to-Peer (P2P)
lending," Applied Economics, vol. 47, no. 1, pp. 54-70, 2015.
“outlier detection + classifier” reject inference framework can [3] A. Mild, M. Waitz, and J. Wöckl, "How low can you go?—
be potentially further improved by employing powerful outlier Overcoming the inability of lenders to set proper interest rates
detection algorithms or classification techniques. on unsecured peer-to-peer lending markets," Journal of Business
Research, vol. 68, no. 6, pp. 1291-1305, 2015.
[4] S. Freedman and G. Z. Jin, "The information value of online
VI. CONCLUSIONS AND FUTURE RESEARCH social networks: lessons from peer-to-peer lending,"
P2P lending is growing rapidly across the world and the International Journal of Industrial Organization, vol. 51, pp.
information asymmetry is a core issue in this newly 185-222, 2017.
established industry. Credit scoring, as an efficient solution to [5] J. E. Stiglitz and A. Weiss, "Credit rationing in markets with
imperfect information," The American economic review, vol. 71,
information problems, is drawing much attention for both no. 3, pp. 393-410, 1981.
academia and industry. The credit scoring models are typically [6] C. R. Everett, "Group Membership, Relationship Banking and
built with the accepted applicants, which may cause sample Loan Default Risk: The Case of Online Social Lending,"
bias and hinder model performances. Recently, reject Banking and Finance Review, vol. 7, no. 2, pp. 15-54, 2015.
[7] M. Malekipirbazari and V. Aksakalli, "Risk assessment in social
inference methods that employ the rejected data to mitigate lending via random forests," Expert Systems with Applications,
sample bias have become research hotspot in credit scoring. vol. 42, no. 10, pp. 4621-4631, 2015.
This paper develops a novel reject inference framework based [8] C. Serrano-Cinca, B. Gutierrez-Nieto, and L. López-Palacios,
on outlier detection technique and LightGBM (i.e., OD- "Determinants of default in P2P lending," PloS one, vol. 10, no.
10, p. e0139427, 2015.
LightGBM). Our proposal has the following four potential [9] Y. Guo, W. Zhou, C. Luo, C. Liu, and H. Xiong, "Instance-Based
advantages. First, it is a comparatively flexible framework Credit Risk Assessment for Investment Decisions in P2P
since the outlier detection algorithm and the classifiers can be Lending," European Journal of Operational Research, vol. 249,
freely changed. Second, unlike popular S3VM technique, a no. 2, pp. 417-426, 2016.
[10] T. Verbraken, C. Bravo, R. Weber, and B. Baesens,
state-of-the-art classification algorithm (i.e., LightGBM) is "Development and application of consumer credit scoring
employed to discriminate the good and risky applicants. Third, models using profit-based classification measures," European
the proposed model can alleviate the dramatic computational Journal of Operational Research, vol. 238, no. 2, pp. 505-513,
costs of commonly-used S3VM. Finally, the GPU support of 2014.
[11] S. Lessmann, B. Baesens, H.-V. Seow, and L. C. Thomas,
LightGBM guarantees the high efficiency of handling large- "Benchmarking state-of-the-art classification algorithms for
sized problems. credit scoring: An update of research," European Journal of
The proposed model is validated on two real-world credit Operational Research, vol. 247, no. 1, pp. 124-136, 2015.
datasets of P2P lending. The numeric results of predictive [12] J. M. Liberti and M. A. Petersen, "Information: Hard and soft,"
Review of Corporate Finance Studies, vol. 8, no. 1, pp. 1-41,
performances strongly demonstrate the superiority of our 2018.
proposal in terms of discriminative capability. The analysis of [13] Z. Li, Y. Tian, K. Li, F. Zhou, and W. Yang, "Reject inference
computational cost shows the feasibility of our proposal in in credit scoring using Semi-supervised Support Vector

VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2927602, IEEE
Access
Yufei Xia: A Novel Reject Inference Model Using Outlier Detection and Gradient Boosting Technique in Peer-to-Peer Lending (June
2019)

Machines," Expert Systems with Applications, vol. 74, pp. 105- from neural networks for credit scoring applications,"
114, 2017. International Journal of Computational Intelligence and
[14] V. García, A. Marqués, and J. S. Sánchez, "On the use of data Applications, vol. 14, no. 04, p. 1550021, 2015.
filtering techniques for credit risk prediction with instance-based [36] Y. Xia, C. Liu, Y. Li, and N. Liu, "A boosted decision tree
models," Expert Systems with Applications, vol. 39, no. 18, pp. approach using Bayesian hyper-parameter optimization for
13267-13276, 2012. credit scoring," Expert Systems with Applications, vol. 78, pp.
[15] T. Harris, "Credit scoring using the clustered support vector 225-241, 2017.
machine," Expert Systems with Applications, vol. 42, no. 2, pp. [37] M. Wang, X. Zheng, M. Zhu, and Z. Hu, "P2P lending platforms
741-750, 2015. bankruptcy prediction using fuzzy SVM with region
[16] Y. Tian, Z. Yong, and J. Luo, "A new approach for reject information," in 2016 IEEE 13th International Conference on e-
inference in credit scoring using kernel-free fuzzy quadratic Business Engineering (ICEBE), 2016, pp. 115-122: IEEE.
surface support vector machines," Applied Soft Computing, vol. [38] A. Byanjankar, M. Heikkilä, and J. Mezei, "Predicting credit risk
73, pp. 96-105, 2018. in peer-to-peer lending: A neural network approach," in
[17] F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation forest," in 2008 Computational Intelligence, 2015 IEEE Symposium Series on,
Eighth IEEE International Conference on Data Mining, 2008, 2015, pp. 719-725: IEEE.
pp. 413-422: IEEE. [39] F. Tan, X. Hou, J. Zhang, Z. Wei, and Z. Yan, "A deep learning
[18] G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting approach to competing risks representation in peer-to-peer
Decision Tree," in Advances in Neural Information Processing lending," IEEE transactions on neural networks and learning
Systems, 2017, pp. 3149-3157. systems, 2018.
[19] J. Crook and J. Banasik, "Does reject inference really improve [40] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens,
the performance of application scoring models?," Journal of and J. Vanthienen, "Benchmarking state-of-the-art classification
Banking & Finance, vol. 28, no. 4, pp. 857-874, 2004. algorithms for credit scoring," Journal of the operational
[20] J. Banasik and J. Crook, "Reject inference, augmentation, and research society, vol. 54, no. 6, pp. 627-635, 2003.
sample selection," European Journal of Operational Research, [41] X. Ye, L.-a. Dong, and D. Ma, "Loan Evaluation in P2P Lending
vol. 183, no. 3, pp. 1582-1594, 2007. based on Random Forest Optimized by Genetic Algorithm with
[21] A. Feelders, "Credit scoring and reject inference with mixture Profit Score," Electronic Commerce Research and Applications,
models," International Journal of Intelligent Systems in 2018.
Accounting, Finance & Management, vol. 9, no. 1, pp. 1-8, 2000. [42] W. Li, S. Ding, Y. Chen, and S. Yang, "Heterogeneous ensemble
[22] J. Banasik and J. Crook, "Reject inference in survival analysis for default prediction of peer-to-peer lending in china," IEEE
by augmentation," Journal of the Operational Research Society, Access, vol. 6, pp. 54396-54406, 2018.
vol. 61, no. 3, pp. 473-485, 2010. [43] X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, "Study on
[23] S. Maldonado and G. Paredes, "A Semi-supervised Approach for a prediction of P2P network loan default based on the machine
Reject Inference in Credit Scoring Using SVMs," in Industrial learning LightGBM and XGboost algorithms according to
Conference on Data Mining, 2010, pp. 558-571: Springer. different high dimensional data cleaning," Electronic Commerce
[24] G. G. Chen and T. Åstebro, "Bound and collapse Bayesian reject Research and Applications, vol. 31, pp. 24-39, 2018.
inference for credit scoring," Journal of the Operational [44] Y. Xia, C. Liu, B. Da, and F. Xie, "A novel heterogeneous
Research Society, vol. 63, no. 10, pp. 1374-1387, 2012. ensemble credit scoring model based on bstacking approach,"
[25] Y. Xia, C. Liu, and N. Liu, "Cost-sensitive boosted tree for loan Expert Systems with Applications, vol. 93, pp. 182-199, 2018.
evaluation in peer-to-peer lending," Electronic Commerce [45] J. H. Friedman, "Greedy Function Approximation: A Gradient
Research and Applications, vol. 24, no. July–August pp. 30–49, Boosting Machine," Annals of Statistics, vol. 29, pp. 1189-1232,
2017. 2000.
[26] A. Kim and S.-B. Cho, "An ensemble semi-supervised learning [46] C. M. Bishop, Pattern recognition and machine learning.
method for predicting defaults in social lending," Engineering springer, 2006.
Applications of Artificial Intelligence, vol. 81, pp. 193-199, 2019. [47] D. Wang, Y. Zhang, and Y. Zhao, "LightGBM: an effective
[27] Y. Xia, X. Yang, and Y. Zhang, "A rejection inference technique miRNA classification method in breast cancer patients," in
based on contrastive pessimistic likelihood estimation for P2P Proceedings of the 2017 International Conference on
lending," Electronic Commerce Research and Applications, vol. Computational Biology and Bioinformatics, 2017, pp. 7-11:
30, pp. 111-124, 7// 2018. ACM.
[28] D. M. Hawkins, Identification of outliers. Springer, 1980. [48] H. Shi, "Best-first decision tree learning," The University of
[29] V. Hodge and J. Austin, "A survey of outlier detection Waikato, 2007.
methodologies," Artificial intelligence review, vol. 22, no. 2, pp. [49] R. Anderson, The credit scoring toolkit: theory and practice for
85-126, 2004. retail credit risk management and decision automation. Oxford
[30] A. Srivastava, A. Kundu, S. Sural, and A. Majumdar, "Credit University Press, 2007.
card fraud detection using hidden Markov model," IEEE [50] F. Pedregosa et al., "Scikit-learn: Machine learning in Python,"
Transactions on dependable and secure computing, vol. 5, no. 1, Journal of machine learning research, vol. 12, no. Oct, pp. 2825-
pp. 37-48, 2008. 2830, 2011.
[31] S. Panigrahi, A. Kundu, S. Sural, and A. K. Majumdar, "Credit [51] A. Bequé and S. Lessmann, "Extreme Learning Machines for
card fraud detection: A fusion approach using Dempster–Shafer Credit Scoring: An Empirical Evaluation," Expert Systems with
theory and Bayesian learning," Information Fusion, vol. 10, no. Applications, vol. 86, pp. 42-53, 2017.
4, pp. 354-363, 2009. [52] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, "Algorithms
[32] Z. Zhang, G. Gao, and Y. Shi, "Credit risk evaluation using for hyper-parameter optimization," in Advances in Neural
multi-criteria optimization classifier with kernel, fuzzification Information Processing Systems, 2011, pp. 2546-2554.
and penalty factors," European Journal of Operational Research,
vol. 237, no. 1, pp. 335-348, 2014.
[33] H. Liu, S. Shah, and W. Jiang, "On-line outlier detection and data
cleaning," Computers & chemical engineering, vol. 28, no. 9, pp.
1635-1647, 2004.
[34] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, "Data
preprocessing for supervised leaning," International Journal of
Computer Science, vol. 1, no. 2, pp. 111-117, 2006.
[35] R. Setiono, A. Azcarraga, and Y. Hayashi, "Using sample
selection to improve accuracy and simplicity of rules extracted
VOLUME XX, 2019

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/.

You might also like