You are on page 1of 8

Credit Scoring Model based on Kernel Density

Estimation and Support Vector Machine for Group


Feature Selection
Xingzhi Zhang
College of computer and information science
Southwest University
Chongqing, China
zxz1508zhang@163.com
Zhurong Zhou
College of computer and information science
Southwest University
Chongqing, China
zhouzr@swu.edu.cn

Abstract—A credit scoring model (CSM) is a tool that is decision-making ability of institutions to reduce the loss of
typically used in the decision-making process of accepting or credit institutions.
rejecting a loan. The selection of an appropriate feature subset is Construction of credit scoring model requires data mining
crucial for the credit scoring model. In this paper, we propose a techniques. General credit score data is mainly composed of
novel framework to improve the performance of this task. First, variables such as credit applicant's payment history, credit
the kernel density estimation (KDE) is used to construct feature purpose, number of previous loans, personal circumstances, etc.
groups in order to combine similar features and reduce wasteful Therefore, credit scoring model use data mining techniques to
computational workload. Second, the correlation among features help identify important variables related to credit risk. However,
is not only simply similar, but also other meaningful relations,
there are many redundant and irrelevant features in the credit
such as part-of, has-a etc. Therefore, we calculated the
corresponding group scores for each feature group, and then
data. In order to improve the prediction accuracy of the scoring
obtained the corresponding radar map according to the group model, feature selection methods and feature grouping have
score. The purpose is to help improve the quality of the final been widely used. Feature selection in classification tasks is
selected feature subset and to get the specific semantics of each defined as the process that seeks the minimal size of relevant
feature group. Finally, each feature group is selected as a separate features such that the classification error is optimized [1]. It has
entity for feature selection to obtain the optimal feature subset. All been demonstrated that it can solve the delicate problem of
features are treated as one-dimensional vectors. The support feature redundancy, minimize redundancy.
vector machine (SVM) algorithm is used for training and Previous related literature studies have focused on the use of
prediction, and corresponding calculations are performed to feature selection methods to improve the model's prediction
obtain a total credit score. Extensive experiments on the UCI accuracy, while ignoring the influence of the correlation
benchmark database show the advantages and effectiveness of our between features on the accuracy of model prediction. Although
proposed algorithm. feature selection methods and feature groupings can find the
smallest and the best feature subset in the original feature data,
Keywords—Credit scoring, Feature group, Feature similarity, the final prediction effect of many models is not very stable.
Feature group semantics, Kernel density estimation, Support Vector Therefore, for very similar credit data, many different subsets of
Machine features may result in similar or identical predictive accuracy,
revealing different subsets of predicted features and the
I. INTRODUCTION instability that exists between them, and lead to affect credit
Payments using credit cards are becoming more prevalent experts to study a single subset of prediction feature. At the same
throughout the world and credit card scoring institutions often time, this issue is particularly important for finding knowledge
use intuitive experience to assess applicant's credit, resulting in from credit data. Because in the credit evaluation application,
many miscarriages, eventually resulting in huge losses to credit the main purpose of credit personnel is to find credit features of
institutions each year. Therefore, the credit scoring model (CSM) relevant markers in some credit industry, not just predict the
has been widely used by credit institutions to determine whether credit applicant's default rate.
credit applicants belong to good or bad applicant groups and The causes of instability between features are mainly for the
give an estimate of the probability of default. Credit scoring following two reasons. One is to ignore the interaction between
model can reduce the cost of credit analysis to improve the credit the feature selection method and the classifier, which leads to
prone to over-fitting. The other is that many feature grouping

978-1-5386-5314-2/18/$31.00 ©2018 IEEE 1829


and feature selection methods ignore the correlation among classifier in the credit risk field. The experimental results show
features, that is, treat all features as a one-dimensional vector for that the method is a more effective credit scoring model.
training and prediction. Literature [8] proposed a kernel elastic network, using the
Based on the above description, the research motivation of kernel matrix to capture the correlation between features.
this paper is to change the selection strategy and treat the groups Literature [9] proposed the feature group using weighted L1
as new feature. We propose a credit scoring model based on Norm for high-dimensional data, but this method did not
kernel density and SVM, namely KDSVM. It means, that the consider the influence of feature stability on the model accuracy.
feature selection algorithm runs at two levels. At the lower level, The alternate direction method of multipliers (ADMM) [10] has
We remove irrelevant data with little relevance to the category
been used to speed up graph-based oscar regression [11], but
labels, group the features, and search for the best combination
this modified method requires However, this modified method
of feature subsets from different feature groups. At the higher
level, we treat each group as new feature which are used for requires that the a priori feature map be provided a priori, which
training and prediction. In the selection of the best feature subset, does not require prior knowledge of most data sets. In [12] the
the selected subset of features has higher stability. At the same authors propose a class-conditional regularization of the
time, this paper chose kernel density estimation (KDE) and multinomial logistic model (CCSOGL) to find the feature group
support vector machine (SVM) in machine learning for two in a particular class. Classification with the sparse group lasso
reasons: 1) kernel density estimation has the advantages of being [13] introduces the concept of feature grouping to explain the
easy to implement; 2) the support vector machine method has sparsity in the feature structure. Literature [14] considered that
strong robustness, perfect theoretical foundation and good the use of cluster support vector machines (CSVM) for credit
sparsity. score was proposed to reduce computing cost while ensuring
This paper is structured as follows. In Section II, we classification performance. Experiments showed that CSVM
reviewed the feature selection and group feature in past credit had better effect compared with traditional non-linear support
scoring studies. In Section III, we have initially introduced the vector machines. In literature [15], a profit-driven approach for
relevant algorithms involved in this paper. In Section IV, we classifier construction and simultaneous variable selection
describe the basic framework of the model proposed in this based on linear Support Vector Machines. In this method, the
paper, namely the KDSVM framework. In Section V, we researcher added the group penalty function into the SVM
introduce the method used to verify the stability of this model. formula and processed the variables belonging to the same
In Section VI, we describe the credit data and data preprocessing group through this penalty function , but the model stability was
methods used in this paper. Section VII, the validity of the model
ignored.
based on credit data is verified. Section VIII, provides a
summary of the study and its outlook for the future. III. RELATED ALGORITHMS
II. RELATED WORK A. Kernel density estimation
In this section, we will review the existing feature Kernel density estimation is the most popular non-
selection methods and feature grouping studies. In [2] the parametric method for estimating probabilistic density functions
authors uses a chi-squared feature selection method based on [16]. Give a data set D = {𝑥𝑖 }𝑛𝑖=1 , 𝑥𝑖 ∈ 𝑅𝑚 . The kernel density
different basic learners and ensemble methods to combine estimation method is given by:
learning with seven basic learners. In literature [3], in order to
1 𝑥−𝑥𝑖
obtain better generalization error, reduce estimation variance 𝑔ℎ∧ (𝑥) = ∑𝑛𝑖=1 𝐾 ( ) (1)
𝑛ℎ ℎ
and improve the stability of feature selection, the author
constructed efficient sparse modeling and automatic feature In the above formula, ℎ > 0 is a smooth parameter called
grouping strategies, namely octagonal shrinkage and clustering bandwidth; 𝑛 is the sample size; K(. ) is the kernel function, and
algorithm for regression (OSCAR). In [4] the authors introduce it has to satisfy ∫ 𝐾(𝑡)𝑑𝑡 = 1, 𝐾(−𝑡) = 𝐾(𝑡), 𝐾(𝑡) ≥ 0 , if
the idea of Dense Feature Groups (DFG) based on Kernel 𝐾ℎ (𝑡) = 𝐾(𝑡⁄ℎ)⁄ℎ, then equation 1 can be converted to:
Density Estimation (KDE). The author based on kernel density 1
𝑔ℎ∧ (𝑥) = ∑𝑛𝑖=1 𝐾ℎ (𝑥 − 𝑥𝑖 ) (2)
estimation is used to determine the features of intensive group, 𝑛
and each dense as feature selection is the same entity in the Let {𝑓𝑗 }𝑗=1,2,… represents a sequence of consecutive
group features. In [1] the authors propose a high-dimensional
locations of the kernel K, where:
feature selection via feature grouping: A Variable
𝑓𝑗 −𝑥𝑖
Neighborhood Search approach (VNS). It uses the concept of ∑𝑛
𝑖=1 𝑥𝑖 𝐾( )

Markov blankets to group data features. In a study [5], using 𝑓𝑗+1 = 𝑓𝑗 −𝑥𝑖 𝑗 = 1,2, … (3)
∑𝑛
𝑖=1 𝐾( )
the feature selection method based on rough sets and scatter ℎ
search, the calculation time of the two actual data is 44%, is the weighted mean at 𝑓𝑗 computed based on kernel K and
23.5%, and 16.7% of the original time, respectively. The
𝑓1 is the center of the initial position of the kernel.
accuracy of prediction reached 90.5%, 83.4% and 87.9%,
respectively. In a study [6], particle swarm optimization (PSO)
B. Support Vector Machine
was applied to the selection of the optimal linear SVM classifier
in the credit risk domain. In [7], the particle swarm optimization Given a training set D = {(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )},
algorithm (PSO) is applied to select the optimal linear SVM

1830
where each point (𝑥𝑛 , 𝑦𝑛 ) ∈ 𝑅𝑚 , and 𝑦𝑛 ∈ {−1,1}. The purpose determine the appropriate feature group from the given data set;
of SVM is to find a best division hyperplane in the sample space 2) Feature group transformation (selection); 3) Feature ranking.
based on training set D. The equation of a hyperplane is: Among them, the small black dots in Fig 1. represent credit data
𝜔𝑇 . 𝑥 + 𝑏 = 0 (4) (including all credit features of the customer and credit data
For the maximum partition hyperplane, SVM is obtained under each credit feature), and other colored dots represent
by the following optimization problems: different credit features. Next, we will give a detailed
1 description of the method proposed in this paper.
min ( ||𝜔||2 + 𝑐 ∑𝑁 𝑖=1 𝜂𝑖 ) (5)
2
subject to:
𝑦𝑖 (𝜔. 𝑥𝑖 + 𝑏) ≥ 1 − 𝜂𝑖 , 𝜂𝑖 ≥ 0, 𝑖 = 1,2, … , 𝑁 (6) 1

Where C is a constant number, 𝜂𝑖 is the smallest


Feature
nonnegative number. In order to solve this quadratic Grouping
Feature Group
Transformation
Feature
Ranking
N

optimization problem, Lagrange equation can be used to solve Datasets and Selection
it. Therefore, the problem will be transformed into the dual
problem, namely: Fig 1. Group feature selection
1 𝑁
max ∑𝑁 𝑁
𝑖=1 𝛼𝑖 − 2 ∑𝑖=1 ∑𝑗=1 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑘(𝑥𝑖 . 𝑥𝑗 ) (7) B. Feature group based on kernel density estimation
subject to: In this section, we describe how to construct a feature
∑𝑁
𝑖=1 𝛼𝑖 𝑦𝑖 = 0,0 ≤ 𝛼𝑖 ≤ C, 𝑖 = 1,2, … , N (8) group of credit features in credit data. We use the concept of
Where 𝛼𝑖 is a Lagrange multiplier for each training kernel density estimation to first construct a feature set for the
example. k(𝑥𝑖 . 𝑥𝑗 ) is a kernel function. applicant's credit features. See Algorithm 1 for a detailed
In this paper, the kernel function we use is the Gaussian description. In the kernel density estimation, since this method
kernel function, which is defined as: is to calculate a set of data vectors, in this paper, we will
||𝑥−𝑦||2 perform a transposition of the credit data and use the data
k(𝑥. 𝑦) = 𝑒𝑥𝑝 (− ) (9)
2𝜎 2 matrix to represent the original data.
According to the Karush-Kuhn-Tucker (KKT) Algorithm 1 Feature group based Kernel density(FGKD)
complementarity condition, the KKT complementarity Input: Data set D = {𝑥𝑖 }𝑛𝑖=1, 𝑥𝑖 ∈ 𝑅𝑚 , bandwidth h,
conditions with SVM as follow: relevance measure ϑ(∗)
𝛼𝑖 (𝑦𝑖 (𝜔. 𝑥𝑖 + 𝑏) − (1 − 𝜂𝑗 )) = 0 (10) Output: Feature groups 𝐶1 , 𝐶2,…, 𝐶𝑛
(𝐶 )
𝜂𝑗 − 𝛼𝑖 = 0, 𝑖 = 1,2, … , 𝑁 (11) for 𝑖 = 1 to 𝑛 do
Where the set of nonzero 𝛼𝑖 is corresponding to the set of Initialize 𝑗 = 1, 𝑓𝑖,𝑗=𝑥𝑖
support vectors. Repeat
So, for any input variable, the output classification Compute 𝑓𝑖,𝑗+1 according to (3)
prediction of SVM is obtained by: Until convergence
𝑓(𝑥) = 𝑠𝑖𝑔𝑛(∑𝑁 𝑖=1 𝛼𝑖 𝑦𝑖 𝑘(𝑥𝑖 . 𝑥) + 𝑏) (12) set stationary point 𝑓𝑖,𝑐= 𝑓𝑖,𝑗+1
Merge 𝑓𝑖,𝑐 with its closest peak if their distance< h
IV. THE PROPOSED METHODOLOGY
end for
A. Overview for 𝑖 = 1 to 𝑀 do
Calculation relevance ϑ(𝐶𝑖 ) based on average relevance
In order to establish a more accurate credit scoring model
of features in 𝐶𝑖
based on the feature selection and feature grouping in the above
literature, we propose to change the selection strategy and treat end for
the groups as new feature of the credit scoring model based on According to the description of algorithm 1, the similar
kernel density estimation and support vector machine features in the final original credit feature will be divided into
algorithms. First, we estimate the dense core area measured by corresponding feature groups. Some feature groups may have
probability density, and consider that the features between core only one feature value. Here, for a credit dataset with n features,
areas are highly correlated with each other to construct feature the value of K in algorithm 1 ranges from 1 to n. The larger the
groups. Second, the corresponding group scores of each feature value of K, namely, the greater the bandwidth h, the lower the
group were calculated and corresponding radar maps were correlation between the features in the obtained feature group.
obtained according to the group scores. Finally, treat each Therefore, in order to ensure high similarity features in a feature
feature group as a separate entity in order to obtain the optimal group, the value of K should be small enough. After obtaining
feature subset. The optimal feature subset is used as a one- the corresponding feature group according to algorithm 1, we
dimensional vector to train and predict using SVM algorithm consider the feature group as a new dimension to analyze the
and a credit total score is finally obtained. semantics it contains. In the new dimension, we calculate the
In the proposed model, our method more effectively solved corresponding score for each feature group, and then use the
the influence of the correlation between features on the score of each feature dimension to make a preliminary estimate
prediction accuracy of the model. As shown in Fig 1., we show of the credit status of the credit.
the overall representation of changing the selection strategy and
treating the group as a new feature. In this framework, 1)

1831
C. Feature group semantics Algorithm 2 Feature group transformation
In the credit scoring process, the traditional scoring process Input: Data set D = {𝑥𝑖 }𝑛𝑖=1, 𝑥𝑖 ∈ 𝑅𝑚
treats all credit features as a one-dimensional vector for training Output: Best features 𝐴1 , 𝐴2,…, 𝐴𝑖
and prediction, ignoring the specific semantics of each credit for 𝑖 = 1 to 𝑚 do
group feature. Therefore, before performing feature group Train sample 𝐷𝑖 ∈ 𝐷 #Reusable sampling
transposition (selection) and selecting the optimal subset, we Get the Optimal feature groups 𝐶1 , 𝐶2,…, 𝐶𝑛 #Feature
introduce a new concept, feature group semantic analysis, that
group based Kernel density estimation
is, the feature group divided in Section A as a new dimension to
discuss the specific semantics represented by each feature group. end for
As shown in Fig 2., first, consider the feature group divided in for 𝑖 = 1,2, … , 𝑛 #Feature group transformation
B as a new feature dimension. Second, for each feature group, Calculate the average of feature group 𝐶1 , 𝐶2,…, 𝐶𝑛
perform calculations within the group to obtain the group score, Get representative feature 𝐴𝑖 from 𝐶𝑖∗
such as for feature group f1, calculate the similar features in the end for
group, the group score S1 is obtained. Then, based on the Rank 𝐴1 , 𝐴2,…, 𝐴𝑖
obtained group score S1, S2,…. we can obtain a corresponding Select the best features
credit radar map in this dimension. In this radar map, we can According to the above method, we will obtain the optimal
know the specific semantics represented by each feature group feature subset by transforming feature groups and use the
is shown in Fig 2. In the radar map, the red feature group optimal feature subsets for training and prediction in SVM. At
represents the applicant's identity. In this feature group, we can the same time, through further calculations, we obtain a total
construct the feature group based on credit features with similar credit score. In the future credit assessment, credit assessors can
features. Then, in the identity feature group, the specific
use the total score to perform a final credit assessment on the
semantics represented by the applicant are the specific
applicant. Among them, using the SVM algorithm in the
information corresponding to the applicant's work, gender, and
age. By analyzing the semantics of feature groups, we can allow training and prediction process, we use the grid search method
credit officers to make a preliminary estimate of the applicant's to find the optimal parameters for the parameter optimization
credit status. Finally, in the new dimension, each feature group problem in the SVM algorithm. Excellent (𝑐, 𝑔), in this process,
is treated as a single entity for feature group transposition using grid search to optimize parameters, mainly due to the use
(selection). The most representative one or several features are of optimal parameters in the model will improve the model's
selected for each feature group, and then a total score is obtained prediction accuracy and improve the overall operating
by totaling the components we have obtained. efficiency of the model, while the grid search compared with
cross-validation, it has efficient parallel rows and efficient
tuning efficiency in parameter optimization.
V. STABILITY MEASURE
In order to verify the stability between features, we mainly
measure the similarity between any two features. Common
similarity measures include Jaccard similarity coefficient,
feature overlap probability, Spearman correlation coefficient,
and Pearson correlation coefficient, etc. The method used in
this study is the Jaccard similarity coefficient.
To measure the stability between features, we need to use
Fig 2. Feature group semantic analysis process the feature selection results generated in the previous group
feature selection process for similarity measures. Measurement
D. Feature group transformation feature selection stability requires similarity measures for the
In the previous section, we used the method of kernel two sets of feature selection results. Let 𝑇1 = {𝑐𝑖 }1𝑖=1 and
𝑇2
density estimation to group the original credit features and 𝑇2 = {𝑐𝑗 }𝑗=1 represent two sets of feature selection results.
introduce the concept of feature semantic analysis. In this
section, we treat each feature group as a separate entity for Each of 𝑐𝑖 and 𝑐𝑗 represents a group of features. In the group
feature group transposition (selection). Here, the feature group feature selection process, there may be only one feature in a
transposition (selection), that is, feature selection is performed feature group. At this time, we also treat the feature as an
for each feature group, and the most representative one or independent feature group. In the group feature selection
several features are selected for each feature group in order to process, there may be only one feature in a feature group. At
improve the prediction accuracy of the credit scoring model. this time, we also treat the feature as an independent feature
Among them, the commonly used methods of feature group group. Next, we use the Jedcard similarity coefficient to
transformation are feature value averaging [17], center-point measure the similarity between features. The similarity measure
strategy [18] and more complex principal component analysis between 𝑇1 and 𝑇2 is based on the following formula:
|𝑇 ⋂𝑇 |
[19]. In this paper, we adopt a simple method of feature value 𝑗(𝑇1 , 𝑇2 ) = 1 2 (13)
|𝑇 ⋃𝑇 |
1 2
averaging. The specific description is as shown in Algorithm 2.

1832
VI. DATA SETS AND DATA PREPROCESSING Personal status Nominal(1:male:divorced;2:female:
and sex divorced;
A. Data sets 3:male:single;4:male:married;
In this study, we use two credit scoring data sets to validate 5:female:single)
our proposed credit model. Data set from computer learning Other Nominal(1:none;2:co-
database UCI knowledge base, namely German credit data set debtors/guarantor applicant;3:guarantor)
and Australian credit data set (he credit data set adopted in this s
paper is the relatively complete credit data in the financial Present residence Numerical
industry so far). TABLE I summarizes the information of them. since
TABLE II details the composition of the German credit data set. Property Nominal(1:real estate;2:if not 1,
For the Australia data sets, to ensure the privacy of users, the life insurance;3:if not 1/2,car or other;
data has been encrypted, the feature names and values have 4:unknown/no property)
been changed to meaningless symbol data, so we will not go Age in years Numerical
into detail. Other installment Nominal(1:bank; 2:stores; 3:none)
plans
TABLE 1. FEATURES OF THE TWO DATA SETS USED IN THE Housing Nominal(1:rent; 2:own; 3:for free)
EVALUATION EXPERIMENT.
Dataset Features Instances Default rate Classes Number
of existing credits Numerical
German 20 1000 0.30 2
at this bank
Australia 14 690 0.44 2
Job Nominal(1:unemployed;2:unskilled-
resident; 3:official; 4:management)
TABLE II. DESCRIPTION OF ALL VARIABLES IN THE GERMAN DATA Number of
SET. people Numerical
Variable(Features Description being liable to
) Provide
Status of existing Nominal(1:...<0DM; maintenance for
checking account 2:0DM<=...<200DM; Telephone Nominal(1:none; 2:yes)
3:...>=200DM; 4: no checking account) foreign worker Nominal(1:yes; 2:no)
Duration in Numerical
month
Nominal(0: no credits taken; B. Data Preprocessing
1:Timely payment; In the original credit score data set, there are redundant and
Credit history 2:Repay existing credit; irrelevant features and missing values. In order to ensure the
3:Delayed repayment; 4:not at this efficient calculation of the credit score model and finally have
bank) a high prediction accuracy, we must make the original credit
Nominal(0:car(new);1:car(used); data before using the data preprocessing, so we have replaced
2:furniture/equipment;3:radio/televisio missing values with the average or mode of feature depending
Purpose n; on their feature type; i.e.., numerical or categorical ones.
4:domestic appliances; In this paper, we have to convert the nominal features into
5:repairs;6:education;7:vacation – does numeric data before feeding into classifiers. The
not exist?;8:retraining;9:business; datapreprocessing we use is standardized processing, i.e., z-
10:others score. The purpose is to convert the data into a gaussian
Credit amount Numerical distribution with a mean value of 0 and variance of 1.
Savings Nominal(1:...<100DM; Give a data set D = (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2, … , 𝑛. Where 𝑥𝑖 ∈ 𝑅𝑚
account/bonds 2:100DM<=...<500DM; and 𝑦𝑖 ∈ {1, −1} represents the label, of which 1 on behalf of
3:500DM<=...<1000DM; the credit applicants has a good reputation, on the contrary -1
4:...>=1000DM; on behalf of the credit applicants has a high risk of default.
5:unknown/ no savings account Next, we process the data according to the formula of Z-score:
𝑋(𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙_𝑑𝑎𝑡𝑎)−𝜇(𝑑𝑎𝑡𝑎)
Present Nominal(1:unemployed;2:...<1year; Z(𝑛𝑒𝑤_𝑑𝑎𝑡𝑎) = (14)
𝜎(𝑑𝑎𝑡𝑎)
employment 3:1<=...<4years; Where μ(𝑑𝑎𝑡𝑎) is used to calculate the mean value of
since 4:4 <= ...<7years; 5:...>=7years each column’s data, σ(𝑑𝑎𝑡𝑎) is the standard deviation, and
Z(𝑛𝑒𝑤_𝑑𝑎𝑡𝑎) is the data which has been processes by Z-score.
Installment rate in According to the above method, we can obtain the value of the
percentage of Numerical original data normalization processing, that is, to obtain the new
disposable experimental data.
income

1833
VII. EXPERIMENTAL AND ANALYSIS For the German credit data set, we through the analysis of
In this section, in order to verify the effectiveness of the the features of the selected subset, found in the evaluation of
proposed credit scoring model and verify the performance of the consumer credit conditions, Status of existing checking account,
proposed method and algorithm, the relevant credit data set will Duration in month, Credit amount, Credit History is the most
be used for experiments to verify the stability and accuracy of important indicator in the whole evaluate credit status. Among
the model. The experimental platform for this study is the most important indicators in the state, the proportions of the
PYCHARM, on a PC with 3.2GHz, Intel(R) Core(TM) i5-3470 four features are 0.142, 0.132, 0.162, and 0.140, respectively.
and 4GBRAM, using the Microsoft Windows10 operating However, in this paper, due to the Australian credit data in order
system. to protect consumer credit security, there is no clear description
of the credit features, so in the analysis, this paper mainly
A. Model stability verification analyzes the German credit set, and for the Australian data is
In this section, we mainly verify the stability of the model, mainly to verify the effectiveness and reliability of the proposed
that is, to verify the stability of features after the group features method.
are selected. In order to reflect the validity and reliability of the B. Model performance validation
experimental results in this paper, we compare them with other
In order to reduce the impact of data dependence and
feature selection algorithms. In this paper, we mainly compare
improve the reliability of the results, this study uses 10-fold
with Tree-based feature selection (TFS) and Recursive feature
cross validation to create a random partition of the data set.
elimination (RFE). Fig 3 and 4 are the experimental results of
Divide the data set into ten, take 9 of them as training data to
credit data for German and Australia, respectively.
train the credit scoring model and adjust the parameters of the
The experimental results in Fig 3 show that for the German
credit scoring model. One test is used as test data. The
credit data set, we can clearly see that after the proposed method
advantage of cross-validation is that the credit scoring model is
has changed the feature selection strategy, the stability between
developed using most of the data, and all the data is used to test
feature subsets is better than the stability of other algorithms,
the results model [20].
and with the increase of feature subsets, the stability of the
In order to obtain effective results in the prediction
algorithm tends to be even and stable. Similarly for the
accuracy of the proposed method, we will use the following two
Australian credit set, the experimental results in Fig 4 show that
performance indicators, specifically: a) accuracy, b) area under
the algorithm in this paper also has better stability than other
the curve (AUC). We chose these two indicators because they
algorithms.
are very popular in credit scores and give a comprehensive view
of the validity of the assessment mode. Accuracy is the
0.7
proportion of good and bad loans that are correctly classified. It
RFE
0.6 mainly judges the predictive power of the model. As such, this
TFS
0.5
is a criterion that measures the discriminating ability of the
KDSVM(Our Method)
model [21]. AUC is the best predictor of which model to predict.
STABILITY

0.4
That is, when the AUC value of a certain model is larger, the
0.3 prediction performance of the model is better. Next, we will
0.2 analyze and discuss the experimental results.
We compare the results of the model proposed in this
0.1
paper with the basic classifiers and hybrid classifiers in the
0
2 4 6 8 10 12 14
traditional combination method. The above two real credit data
NUMBER OF FEATURE SUBSETS sets are mainly verified by using two measurement standards.
Among them, the results of the basic classifier and the hybrid
Fig3. German data set. classifier are mainly from the literature [22]. In the comparison,
we only select the better classifier in [22] and compare it with
0.4
RFE
the hybrid classifier proposed in this paper. The purpose is to
0.35 TFS verify that the proposed method is more effective. Fig 5-6
KDSVM(Our Method) shows the experimental results of two credit data sets for
0.3 Germany and Australia, respectively.
STABILITY

0.25
From Fig 5, it can be clearly seen that the accuracy of the
credit scores of the hybrid classifiers AVG, MajVot, ConsA and
0.2 the proposed KDSVM for German credit data is higher than that
of the basic classifier. The accuracy of the KDSVM proposed
0.15
in this paper is the best, up to 81%, although the final accuracy
0.1 of KDSVM is only less than 2% higher than ConsA, but it can
2 4 6
NUMBER OF FEATURE SUBSETS
8 10
also show that our method is feasible because the accuracy of
the financial industry is increased by 2%. Even an increase of
Fig 4. Australian data set. 1% can reduce enormous losses for financial institutions. Fig 5
also shows that the final AUC of the hybrid classifier is

1834
generally better than the performance of the basic classifier. In In credit data, due to a large amount of noise data, based
this figure, the AUC of the KDSVM is the largest, indicating on the existing credit scoring model, researchers have
that the hybrid classifier in this paper works better. continuously proposed a continuous mixed credit scoring
0.86 model, and continuously researched and improved on the
0.84 consumers' credit features. From different feature selection
0.82 algorithms to feature set methods, to improve the performance
0.8 of the model, but the existing model still has some deficiencies.
0.78 Therefore, in order to build a more effective credit scoring
0.76 model, in this paper, we propose to change the selection
0.74 strategy and treat the groups as new feature of the credit scoring
0.72
model based on kernel density estimation and support vector
0.7
machine algorithms. In the establishment of the model, we fully
0.68
RF NN SVM AVG Majvot ConsA KDSVM consider the connections between similar features, so we
Accuracy AUC
introduce the concept of feature dimension scoring by changing
the feature selection strategy. Experiments show that the
Fig 5. The results of different performance measurement of German credit proposed method can eliminate the maximum influence of
data by classifier.
redundant features on the model prediction accuracy and the
For Australian credit data, Fig 6 shows the performance final selected feature subset has higher stability.
measurement results for each classifier. By analyzing Figure 6, For future work, we will study the following two aspects:
the AUC performance of the basic RF classifier is better than 1) further study on the performance of KDSVM and find
that of other hybrid classifiers, but its prediction accuracy is not out whether its performance can be improved significantly.
as good as that of a hybrid classifier, and the accuracy of the 2) according to the credit data set, different data processing
proposed method is better than that of the best overall ConsA. methods are used to deal with it, such as data filtering method.
It was 4% higher, but in this experiment, the proposed method 3) apply the method of this paper to other fields to study
was not effective on AUC performance in Australian credit data. whether its performance is effective.
0.96

0.94 REFERENCES
0.92 [1] Miguel García-Torres, Francisco Gómez-Velaa, Belén Melián-Batista,
0.9 J.Marcos Moreno-Vega. High-dimensional feature selection via feature
grouping:A Variable Neighborhood Search approach. Information
0.88
Sciences 326 (2016) 102–118.
0.86 [2] Shashi Dahiya, S.S Handa and N.P Singh. Credit scoring ensemble of
0.84 various classifiers on reduced feature set. Industrija, vol.43, No.4, 2015.
0.82
[3] Leon Wenliang Zhong and James T. Kwok. Efficient Sparse Modeling
With Automatic Feature Grouping. IEEE TRANSACTIONS ON
0.8 NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 23, NO. 9,
RF NN SVM AVG Majvot ConsA KDSVM
SEPTEMBER 2012.
Accuracy AUC [4] L.Yu, C. Ding, S. Loscalzo. Stable feature selection via dense feature
groups, in: Proceedings of the 14th ACM SIGKDD International
Fig 6. The results of different performance measurement of Australian
Conference on Knowledge Discovery and Data Mining, 2008, pp. 803–
credit data by classifier.
811.
By analyzing and summarizing the experimental results,
[5] Jue Wang, Abdel-Rahman Hedar, Shouyang, Wang, Jian Ma. Rough
we can draw the following conclusions. In the credit scoring
set and scatter search metaheuristic based feature selection for credit
model, the performance of the hybrid classifier is generally
scoring.Expert Systems with Applications 39 (2012) 6123C6128.
better than that of the basic classifier. However, in the
[6] Danenas, P., Garsva, G. Selection of support vector machines based
performance comparison of the hybrid classifier, the
classifiers for credit risk domain. Expert Syst. Appl. 42, 3194C3204.
classification proposed in this paper. The performance of the
2015.
device is the best. Therefore, the hybrid credit scoring model
[7] S. Loscalzo, L. Yu, C. Ding. Consensus group stable feature selection, in:
proposed in this paper can be used to replace other credit Proceedings of the 15th ACM SIGKDD International Conference on
scoring models. Knowledge Discovery and Data Mining, 2009, pp. 567–576.
[8] B. Vinzamuri and C. K. Reddy. Cox regression with correlation based
VIII. CONCLUSION regularization for electronic health records. Proceedings of the IEEE
International Conference on Data Mining (ICDM), pages 757–767, 2013.
In financial institutions, the credit scoring system can be
[9] Bhanukiran Vinzamuri, Karthik K. Padthe, and Chandan K. Reddy.
used as a risk management tool to prevent losses from bad debts. Feature Grouping using Weighted L1 Norm for High-Dimensional Data.
Therefore, the credit scoring system is very important to ensure 2016 IEEE 16th International Conference on Data Mining, pages 1233–
the stability of the financial order. With the rapid development 1238, 2016.
of artificial intelligence and the Internet, machine learning [10] S. Yang, L. Yuan, Y. C. Lai, X. Shen, P. Wonka, and J. Ye. Feature
methods have been widely used to establish credit scoring grouping and selection over an undirected graph. Proceedings of the 18th
ACM SIGKDD international conference on Knowledge discovery and
models to assess consumer credit risk ratings. data mining, pages 922–930. ACM, 2012.

1835
[11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed [17] GUO Z, ZHANG T,.LI X, et al. Towards precise classification of cancers
optimization and statistical learning via the alternating direction method based on robust gene functional expression profiles [J]. BMC
of multipliers. Foundations and Trends in Machine Learning, 3(1) , pages Bioinformatics, 2005, 6(1):58.
1–122, 2011. [18] SHA H. Group feature selection based on feature clustering ensemble.
[12] Xiangrui Li, Dongxiao Zhu, Ming Dong. Multinomial classification with Microcomputer & ItsApplications [J]. 2014,33(11).
class-conditional overlapping sparse feature groups. Pattern Recognition [19] RAPAPORT F, ZINOVYEV A, DUTREIX M, et al. Classification of
Letters 101 (2018) 37–43. microarray data using gene networks [J]. BMC Bioinformatics, 2007,
[13] Nikhil Rao, Robert Nowak, Christopher Cox, and Timothy Rogers. 8(1):35.
Classification With the Sparse Group Lasso. IEEE TRANSACTIONS [20] David West. Neural network credit scoring models. Computers &
ON SIGNAL PROCESSING, VOL. 64, NO. 2, JANUARY 15, 2016. Operations Research 27 (2000) 1131-1152.
[14] Terry Harris. Credit scoring using the clustered support vector machine. [21] Lessmann, S., Baesens, B., Seow. H., & Thomas, L. C. (2015).
Expert Systems with Applications 42 (2015) 741–750. Benchmarking state-of-the-art classification algorithms for credit scoring:
[15] Sebastián Maldonado, Cristián Bravo, Julio López, Juan Pérez. Integrated An update of research. European Journal of Operational Research, 247,
framework for profit-based feature selection and SVM classification in 124-136.
credit scoring. Decision Support Systems 104 (2017) 113–121. [22] Maher Ala’raj, Maysam F. Abbod. A new hybrid ensemble credit scoring
[16] M. P. Wand and M. C. Jones.Kernel Smoothing.Chapman and Hall, 1995. model based on classifiers consensus system approach. Expert System
With Applications 64(2016) 36-55.

1836

You might also like