You are on page 1of 13

This article appeared in a journal published by Elsevier.

The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy

Knowledge-Based Systems 39 (2013) 45–56

Contents lists available at SciVerse ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Differential evolution trained kernel principal component WNN and kernel


binary quantile regression: Application to banking
Kalam Narendar Reddy, Vadlamani Ravi ⇑
Institute for Development and Research in Banking Technology (IDRBT), Castle Hills Road #1, Masab Tank, Hyderabad 500 057, AP, India

a r t i c l e i n f o a b s t r a c t

Article history: In this paper, two novel kernel based soft computing techniques viz., Differential Evolution trained Kernel
Received 20 March 2012 Principal Component Wavelet Neural Network (DE-KPCWNN) and DE trained Kernel Binary Quantile
Received in revised form 1 October 2012 Regression (DE-KBQR) are proposed for classification. While, the former can solve multi-class classifica-
Accepted 8 October 2012
tion problems, the latter can solve binary classification problems only. In the proposed DE-KPCWNN tech-
Available online 2 November 2012
nique, Kernel Principal Component Analysis (KPCA) is applied to input data to get Kernel Principal
Components, on which we will employ WNN. Then, DE is used to train the resulting KPCWNN. In DE-
Keywords:
KBQR we applied Kernel technique on the input data to get Kernel Matrix, on which we will employ
Kernel methods
Kernel binary quantile regression
BQR. Then, DE is used to train the resulting KBQR. Several experiments are conducted on four bankruptcy
Kernel principal component analysis datasets, three benchmark datasets and two Credit scoring datasets to assess the effectiveness of the pro-
Wavelet neural network posed classification techniques. The results indicate that the proposed Soft Computing hybrids for clas-
Differential evolution sification are efficient than the existing classification techniques. Out of the two, DE-KBQR performed
relatively better compared to DE-KPCWNN on a majority of binary classification problems. This is the sig-
nificant outcome of this study.
Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction to perform principal component analysis. Kernel trick is used in


Kernel Principal Component Analysis (KPCA) to find the principal
Many data mining algorithms are intrinsically so powerful that components of the higher dimensional feature space. Principal
they can efficiently mine datasets which are nonlinearly separable. Component Analysis (PCA) is a multivariate statistical technique
However, another approach of mining nonlinearly separable data- used for dimensionality reduction. PCA fails to work well if the
sets is to project the original dataset with fewer dimensions into a datasets are related in a nonlinear manner. In that case, one may
higher dimensional feature space. Support Vector Machine (SVM) think of nonlinear statistical techniques like KPCA. KPCA is a non-
follows this approach to a prodigious effect. The mapping trans- linear version of PCA. In KPCA, kernel trick will take care of the
forms the nonlinearly separable data into linearly separable data nonlinearity in the datasets. Mika et al. [49] used kernels to per-
that can be easily mined thereafter. Kernel trick is employed to form fisher discriminant analysis. Rosipal et al. [65] employed
avoid explicit mapping from original input space to higher dimen- expectation maximization to perform kernel principal component
sion feature space as the process of explicit mapping is cumber- regression. Li et al. [42] proposed Kernel Quantile Regression
some. Kernel trick is a way of mapping observations from (KQR) in which he employed Quantile Regression (QR) in reproduc-
original input space S into an inner product space V of feature ing kernel Hilbert spaces. Ravisankar and Ravi [64] proposed a new
space, without ever having to compute the higher dimensional fea- architecture called Kernel Principal Component Neural Network
ture space mapping explicitly, so that the observations will have (KPCNN) trained by threshold accepting algorithm.
linear structure in V. The use of Mercer’s theorem for interpreting This paper proposes two classification techniques viz., (i) DE
kernels as inner products in a feature space was introduced into trained Kernel Principal Comopnent Wavelet Neural Network
machine learning by Aizerman et al. [3]. Boser et al. [12] suggested (KPCA + DEWNN) and (ii) DE trained Kernel Binary Quantile
a way to create nonlinear classifiers by applying the kernel trick to Regression (KBQR). The KPCWNN employs KPCA and Differential
maximum-margin hyper planes which is used to solve the soft Evolution trained Wavelet Neural Network (DEWNN) in tandem,
margin problem of SVM. Scholkopf et al. [67] used kernel functions where KPCA is used to find the kernel principal components, which
are fed as input to Wavelet Neural Network (WNN) trained using
Differential Evolution (DE). On the other hand, DE-KBQR employs
⇑ Corresponding author. Tel.: +91 40 23534981x2042; fax: +91 40 23535157. differential evolution to train KBQR, where the kernel trick is ap-
E-mail addresses: narendarcse07@gmail.com (K.N. Reddy), rav_padma@yahoo.
plied on the original dataset and the data obtained thereafter is
com (V. Ravi).

0950-7051/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.knosys.2012.10.003
Author's personal copy

46 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

fed as input to BQR. The effectiveness of the proposed techniques is a Semi-Online training algorithm for the Radial Basis Function
tested on several benchmark datasets and bank datasets. Neural Network (SORBFN) and applied it to bankruptcy prediction
The rest of the paper is organized as follows. Section 2 presents in banks. SORBFN without linear terms performed better than
a brief explanation of the techniques employed in the paper. Sec- techniques such as ANFIS, SVM, BPNN, RBF and Orthogonal RBF.
tion 3 presents a review of techniques used for bankruptcy predic- In another work, Ravikumar and Ravi [63] conducted a comprehen-
tion of banks, credit scoring. The architectures of the proposed sive review of all the works reported using statistical and intelli-
classification techniques are presented in Section 4. Experimental gent techniques to solve the problem of bankruptcy prediction in
setup is explained in Section 5. Results and discussions are pre- banks and firms during 1968–2005. It compares the techniques
sented in Section 6, followed by conclusions in Section 7. in terms of prediction accuracy, data sources, time line of each
study wherever available. Then, Pramodh and Ravi [55] employed
modified great deluge algorithm to train an auto associative neural
2. Literature review network and applied it to bankruptcy prediction. Further, Ravi
et al. [58] developed a novel soft computing system for bank per-
The prediction of bankruptcy for financial firms especially formance prediction based on BPNN, RBF, Classification and regres-
banks has been the extensively researched area since late 1960s. sion tree (CART), Probabilistic neural network (PNN), FRBC and PCA
Bankruptcy prediction research has attracted both statisticians based hybrid techniques.
and computer scientists with the result that a number of statistical Later, Ravi and Pramodh [59] proposed a threshold accepting
techniques and more sophisticated nonparametric methods like based training algorithm for a novel Principal Component Neural
neural networks are applied to solve this problem. Beaver [8] pio- Network (PCNN), without a formal hidden layer. They employed
neered bankruptcy prediction research in 1966 with univariate PCNN for bankruptcy prediction problems and reported that PCNN
(single factor/ratio) analysis. Altman [4] is the first person who outperformed BPNN, Threshold Accepting trained Neural Network
proposed a multivariate method to predict the bankruptcy of a firm (TANN), PCA-BPNN and PCA-TANN in terms of area under receiver
using financial ratios. Every person within a firm form top manage- operating characteristic curve (AUC) criterion. Later, to solve bank-
ment to auditors of the firm is very much interested in bankruptcy ruptcy prediction problems, Ravisankar and Ravi [64] employed
prediction because it will reflect the financial health of the organi- KPCNN trained by threshold accepting based training algorithm.
zation. Internationally, regulators conduct on-site examinations on KPCNN is a nonlinear version of the PCNN. It outperformed PCNN
banks premises very frequently. Afterwards, regulators indicate with and without feature selection. Later, to solve bankruptcy pre-
the safety and soundness of the institution by using a six part rat- diction problems Gestel et al. [32] employed nonlinear kernel
ing system called as the CAMELS rating, evaluates banks according based classifiers like kernel Quadratic Discriminant Analysis
to their basic functional areas: capital adequacy, asset quality, (QDA). Later, Li et al. [40] employed the Random Subspace Binary
management expertise, earnings strength, liquidity, and sensitivity Logit (RSBL) model by taking the random subspace approach and
to market risk. While CAMELS ratings clearly provide regulators using the classical logit model to solve bankruptcy prediction prob-
with important information, Cole and Gunther [24] reported that lem. Later, Chaudhuri and De [16] employed fuzzy SVM to solve
these CAMELS ratings decay rapidly. Hence, there arose a need bankruptcy prediction problems. Chauhan et al. [17] proposed Dif-
for developing robust, powerful and accurate offline prediction ferential Evolution trained WavletNeural Network (DEWNN) for
systems based on non-statistical approaches. bankruptcy prediction in banks. The results indicate the superior
Many statistical techniques such as regression analysis and lo- performance of DEWNN as compared to Threshold Accepting
gistic regressions have been used to solve the problem of bank- trained Wavelet Neural Network (TAWNN) and the original
ruptcy prediction. These techniques make use of the company’s WNN. Most recently, to solve bankruptcy prediction problems,
financial data to predict its financial state. Bankruptcy prediction Vasu and Ravi [76] proposed a new hybrid Principal Component
problem can also be solved using various other types of classifiers Analysis + Threshold Accepting trained Wavelet Neural Network
such as case based reasoning [36], rough sets [46], support vector (PCA + TAWNN). It outperformed all the other techniques KPCNN,
machines [47], case based reasoning, neural network and discrim- DEWNN and TAWNN in terms of Accuracy. Chen et al. [20] applied
inant analysis [36] and data envelopment analysis [23] to mention GA based cost-sensitive approach to predict bankruptcy. Most re-
a few. Then, Ravi Kumar and Ravi [61] proposed a Fuzzy Rule Based cently, Olson et al. [52] compared the performance of various data
Classifier (FRBC) for bankruptcy prediction. They reported that fuz- mining methods to predict bankruptcy.
zy rule based classifier outperformed the well-known technique, On the other hand, prediction of consumer creditworthiness is a
BPNN in the case of US Banks data. very important issue in the credit industry. With the rapid growth
While the above mentioned works refer to the application of in this field, credit scoring models have been widely used for the
stand-alone statistical or intelligent techniques, literature abounds credit approval process. The credit scoring models are developed
with many papers pertaining to the application of hybrid- to distinguish which customers belong to good class (Creditwor-
intelligent techniques (soft computing) to the bankruptcy predic- thy) or bad class (NonCreditworthy) with the help of their related
tion problem. The following paragraphs present some of such attributes such as income, marital status and age, or based on the
works. Some of the hybrid techniques proposed for bankruptcy past records. Many credit scoring models have been widely devel-
prediction so far are listed in Table 11. Cheng et al. [21] combined oped to come out with improved accuracy during the past few
Radial Basis Function (RBF) network with logit analysis learning to years. Huang et al. [33] employed SVM based hybrids to a build
predict financial distress. They compared their technique with logit credit scoring model. Later, Yu et al. [82] used multistage neural
analysis and a back propagation neural network and found that network ensemble learning model to evaluate credit risk. Later
their method outperformed them. Ravikumar and Ravi [62] pro- Zhou et al. [86] employed ensemble models based on Least Squares
posed an ensemble classifier using simple majority voting scheme Support Vector Machines (LSSVMs) for building credit scoring
for the bankruptcy prediction problem based on intelligent tech- model. Later, Chen and Li [18] employed SVM classifier combined
niques such as Adaptive Neuro Fuzzy Inference Systems (ANFISs), with conventional statistical LDA, Decision Tree (DT), Rough sets.
SVM, RBF, Semi-Online RBF1, Semi-Online RBF2, Orthogonal RBF They used F-score as feature selecting model for building credit
and Back Propegation Neural Network (BPNN). They reported that, scoring models. Recently, Wang et al. [78] employed two ensem-
ANFIS, SORBF2, BPNN are the most prominent as they appeared in bles based on DT called Random Subspace (RS)-Bagging DT and
the best ensemble classifier combinations. Ravi et al. [60] proposed Bagging-RS DT to build credit scoring models.
Author's personal copy

K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56 47

3. Overview of used techniques models the relation between a set of predictor variables and spe-
cific percentiles (or quantiles) of the response variable. Let the
3.1. Wavelet neural networks equation for output variable Y is given by:
Y ¼ bh X þ h
The word wavelet is due to Grossman and Morlet [31]. Wavelets
are class of functions used to localize a given function in both space Here h is the Qunatile value which takes any value between 0 and 1.
and calling (http://mathworld.wolfram.com/wavelet.html). They bh is the vector of regression coefficients associated with the h. h is
have advantages over traditional Fourier methods in analyzing the error value.
physical situations where the signal contains discontinuities and The model for quantile regression is given by:
sharp spikes. Based on the locally supported basis functions such
Quanth ðyi =xi Þ  inffy : F i ðy=xÞhg ¼ xi bh Quanth ðuhi =xi Þ ¼ 0
as Radial Basis Function Network (RBFN), a class of neural net-
works called WNN, which originate from wavelet decomposition Here Quanth(yijxi) denotes the hth conditional quantile of yi on the
in signal processing, have become more popular recently [84,85]. regressor vector xi; bh is the unknown vector of parameters to be
A family of wavelets can be constructed from a function w(x), estimated for different values of h in (0, 1); uhi is the error term
sometimes known as a ‘‘mother wavelet,’’ which is confined in a fi- which follows a continuously differentiable cumulative density
nite interval. ‘‘Daughter wavelets’’ ua,b(x) are then formed by using function F uh ð j xÞ and a density function fuh(jx). The value Fi(jx) de-
translation (b) and dilation (a) parameters. An individual wavelet is notes the conditional distribution of y given x. Varying the value of h
defined by: from 0 to 1 reveals the entire distribution of y conditional on x.
The estimator for parameter vector b in BQR is given by
ua;b ðxÞ ¼ jaj1=2 uððx  aÞ=bÞ ð1Þ
1X n
  
Based on wavelet theory, the WNN was proposed as a universal Argminbh qh yi  I x0i  bh P 0
n i¼1
tool for functional approximation, which shows surprising effec-
tiveness in solving the conventional problem of poor convergence Here I {} is the Indicator function and qh(v) = [h  I{v < 0}]  v
or even divergence encountered in other kinds of neural networks. Traditionally, quantile regression is formulated as a linear pro-
Compared to other networks it converges very quickly [83]. WNN gramming problem [37]. However, in Li and Miu [41], simulated
uses a gradient descent technique for training. The popularity of annealing (SA) was applied to fit quantile regression. In this paper,
WNN can be seen by its applications [6,26,27,53,77,57]. we apply the differential evolution trained quantile regression on
Traditional gradient descent method suffers from well known the kernel matrix instead of the original data matrix. Thus, the
drawbacks such as entrapment in local minimum, long convergence resulting method is called DE-KBQR. Since differential evolution per-
times and the need of differentiability of the objective function. forms a population based search, it promises to yield near-global
Hence to avoid these problems recently, Pan et al. [53] used genetic optimal solution compared to SA. One related work in this direction
algorithm to optimize the WNN. Then, Vinay Kumar et al. [77] pro- is that of Moriguchi et al. [50], where kernel quantile regression is
posed threshold accepting trained WNN (TAWNN) for estimating developed for anomaly detection from time series data.
software development cost. They compared the effectiveness of
TAWNN with that of WNN, Multi Layer Perceptron (MLP), RBFN, 3.3. Differential evolution
Multiple Linear Regression (MLR), Dynamic Evolving Neuro-Fuzzy
Inference System (DENFIS) and SVM in terms of Mean Magnitude Differential Evolution (DE) [69] is a novel approach in evolu-
Relative Error (MMRE) obtained on Canadian Financial (CF) dataset tionary algorithms. DE algorithm consists mainly of four steps: ini-
and IBM Data Processing Services (IBMDPSs) dataset. They found tialization, mutation, recombination and selection. DE differs from
that TAWNN outperformed all other techniques except WNN. other population-based techniques in that it employs differential
mutation. In a population of solutions within an n-dimensional
3.2. Binary quantile regression search space, a fixed number of vectors are randomly initialized,
then evolved over time to explore the search space and to locate
QR, introduced by Koneker and Basset [37], offers a more com- the minima of the objective function. The objective function is here
plete picture of how covariates affect the response variable. Kordas to minimize the error value. Inside a generation, new vectors are
[38] introduced Binary Quantile Regression (BQR) for binary re- generated by the combination of vectors randomly chosen from
sponse variable which is the extension of the maximum score esti- the current population (mutation). The vectors so generated are
mation method introduced by Manski [44,45]. Literature abounds then mixed with a predetermined target vector. This operation is
with the applications of QR in both economics and finance. For exam- called recombination and produces the trial vector. Finally, the trial
ple, Basset and Chen [7] for return based attribution; Buchinsky vector is accepted for the next generation if and only if it yields a
[13,14] on wage effects; Conley and Galenson [25] and Gosling reduction in the value of the objective function. This last operation
et al. [30] dealing with earnings inequality and mobility; Taylor is referred to as selection. These four steps are repeated until the
[72], Chernozhukov and Umantsev [22] on value at risk. Kordas specified error criterion is met or the maximum number of itera-
[38] extends the maximum score estimation method introduced tions is reached. If either of these conditions is met then we will
by Manski [44,45] to establish the BQR model for binary data. Benoit stop the algorithm. Among many variants, Bhat et al. [11] devel-
and Van den Poel [9] proposed Bayesian approach for fitting BQR on oped an improved version of DE by hybridizing it with a sim-
the asymmetric Laplace distribution. Atella et al. [5] used QR for plex-like heuristic and applied it to solve parameter estimation
examining the relationship between obesity and wages in a cross- problems in biofilter modeling.
national perspective. Fattouh et al. [29] used QR to investigate the
evolution and determinants of Korean firms’ capital structure. 4. Proposed classification techniques
The Ordinary Least Squares technique (OLSs) estimates the rela-
tion between mean of the response variable and independent vari- 4.1. Architecture of DE-KPCWNN
ables. But it focuses on modeling the conditional mean of the
response variable without accounting for the full conditional dis- Ravisankar and Ravi [64] proposed KPCNN classifier trained by
tributional properties of the response variable. However, BQR threshold accepting with different kernels such as polynomial,
Author's personal copy

48 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

sigmoid and Gaussian. They successfully applied this architecture X with the prime representing the transpose operation. To perform
to predict bankruptcy in banks. Since it forms the motivation for KPCWNN, one should first perform nonlinear transformation of the
the current architecture, it is described in more detail. KPCNN is input dataset. It can be done by using a nonlinear function like
a nonlinear version of the PCNN. The architecture mainly involves Ø(X), which transforms the original data X into a higher dimen-
three layers namely, input layer, principal component layer and sional feature space. Then a kernel matrix K is formed by using
output layer. Input layer consists of the total number of input vari- the inner products of the new feature vectors by using different
ables of the given dataset. K-matrix is computed from the input kernels. Then, we centralize the kernel matrix. Later, we apply
data matrix using different kernels such as, polynomial, sigmoid PCA on the centralized Kernel matrix. Subsequently, the selected
and Gaussian which transform the input data into a matrix with principal component matrix of the centralized K is fed as input to
higher dimensions. Then PCA is applied on this kernel matrix in the DEWNN. We employed polynomial kernel in this study. Flow
the principal component layer. The combination of K-matrix and chart for DE-KPCWNN is depicted in Fig. 3.
PC layer can be considered as nonlinear PCA on original data. Nor- The steps involved in DE-KPCWNN algorithm are given below:
mal PCA is unable to extract the principal components from the Let xi, i = 1, 2, . . . , np be the training data, n be the total number
data if its variables are nonlinearly related. In KPCA, kernels will of samples, y be the vector representing the corresponding output
take care of the nonlinearity in the datasets. Thus, the first two lay- variable class label.
ers together perform KPCA. Finally, the last layer is the output Step 1: Applying kernel PCA
layer, which represents the class label of each sample.
In order to amplify the advantages of both KPCA and DEWNN, in (i) For the training data, compute the kernel matrix viz.
this paper, we propose a hybrid of KPCA and DEWNN in that order K = [Kij]npnp, where Kij = K(xi, xj).
resulting in DE-KPCWNN. (ii) For the testing data, compute the kernel matrix viz.
In this architecture of DE-KPCWNN (see Fig. 1), there are four [Kte](nnp)np,
layers: input layer, principal component layer, hidden layer and where Kte = K(xt, xj). Here Kte projects the test data xt onto
output layer. Principal component layer is the input layer in nor- training data xi in the high dimensional feature space in
mal WNN. Number of nodes in the input layer is equal to the total terms of the inner product.
number of input variables of the given dataset. K-matrix is com- (iii) Centralize K and Kte using
puted from the input data matrix using different kernels like poly- " # " #
1np 10np 1np 10np
nomial, sigmoid and Gaussian, which transform the input data ðiÞ K ¼ Inp  K Inp  ð2Þ
matrix into a matrix with higher dimensions. In this architecture np np
" #" #
also, there are no weights between input layer and PC layer similar 1np 1np0
1np 10np
to that of PCNN and KPCNN. In the principal component layer, we ðiiÞ K te ¼ K te  K Inp  ð3Þ
np np
perform the PCA on the kernel matrix. The number of nodes of the
principal component layer is equal to the number of principal com- (iv) Combine K and Kte matrices to form total centralized kernel
ponents selected which depends on how much variance we want matrix Ktot of order n  np, where n = np + nt.
to explain in the original data. In Fig. 1, we assume that only two (v) Perform principal component analysis. On Ktot. The principal
principal components explain the required variance. We must components are computed using the matrix equation
remember that the PC layer is equivalent to input layer in WNN. P = KtotE, where Ktot is a matrix of order of n  np, Eis a
From principal component layer, all the nodes in each layer are matrix of order np  np, whose columns are the eigenvectors
fully connected to one another in the subsequent layers. The hid- of Ktot and P is the matrix of order n  np of principal compo-
den layer performs the wavelet operations on the data in the form nents P1, P2, . . . , Pnp.
of activation function. In this study, we used the Gaussian wavelet (vi) The ratio of each of the eigenvalue to the total sum of all the
function as the activation function. The output layer contains a sin- eigenvalues indicates the proportion of variation explained
gle unit. WNN is implemented here with the Gaussian wavelet by the corresponding principal component. Since the eigen-
function. Taking cue from Chauhan et al. [17] and Ilonen et al. values are automatically sorted in the descending order, the
[35], we employed DE to train the KPCWNN. first few principal components will together explain the
maximum variance. Hence we have to pre-specify the per-
4.1.1. Algorithm for DE-KPCWNN Architecture centage of total variance we would like to explain, so that
A dataset with n samples and k features is represented by a data it will be used for the selection of the corresponding first
matrix X, and an element Xij represents the value of jth feature for few principal components, which results in dimensionality
ith sample. Let X Tj ¼ ðX 1j ; X 2j ; X 3j ; . . . ; X nj Þ denotes the jth column of

Principal Output
Input layer Component layer Hiddenlayer Layer

KPC (1)

KPC (2)

Fig. 1. Architecture of DE-KPCWNN.


Author's personal copy

K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56 49

Kernel β vector Output


matrix Values Value
Input data Apply Train BQR Apply
various
kernel Using DE BQR
Techniques

Fig. 2. Architecture of DE-KBQR.

reduction. The number of nodes (nin) in the principal com- PoP ¼ ðR1 ; R2 ; R3 ; . . . ; RM Þ
ponent layer is equivalent to the number of principal com-
The vector size m is given by:
ponents selected.
(vii) As we selected the first nin principal components according m ¼ ðnin þ non þ 2Þ  nhn
to the required variance we want to explain we will form a
new matrix of choosen pcs [Y]nnin by ignoring the last
(np-nin) columns of the P0 matrix. 4.2. Architecture of DE-KBQR

Step 2: Training WNN using differential evolution In DE-KBQR, we apply the kernels like polynomial, sigmoid and
After applying the KPCA on the original data matrix, we have Y Gaussian to compute the Kernel Matrix from the input data which
matrix of order n  nin. In this step, we train the WNN using the DE transforms the input data into higher dimensions. Then, we apply
algorithm. the BQR technique on the kernel matrix to compute how the covar-
iates are affecting the binary response variable for different quan-
(1) Select the number of hidden nodes (nhn) required. Initialize tiles viz. 0.05, 0.15, 0.25, 0.35, 0.45, 0.5, 0.55, 0.65, 0.75, 0.85 and
the dilation and translation parameters for these nodes, 0.95.
weights for the connections between the input and hidden We employ DE to train the KBQR model, in order to estimate the
layers and also for the connections between the hidden regression coefficients (b values) corresponding to all the indepen-
and the output layers using random numbers generated dent variables for each quantile of the response variable. Now we
from U(0, 1) distribution. have 11 predicted values of the response variable corresponding
(2) The output of the sample Vk, where k = 1, 2, 3, . . . , n and n is to different quantiles from 0.05 to 0.95. Then we apply majority
the number of samples, is computed with the following voting technique to find the output value of the response variable.
formula: Its architecture is depicted in Fig. 2 .
Pnin !
X
nhn
i¼1 W ij yki  t j 4.2.1. Algorithm for DE-KBQR architecture
Vk ¼ Wjf ð1Þ
j¼1
dj A dataset with m samples and n features is represented by a
data matrix X, and an element Xij represents the value of jth feature
where k = 1, 2, 3, . . . , n and nin = number of nodes in the principal
for ith sample. Let Xj = (X1j, X2j, X3j, . . . , Xmj) denotes the jth column
component layer and nhn = number of nodes in the hidden layer.
of X with the prime representing the transpose operation. To
In (2) when f(t) is taken as a Morlet mother wavelet it has the fol-
implement KBQR, one should first perform nonlinear transforma-
lowing form:

f ðtÞ ¼ cosð1:75tÞ expðt2 =2Þ


Divide the data into
and when Gaussian Wavelet is considered it becomes: training and test sets

f ðtÞ ¼ expðt 2 Þ

Apply the Kernel technique


Thus, the output of a WNN is a function of weights (W) from in-
to both matrices
put layer to hidden layer, weights (w) from hidden layer to output
layer, dilation parameters D, translation parameters T and input
values X as shown above.
Hence in order to train the WNN we will initially take a popu- Centralize both the Kernel
lation of vectors R which is equal to the size of population M. Each matrices using Eq (1) & (2)
vector R consists of:

(i) Weight values from input nodes to hidden nodes W = {Wij,


here i = 1, 2 , . . . , nin, where nin = number of PCs and Train BQR using
j = 1, 2, 3, . . . , nhn; where nhn = number of hidden nodes} Differential Evolution
(ii) Weight values from hidden nodes to output nodes w = {wjk,
j = 1, 2, . . . , nhn and k = 1, 2, . . . , non, where non = number of
output nodes. Here it is 1}.
(iii) Dilation parameters D = (d1, d2, d3, . . . , dnhn) Compute test set
(iv) Translation parameter T = (t1, t2, t3, . . . , tnhn) sensitivity, specificity and
Accuracy
A population PoP in each generation consists of M number of R
such vectors where M is the size of population as below: Fig. 3. Flow chart for DE-KPCWNN.
Author's personal copy

50 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

(IV) Combine K and Kte matrices to form total centralized kernel


Divide the data into
matrix Ktot of order n  np,
training and test sets

Step 2: Training KBQR using differential evolution


After applying the Kernel technique on the input matrix the
Apply the Kernel resultant data is in Z matrix of order n  np. In this step, we train
technique to both matrices the BQR using the DE algorithm.
The model for KBQR is given by:

1 X
np
Argminbh q ðy  If£ðxi Þ  bhi P 0gÞ
Centralize both the Kernel np i¼1 h i
matrices using Eq (1) & (2)
where £(x) is the kernel matrix and bh is the parameter vector and I
{} is the indicator function and
Apply PCA on both qh ðv Þ ¼ ½h  Ifv < 0g  v
centralized matrices

(I) Initialize the regression coefficients b = (b1, b2, b3, . . . , bnp)


which are corresponding to the each input variable, i.e. the
Select Number of PC’s columns in the Z to some random values generated from
based on Variance value U(0, 1) distribution.
(II) The output of the sample yk, where k = 1, 2, 3, . . . , n and n is
the number of samples, is computed with the following
Form the Principal formula:
Components matrix
Y ¼ bh X
In order to train the WNN using DE, we initially consider a popula-
Train WNN using tion of vectors R, whose size is equal to population, where M is the
Differential Evolution populations, i.e.. Each vector R consists of regression coefficients b
given by:
R ¼ ðb1 ; b2 ; b3 ; . . . ; bnp Þ

We have M such vectors (R1, R2, R3, . . . , RM).


Compute test set sensitivity, Steps in training KBQR using DE are explained in Section 4.3.
specificity and Accuracy
4.3. Steps of DE common to both architectures

Fig. 4. Flow chart for DE-KBQR. (i) Initialization


The initial population is randomly initialized following uni-
tion of the input dataset. It can be done by using a nonlinear func- form distribution using the user specified lower and upper
tion like Ø(X), which transforms the original data X into a higher bounds for all the parameters as follows:
dimensional feature space. Then a kernel matrix K is formed by Ri ¼ Rimin þ randð0; 1Þ  ðRimax  Rimin Þ ð4Þ
using the inner products of the feature vectors by using different
kernels. Then, we perform the centralization of kernel matrix. Then (ii) Mutation
BQR is applied on the centralized K matrix, which performs binary Mutation is basically a search mechanism, which, together
classification on the K matrix. Here we will train the BQR using DE. with recombination and selection, directs the search
The property called ‘‘kernel trick’’ in the literature helps in the ana- towards potential areas of optimal solution. In this step,
lyzing nonlinear data by using linear classifiers. Flow chart for DE- three distinct target vectors Ra, Rb and Rc are randomly cho-
KBQR is depicted in Fig. 4. sen from the M vectors of the parent population on the basis
The steps involved in DE-KBQR are given below: of three random numbers a, b and c. One of the vectors Rc is
Step 1: Applying kernel technique the base of the mutated vector. To this is added the weighted
difference of the remaining two vectors, i.e. (Ra  Rb) to gen-
(I) Compute the kernel matrix, for the training data, erate a noisy random vector Ni as follows
K = [Kij]npnp, where Kij = K(xi, xj) Ni ¼ Rc þ F  ðRa  Rb Þ ð5Þ
(II) Compute the kernel matrix, for the testing data,
[Kte](nnp)np, where i = 1, 2, . . . , M and a, b, c are random numbers between 1 and
where Kte = K(xt, xj). Here, Kte projects the test data xt onto M. F is termed the scaling factor and it is user-specified. This muta-
training data xi in the high dimensional feature space in tion process is repeated to create a mate for each member of the
terms of the inner product. parent population. Mutation ensures an efficient search of the solu-
(III) Centralize K and Kte using tion space in each dimension.
" # " # (iii) Recombination (crossover)
1np 10np 1np 10np In the recombination (crossover) operation, each target vec-
ðiÞ K ¼ Inp  K Inp  ð2Þ
np np tor of the parent population is allowed to mate with a
" 0
#" # mutated vector Thus, vector Ri is recombined with the noisy
1np 1np 1np 10np
ðiiÞ K te ¼ K te  K Inp  ð3Þ random vector ni to generate a trial vector Ti. Each element
np np of the trial vector (tij, where i = 1, . . . , M and j = 1, . . . , n), is
Author's personal copy

K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56 51

determined by a binomial experiment whose success or fail- M such competitions for each vector will give the new generation
ure is determined by the user supplied crossover factor, CR. with M vectors which will give the best population than the earlier
The parameter CR is used to control the rate at which the population with which we started.
crossover takes place. The above four steps are repeated until a certain convergence
criterion is met. The convergence criterion used is as follows: first,
tji ¼ nji ; if Randð0; 1Þ < CR or j ¼ randð1; . . . ; nÞ we sort all the M vectors in a new generation in the ascending or-
¼ Rji ; otherwise; where i ¼ 1; 2; . . . ; M and j der of the error value. We stop the training procedure when the
difference between error values corresponding to first vector and
¼ 1; 2; 3; . . . ; n
last vector is less than a predefined error tolerance (106) or the
(iv) Selection maximum number of generations is completed, whichever hap-
In the selection stage, we will select either target vector or pens earlier.
trail vector which will fit the objective function more. Here Eventually, in case of KPCWNN we get the final population con-
as our problem is classification problem, so the objective is sisting of vector sets of weights from PC layer to hidden layer,
to select one of the trail vector and target vector which gives weights from hidden to output layer, dilation and translation
less error. Error will be calculated for both target and trail parameters. However, in the case of KBQR, we get the final popula-
vector. The vector giving minimum error will go to new tion consisting of vector sets of coefficients corresponding to each
generation. input variable. Out of these, we choose the best set as the optimal
set of decision variables, using which we make predictions on the
For KPCWNN the error value considered is NRMSE value. test set. Flow chart for training using DE is depicted in Fig. 5.
For KBQR the error value is given by:
5. Datasets description and experimental setup
1 X
np
q ðy  If£ðxi Þ  bhi P 0gÞ
np i¼1 h i
The datasets analyzed by us in this work are three different
bankruptcy prediction data sets viz. Turkish Banks, Spanish Banks

Generate randomly M Calculate the objective function f ( )


vectors using eq [4] value for all vectors

Generate noisy vectors using


Is Rand(0,1)< CR Eq (5)
or j=rand(1,n)
No Yes
Assign noisy vectors to trail
vector
Assign target vector to trail
vector

Is trail vector
within bounds
Yes
No

Keep it within bounds


Yes
f(targeti) <
f( traili)

Replace target by trail and f


No (target) by f(trail)

Yes
Is Convergence
criteria met?

No
No
Yes
Max Iterations?
Report results

Fig. 5. Flowchart for differential evolution.


Author's personal copy

52 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

and US Banks datasets; German and UK Credit datasets and three The parameters used for DE-KBQR were minimum weight, max-
other benchmark datasets viz., Iris data, wine data, Wisconsin imum weight, population size, scaling factor F, crossover rate CR,
breast cancer data. Turkish Banks’ dataset is obtained from Canbas maximum number of allowed generations and convergence crite-
et al. [15] is available at (http://www.tbb.org.tr/english/bulten/yil- ria In DEKBQR, we employed DE to train the KBQR model in order
lik/2000/ratios.xls). Banks association of Turkey published 49 to estimate the regression coefficients (b values) corresponding to
financial ratios. Canbas et al. [15] applied univariate analysis of all the independent variables for each quantile of the response var-
variance (ANOVA) test on these 49 ratios of previous year for pre- iable. Thus, we have 11 predictions of the response variable
dicting the health of the bank in present year. However, Canbas
et al. [15] chose only 12 ratios as the early warning indicators that
have the discriminating ability (i.e. significant level is <5%) for Table 2
Average results for 10-fold cross validation for Spanish Banks dataset.
healthy and failed banks 1 year in advance. Among these variables,
12th variable has some missing values meaning that the data for Classifier Sensitivity Specificity Accuracy
some of the banks are not given. So, we filled those missing values DEWNN 94.16 86 88.33
with the mean value of the variable following the general approach KPCNN 94.17 92.17 91.17
in data mining. The financial ratios, which are considered as pre- DEBQR (proposed method) 100 88.99 94.99
PCA-TAWNN 100 100 100
dictor variables are presented at the end of the paper in Table 10.
DE-KPCWNN (proposed method) 100 100 100
This dataset contains 40 banks where 22 banks went bankrupt DE-KBQR (proposed method) 100 100 100
and 18 banks are healthy. The Spanish Banks’ data is obtained from
Olmeda and Fernandez [51]. Spanish Banking industry suffered the
worst crisis during 1977–1985 resulting in a total cost of 12 billion
dollars. The ratios used for the failed banks were taken from the Table 3
Average results for 10-fold cross validation for US Banks dataset.
last financial statements before the bankruptcy was declared and
the data of non-failed banks was taken from 1982 statements. This Classifier Sensitivity Specificity Accuracy
dataset contains 66 banks where 37 went bankrupt and 29 healthy DEWNN 97.32 89.78 93.33
banks. The US Banks’ data is obtained from Rahimain et al. [56]. DEBQR (proposed method) 91.66 96.66 94.2
The financial ratios used by them are presented in Table 10. They DE-KPCWNN (proposed method) 96.66 93.33 95
DE-KBQR (proposed method) 96.66 100 98.33
obtained the data of 129 banks from the Moody’s Industrial Man-
ual, where banks went bankrupt during 1975–1982. This 129 US
Banks dataset contains 65 went bankrupt and 64 healthy banks.
Again, the financial ratios used by them are presented in Table 10. Table 4
Average results for 10-fold cross validation for UK Banks dataset.
The UK Banks’ dataset, taken from Ref. [10], consists of 60 banks
with 10 features, out of which 30 are healthy and 30 are bankrupt. Classifier Sensitivity Specificity Accuracy
The UK Credit dataset is obtained from a financial services com- KPCNN 80.25 75.83 80
pany of England [73]. The dataset has information of 1225 appli- DEBQR (proposed method) 83.33 79.99 81.66
cants, of which 323 are bad and the rest 903 are good. The PCA-TAWNN 95.5 78.32 89.98
DE-KPCWNN (proposed method) 96.66 83.333 89.99
German Credit dataset has information of 1000 customers, 700
DE-KBQR (proposed method) 89.33 86.24 87.78
good customers and 300 bad customers and it is taken from UCI
repository (http://archive.ics.uci.edu/ml). The benchmark datasets
are taken from UCI repository (http://archive.ics.uci.edu/ml).
Throughout this study, we performed the 10-fold cross valida- Table 5
Average results for 10-fold cross validation of UK Credit dataset.
tion method of testing. The results presented in the tables reflect
the average results over the 10 folds. Classifier Sensitivity Specificity Accuracy
PCA + SVM 73.02 43.45 51.18
DE-BQR (proposed method) 21.0 96.0 76.5
6. Results and discussion
DE-KBQR (proposed method) 76.56 43.9 53.526

We implemented both DE-KPCWNN and DE-BQR using Java


(JDK 1.5) on Windows 7 platform on desktop with a RAM of 2 GB.
Table 6
The parameters used for DE-KPCWNN are the number of hidden
Average results for 10-fold cross validation of German Credit dataset.
nodes, minimum weight, maximum weight, population size, scal-
ing factor F, crossover rate CR, maximum number of allowed gen- Classifier Sensitivity Specificity Accuracy
erations and convergence criteria. We have taken F value in the DE-KBQR (proposed method) DE-KBQR 61.1 71.7 69.03
range 0.5–0.9. CR is taken in the range 0.4–0.9. Minimum weight DE-BQR 53.7 85.36 75.91
is 1 and maximum weight is +1, numbers. Convergence tolerance DE-KPCWNN (proposed method) 54.47 90.61 80.17
Voting based PWBTS 43.41 92.59 78.13
limit is 106. As regards the parameters of KPCNN, in case of poly- RBF-SVM – – 76.60
nomial kernel, the values of p1 and p2 are varied in the range 1–5. LDA + SVM – – 76.10
Bagging-RS DT – – 78.52
SVM + GA – – 77.92
Table 1
Average results for 10-fold cross validation for Turkish Banks dataset.

Classifier Sensitivity Specificity Accuracy Table 7


DEWNN 100 80 90 Average results for 10-fold cross validation for other benchmark datasets.
KPCNN 88.33 95.5 89.99
Dataset DEWNN (%) DE-KPCWNN (%) DE-KBQR (%) DE-BQR
DEBQR (proposed method) 100 85 95
PCA-TAWNN 100 100 100 Iris 97.99 98.5 – –
DE-KPCWNN (proposed method) 95 100 97.5 Wine 97.6 98.1 – –
DE-KBQR (proposed method) 100 100 100 WBC 97.05 97.4 97.52 95.02
Author's personal copy

K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56 53

corresponding to different quantiles viz., 0.05, 0.15, 0.25, 0.35, 0.45, For the Turkish and Spanish datasets the results of the proposed
0.5, 0.55, 0.65, 0.75, 0.85, and 0.95. Then, we ensemble the predic- architectures DE-KPCWNN and DE-KBQR are compared with that
tions of the response variable corresponding to the 11 quantiles by of KPCNN [64], PCA-TAWNN [76] and DEWNN [17]. For the US
resorting to the majority voting technique to find the final pre- Bankruptcy dataset, we compare the results with that of DEWNN
dicted value of the response variable. only. For the UK Banks dataset, we compare the results with that
Table 8 presents the details of each of the dataset and the num- of KPCNNonly. For the UK Credit dataset, we compare the results
ber of Kernel Principal Components selected after employing the with that of PCA + SVM [28] hybrid only. For the benchmark data-
KPCA on input data for DE-KPCWNN architecture. sets Wine, WBC and Iris, we applied DE-KPCWNN. DE-KBQR is ap-
plied only to WBC dataset because it can solve only for binary
classification problems. We compare the results of the proposed
architectures with that of DEWNN. The results are presented in
Table 8 Tables 1–7.
Datasets details.

Dataset name Number of Number of Number of


samples attributes kernel PCs
selected in
DE-KPCWNN Table 9
Turkish banks dataset 40 12 12 t-Test on accuracy between DE-KPCWNN and DE-KBQR.
Spanish banks dataset 66 9 9
Dataset name t-Statistic value
US Banks dataset 128 5 3
UK Banks dataset 60 11 13 Turkish Bankruptcy 0.474
UK Credit dataset 1225 12 – Spanish Bankruptcy 0
German Credit dataset 1000 24 20 US Banks dataset 0.562
Wbc 673 9 14 UK Banks dataset 0.316
Wine 160 13 19 German Credit dataset 4.35
Iris 150 4 7 WBC 0.02

Table 10
Financial ratios of the datasets.

Turkish Bank’s data


1 Interest expenses/average profitable assets
2 Interest expenses/average non-profitable assets
3 (Share holders equity + total income)/(deposits + nondeposit funds)
4 Interest income/interest expenses
5 (Share holders equity + total income)/total assets
6 (Share holders equity + total income)/(total assets + contingencies and commitments)
7 Networking capital/total assets
8 (Salary and employees benefits + reserve for retirement)/no. of personnel
9 Liquid assets/(deposits + nondeposit funds)
10 Interest expenses/total expenses
11 Liquid assets/total assets
12 Standard capital ratio

Spanish Banks’ data


1 Current assets/total assets
2 Current assets-cash/total assets
3 Current assets/loans
4 Reserves/loans
5 Net income/total assets
6 Net income/total equity capital
7 Net income/loans
8 Cost of sales/sales
9 Cash flow/loans

UK Banks’ data
1 Sales
2 Profit before tax/capital employed (%)
3 Funds flow/total liabilities
4 (Current liabilities + long-term debit)/total assets
5 Current liabilities/total assets
6 Current assets/current liabilities
7 Current assets  stock/current liabilities
8 Current assets  current liabilities/total assets
9 LAG (number of days between account year end and the date of annual report
10 Age

US Banks’ data
1 Working capital/total assets
2 Retained earnings/total assets
3 Earnings before interest and taxes/total assets
4 Market value of equity/total assets
5 Sales/total assets
Author's personal copy

54 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

Table 11
A review of hybrids used proposed for bankruptcy prediction.

Based on training standalone techniques using optimization algorithms


GA + SVM GA is used for training SVM parameters [48,2,79]
GP + NLN Gnetic programming used for finding the neural logic network topology and parameters [75]
GA + MLP GA used for finding MLP weights [54,66]
DE + WNN DE is used for optimizing the WNN parameters [17]
PSO + FKNN PSO is used to find optimal neighborhood (k) value of k-nearest neighbor and fuzzy strength
parameter (m) [19]
Based on feature selection and standalone technique hybrids
LDA + MLP Features are selected by LDA and reduced features are then used in training MLP [39]
MLP + LDA + LR + MARS + C4.5 Ensemble based on aggregation by voting and weighted sum are explored. GA is used to find
aggregation weights [51]
GA + MLP GA is used to select subset of features which are fed to MLP for training [1]
RS + SVM RS is used to select features which are fed to SVM [87]
PCA + MLP (PCNN) To reduce dimensionality, input features are transformed by applying PCA and then trained MLP
using these reduced feature set [59]
KPCA + MLP (KPCNN) KPCA is applied on input data which is nonlinear version of PCA is applied on input data to get
reduced features in kernel space and these reduced features are used training MLP [64]
PCA + TAWNN PCA and threshold accepting WNN are applied in tandem [76]
PLS + SVM partial least squares (PLSs) is used for feature selection and SVM for bankruptcy prediction using
these reduced features [80]
PCAnFAnt-testncorrelation matrixnstepwise regression + MLP PCAnFAnt-testncorrelation matrixnstepwise regression is used for feature selection and MLP is
used for building the model with reduced features. t-test + MLP outperformed other hybrids [74]
Based on hybrids using ensemble of different techniques
PCA + MLP To reduce dimensionality, input features are transformed by applying PCA and then train
ensemble of MLPs using these reduced feature set [68]
SVM Bagged ensemble of SVMs, where different ensemble members use different hyper parameters
[81]
MLP + LR + LDA + C 5.0 The members are aggregated by weighted voting [43]
MLP + SVM + LDA + LR + CBR The members are aggregated by the weighted majority voting rule [70]
MLP + RBF + PNN + SVM + DT + FRB + PCAMLP + PCARBF + PCAPNN Nine different architectures are aggregated by the majority voting or weighted averaging rules
[58]
ANFIS + SVM + MLP + four types of RBF Experiments with a varying ensemble size. The optimal size and the structure were data
dependent [61]
SVM A financial distress prediction model based on SVM ensemble. Mda/pca/logitis used for feature
selection and SVM with different kernels are used as candidates of ensemble [71]
DT + BPNN + SVM This ensemble is based on the expected probability values of bankruptcy and nonbankruptcy [34]

For Spanish dataset, sensitivity improved from 94.17% to 100%, For the German Credit dataset, we compare the present results
while accuracy got increased from 95% to 100% by employing with that of PCA + SVM [28], LDA + SVM [18], SVM + GA [33] and
DE-KBQR. By employing DE-KPCWNN accuracy increased from Bagging-RS DT [78] hybrids. DE-KBQR achieved a higher sensitivity
92.17% to 100%. Thus, it is clear that both DE-KPCWNN and of 61.1% but there is a significant drop of nearly 20% in specificity,
DE-KBQR outperformed other techniques. while DE-KPCWNN outperformed all the techniques in terms of all
For Turkish dataset, sensitivity of 100% is achieved; while accu- measures, where accuracy increased to 80.14 from 78.52.
racy got increased from 94.17% to 100% by employing DE-KBQR. Moreover, t-test is applied at 1% level of significance on 10-folds
Thus, in this case also, DE-KBQR is superior to other techniques of all the datasets to see if the difference between average accura-
and equal to DEWNN, PCA-TAWNN and DEBQR. cies obtained by DE-KPCWNN and DE-KBQR is statistically signifi-
In the case of US Bankruptcy dataset accuracy got increased cant or not. The t-test values presented in Table 9 are compared to
from 94.2% to 98.33% by employing DE-KBQR, while by employing 2.83, which is the t-test table value at 18 degrees of freedom
DE-KPCWNN accuracy increased from 92.17% to 95%. (10 + 10  2 = 18) and 1% level of significance. The t-test values
Finally, in the case of UK Bankruptcy dataset, sensitivity got in- for five datasets demonstrate that the there is no statistically sig-
creased from 83.33% to 96.66%, accuracy got increased from 81.66% nificant difference between them. For the German Credit dataset,
to 89.99% by employing DE-KPCWNN, whereas by employing DE- the t-test indicates that the DE-KPCWNN outperformed DE-KBQR
KBQR, the accuracy got increased from 81.66 to 84.99. Hence, for in terms of accuracy, even though the latter scored marginally in
the bankruptcy prediction datasets, DE-KPCWNN outperformed terms of sensitivity. So, we can conclude that the difference be-
other classification techniques studied here. tween DE-KPCWNN and DE-KBQR is statistically insignificant as
Thus, the proposed architectures viz., DE-KPCWNN and DE- it happened in majority of datasets. Thus, it can be concluded that
KBQR outperformed both the DEWNN and KPCNN in terms of accu- besides being robust, both DE-KPCWNN and DE-KBQR are effective
racy, sensitivity and specificity for all the datasets except in the algorithms for solving classification problems occurring in finance.
case of Turkish and US Bankruptcy, whereas DEWNN has achieved
marginally higher sensitivity than the DE-KPCWNN. In case of 7. Conclusions
some binary classification problems DEKBQR outperformed all
the other techniques including the proposed DE-KPCWNN. In this study, we proposed two new novel soft computing archi-
For the UK Credit dataset, which is a tougher dataset, DE-KBQR tectures DE-KPCWNN and DE-KBQR. We compared them with both
outperformed PCA + SVM hybrid in terms of all measures. DE- KPCNN and DEWNN on benchmark datasets viz. Iris dataset, Wine
KBQR increased sensitivity to 76.56% from 73.02%, specificity to dataset and Wisconsin Breast Cancer dataset as well as bankruptcy
43.9% from 43.45% and accuracy to 53.52% from 51.18%. Thus, datasets viz. US Banks, Turkish banks, Spanish Banks, UK Banks and
DE-KBQR produced the best result compared to all other tech- credit scoring datasets viz., German Credit and UK Credit. The re-
niques applied so far. sults demonstrated the superior performance of DE-KPCWNN,
Author's personal copy

K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56 55

DE-KBQR over DEWNN and the KPCNN. Based on the results, we to long-term, unsupervised, gastrointestinal motility monitoring, Expert
Systems with Applications 34 (1) (2008) 26–41.
conclude that DE-KPCWNN and DE-KBQR can be very effective soft
[27] L. Dong, D. Xiao, Y. Liang, Y. Liu, Rough set and fuzzy wavelet neural network
computing tools for solving classification problems in finance like integrated with least square weighted fusion algorithm based fault diagnosis
bankruptcy prediction and credit scoring applications. The reasons research for power transformers, Electric Power Systems Research 78 (1)
could be the employment of kernel trick in conjunction with some (2008) 129–136.
[28] M.A.H. Farquad, V. Ravi, Sreeramji, G. Praveen, Credit Scoring using PCA-SVM
proven intelligent techniques. Future directions include construct- hybrid model, in: Second International Conference on Recent Trends in
ing more kernel techniques in this direction and also developing Information, Telecommunication and Computing – ITC 2011, March 10–11,
online training algorithms for some of the kernel techniques. Obvi- Bangalore, India.
[29] B. Fattouh, P. Scaramozzino, L. Harris, Capital structure in South Korea: a
ously, online training algorithms consume less time, which is quantile regression approach, Development Economics 76 (1) (2005) 231–250.
important in some applications. [30] A. Gosling, S. Machin, C. Meghir, The changing distribution of male wages in
the UK, The Review of Economic Studies 67 (4) (2000) 635–666.
[31] A. Grossmann, J. Morlet, Decomposition of Hardi functions into square
References integrable wavelets of constant shape, SIAM Journal of Mathematical
Analysis 15 (1984) 725–736.
[32] T.V. Gestel, B. Baesens, D. Martens, From linear to non-linear kernel based
[1] T. Abdelwahed, E.M. Amir, New evolutionary bankruptcy forecasting model
classifiers for bankruptcy prediction, Neurocomputing 73 (2010) 2955–2970.
based on genetic algorithms and neural networks, in: Proceedings of the 17th
[33] C.L. Huang, M.C. Chen, C.J. Wang, Credit scoring with a data mining approach
IEEE International Conference on Tools with Artificial Intelligence (ICTAI05),
based on support vector machines, Expert Systems with Applications 33 (4)
IEEE Computer Society, 2005, pp. 1–5.
(2007) 847–856.
[2] H. Ahn, K. Lee, K.J. Kim, Global optimization of support vector machines using
[34] C. Hung, J.H. Chen, A selective ensemble based on expected probabilities for
genetic algorithms for bankruptcy prediction, Lecture Notes in Computer
bankruptcy prediction, Expert Systems with Applications 36 (3) (2009) 5297–
Science, vol. 4234, Springer, Heidelberg, 2006, pp. 420–429.
5303.
[3] M.A. Aizerman, E.M. Braverman, L.I. Rozonoer, Theoretical foundations of the
[35] J. Ilonen, J.K. Kamarainen, J. Lampinen, Differential evolution training
potential function method in pattern recognition learning, Automation and
algorithm for feed-forward neural networks, Neural Processing Letters 17 (1)
Remote Control 25 (6) (1964) 821–837.
(2003) 93–105.
[4] E.I. Altman, Financial ratios, discriminant analysis and the prediction of
[36] H. Jo, I. Han, H. Lee, Bankruptcy prediction using case-based reasoning, neural
corporate bankruptcy, Journal of Finance 23 (1968) 589–609.
networks and discriminant analysis, Expert Systems with Applications 13 (2)
[5] V. Atella, N. Pace, D. Vuri, Are employers discriminating with respect to
(1997) 97–108.
weight? European evidence using quantile regression, Economics and Human
[37] R. Koenker, G.B. Bassett, Regression quantiles, Econometrica 46 (1) (1978) 33–50.
Biology 6 (3) (2008) 305–329.
[38] G. Kordas, Smoothed binary regression quantiles, Applied Econometrics 21 (3)
[6] E. Avci, An expert system based on wavelet neural network-adaptive norm
(2006) 387–407.
entropy for scale invariant texture classification, Expert Systems with
[39] K.C. Lee, I. Han, Y. Kwon, Hybrid neural network models for bankruptcy
Applications 32 (3) (2007) 919–926.
predictions, Decision Support Systems 18 (1996) 63–72.
[7] G. Bassett, H.L. Chen, Quantile style: return-based attribution using regression
[40] H. Li, Y.C. Lee, Y.C. Zhou, J. Sun, The random subspace binary logit (RSBL) model
quantiles, Empirical Economics 26 (2001) 293–305.
for bankruptcy prediction, Knowledge-Based Systems 24 (8) (2011) 1380–
[8] W. Beaver, Financial ratios as predictors of failure, Journal of Accounting
1388.
Research 5 (1966) 71–111.
[41] M-Y.L. Li, P. Miu, A hybrid bankruptcy prediction model with dynamic loadings
[9] D.F. Benoit, D. Van den Poel, Binary quantile regression: a Bayesian approach
on accounting-ratio-based and market-based information: a binary quantile
based on the asymmetric laplace distribution, Journal of Applied Econometrics
regression approach, Empirical Finance 17 (4) (2010) 818–833.
(2010). 10.1002.jae.1216.
[42] Y. Li, Y. Liu, J. Zhu, Quantile regression in reproducing kernel Hilbert spaces,
[10] M.J. Beyon, M.J. Peel, Variable precision rough set theory and data
American Statistical Association 102 (477) (2007).
discretization: an application to corporate failure prediction, Omega 29 (6)
[43] F.Y. Lin, S. McClean, A data mining approach to the prediction of corporate
(2001) 561–576.
failure, Knowledge Based Systems 14 (2001) 189–195.
[11] T.R. Bhat, D. Venkataramani, V. Ravi, C.V.S. Murty, Improved differential
[44] C. Manski, Maximum score estimation of the stochastic utility model of choice,
evolution method for efficient parameter estimation in biofilter modeling,
Econometrics 3 (3) (1975) 205–228.
Biochemical Engineering Journal 28 (2) (2006) 167–176.
[45] C. Manski, Semi parametric analysis of discrete response: asymptotic
[12] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin
properties of the maximum score estimator, Journal of Econometrics 27 (3)
classifiers, in: COLT 92: Proceedings of the Fifth Annual Workshop on
(1985) 313–333.
Computational Learning Theory, ACM Press, New York, 1992, pp. 144–152.
[46] T.E. McKee, Developing a bankruptcy prediction model via rough set theory,
[13] M. Buchinsky, Changes in U.S. wage structure 1963–1987: an application of
Intelligent Systems in Accounting, Finance and Management 9 (3) (2000) 159–
quantile regression, Econometrica 62 (2) (1994) 405–458.
173.
[14] M. Buchinsky, The dynamics of changes in the female wage distribution in the
[47] J.H. Min, Y.C. Lee, Bankruptcy prediction using support vector machine (SVM)
USA: a quantile regression approach, Journal of Applied Econometrics 13 (1)
with optimal choice of kernel function parameters, Expert Systems with
(1998) 1–30.
Applications 28 (4) (2005) 603–614.
[15] S. Canbas, B. Caubak, S.B. Kilic, Prediction of commercial bank failure via
[48] S.H. Min, J. Lee, I. Han, Hybrid genetic algorithms and support vector machines for
multivariate statistical analysis of financial structures: the Turkish case,
bankruptcy prediction, Expert Systems with Applications 31 (2006) 652–660.
European Journal of Operational Research 166 (2005) 528–546.
[49] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Muller, Fisher dicriminant
[16] A. Chaudhuri, K. De, Fuzzy support vector machine for bankruptcy prediction,
analysis, neural networks for signal processing, in: Proceedings of the 1999
Applied Soft Computing 11 (2) (2011) 2472–2486.
IEEE Signal Processing Society Workshop, vol. IX, 1999, pp. 41–48.
[17] N.J. Chauhan, V. Ravi, D.K. Chandra, Differential evolution trained wavelet
[50] H. Moriguchi, I. Takeuchi, M. Karasuyama, S. Horikawa, Y. Ohta, T. Kodama, H.
neural network: application to bankruptcy prediction in banks, Expert Systems
Naruse, Adaptive kernel quantile regression for anomaly detection, Advanced
with Applications 36 (4) (2009) 7659–7665.
Computational Intelligence and Intelligent Informatics 139 (3) (2009) 30–236.
[18] F.L. Chen, F.C. Li, Combination of feature selection approaches with SVM in
[51] I. Olmeda, E. Fernandez, Hybrid classifiers for financial multi criteria decision
credit scoring, Expert Systems with Applications 37 (7) (2010) 4902–4909.
making: the case of bankruptcy prediction, Computational Economics 10
[19] H.L. Chen, B. Yang, G. wang, J. Liu, X. Xu, S.J. Wang, D.Y. Liu, A novel bankruptcy
(1997) 317–335.
prediction model based on an adaptive fuzzy k-nearest neighbor method,
[52] D.L. Olson, D. Delen, Y. Meng, Comparative analysis of data mining methods for
Knowledge Based Systems 24 (8) (2011) 1348–1359.
bankruptcy prediction, Decision Support Systems 52 (2) (2012) 464–473.
[20] N. Chen, B. Ribeiro, A.S. Vieira, J. Duarte, J.C. Neves, A genetic algorithm-based
[53] C. Pan, W. Chen, Y. Yun, Fault diagnostic method of power transformers based
approach to cost-sensitive bankruptcy prediction, Expert Systems with
on hybrid genetic algorithm evolving wavelet neural network, IET Electric
Applications 38 (10) (2011) 12939–12945.
Power Applications 2 (1) (2008) 71–76.
[21] C.B. Cheng, C.L. Chen, C.J. Fu, Financial distress prediction by a radial basis
[54] P.C. Pendharkar, J.A. Rodger, An empirical study of impact of crossover
function network with logit analysis learning, Computers and Mathematics
operators on the performance of non-binary genetic algorithm based neural
with Applications 51 (3–4) (2006) 579–588.
approaches for classification, Computers and Operations Research 31 (2004)
[22] V. Chernozhukov, L. Umantsev, Conditional value-at-risk: aspects of modeling
481–498.
and estimation, Empirical Economics 26 (1) (2001) 271–292.
[55] C. Pramodh, V. Ravi, Modified great deluge algorithm based auto associative
[23] A. Cielen, L. Peeters, K. Vanhoof, Bankruptcy prediction using a data
neural network for bankruptcy prediction in banks, Computational
envelopment analysis, European Journal of Operational Research 154 (2)
Intelligence Research 3 (4) (2007) 363–370.
(2004) 526–532.
[56] E. Rahimain, S. Singh, T. Thammachote, R. Viramani, Bankruptcy prediction by
[24] R. Cole, J. Gunther, A CAMEL rating’s shelf life, Federal Reserve Bank of Dallas
neural network, in: R.R. Trippi, E. Turban (Eds.), Neural Networks in Finance
Review (December) (1995) 13–20.
and Investing, Irwin Professional Publishing, Burr Ridge, USA, 1996.
[25] T. Conley, D. Galenson, Nativity and wealth in mid-nineteenth-century cities,
[57] N. Raj Kiran, V. Ravi, Software reliability prediction using wavelet neural
Journal of Economic History 58 (2) (1998) 468–493.
networks, in: International Conference on Computational Intelligence and
[26] C. Dimoulas, G. Kalliris, G. Papanikolaou, V. Petridis, A. Kalampakas, Bowel-
Multimedia Applications, Sivakasi, Tamilnadu, India, 2007.
sound pattern analysis using wavelets and neural networks with application
Author's personal copy

56 K.N. Reddy, V. Ravi / Knowledge-Based Systems 39 (2013) 45–56

[58] V. Ravi, H. Kurniawan, Peter Nwee Kok Thai, P. Ravi Kumar, Soft computing [72] J. Taylor, A quantile regression approach to estimating the distribution of multi
system for bank performance prediction, Applied Soft Computing 8 (1) (2008) period returns, Journal of Derivatives 7 (1) (1999) 64–78.
305–315. [73] L.C. Thomas, D.B. Edelman, J.N. Crook, Credit Scoring and Its Applications,
[59] V. Ravi, C. Pramodh, Threshold accepting trained principal component neural SIAM, Philadelphia, 2002.
network and feature subset selection: application to bankruptcy prediction in [74] C.F. Tsai, Feature selection in bankruptcy prediction, Knowledge Based
banks, Applied Soft Computing 8 (4) (2008) 1539–1548. Systems 22 (2) (2009) 120–127.
[60] V. Ravi, P. Ravi Kumar, E. Ravi Srinivas, N.K. Kasabov, A semi-online training [75] A. Tsakonas, G. Dounias, M. Doumpos, C. Zopounidis, Bankruptcy prediction
algorithm for the radial basis function neural networks: applications to with neural logic networks by means of grammar-guided genetic
bankruptcy prediction in banks, in: V. Ravi (Ed.), Advances in Banking programming, Expert Systems with Applications 30 (2006) 449–461.
Technology and Management: Impact of ICT and CRM, Idea Group Inc., USA, [76] M. Vasu, V. Ravi, Bankruptcy Prediction in Banks by Threshold Accepting
2007. Trained Principal Component Analysis and Wavelet Neural Network Hybrid,
[61] P. Ravi Kumar, V. Ravi, Bankruptcy prediction in banks by fuzzy rule based DMIN, Las Vegas, USA, 2011.
classifier, in: Proceedings of First IEEE International Conference on Digital and [77] K. Vinay Kumar, V. Ravi, M. Carr, N. Raj Kiran, Software cost estimation
Information Management, Bangalore, 2006a, pp. 222–227. using wavelet neural networks, Systems and Software 81 (11) (2008)
[62] P. Ravi Kumar, V. Ravi, Bankruptcy prediction in banks by an ensemble 1853–1867.
classifier, in: Proceedings of the IEEE International Conference on Industrial [78] G. Wang, J. Ma, L. Huang, K. Xu, Two credit scoring models based on dual
Technology, Mumbai, 2006b, pp. 2032–2036. strategy ensemble trees, Knowledge Based Systems 26 (2012) 61–68.
[63] P. Ravi Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical [79] C.H. Wu, G.H. Tzeng, Y.J. Goo, W.C. Fang, A real-valued genetic algorithm to
and intelligent techniques – a review, European Journal of Operational optimize the parameters of support vector machine for predicting bankruptcy,
Research 180 (1) (2007) 1–28. Expert Systems with Applications 32 (2007) 397–408.
[64] P. Ravisankar, V. Ravi, Failure prediction of banks using threshold accepting [80] Z. Yang, W. You, G. Ji, Using partial least squares and support vector machines
trained kernel principal component neural network, in: World Congress on for bankruptcy prediction, Expert Systems with Applications 38 (7) (2011)
Nature and Biologically Inspired Computing, NABIC, 2009, pp. 7–12. 8336–8342.
[65] R. Rosipal, L.J. Trejo, A. Cichicki, Kernel Principal Component Regression with [81] L. Yu, K.K. Lai, S.Y. Wang, An evolutionary programming based SVM ensemble
EM Approach to Nonlinear Principal Components Extraction, Technical Report, model for corporate failure prediction, Lecture Notes in Computer Science,
University of Paisley, Scotland, UK, 2000. 4432, Springer, Heidelberg, 2007, pp. 262–270.
[66] Y. Sai, C.J. Zhong, L.H. Qu, A hybrid GA-BP model for bankruptcy prediction, in: [82] L. Yu, S. Wang, K.K. Lai, Credit risk assessment with a multistage neural
Proceedings of the International Symposium on Autonomous Decentralized network ensemble learning approach, Expert Systems with Applications 34 (2)
Systems, Sedona, 2007, pp. 473–477. (2008) 1434–1444.
[67] B. Scholkopf, A. Smola, K.R. Muller, Nonlinear component analysis as a kernel [83] Q. Zhang, A. Benvniste, Wavelet networks, IEEE Transactions on Neural
eigenvalue problem, Neural Computation 10 (5) (1998) 1299–1319. Networks 3 (6) (1992) 889–898.
[68] S.W. Shin, S.B. Kilic, Using PCA-based neural network committee model for [84] Q. Zhang, Using wavelet network in nonparametric estimation, IEEE
early warning of bank failure, Lecture Notes in Computer Science, vol. 4221, Transactions on Neural Networks 8 (2) (1997) 227–236.
Springer, Heidelberg, 2006, pp. 289–292. [85] X. Zhang, J. Qi, R. Zhang, M. Liu, Z. Hu, H. Xue, Prediction of programmed-
[69] R. Storn, K. Price, Differential evolution – a simple and efficient heuristic for temperature retention values of naphtha’s by wavelet neural networks,
global optimization over continuous spaces, Global Optimization 11 (4) (1997) Computers and Chemistry 25 (2) (2001) 25–133.
341–359. [86] L. Zhou, K.K. Lai, L. Yu, Least squares support vector machines ensemble
[70] J. Sun, H. Li, Listed companies financial distress prediction based on weighted models for credit scoring, Expert Systems with Applications 37 (1) (2010) 127–
majority voting combination of multiple classifiers, Expert Systems with 133.
Applications 35 (2008) 818–827. [87] J.G. Zhou, J.M. Tian, Predicting corporate financial distress based on rough sets
[71] J. Sun, H. Lim, Financial distress prediction using support vector and wavelet support vector machine, in: 2007 International Conference on
machines: ensemble vs. individual, Applied Soft Computing 12 (8) Wavelet Analysis and Pattern Recognition, Beijing, vol. 1–4, 2007, pp. 602–
(2012) 2254–2265. 607.

You might also like