You are on page 1of 16

Knowledge-Based Systems 57 (2014) 41–56

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Predicting financial distress and corporate failure: A review


from the state-of-the-art definitions, modeling, sampling, and featuring
approaches
Jie Sun ⇑, Hui Li, Qing-Hua Huang, Kai-Yu He
School of Economics and Management, Zhejiang Normal University, Jinhua, Zhejiang 321004, PR China

a r t i c l e i n f o a b s t r a c t

Article history: As a hot topic, financial distress prediction (FDP), or called as corporate failure prediction, bankruptcy
Received 13 April 2012 prediction, acts as an important role in decision-making of various areas, including: accounting, finance,
Received in revised form 6 December 2013 business, and engineering. Since academic research on FDP has gone on for nearly eighty years, there are
Accepted 6 December 2013
abundant literatures on this topic, which may appear chaotic to the researchers of the field and make
Available online 13 December 2013
them feel confused. This paper contributes to the current review researches by making a full summary,
analysis and evaluation on the current literatures of FDP. The current literatures of FDP are reviewed from
Keywords:
the following four unique aspects: definition of financial distress in the new century, FDP modeling, sam-
Definition of financial distress
Sampling methods
pling approaches for FDP, and featuring approaches for FDP. By considering the new state-of-the-art tech-
Featuring methods niques in this area, FDP modeling are classified and reviewed by the following groups: namely, modeling
Review with pure single classifier, modeling with hybrid single classifier, modeling by ensemble techniques,
Financial distress prediction dynamic FDP modeling, and modeling with group decision-making techniques. Sampling methods for
Corporate failure prediction FDP are classified and reviewed by the following paired groups, namely: training sampling and testing
Case-based reasoning sampling, single industry sampling and cross-industry sampling, balanced sampling and imbalanced
Ensemble sampling. Featuring methods for FDP are categorized and reviewed by qualitative selection and combi-
Group decision-making
nation of qualitative and quantitative selection. We comment on the current researches from the view
Support vector machine
of each category and propose further research topics. The review paper is valuable to guide research
Hybrid modeling
Neural network and application of the area.
Decision tree Ó 2013 Elsevier B.V. All rights reserved.
Logistic regression
Multiple discriminant analysis

1. Introduction making for investors, credit decision-making for creditors, cus-


tomer credit rating for banks, and so on.
The word of ‘‘Early Warning’’ is originated from military area. During the last fifty years, numerous researches focus on this
Nowadays, the word is widely used in some respects such as: mac- topic and attempt to provide a useful solution for the problem. Lit-
roeconomics, business administration, environmental monitoring, eratures on this topic may appear chaotic to the researchers of the
among others. Early warning of financial distress, corporate failure, field and make them feel confused. A thoughtful review on related
or bankruptcy is an important research field for corporate finance articles is useful to help people understand the research and devel-
and its core is financial distress prediction (FDP), which is an opment in FDP. Thus, this paper attempts to review the state-of-
extensive ongoing research topic. Generally, FDP is to predict the-art researches from a new perspective, which is different from
whether or not a company will fall into financial distress based the several review articles on FDP in recent years. Balcaen and
on the current financial data, through mathematical, statistical, Ooghe [11] extensively elaborated on the application of classical
or intelligent models. It is also called as financial failure discrimi- statistical techniques in FDP from the following categories,
nation, bankruptcy prediction, business failure prediction, corpo- namely: single variable analysis, risk index models, multivariate
rate failure prediction, among others. It plays an important role discriminant analysis (MDA), and conditional probability models.
on managerial decision-making for firms, investment decision- This review made a contribution to the current literature by clari-
fying the characteristics of the classical statistical techniques in
FDP and their related problems. However, intelligent techniques
⇑ Corresponding author. Tel.: +86 579 8229 8567. are more frequently used in FDP than statistical techniques in re-
E-mail address: sunjiehit@gmail.com (J. Sun). cent years. Thus, Kumar and Ravi [58] considered both statistical

0950-7051/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.knosys.2013.12.006
42 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

and intelligent techniques in FDP, and classified articles of FDP for FDP. In detail, FDP modeling articles are classified and reviewed
from the following groups, namely: statistical techniques, neural by the following groups: namely, modeling with pure single classi-
networks (NN), case-based reasoning (CBR), decision trees (DT), fier, modeling with hybrid single classifier, modeling by ensemble
operational research, evolutionary algorithms (EA), rough set (RS) techniques, dynamic FDP modeling, and modeling with decision
based techniques, other techniques, and soft computing techniques implementations. Sampling methods for FDP are classified and re-
from the hybridization of all the above-mentioned techniques. viewed by the following paired groups, namely: training sampling
They also reviewed each article on the source of data, financial and testing sampling, cross-industry sampling and single industry
variables, country, time line of the data, and performance of tech- sampling, balanced sampling and imbalanced sampling. Feature
niques. This review presented that the hybrid modeling is a hot to- selection methods for FDP are categorized and reviewed by quali-
pic in FDP. By considering researches in this category, tative selection and quantitative selection.
Bahrammirzaee [10] conducted a comparative review of three fa- This paper is organized as follows. Section 2 overviews and
mous intelligent techniques for financial applications, i.e., artificial comments on the definition of financial distress in the new cen-
neural networks, expert systems and hybrid intelligent systems. tury. Section 3 reviews and comments on FDP modeling methods
Credit evaluation and financial prediction are two chief problems from the category of pure single classifier methods, hybrid single
for financial market. They concluded that these intelligent methods classifier methods, ensemble methods, dynamic modeling for
were superior to traditional statistical ones in dealing with finan- FDP, and FDP-based decision implementations. Section 4 reviews
cial problems, although such superiority is not absolute. Verikas and comments on sampling methods for FDP. Section 5 presents
et al. [125] presented a comprehensive review of hybrid and the overview and comments from the perspective of featuring
ensemble-based soft computing techniques for FDP. Their focus methods for FDP. Section 6 concludes and analyzes some further
was on how different techniques were combined. A technique research topics in FDP.
was named as hybrid if several soft computing approaches were
applied and only one predictor was used to make the final predic-
tion. A technique was called as ensemble if outputs of several pre- 2. Development of definition of financial distress in the new
dictors were combined to obtain a prediction. Lin et al. [74] made a century
review of FDP literatures between 1995 and 2010. They carried out
statistical analysis on the FDP literatures in terms of single and soft 2.1. Development of definition of financial distress
classifiers, baseline classifiers, and datasets, and considered soft
classification techniques as the direction for future research of Financial distress is the situation that an enterprise has certain
FDP. Marques et al. [78] summarized the application of evolution kind of financial difficulties. In some classical literatures, such
computing to credit scoring, which included classification, variable financial difficulties include inability to pay debts or preferred div-
selection, and parameter optimization. They also reviewed perfor- idend and the corresponding consequences such as overdraft of
mance evaluation criteria, statistical significance tests, credit dat- bank deposits, liquidation for interests of creditors, and even enter-
abases, and some data preprocessing issues. ing the statutory bankruptcy proceeding [13,3,29]. Such definition
Though the six reviews have been published, this review is of financial distress is based on the theoretical framework of ‘‘cash
valuable and significant for the following reasons: flow’’ or ‘‘liquid assets’’ model. As Beaver [13] mentioned, an enter-
prise is like a reservoir formed by the cash flow, composed of cash
(1) From the above six review articles we can find that hybrid inflows and outflows. An enterprise in financial distress is just like
intelligent techniques are potentially useful models for FDP a reservoir whose water is drained.
with high predictive performance. Thus, how to construct Carminchael [18] believed that financial difficulty is a situation
hybrid models for FDP needs to be analyzed by grouping that an enterprise encounters frustration in fulfilling its obliga-
the current work. tions. These frustrations include: insufficiency of liquidity, insuffi-
(2) Meanwhile, with the development of computing and model- ciency of equity, default of debt, and insufficiency of liquid capital.
ing techniques, some new techniques may also be useful for Foster [35] defined financial distress as a serious liquidity problem
solving the problem of FDP, e.g., dynamic prediction of finan- which is unable to be resolved without large-scale restructuring of
cial distress, class imbalance classification techniques, fea- the operation or structure of economic entities. In Doumpos and
ture selection approaches, group decision-making Zopounidis [32], financial distress not only contains inability to re-
techniques, among others. How these new emerged tech- pay important obligatory payments and the corresponding conse-
niques can be used in FDP needs to be analyzed by summa- quences mentioned above, but also includes the situation of
rizing existing researches. negative net asset value, which means an enterprise’s total liabili-
(3) Third, none of the existing review articles provides a clear ties exceed its total assets from the view of accounting.
indication on FDP with different sampling techniques and Ross et al. [100] summarized previous studies and concluded
feature selection approaches. This topic needs to be that financial difficulties consist of the following four conditions:
investigated. (1) business failure, that is, a company cannot pay the outstanding
(4) Fourth, some review articles in the last century discussed debt after liquidation; (2) legal bankruptcy, namely, a company or
the definition of financial distress, which was seldom its creditors applies to the court for a declaration of bankruptcy; (3)
focused on in recent reviews. However, the definition of technical bankruptcy, namely, a company cannot fulfill the con-
financial distress should be re-considered since some new tract on schedule to repay principal and interest; and (4) account-
concept or phenomena come into the area of FDP in the ing bankruptcy, namely: a company’s book net assets are negative.
new century. The current reviews need some supplement In the 21st century, most FDP literatures that collect data from
to help researchers understand the area of FDP more developed countries or areas concentrate on the prediction of
thoughtfully. bankruptcy, which is the ultimate and most serious form of finan-
cial distress [76,106,89,22]. Bose [14] defined financial distress as
This review is to fetch up the gap of the current reviews. We re- the condition that a company’s stock price is less than 10 cents,
viewed the current FDP literatures from the following four unique which is followed by Ravisankar et al. [96]. When studying FDP
aspects: definition of financial distress in the new century, FDP for Taiwan companies, Lin [72] defined financial distress as the
modeling, sampling methods for FDP, and featuring approaches inability of a firm to pay its financial obligations as they mature.
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 43

Operationally, a firm is said to have been failed when any of the The single variable analysis is the earliest method for FDP. As
following events have occurred: bankruptcy, bond default, an over- the most famous research on this FDP method, Beaver [13] pro-
draft of bank account, events signifying an inability to pay debts as posed two single variable analysis methods, namely the profile
they come due, entrance into a bankruptcy proceeding, an explicit analysis and the univariate discriminant model. By the profile anal-
agreement with creditors to reduce debts, or being classified as ysis for five years before failure, he found that the means of finan-
‘‘full delivery stock’’ by Taiwan Stock Exchange or Gre Tai Securi- cial ratios in two groups were significantly different, and the nearer
ties Market. to the failure time, the more obvious the gap was. Afterward, he
While in developing countries like mainland of China and Iran, respectively used five financial ratios as independent variables to
financial distress is usually defined as certain degree of financial build univariate discriminant models and concluded that the more
deterioration ruled by the national security management institu- recent date from the failure, the lower misjudgment rate and the
tion. For example, Chinese listed company’s financial distress is de- stronger predictability. The FDP methods based on single variable
fined as special treatment (ST) by Chinese Stock Exchange for the analysis only use one financial ratio at each time. When different
reason that their profits continue to be negative for two consecu- ratios are used to make FDP for the same company, it will cause
tive years or their per-share net assets are lower than per-share a conflict for different conclusions.
stock face value [109,31,110]. The Iranian companies whose re- Altman [3] firstly used the MDA for FDP that belongs to multi-
tained losses are more than 50% of their capital are labeled as variate prediction. He constructed the famous Z-score model that
financial distress according to commercial law of 141 act of Tehran is a multivariate linear discriminant function with five financial ra-
Stock Exchange [93]. In addition, Sun et al. [108] proposed the con- tios, and found that its predictive power at the year before bank-
cept of relative financial distress, which is defined as relative dete- ruptcy was significantly better than the single variable
rioration of financial situation for a certain enterprise with the discriminant model. The MDA belongs to linear regression analysis,
process of its life cycle. which assumes that independent variables should obey multivari-
ate normal distribution and equal covariance matrixes. When sam-
2.2. Comments on definition of financial distress ple data do not meet these two assumptions, the results of a MDA
model may be suspicious. In addition, the dependent variable for
Above all, for definition of financial distress, there exist many the MDA is assumed to be a continuous one, which disaccords with
different points of view. Different scholars may give different the fact that the probability of an enterprise falling into financial
explanations according to their own study purpose. Broadly speak- distress only values in [0, 1].
ing, there are two main ideas: The Logit linear probability model uses the logistic function to
From the perspective of theoretical analysis, financial distress transform the dependent variable of financial distress probability
has different degrees. Mild financial distress may just be temporary into a totally continuous one that is then suitable for linear regres-
cash flow difficulty, while serious financial distress is business fail- sion analysis. Ohlson [85] firstly applied the Logit model to de-
ure or bankruptcy. An enterprise in financial distress may experi- scribe the relationship between financial situation and financial
ence a dynamic changing process of various states between the ratios and suggested that the Logit model was more rational than
above two extreme forms of financial distress. Actually, financial the MDA model for the application of FDP. Zavgren [134] used fac-
distress is a dynamic ongoing process, and is the result of continu- tor analysis to obtain the independent variables for the Logit model
ous abnormality of business operation for a period of time (from of FDP. Recently, Tseng and Lin [123] proposed a quadratic interval
months to years or even longer). Logit model for FDP by using a quadratic programming approach to
From the perspective of empirical research, in order to make deal with binary response variables in the logistic regression. The
clear the criteria of research sampling, or for the restrictions of data results indicated that the new Logit model improved the discrimi-
availability, financial distress is often defined as some situations nate performance of FDP and provided more information to
which clearly show an enterprise’s financial difficulty, such as stat- researchers. On one hand, the Logit model is applicable to FDP
utory bankruptcy, and ST for Chinese listed companies. Current for its non-continuous dependent variable expressed as financial
studies considered single criteria of financial distress rather than distress probability. On the other hand, the Logit model is not
the intensity of it, and future studies need to explore a metric that based on the assumptions that independent variables should fol-
can classify the distressed companies into different degrees such as low normal distribution and equal covariance. However, it still re-
mild, intermediate, and bankrupt. quires that the independent variables have no linear functional
relationship with each other, namely, no multi-collinearity
problem.
3. Review on FDP modeling methods Recently, Serrano-Cinca and Gutiérrez-Nieto [105] applied par-
tial least square discriminant analysis (PLS-DA) to predicting the
FDP is the core process of financial distress early warning. Its re- 2008–2011 USA banking crisis, and found that results resembled
search progress can be briefly summarized as follows: from single linear MDA and SVM results. PLS-DA has the advantage that it is
variable analysis to multivariate prediction; from traditional statis- not affected by multicollinearity. Some studies attempted to inte-
tical methods to machine learning methods based on artificial grate the MDA and the Logit model with some other techniques
intelligence; from pure single classifier methods to hybrid single by hybrid or ensemble. These literatures are reviewed in the Sec-
classifier methods and classifier ensemble methods; from station- tions 3.2 and 3.3. For a detailed understanding of FDP with statis-
ary modeling to dynamic modeling considering time process; from tical techniques, please refer to Balcaen and Ooghe [11].
quantitative prediction methods to decision implementations.

3.1. Pure single classifier methods for FDP


3.1.2. Artificial intelligence single classifier methods
3.1.1. Statistical single classifier methods Artificial intelligence single classifier methods for FDP include
Statistical single classifier methods, including the single vari- NN, EA, RS, CBR, SVM, etc. They are applied to FDP on the basis
able analysis, the MDA, and the Logit model, are based on statisti- of the fruitful research results of computer and artificial intelli-
cal theory. Such statistical methods of FDP are usually simple, gence technology. They are not subject to the stringent assump-
easy-to-use and time-saving. tions required for statistical methods.
44 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

3.1.2.1. Neural networks. In early 1990s, NN began to be introduced 3.1.2.3. Evolution algorithms. EAs are generic population-based
into the research on FDP, e.g., Tam [118], Tam and Kiang [119]. A meta-heuristic optimization algorithms that use mechanisms in-
NN model consists of an interconnected group of artificial neurons, spired by biological evolution, such as reproduction, mutation,
and processes information using a connectionist approach for com- recombination, and selection. Varetto [124] applied Genetic Algo-
putation. A wealth of related researches studied on NN methods for rithm (GA) to extract linear functions without statistical restric-
FDP, and the most widely used is the three-layer feed-forward tions and their corresponding discriminant rules. However, its
back-propagation NN (BPNN), in which the hidden layer deter- FDP performance was not as good as MDA. Shin and Lee [106]
mines the mapping relationships between input and output layers adopted GA to search split point (threshold) of financial ratios,
and the relationships between neurons are stored as weights of the and then extracted the quantitative discriminant rules of bank-
connecting links [84,23,72]. The probabilistic NN was also widely ruptcy prediction. Kim and Han [55] used GA to mine experts’
used for FDP and it employs Bayesian decision-making theory qualitative bankruptcy prediction rules that had the advantages
based on an estimate of the probability density in data space of easy understanding. However, such rules-based methods could
[132,129]. The FDP performances of NN models were often com- generate FDP results for companies only when at least one rule is
pared with those of the MDA and Logit models, and most re- activated, and the coverage rate of rules were relatively low. In
searches provided evidence for the conclusion that NN performed addition, Rafiei et al. [93] also found that GA had lower FDP accu-
better than the statistical methods [34,17,60,135,89,71,129,93]. racy than NN. Etemadi et al. [33] investigated the application of
The advantages of NN over the statistical methods are often attrib- Genetic Programming (GP) for bankruptcy prediction for Iranian
uted to its strong mapping ability based on the network structure. listed firms, and McNemar test showed that GP outperformed
In addition, the statistical relations among variables need not to be MDA. Some other EAs, e.g., Particle Swarm Optimization, Ant Col-
considered in the process of NN model construction. Such an ony Optimization, are potential alternatives for FDP. For example,
advantage of NN is supported by the conclusion of Lin [72] that Martens et al. [75] developed AntMiner + based on Ant Colony
indicates NN can achieve higher prediction accuracy if the data Optimization to produce reasoning rules for FDP, but this tool
does not satisfy the assumptions of the statistical approaches. In was inferior to SVM in term of predictive accuracy. Hence, the
addition, Chen and Du [23] found that BPNN also performs better FDP methods based on EAs can generate rules that are more inter-
than data mining clustering technology for the purpose of FDP. pretable than NN or SVM models, but the performance based on
However, compared to statistical methods, far more sample EAs alone is limited and they are more suitable to be combined
data are needed to train a relatively stable NN model, and too with other classification algorithms for FDP.
much duplication of training easily leads to over-fitting, which de-
creases the stability of cross-sample prediction. In addition, NN is
often criticized by practitioners for its difficulty of understanding 3.1.2.4. Case-based reasoning. FDP based on CBR generally uses K-
because the complex network structure seems to be a black-box nearest neighbor algorithm, which determines or predicts the tar-
for the decision makers. To overcome this drawback, Baesens get cases’ labels according to similar cases’ labels. The evident
et al. [9] captured the learned knowledge embedded in the net- advantage of CBR is that it is easy to understand and its prediction
works and expressed it as explanatory rules and a decision table accuracy is also relatively high with the development of CBR tech-
that facilitates easy consultation. Setiono et al. [104] proposed a nique. By comparing CBR, NN and MDA, Jo and Han [52], Jo and
novel approach to train minimal NNs that are easy to generate con- Han [53] found that there was no real difference between CBR
cise and comprehensible classification rules for the user. Such a re- and MDA, and CBR performed well when data was not sufficient.
search direction for NN-based FDP is very important because it Park and Han [87] applied AHP weighted K-nearest neighbor algo-
makes NN-based models easier to be applied for the real-world rithm to bankruptcy prediction, and significantly improved the
FDP problem. prediction accuracy by integration of qualitative and quantitative
indicators. Sun and Hui [109] proposed a similarity weighted vot-
ing CBR method based on gray correlation distance, which per-
3.1.2.2. Support vector machines. Support vector machine (SVM) is a formed effective short-term FDP one or two years in advance. In
relatively new artificial intelligence method that is based on the recent years, Li and Sun [62], Li and Sun [63], Li and Sun [64], Li
structural risk minimization principle rather than the empirical risk et al. [61] respectively injected principles of ELECTRE, PROMETHEE,
minimization principle. SVM is a powerful and promising data clas- ORESTE, and TOPSIS methods into CBR to generate various CBR
sification and function estimation tool [130], which is not easy to run methods with reasonable FDP performances. Borrajo et al. [16]
into over-fitting even for relatively small sample. Shin et al. [107] constructed a multi-agent system for business control process
and Min and Lee [80] both used SVM to predict bankruptcy for South and failure prediction, and the core agent incorporates a CBR
Korean companies, and drew the conclusion that this method out- system.
performed MDA, Logit and NN. Hui and Sun [49] and Ding et al.
[31] respectively adopted SVM model to do empirical study on
FDP for Chinese listed companies, and reached a similar conclusion. 3.1.2.5. Rough set. Dimitras et al. [30] and McKee [76] respectively
Li and Sun [68] used a straightforward wrapper approach to help applied RS theory to establish bankruptcy prediction model on
SVM model produce more accurate prediction of business failure, Greece data and US data. They found that this method had many
which effectively improve efficiency of SVM in FDP. Gesel et al. advantages, e.g. easily understandable decision rules in natural
[39] demonstrated that least square SVM was the most preferred language, case support for decision rules, combination of qualita-
predictive model for FDP compared with the Logit, the Fisher MDA tive and quantitative variables, no statistical evaluation of the
and the quadratic MDA. Their results also indicated that bankruptcy probability and fuzzy membership, etc. However, different samples
problems were weakly non-linear. However, Bose and Pal [15] com- and decision-makers’ knowledge would generate different set of
pared the performance of SVM in FDP with those of MDA and NN, decision rules. Bose [14] also applied RS to decide financial health
and concluded that SVM performed worse than NN. More studies of dot-coms with an acceptable validation accuracy of 72.08%, but
used SVM with some other techniques in a hybrid or ensemble RS often resulted in many rules associated with each class and
means, instead of independent use of SVM. These literatures are re- most of them (90%) were redundant. Therefore, RS has the short-
viewed in Sections 3.2 and 3.3. comings of unfixed structure and poor universality.
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 45

3.1.2.6. Decision tree including RSP, CART and See 5.0. Frydman et al. sult in different FDP models for certain degree of randomness in
[36] and McKee and Greenstein [77] adopted recursive partitioning the training process.
(RSP) method which created DTs for FDP. Sun and Li [111,112] pro-
posed a data mining method based on attribute induction, infor- 3.2. Hybrid single classifier methods for FDP
mation gain, and DT, and applied it for FDP of Chinese listed
companies. Gepp et al. [38], Li et al. [65] also proved that DT or Many researchers studied on hybrid FDP methods based on two
classification and regression tree (CART) or See 5.0 had better or three algorithms, and the most popular forms are integrating
FDP performance than MDA. Chen [24] empirically compared DT NN, CBR or SVM with other techniques. They are categorized into
with Logit for Taiwan firms, and found that DT got higher accuracy three types as shown in Fig. 1.
than Logit in short run (less than one year), while Logit performed
better in long run (above one and half year). Olson et al. [86] found 3.2.1. One algorithm is applied to choose features for another
C5 and CART DT obtained better FDP performances than NN and classification algorithm
SVM for the particular US bankruptcy data set, but DT models were It is a very common hybrid paradigm to use the first technique
easy to generate too many rules. They controlled this by adjust- for feature selection and the second for classification, which is a
ment of minimum support parameter, which helped make a trade- two-stage modeling process. As shown by Fig. 1(1), if the first stage
off between average accuracy and DT size. applies certain evolution algorithm to optimize the feature portfo-
lio and uses a classification performance measure as the object
3.1.2.7. Other techniques. Besides the above major FDP methods, function, it will embed the classification modeling into the feature
there are some other ones. For example, Sarkar and Sriram [103], selection process. Other than this situation, feature selection and
Gesel et al. [40], Sun and Shenoy [116], and Wu [127] applied classification modeling are usually two independent sequential
Bayesian (kernel) classifier for early warning of business failures. stages.
Pendharkar [88], Cielen et al. [27], and Premachandra et al. [92]
used data envelopment analysis as a tool for corporate failure pre- 3.2.1.1. Hybrid NN for FDP. Back et al. [8] and Anandarajan et al. [4]
diction. Kwak et al. [59] proposed a multiple criteria linear pro- combined NN with GA. They used GA to select input variables for
gramming (MCLP) method for FDP with Korean data, and NN and established GA-NN model to further improve FDP perfor-
indicated that the MCLP performed as well as MDA and Logit and mance. Ravisankar et al. [96] and Ravisankar and Ravi [97] also
was comparable to DT and SVM. Ryu and Yue [102] developed a constructed several hybrids in such a paradigm. In the former, each
new mathematical programming model named Isotonic Separation hybrid comprised two of the multilayer feed forward NN, probabi-
(IS) for FDP, which outperformed MDA, NN, DT, and RS in the bank- listic NN, RS and genetic programming (GP), and GP-GP hybrid
ruptcy prediction experiment with North American data. IS is dif- showed the best performance at the significance level of 10%. In
ferent from the well-known statistical survival analysis methods the latter, features were firstly selected by t-statistic, f-statistic,
in that it is able to predict a sample’s survival time frame [101]. or Group Method of Data Handling (GMDH, a relatively unexplored
Chandrasekaran et al. [20] further revised IS to make it applicable NN), and classifiers were trained by GMDH, counter propagation
on regression problems. NN and fuzzy adaptive resonance theory map. As a result,
GMDH-GMDH and t-statistic-GMDH outperformed others.
3.1.3. Comments on pure single classifier methods for FDP
Pure single classifier methods are the base for FDP modeling 3.2.1.2. Hybrid SVM for FDP. Yeh et al. [133] built a two-stage hy-
and each one has its own superiority and inferiority. Statistical brid model that integrates RS theory and SVM, called as RST-
methods are restricted by statistical assumptions, but they can ob- SVM. Redundant attributes were reduced by RS and then business
tain a fixed model structure in different times of training on a cer- failure was predicted by SVM, and it was proved to outperform
tain data set. On the contrary, artificial intelligence methods are RST-BPN. Lin et al. [73] trained a SVM FDP model after feature
not subjected to statistical assumptions and can construct FDP dimensionality reduction though isometric feature mapping (ISO-
models for more complex data context, but their training processes MAP) algorithm, one of the most developed dimensionality reduc-
are much more complex too. Table 1 shows the comparison of tion techniques. It produced better performance when compared
training characteristics among six typical artificial intelligence with the PCA-SVM hybrid.
classification algorithms. Firstly, CBR does not need training pro-
cess before implementation of FDP, which greatly differs from 3.2.1.3. Hybrid CBR for FDP. Ahn and Kim [1] put forward a GA-CBR
the other five algorithms that need training models/rules before hybrid bankruptcy prediction method, in which GA was used to
FDP. Secondly, DT has an advantage that it needs not setting any optimize feature weighting and instance selection simultaneously.
parameter before training, but the other five artificial intelligence Such a hybrid obtained higher prediction accuracy than conven-
methods need parameter setting before training, which signifi- tional CBR, but it needed much more modeling time and computer
cantly affects the performance of FDP models trained. Third, for resource. Li et al. [66] solely used GA to select feature for CBR and
NN and EA, different times of training on a certain data set may re- applied the hybrid algorithm in FDP. Cho et al. [26] used DT to

Table 1
The comparison of training characteristics among six typical artificial intelligence classification algorithms.

Training models/rules before Needing parameter setting before Fixed model structure for different times of training based on the same data and
FDP training parameters
NN Yes Yes No
SVM Yes Yes Yes
EA Yes Yes No
CBR No Yes Yes
RS Yes Yes Yes
DT Yes No Yes
46 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

Feature selection Classification


Data set FDP model
method algorithm

(1) One algorithm is applied to choose features for another classification algorithm

Optimization Classification
Data set FDP model
method algorithm

(2) One algorithm is applied to optimize parameters for another classification algorithm

New classification algorithm

Data set FDP model


Algorithm 1 Algorithm 2

(3) A new classification algorithm is produced by integrating principles of two or more algorithms

Fig. 1. Principles of three types of hybrid single classifier methods for FDP.

draw explanatory variables for bankruptcy prediction based on 3.2.3. A new classification algorithm is produced by integrating
CBR using the Mahalanobis distance, and it outperformed the principles of two or more methods
CBR method using Euclidean distance, Logit, and NN. Different from the former two hybrid types that belong to two-
stage integration, this type of hybrid paradigm is an inner fusion of
two algorithms to produce a new classification algorithm (see
Fig. 1(3)), which is expected to be more effective for FDP modeling
3.2.2. One algorithm is applied to optimize parameters for the other
than either original algorithm.
classification algorithm
Cheng et al. [25] proposed a FDP method by embedding Logit
This hybrid paradigm is usually designed for the classification
into radial basis function NN, in which the logistic function was
algorithms that need parameter setting before model training,
used as the active function between the hidden nodes and the out-
especially for the classification algorithms such as NN and SVM
put node, and it outperformed both the Logit model and the BPNN.
whose parameter values are critical to the final FDP models. As
Hua et al. [48] developed a method called as binary discriminant
shown by Fig. 1(2), although parameter optimization and classifi-
rule for corporate FDP, which modified the outputs of the SVM
cation modeling are two sequential stages, classification algo-
classifiers according to the result of logistic regression analysis,
rithms should also be used in the parameter optimization stage,
and its prediction effect was better than the pure SVM. Chaudhuri
in which the evolution methods use classification performance
and De [22] integrated SVM with fuzzy membership functions to
measures as the object function for optimizing parameters.
propose a new classification tool called as FSVM, and it was effec-
tive in their bankruptcy prediction experiment for large
organizations.
3.2.2.1. Hybrid SVM for FDP. Wu et al. [128] integrated real-valued
GA (RGA) with SVM, so that the optimal parameters of SVM can be 3.2.4. Comments on hybrid single classifier methods for FDP
automatically optimized by RGA considering the predictive accu- Studies on hybrid single classifier methods for FDP are almost
racy and generalization ability simultaneously. RGA-SVM showed present in the 21st century, and began to boom especially in the re-
better experimental results than the pure SVM as well as DA, Logit, cent five years. This indicates that hybrid method for FDP is a new
Probit, and BPN. Min et al. [81] also proposed a GA-SVM hybrid subfield of FDP research that appeals to the current researchers.
model for bankruptcy prediction, but they used GA to optimize Among the three hybridization forms, the first two paradigms that
both feature selection and SVM parameters, to significantly im- are two-stage integration of two algorithms are more common
prove the predictive ability of SVM. than the third paradigm, which needs inner embedding of two
algorithms with much more innovation. Therefore, it is worth
exploring how to produce some new hybrid classification algo-
3.2.2.2. Hybrid NN for FDP. Chaudhan et al. [21] used differential rithms that are effective for FDP, and more than one form of
evolution (DE) for optimizing weights of wavelet neural network hybridization can be further integrated to obtain more effective
(WNN), with FDP results indicating that DEWNN outperformed ini- FDP tools.
tial WNN and threshold accepting WNN (TAWNN). Hu [44] devel-
oped an ELECTRE-based single-layer perceptron (SLP) approach 3.3. Ensemble methods for FDP
and GA was then designed to determine its connection weights.
Its application to FDP demonstrated that it performed better than In recent years, ensemble methods became another fruitful to-
the traditional SLP, the multi-layer perceptron, and some other sin- pic in FDP field. An ensemble system can exploit each base classi-
gle classifier methods. Pendharkar [90] focused on FDP by integrat- fier’s unique information for classification, and it is expected to
ing cost-sensitive learning with GA-NN through using posterior produce smaller error variance than each base classifier in stand-
probability for minimizing total misclassification cost as the fitness alone mode. Though the pioneering work on ensemble was done
function of GA, and it effectively helped the model improve the FDP early by Bates and Granger [12], its application to FDP firstly
performance with imbalanced samples. emerged in Jo and Han [52] as far as we know and began to rise
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 47

only recently after 2005. There are mainly two forms of classifier ered feed forward BPNN, a probabilistic NN, a radial basis function
ensemble: parallel and serial. NN, SVM, DT, and a fuzzy rule classifier to be an ensemble system
for FDP, and demonstrated that the ensemble yield lower Type I
3.3.1. Parallel ensemble methods for FDP and II errors.
In a parallel ensemble system, the classification results of multi-
ple classifiers are combined according to the schemes of majority 3.3.1.2. Ensemble with one algorithm under different samples or
voting, weighted majority voting, product rule, minimum or max- features. The framework of ensemble FDP system with one algo-
imum, Borda count, simple average, etc. [43]. There exist mainly rithm under different samples or features is shown in Fig. 3. This
three approaches to generate base-level classifiers [120]: (1) gen- ensemble approach tries to produce multiple diverse training data-
erating classifiers by applying different learning algorithms (with sets from the original one by a sampling or featuring mechanism,
heterogeneous model representations) to a single data set, (2) so that a classification algorithm can train on them to construct
applying a single learning algorithm to different versions of a given multiple FDP models that have certain degree of diversity among
data set with different samples and features, and (3) applying a each other. Certain combination mechanism can also be used to
single learning algorithm with different parameter settings to a combine the outputs of different FDP models for the final FDP re-
single data set. sult. This kind of ensemble method tries to make full use of differ-
ent fragments of information contained in the original data set and
3.3.1.1. Ensemble with different algorithms. This ensemble approach thus improves the FDP performance for a given classification
was early used in FDP. Its overall framework is shown in Fig. 2. It algorithm.
uses several classification algorithms to generate several FDP mod- Bagging and Boosting are the two most popular ensemble
els with different representations, which are believed to contain methods that apply a single learning algorithm to different ver-
complementary classification information. By combining the out- sions of a given data set. Bagging and Boosting (including Ada-
puts of these FDP models, the ensemble system obtains the final Boost) ensembles based on NN were widely applied in FDP
FDP result that is more accurate and stable than member models. researches such West et al. [126], Alfaro et al. [2], and Kim and
For example, Jo and Han [52] utilized MDA, CBR and NN to gen- Kang [56]. They had the generalization ability superior to the single
erate base classifiers and integrated their predicted values by NN model. Sun et al. [110] constructed AdaBoost ensemble respec-
weighted sum scheme to reduce the bankruptcy prediction error. tively with single attribute test (SAT) and DT for FDP, and found
Ravikumar and Ravi [98] developed a set of ensemble classifiers that AdaBoost-SAT outperformed AdaBoost-DT, single DT and sin-
by simple majority voting scheme based on original seven algo- gle SVM, indicating that AdaBoost is more suitable for weak learn-
rithms. Each time, two, three, four, five or six classifiers with high- ers. Kim and Upneja [57] compared the predictive and
est classification rates and least Type-I error were selected to discriminatory performances of AdaBoosted DT models with single
construct the best ensemble classier, which outperformed constit- DT models for publicly traded US restaurants for the period from
uent stand-alone models. Sun and Li [111,112] investigated 1988 to 2010, and AdaBoosted DT model demonstrated the best
weighted majority voting combination of multiple diversified clas- prediction performance with the smallest error in overall and type
sifiers that were generated by MDA, Logit, NN, DT, SVM and CBR, I error rates. Besides, Random Subspace is another ensemble meth-
and obtained higher average accuracy and lower variation coeffi- od that uses different versions of a given data set to produce base
cient than any base classifier. Ravi et al. [94] combined a multi-lay- classifiers. Namely, each individual classifier uses only a subset of

MDA FDP model 1

Logit FDP model 2

NN FDP model 3
Combination FDP result
Data set CBR FDP model 4 mechanism
SVM FDP model 5

DT FDP model 6

Fig. 2. Framework of ensemble FDP system with different algorithms.

Training set 1 FDP model 1

Training set 2 FDP model 2


Sampling
Classification Combination
Data set Training set 3 FDP model 3 FDP result
Or algorithm mechanism
Featuring

Training set n FDP model n

Fig. 3. Framework of ensemble FDP system with one algorithm under different samples or features.
48 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

all features for training and testing. Nanni and Lumini [82] found of one base classifier as the final output according to certain prin-
that Random Subspace outperformed Bagging, Class Switching, ciples. However, the methods to produce candidate FDP models in
and Rotation Forest. Li et al. [67] integrated Random Subspace parallel ensemble systems are still suitable for serial ensemble sys-
method with Logit for FDP, and improved the FDP performance tems. Fig. 5 shows the framework of serial ensemble FDP system, in
of Logit models significantly. Li and Sun [69] proposed a principal which a base model except the last one has an effective output only
component CBR ensemble method that produced diverse base when it is triggered by the base model in front of it and its output
models by different features and different algorithms simulta- class has enough credibility simultaneously. The last base model n
neously to effectively improve the performance of CBR in FDP. in the serial FDP system can be either a single classifier or a parallel
The different feature sets were the initial ones and those respec- ensemble classifier, which is usually considered to have overall
tively selected by stepwise MDA, stepwise Logit and t-test. Two good FDP performance. For example, Hung and Chen [50] proposed
CBR algorithms respectively with Euclidean Manhattan metrics a selective ensemble for bankruptcy prediction, which selects some
were used to construct CBR classifiers. suitable classifiers on the base of the expected probability of each
individual classifier and uses a parallel voting ensemble as the last
3.3.1.3. Ensemble with one algorithm under different parameters. This base model. Sun and Li [113] applied the serial ensemble method
ensemble method is applicable to classification algorithms that can for FDP by designing two selection operators for each class’s best
train different FDP models with different parameter values, e.g. NN classifier and the wholly best single classifier. However, serial
and SVM. Because improper parameter settings may lead to terri- ensemble methods did not show much superiority to the best base
bly bad model performance and the number of FDP models trained classifier for the FDP problem with two categories.
with optimized parameter values is usually limited, an ensemble
system constructed under the thought of one algorithm under dif- 3.3.3. Comments on ensemble methods for FDP
ferent parameters alone usually does not have enough diversity. As Classifier ensemble, which is also named as combination of
a result, it is usually combined with ensemble under different sam- multiple classifiers, has shown much superiority for FDP. However,
ples or features and the framework is shown in Fig. 4. the current researches almost focused on the effectiveness and fea-
Tsai and Wu [122] constructed an NN ensemble by mainly using sibility of different forms of ensemble classifiers for FDP by com-
different parameters and also using different versions of the same prehensive comparisons, and some problems still need to be
training set. However, it did not outperform the single best NN further explored. For example, how to selectively choose partial
classifier in many cases, and it was attributed to too small training base classifiers from candidate ones to construct more effective
dataset for diversified single classifiers and the binary classification classifier ensembles for FDP? What kind of measures is more suit-
problem of FDP. Sun and Li [137] constructed a SVM ensemble able for evaluating candidate classifiers considering both predic-
whose candidate single classifiers were trained by SVM algorithms tive ability and diversity? Whether other combination
with different kernel functions and parameters on different feature mechanisms instead of voting work better for FDP?
subsets of one initial dataset, and indicated that it significantly
outperformed individual SVM classifier when the number of base 3.4. Dynamic FDP modeling for disposal of financial distress concept
classifiers in SVM ensemble was properly set. drift

3.3.2. Serial ensemble methods for FDP It is a real-world problem how to update the FDP model dynam-
Different from parallel combination, a serial ensemble system ically when the new sample data batches gradually emerge with
arranges several base classifiers in sequence and selects the result time going on. The sample data that flow in at intervals may be col-

Parameters 1 FDP model 1


Training
set 1
Parameters m FDP model m
Sampling
Parameters 1
Training Classification Combination FDP
Data set algorithm mechanism result
Or set 2
Featuring Parameters m

Parameters 1
Training
set n
Parameters m FDP model m×n

Fig. 4. Framework of ensemble FDP system with one algorithm under different parameters and different samples or features simultaneously.

Distress

FDP
FDP model 1 FDP model 2 FDP model 3 … FDP model n result

Non-distress

Fig. 5. Framework of serial ensemble FDP system.


J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 49

lected by two ways: one from different companies and the other 3.5. FDP based on decision implementations and comments
from different time points of certain company. Dynamic FDP mod-
eling in the former case is considered as lateral and that in the lat- 3.5.1. FDP based on decision implementations
ter case is considered as longitudinal. Either lateral or longitudinal Existing quantitative FDP methods almost only took financial
dynamic FDP modeling tries to construct a model updating mech- ratios into consideration and paid little attention to the important
anism that can make the FDP model keep effective with time going role of experts’ experiential knowledge and non-financial informa-
on. The core thought of dynamic FDP modeling is shown in Fig. 6, tion. Sun and Li [114] brought forward a group decision making
and the key problem is how to design the FDP updating method for financial distress early warning by designing qualita-
mechanism. tive attribute system and the mechanism for integrating multiple
experts’ opinion for financial distress diagnosis.
Some researches applied social network analysis to study the
3.4.1. Lateral dynamic modeling for FDP links and effects of one company’s failure on others related. When
Sun and Li [115] firstly explored the financial distress concept the customers of a sound firm are financially distressed, then this
drift (FDCD) that may behave in two ways: real and virtual, and at- firm gets into financial difficulties with a probability ranging from
tempted to treat the virtual one by dynamic FDP modeling using 3.4% to 11.3%, depending on the level of collection costs and the de-
five instance selection methods, namely full memory windowing, gree of diversification [6]. Such bankruptcy propagation among
no memory windowing, windowing with fixed size, windowing suppliers/customers in supply chains connected by trade credit
with adaptable size, and batch selection. They indicated that vir- can be upward, downward, or both upward and downward, which
tual FDCD existed and the dynamic FDP models (particularly win- is determined by whether payments are delayed and whether
dowing with fixed size and batch selection) outperformed static firms can instantly adapt their resources to the input received
ones. Sun et al. [138] further proposed a new FDP method for FDCD [7]. In a credit network model that includes firms and banks, the
that is called as adaptive and dynamic ensemble (ADE) of SVM bankruptcy of a borrower may lead to the lender’s bankruptcy or
(ADE-SVM). It was divided into three steps: incremental construc- an increase of interest rate provided by the surviving lender, which
tion of candidate SVMs, adaptive selection of base SVMs for the further causes more borrowers already on the verge of bankruptcy
current financial distress concept, and dynamic combination of to reach a tipping point and default [41]. Thus an avalanche of
multiple SVMs. It was proved that ADE-SVM had better FDP perfor- bankruptcies may ensue. Xu et al. [131] verified that supply chain
mance than the above instance selection methods. coordination like information sharing and vendor-managed inven-
tory is effective in reducing the risk of bankruptcy. Furthermore,
3.4.2. Longitudinal dynamic modeling for FDP Hua et al. [47] studied on bankruptcy propagation in a supply
Sun et al. [108] proposed a dynamic FDP method named as chain, and found that operational interactions between firms and
SFFS-PC-NN optimized by GA with an individual enterprise’s longi- operational decisions made by firms in a supply chain were impor-
tudinal data streams, so that a company’s ex-post financial situa- tant causes of such bankruptcy propagation. Meanwhile, impacts
tion evaluation was combined with its ex-ante FDP. In the of these operational issues and decisions depended on financial
process of longitudinal dynamic FDP modeling, the label for a com- decisions.
pany’s financial condition at certain time point may varies with the
drift of relative FDCD, which makes it more effective and suitable
for a specific company’s financial risk management. 3.5.2. Comments
The problem of FDP can be regarded as a group decision-making
problem, and the process of FDP should integrate knowledge of
3.4.3. Comments on dynamic FDP modeling methods various experts related to the firm, e.g., managers, stockholders,
Both lateral and longitudinal dynamic FDP modeling methods governmental officials. Therefore, group decision-making methods
use incremental sample data to update FDP models with time for FDP should gain more focus and become a necessary supple-
going on, trying to solve the FDCD problem. Since the external ment of quantitative FDP techniques. Meanwhile, investigation
and internal environment for an enterprise always changes con- on FDP from the view of credit chain and supply chain manage-
stantly, such study on dynamic FDP modeling is particularly ment is valuable, because one firm’s failure is possible to result
important for keeping the adaptability of FDP models. However, in the failure of an entire chain. In other words, identification of
such a research topic in FDP just began to obtain focus of the area, key actors whose distress can affect the others will further help
and how to dispose the problem of FDCD by proposing more crea- analysis of predictors of FDP, and if some factors related to credit
tive and effective dynamic FDP modeling approaches that consider chain and supply chain operations are considered as the input vari-
the factor of time leaves to be further studied. ables for FDP, the effectiveness of FDP may be further improved.

FDP result
FDP model t
for time t+1

FDP modeling based on candidate data batches from 1 to n

t FDP model
updating
Data Batch 1 Data Batch 2 …. Data Batch n Data Batch n+1 mechanism
t+1

FDP modeling based on candidate data batches from 1 to n+1

FDP result
FDP model t+1 for time t+2

Fig. 6. The core thought of dynamic FDP modeling.


50 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

4. Sampling and featuring methods for FDP and dotcom [19,96]. Since different industries face different levels
of competition, the likelihood of financial distress can differ for
4.1. Sampling methods for FDP firms in different industries [57]. FDP modeling based on single
industry sampling can help construct more efficient FDP models
4.1.1. Training and testing sampling for a specific industry.
Experimental samples of each class are often divided into train-
ing and testing samples randomly. Training data are used to estab- 4.1.3. Balanced sampling and imbalanced sampling
lish FDP model based on statistical or artificial intelligence Most FDP researches used balanced samples
techniques. Testing data are used to test the model’s generalization [3,60,52,53,87,107,112,115]. However, Zmijewski [136] showed
ability and robustness. Many literatures [3,77,42,103,54,51], etc.) that if the ratio of distressed to non-distressed samples evidently
on FDP used accuracies on testing data to measure the efficiency deviated from the real-world population, it would distort the mod-
of FDP models, because hold-out testing ensures the independence el’s prediction capability. Specifically, the proportion of distressed
of training data and testing data. However, the randomness of sam- samples is negatively related with the type I error rate (to recog-
ple division in hold-out testing may result in stochastic FDP perfor- nize the distressed as non-distressed), while it is positively related
mance, making the conclusion less persuasive. Therefore, some with the type II error rate (to recognize the non- distressed as dis-
studies used thirty or more times of hold-out testing to obtain tressed). That is, if the proportion of distressed samples is higher
thirty or more testing accuracies for a given FDP approach, so that than that in the real-world population, it will lead to the underes-
statistical mean comparison and deviation analysis can be imple- timation of the type I error and the overestimation of the type II er-
mented for more persuasive conclusions [112,110,70]. ror, and vice versa.
If the whole sample is randomly divided into K folds and each Later on, some FDP studies applied imbalanced samples, also
time one fold is used for testing and the remaining (K 1) folds called proportional samples, in which the proportion of distressed
are used for training, the K-fold cross-validation accuracy can be sample should be closer to that in the real-world population.
computed after K times of iterations. K-fold cross-validation proce- Namely, the FDP experiments should use fewer distressed and
dure is preferred to hold-out procedure especially for small sample more non-distressed companies. Table 2 shows the sample infor-
size because it is a more objective performance measure [36]. mation of some representative FDP literatures based on imbal-
When K equals to the total sample number, it is unbiased leave- anced datasets [77,42,103,54,51]. For example, McKee and
one-out validation, which needs much more computational cost Greenstein [77] used five groups of imbalanced training and test-
and is only suitable for less complicated approaches with a rela- ing datasets, and the proportion of distress to non-distress in test-
tively small data set [109]. The K-fold cross-validation can also ing samples reflects the real-world proportion. While the
be repeated for more than thirty times for statistical analysis. proportions in some literatures are much lower than 1:10
[51,31,45,46]. Besides Gepp et al. [38] used an imbalanced dataset
4.1.2. Cross-industry sampling and single industry sampling consisting of 58 failed and 142 non-failed samples and carried out
To ensure sample size, most FDP researches used cross-industry fivefold cross-validation for the DT FDP method.
samples, in which each distressed company was matched with a The above literatures did not take treatment measures for
healthy one from the same industry but different pairs may belong imbalanced training dada sets, but tuned the models’ cut-offs to
to different industries. Some studies also considered the impact of ensure the prediction performance for skewed testing data sets.
industry on FDP modeling. Platt and Platt [91] showed that indus- Such an approach has the limitation that the cut-off should be
try adjustment for financial ratios improved the effectiveness of determined before prediction, which is a difficult task in real
FDP modeling, and the model trained from one industry was often world, and the prediction performance will be low if the cut-off
ineffective for FDP of other industries. Grice and Ingram [42] stud- is not set properly. Li and Sum [70] used an over-sampling method
ied on the sensitivity of Altman’s model, and concluded that the to generate new minority samples to make the training dataset
model trained with only manufacturing samples had significantly balanced, and found that the FDP model constructed on the cor-
higher overall accuracy than the model trained with both manufac- rected balanced training data set significantly outperformed the
turing and non-manufacturing samples. FDP model trained on the original imbalanced data set. In fact,
With the development of research database, some researches there are several sampling methods that can be used to treat the
began to focus on FDP modeling for a specific industry such as problem of imbalanced datasets. Crone and Finlay [28] applied
manufacturing [135], retail [45,46], hotel [70], restaurant [57], both over-sampling and under-sampling methods to balance the

Table 2
Representative FDP literatures based on imbalanced datasets.

Authors (Year) Training sample Testing sample


Distressed Non- Distressed Non-
distressed distressed
McKee and Greenstein 54 288 12 1154
[77]
66 336 16 2683
59 240 19 2468
72 292 19 2643
76 309 18 3396
Grice and Ingram [42] 148 824 148 854
Sarkar and Sriram [103] 118 793 25 203
Jones and Hensher [54] 194 (78 solvency difficulty and 116 failure 2838 229 (119 solvency difficulty and 110 failure 4980
samples) samples)
Hwang et al. [51] 50% of 79 50% of 156 50% of 79 50% of 156
Ding et al. [31] 28 97 28 97
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 51

original imbalanced credit scoring datasets. They provided evi- set, working capital to total assets, current ratio, and no-credit
dence that over-sampling significantly increases the accuracy rela- interval.
tive to under-sampling across all algorithms, and Logit is less Altman [3] built the initial feature set by choosing twenty-two
sensitive to the balancing preprocessing than MDA and CART DT. potential helpful financial ratios according to the two principles:

(1) Popularity in the literature.


4.2. Comments on sampling methods for FDP
(2) Potential relevancy to the study by his own professional
judgment.
Training and testing sampling relates to how to train and assess
FDP models with available samples. It provides a way to simulate
To establish the final variable profile, the following four proce-
real-world FDP process, which produces statistical results for per-
dures are utilized:
formance estimation of FDP models.
A cross-industry data set is usually applied for PDF modeling
(1) Observation of the statistical significance of various alterna-
when the size of single industry data set is too small, especially
tive functions including determination of the relative contri-
when the number of distressed samples is too small. The knowl-
butions of each independent variable.
edge of FDP differs among different industries, which have differ-
(2) Evaluation of inter-correlations between the relevant
ent financial characteristics. Therefore, a single-industry dataset
variables.
is preferred when the volume of interested samples is not too
(3) Observation of the predictive accuracy of the various
small.
profiles.
The real-world problem of FDP related to imbalanced datasets.
(4) Judgment of the analyst.
Solutions for imbalanced data sets at data level include over-sam-
pling, under-sampling and a hybrid of them. Traditional FDP re-
Five variables are finally selected, including working capital/to-
searches used pairing strategy to balance datasets, which
tal assets, retained earnings/total assets, earnings before interest
essentially belongs to over-sampling. Besides, imbalanced prob-
and taxes/total assets, market value equity/book value of total
lems can also be solved at model level, which adjusts the predictive
debt, and sales/total assets.
modeling by introducing a threshold parameter or cost-sensitive
According to simplicity principle, Ohlson [85] selected nine
learning. For FDP with imbalanced datasets, assessing models only
financial ratios by including the five financial ratios frequently
by predictive accuracies may provide misleading information. For
present in previous literatures, as well as adding asset size, two vir-
example, a model can produce an accuracy of 99% when predicting
tual variables representing whether net income is negative or po-
all the samples as non-distressed, if the dataset consists of 99%
sitive and whether total liabilities exceeds total assets, and a
non-distressed samples and 1% distressed samples. Hence, FDP re-
variable measuring net income’s trend. He found that company
searches based on imbalanced datasets should consider the Type I
size was an important predictor for financial distress, followed
and II error rates and some other measures such as sensitivity,
by the financial structure.
specificity, and F-measure.
Rose and Giroux [99] compared the means of one hundred and
thirty indicators that did not appear previously between forty-six
pairs of failed and non-failed companies. They found that thirty-
5. Featuring methods for FDP
four indicators had significant difference. After combining these
indicators with the other twenty-seven indicators commonly used
Financial experts have found that there are some abnormal
in previous studies, they carried out stepwise MDA to establish a
changes in financial or non-financial indicators before financial dis-
final discriminant equation that included eighteen variables, to im-
tress occurs. Therefore, financial distress can be predicted by
prove the prediction accuracy. Hence, innovative use of new indi-
observing the characteristics of some feature indicators. Theorists
cators for new models may improve the prediction ability.
and practitioners are struggling to establish an effective feature
set for FDP.
5.2. Qualitative selection of FDP features

5.1. Feature selection in classic statistical-technique-based literatures On one hand, qualitative selection means that the features for
of FDP FDP modeling are selected subjectively by researchers according
to their analysis on different indicators. For instance, Kiviluoto
Beaver [13] selected the initial financial indicators by four [139] subjectively selected the following four financial indicators:
criteria: operating margin, net income before depreciation and extraordi-
nary items, net income before the depreciation and extraordinary
(1) Popularity, namely frequent appearance in the previous items of the previous year, and equity ratio. Kim and Han [55] ex-
literatures. tracted expert qualitative decision rules by six qualitative attri-
(2) Apriority, namely having good performance in one of the butes including industry risk, management risk, financial
previous studies. flexibility, credibility, competitiveness, operating risk. Sun and Li
(3) Cash-flow concept, namely including some cash-flow ratios. [114] put forward seven first-level qualitative attributes for early
(4) Independence, namely excluding any indicators that can be warning of financial distress based on group decision making.
transformed from the selected indicators. On the other hand, qualitative selection also means directly
using the indicators that were used in previous literatures
By meeting one of the first three criteria plus the fourth, he [132,76,77], or subjectively and selectively adding or deleting
chose 30 financial ratios from six areas to carry out the single var- some indicators based on them. For example, Cielen et al. [27] di-
iable analysis, including cash-flow ratios, net-income ratios, debt rectly selected eleven financial indicators from Foster [35] and
to total asset ratios, liquid asset to total asset ratios, liquid asset other previous researches. Pendharkar [89] directly chose five
to current debt ratios, and turnover ratios. He found that cash-flow financial indicators from Altman’s Z-Score model and Zeta score
to total-debt ratio had the strongest ability to predict failure, fol- mode. Zhang et al. [135] combined current ratio with the five vari-
lowed by the net-income to total-assets ratio, total debt to total as- ables in Altman’s Z-score model. More recently, Wu et al. [128] se-
52 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

lected nineteen financial ratios that were organized into four (4) Exclude the variable that is intuitively identified as less rel-
groups and Chen [24] selected thirty-seven variables covering six evant to FDP from each pair of highly correlated variables
major types both based on prior FDP researches. While, Bose [14] which has the same number of missing values.
firstly identified fifteen popular financial ratios used in literatures,
and then added the other nine to capture the novelty of dot-com 5.3.2. Quantitative feature extraction
companies. Factor analysis (FA) and principal component analysis (PCA) are
also applied to feature extraction for FDP [134,24,110,69]. Different
5.3. Quantitative selection of FDP features from t-test or stepwise regression that selects efficient variables
from initials ones, FA or PCA reduces dimensions by extracting fac-
We considered a feature selection method quantitative as long tors or principal components, which are new variables computed
as quantitative approaches were employed. Thus, this section also from initial data. Tsai [121] compared five well-known feature
includes the combination use of qualitative and quantitative fea- selection methods for FDP, which are t-test, correlation matrix,
ture selection. stepwise regression, PCA and FA, and found that on average t-test
is superior to others and stepwise regression is on the second posi-
5.3.1. Quantitative feature selection tion. This indicates that t-test is not only a simple feature selection
Based on the initial features subjectively or qualitatively se- method but also an efficient one. Ravi and Pramodh [95] improved
lected by researchers, quantitative statistical methods, artificial performance of TAWNN in FDP by using features extracted by PCA.
intelligence methods or other methods are utilized to streamline The result indicated that the PCTANN outperformed the compared
features. For instance, Jo and Han [52], Jo and Han [53], Park and techniques.
Han [87], Shin and Lee [106], and Ding et al. [31] selected variables Besides, some other examples of quantitative featuring meth-
by the statistical methods of stepwise regression or t-test. Back ods for FDP also include the references [79,1,96,97,133,73] in the
et al. (1996) [8] selected variables for MDA and Logit models by Section 3.2.1, where one algorithm is applied to choose features
stepwise regression, while selected variables for NN by integrating for another classification algorithm to generate a hybrid FDP
GA. Galvao et al. [37] optimized the feature set for FDP with the cri- method.
teria of greatest information content and minimum multi-collin-
earity by GA. 5.4. Comments on featuring methods for FDP
Atiya [5] developed two feature systems: one based on financial
ratios alone and another based on financial ratios and stock-price- 5.4.1. Advantage and disadvantage of qualitative and quantitative
based indicators. They obtained the best five or six inputs from a featuring methods
pool of about 120 candidate inputs by using an initial prescreening Either qualitative or quantitative featuring method has its own
procedure based on individual indicator prediction accuracy and advantages and disadvantages. Pure subjective construction of fea-
correlation matrix and a subsequent cross-validation procedure. ture system is usually appropriate for the exploratory stage of cer-
The five indicators included: book value/total assets, cash flow/to- tain direction of FDP research when no similar literatures can be
tal assets, rate of change of cash flow per share, gross operating in- referred to. Feature system constructed by this type of qualitative
come/total assets, and return on assets. The six indicators method has strong subjectivity, and is sometimes lack of persua-
included: book value/total assets, cash flow/total assets, price/cash siveness. Qualitative selection based on previous researches
flow ratio, rate of change of stock price, rate of change of cash flow chooses representative indicators used by previous researches,
per share, and stock price volatility. Li and Sun [68] integrated a and the scientificity of feature set is supported by the results of
straightforward wrapper with SVM for FDP. A forward feature previous researches. It also has certain degree of subjectivity when
selection method was used to implement the wrapper. Perfor- adding or deleting indicators selectively to adapt to different back-
mance of SVM on each feature was used to ranking indicators, grounds, and its applicability to other research background is not
which is similar to the operation of Atiya [5]. Meanwhile, they also enough.
investigated the feasibility of using linear SVMs as assessment Quantitative feature selection is less affected by subjectivity,
function in feature selection. The result indicated that the RBF and it often can produce a feature set for FDP with greater informa-
non-linear SVM with features selected by linear SVM performed tion content and improve modeling efficiency. It is a mainstream
significantly better than all the other SVMs. Sung et al. [117] em- method in current FDP research. However, it also may eventually
ployed a data mining approach to solve the problem of FDP, and re- produce an efficient final feature set that is hard to be explained
vealed that the major features in predicting bankruptcy included: or understood, and it is often unsuitable for constructing pure
cash flow to total assets, and productivity of capital under normal qualitative attribute system that is necessary for specific FDP
conditions. Meanwhile, the indicators of cash flow to liabilities, method like group decision making.
productivity of capital, and fixed assets to stockholders equity
and long-term liabilities were effective to predict financial distress 5.4.2. Classifying featuring methods of FDP into filter and wrapper
under crisis conditions. Their findings also indicated that the effec- Generally, there are two types of featuring methods: filter and
tive FDP models in crisis economic conditions and normal eco- wrapper. Filter approaches make use of human’s experiential
nomic conditions were different, which was also supported by expertise or statistical information of the dataset to carry out fea-
the finding of Li et al. [61]. ture selection and are independent of the base classifier. Wrapper
In addition, Leshno and Spector [60] streamlined variables by approaches, on the other hand, use classification accuracies or cri-
multi-step exclusion procedure and selected variables on the basis teria derived from the classifier to rank the discriminative power of
of seventy initial financial parameters and ratios by the following all (or part) of the possible feature subsets and select the subset
four steps: that produces the best performance [83].
Therefore, qualitative selection of FDP features belongs to filter
(1) Include all variables in Altman’s Z-score model. method, because features are chosen without quantitative analysis
(2) Retain only one variable from each pair of variables with a before certain kind of classification/prediction algorithm is used to
correlation coefficient of 0.7 or over. construct FDP model. However, quantitative featuring methods
(3) Exclude the variable which has lots of missing values from may belong to either filter or wrapper. For example, the feature
each pair of highly correlated variables. selection/extraction methods like t-test, correlation matrix, PCA
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 53

and FA are all filter approaches. For stepwise regression like step- 6.3. Developing FDP models based on decision implementations
wise MDA or stepwise Logit, it belongs to wrapper when MDA or
Logit is also used as the classification tool. However, it is a filter Most current researches of FDP were carried out from the per-
method when the variables selected by them are used as input of spective of quantitative prediction, and far less attention were paid
other classification algorithms. For the hybrid FDP method in to the importance of expertise and non-financial information. Gen-
which one algorithm is applied to choose features for another clas- erally speaking, before the outbreak of enterprise financial distress,
sification algorithm, filter feature selection is applied if feature not only quantitative financial indicators have abnormal signs, but
selection and model construction are two independent stages, also there exist many hidden peril of management that cannot be
while it is a wrapper when feature selection and model construc- quantified, which are often the source of outbreak of financial dis-
tion are integrated together and a performance measure derived tress. Therefore, it is necessary to break the traditional concentra-
from the model is used to determine the result of feature selection. tion on quantitative FDP models that are based on financial
indicators, and to do further study on how to use expertise and
non-financial information to predict financial distress. As an
6. Conclusions and future research topics in FDP
important branch in operational research and management sci-
ence, group decision-making methods have been applied to solve
FDP is a hot exploratory issue in accounting, finance, business,
many managerial problems and help improve management effi-
and engineering areas. It is gradually forming its own theoretical
ciency. However, its application to FDP is not paid enough atten-
system to be an independent subject. It has important practical sig-
tion. In fact, financial distress early warning based on group
nificance on improving awareness of financial risk, preventing cor-
decision-making methods can provide important supplement for
porate financial distress, and avoiding bankruptcy liquidation. This
quantitative FDP models, and help managers find the reasons for
review classified researches in this subject into four categories:
enterprise financial distress, only on which effective managerial
namely, definition of financial distress concept, FDP modeling
measures can be taken to prevent further deterioration of financial
methods, sampling methods, and featuring methods. Although
condition. Besides, more non-financial indicators should be consid-
abundant articles have been published in this field, yet there are
ered in group decision-making FDP methods as well as traditional
still some valuable topics which need further explored and studies.
quantitative FDP models. The mechanism of avoiding financial dis-
The main issues that need to be further investigated are summa-
tress from the enterprise operation also needs further investiga-
rized as follows:
tion. The competition between firms happens between the entire
supply chains. One firm falling into failure in a supply chain may
6.1. Developing simple but accurate predictive models force the other firms in the chain to fail. Thus, how to avoid finan-
cial distress from the view of supply chain management needs to
The current research objective of PDF is to find more accurate be further investigated.
predictive models, since even a cautious reduction of predictive er-
ror rates of predictive models is able to help people reduce the loss 6.4. Developing imbalanced predictive models for FDP
in business significantly. In order to achieve this objective, several
state-of-the-art techniques are tried to be integrated to construct The real-world problem of FDP consists of imbalanced datasets.
an accurate model. However, by reviewing the achievements of The traditional use of pairing strategy to produce balanced data
fulfilling the objective of investigating more accurate FDP models, sets is a method for handling imbalanced samples, namely: the
we find that lots of very complex models were developed. In order use of expertise of human beings to balance the real-world imbal-
to find more accurate predictive FDP models, we may neglect the anced data sets. With the development of imbalanced data pro-
fact that FDP is a practice-oriented subject. The reduction on pre- cessing, more techniques have demonstrated themselves useful
dictive error rate is not significant and valuable unless the model in FDP modeling. These techniques chiefly include data level solu-
is applied in forecasting financial distress in real world. However, tion for modeling with imbalanced samples, and model level solu-
the complexity of models prevents them from being adopted by tion for modeling with imbalanced samples. The cost-sensitive
industrial organizations and users. Since the use of FDP as a man- learning technique is also a potential alternative for FDP imbal-
agerial tool relates to financial income of an organization, the users anced modeling. This topic needs further investigation, since it fits
commonly will not apply a technique in their work unless they the real-world characteristics of FDP.
understand the inside mechanism of the technique clearly. Thus,
they need simple FDP models, which hold the characteristics of 6.5. Developing financial distress early warning system
ease of interpretation, explanation, and understanding. Simple
models do not surly belong to inaccurate models. Gepp et al. [38] The previous researches have not formed the overall framework
provided some evidence that simpler models are able to produce of financial distress early warning system, resulting in the situation
better predictive powers. which is full of concrete algorithmic research but lack of frame-
work design of financial distress early warning information sys-
6.2. Developing dynamic modeling methods for FDP tem. This is not only unfavorable to the formation and
improvement of financial distress early warning system, but also
In real-world FDP practice, it is a critical problem how to con- limits the application of existing specific FDP methods to the prac-
sider the existence of FDCD and update FDP models effectively tice of enterprise. Therefore, it is necessary to study on the logic
when they become outdated with time going on. Current litera- system of financial distress early warning.
tures on FDP almost focus on algorithmic exploration that is on
the assumption that the data set for current FDP modeling is al- Acknowledgements
ready constructed, and dynamic variation of modeling data with
gradual emergence of new financial distress samples is almost ne- This research is partially supported by the National Natural Sci-
glected. Although some pilot work on this topic has already been ence Foundation of China (Nos. 71371171, and 71171179), the
done, more effective methods for treating concept drift and dy- Humanity and Social Science Foundation of Ministry of Education
namic FDP modeling with time process needs further attention of China (No. 13YJC630140), the Zhejiang Provincial Natural Sci-
and deep research. ence Foundation (LY13G010001, and LR13G010001), the Zhejiang
54 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

Provincial Philosophy and Social Science Foundation – Zhijiang [31] Y. Ding, X. Song, Y. Zeng, Forecasting financial condition of Chinese listed
companies based on support vector machine, Expert Syst. Appl. 34 (2008)
Young Talent of Social Science (11ZJQN082YB), and the Zhejiang
3081–3089.
Provincial Qianjiang Talent Foundation of China. The authors grate- [32] M. Doumpos, C. Zopounidis, A multinational discrimination method for the
fully thank anonymous referees for their useful comments and edi- prediction of financial distress: the case of Greece, Multinatl. Finan. J. 3
tors for their work. (1999) 71–101.
[33] H. Etemadi, A. Rostamy, H. Dehkordi, A genetic programming model for
bankruptcy prediction: empirical evidence from Iran, Expert Syst. Appl. 36 (2)
References (2009) 3199–3207.
[34] D. Fletcher, E. Goss, Forecasting with neural networks: an application using
bankruptcy data, Inform. Manage. 24 (1993) 159–167.
[1] H. Ahn, K. Kim, Bankruptcy prediction modeling with hybrid case-based
[35] G. Foster, Financial Statement Analysis, 2nd ed., Prentice Hall, NJ, 1986.
reasoning and genetic algorithms approach, Appl. Soft Comput. 9 (2009) 599–
[36] H. Frydman, E.I. Altman, D. Kao, Introducing recursive partitioning for
607.
financial classification: the case of financial distress, J. Finance 40 (1985)
[2] E. Alfaro, N. García, M. Gámez, D. Elizondo, Bankruptcy forecasting: an
269–291.
empirical comparison of AdaBoost and neural networks, Decis. Support Syst.
[37] R.K.H. Galvao, V.M. Becerra, M. Abou-Seada, Variable selection for financial
45 (2008) 110–122.
distress classification using a genetic algorithm, in: Proceedings of the 2002
[3] E.I. Altman, Financial ratios discriminant analysis and the prediction of
Congress on Evolutionary Computation, 2000–2005, 2002.
corporate bankruptcy, J. Finance 23 (1968) 589–609.
[38] A. Gepp, K. Kumar, S. Bhattacharya, Business failure prediction using decision
[4] M. Anandarajan, P. Lee, A. Anandarajan, Bankruptcy prediction of financially
trees, J. Forecasting 29 (6) (2010) 536–555.
stressed firms: An examination of the predictive accuracy of artificial neural
[39] T. Gesel, B. Baesens, D. Martens, From linear to non-linear kernel based
networks, Intell. Syst. Account. Finan. Manage. 10 (2001) 69–81.
classifiers for bankruptcy prediction, Neurocomputing 73 (16–18) (2010)
[5] A. Atiya, Bankruptcy prediction for credit risk using neural networks: a survey
2955–2970.
and new results, IEEE Trans. Neural Networks 12 (4) (2001) 929–935.
[40] T. Gesel, B. Baesens, J. Suykens, D. Poel, D. Baestaens, M. Willekens, Bayesian
[6] F. Boissay, Credit Chains and the Propagation of Financial Distress, European
kernel based classification for financial distress detection, Eur. J. Oper. Res.
Central Bank Working Paper. 573, 2006. <http://papers.ssrn.com/sol3/
172 (2006) 979–1003.
papers.cfm?abstract_id=872543>.
[41] D.D. Gatti, M. Gallegati, B. Greenwald, A. Russo, J.E. Stiglitz, The financial
[7] S. Battiston, D.D. Gatti, M. Gallegati, B. Greenwald, J.E. Stiglitz, Credit chains
accelerator in an evolving credit network, J. Econ. Dyn. Control 34 (2010)
and bankruptcy propagation in production networks, J. Econ. Dyn. Control 31
1627–1650.
(2007) 2061–2084.
[42] J.S. Grice, R.W. Ingram, Tests of the generalizability of Altman’s bankruptcy
[8] B. Back, T. Laitinen, K. Sere, Neural networks and genetic algorithms for
prediction model, J. Bus. Res. 54 (2001) 53–61.
bankruptcy predictions, Expert Syst. Appl. 11 (1996) 407–413.
[43] S. Hou, K. Ramani, Classifier combination for sketch-based 3D part retrieval,
[9] B. Baesens, R. Setiono, C. Mues, J. Vanthienen, Using neural network rule
Comput. Graphic 31 (2007) 598–609.
extraction and decision tables for credit-risk evaluation, Manage. Sci. 49 (3)
[44] Y. Hu, Bankruptcy prediction using ELECTRE-based single-layer perceptron,
(2003) 312–329.
Neurocomputing 72 (2009) 3150–3157.
[10] A. Bahrammirzaee, A comparative survey of artificial intelligence applications
[45] Y. Hu, J. Ansell, Measuring retail company performance using credit scoring
in finance: artificial neural networks, expert systems and hybrid intelligent
techniques, Eur. J. Oper. Res. 183 (3) (2007) 1595–1606.
systems, Neural Comput. Appl. 19 (8) (2010) 1165–1195.
[46] Y. Hu, J. Ansell, Retail default prediction by using sequential minimal
[11] S. Balcaen, H. Ooghe, 35 Years of studies on business failure: an overview of
optimization technique, J. Forecasting 28 (8) (2009) 651–666.
the classical statistical methodologies and their related problems, British
[47] Z. Hua, Y. Sun, X. Xu, Operational causes of bankruptcy propagation in supply
Account. Rev. 38 (2006) 63–93.
chain, Decis. Support Syst. 51 (3) (2011) 671–681.
[12] J.M. Bates, C.W.J. Granger, The combination of forecasts, Oper. Res. Quart. 20
[48] Z. Hua, Y. Wang, X. Xu, B. Zhang, L. Liang, Predicting corporate financial
(1969). 451-68.
distress based on integration of support vector machine and logistic
[13] W. Beaver, Financial ratios as predictors of failure, J. Account. Res. 4 (1966)
regression, Expert Syst. Appl. 33 (2007) 434–440.
71–111.
[49] X.-F. Hui, J. Sun, An application of support vector machine to companies’
[14] I. Bose, Deciding the financial health of dot-coms using rough sets, Inform.
financial distress prediction, Lect. Notes Artif. Int. 3885 (2006) 274–282.
Manage. 43 (2006) 835–846.
[50] C. Hung, J. Chen, A selective ensemble based on expected probabilities for
[15] I. Bose, R. Pal, Predicting the survival or failure of click-and-mortar
bankruptcy prediction, Expert Syst. Appl. 36 (2009) 5297–5303.
corporations: a knowledge discovery approach, Eur. J. Oper. Res. 174 (2006)
[51] R. Hwang, K. Cheng, J. Lee, A semi-parametric method for predicting
959–982.
bankruptcy, J. Forecasting 26 (5) (2007) 317–342.
[16] M. Borrajo, B. Baruque, E. Corchado, J. Bajo, J. Corchado, Hybrid neural
[52] H. Jo, I. Han, Integration of case-based forecasting, neural network, and
intelligent system to predict business failure in small-to-medium-size
discriminant analysis for bankruptcy prediction, Expert Syst. Appl. 11 (1996)
enterprises, Int. J. Neural Syst. 21 (4) (2011) 277–296.
415–422.
[17] S.-C. Carlos, Self organizing neural networks for financial diagnosis, Decis.
[53] H. Jo, I. Han, Bankruptcy prediction using case-based reasoning, neural
Support Syst. 17 (1996) 227–238.
networks, and discriminant analysis, Expert Syst. Appl. 13 (1997) 97–108.
[18] D.R. Carminchael, The auditor’s reporting obligation. Auditing Res. Monogr.
[54] S. Jones, D. Hensher, Predicting firm financial distress: a mixed logit model,
(1)(New York: AICPA) (1972) 94–94.
Account. Rev. 79 (4) (2004) 1011–1038.
[19] D.K. Chandra, V. Ravi, I. Bose, Failure prediction of dotcom companies using
[55] M.-J. Kim, I. Han, The discovery of experts’ decision rules from qualitative
hybrid intelligent techniques, Expert Syst. Appl. 36 (2009) 4830–4837.
bankruptcy data using genetic algorithms, Expert Syst. Appl. 25 (2003) 637–
[20] R. Chandrasekaran, Y. Ryu, V. Jacob, S. Hong, Isotonic separation, INFORMS J.
646.
Comput. 17 (4) (2005) 462–474.
[56] M. Kim, D. Kang, Ensemble with neural networks for bankruptcy prediction,
[21] N. Chaudhan, V. Ravi, D. Chandra, Differential evolution trained wavelet
Expert Syst. Appl. 37 (2010) 3373–3379.
neural networks: application to bankruptcy prediction in banks, Expert Syst.
[57] S.Y. Kim, A. Upneja, Predicting restaurant financial distress using decision tree
Appl. 36 (4) (2009) 7659–7665.
and AdaBoosted decision tree models, Econ. Model. 36 (2014) 354–362.
[22] A. Chaudhuri, K. De, Fuzzy support vector machine for bankruptcy prediction,
[58] P. Kumar, V. Ravi, Bankruptcy prediction in banks and firms via statistical and
Appl. Soft Comput. 11 (2011) 2472–2486.
intelligent techniques – a review, Eur. J. Oper. Res. 180 (1) (2007) 1–28.
[23] W.-S. Chen, Y.-K. Du, Using neural networks and data mining techniques for
[59] W. Kwak, Y. Shi, G. Kou, Bankruptcy prediction for Korean firms after the
the financial distress prediction model, Expert Syst. Appl. 36 (2009) 4075–
1997 financial crisis: using a multiple criteria linear programming data
4086.
mining approach, Rev. Quant. Finance Account. (2011).
[24] M.-Y. Chen, Predicting corporate financial distress based on integration of
[60] M. Leshno, Y. Spector, Neural network prediction analysis: the bankruptcy
decision tree classification and logistic regression, Expert Syst. Appl. 38
case, Neurocomputing 10 (1996) 125–147.
(2011). 11261-1127.
[61] H. Li, H. Adeli, J. Sun, J. Han, Hybridizing principles of TOPSIS with case-based
[25] C.-B. Cheng, C.-L. Chen, C.-J. Fu, Financial distress prediction by a radial basis
reasoning for business failure prediction, Comput. Oper. Res. 38 (2) (2011)
function network with logit analysis learning, Comput. Math. Appl. 51 (2006)
409–419.
579–588.
[62] H. Li, J. Sun, Ranking-order case-based reasoning for financial distress
[26] S. Cho, H. Hong, B.-C. Ha, A hybrid approach based on the combination of
prediction, Knowl.-Based Syst. 21 (8) (2008) 868–878.
variable selection using decision trees and case-based reasoning using the
[63] H. Li, J. Sun, Hybridizing principles of the Electre method with case-based
Mahalanobis distance: for bankruptcy prediction, Expert Syst. Appl. 37 (2010)
reasoning for data mining: Electre-CBR-I and Electre-CBR-II, Eur. J. Oper. Res.
3482–3488.
197 (1) (2009) 214–224.
[27] A. Cielen, P. Ludo, K. Vanhoof, Bankruptcy prediction using a data
[64] H. Li, J. Sun, Business failure prediction using hybrid2 case-based reasoning,
envelopment analysis, Eur. J. Oper. Res. 154 (2004) 526–532.
Comput. Oper. Res. 37 (1) (2010) 137–151.
[28] S. Crone, S. Finlay, Instance sampling in credit scoring: an empirical study of
[65] H. Li, J. Sun, J. Wu, Predicting business failure using classification and
sample size and balancing, Int. J. Forecast. 28 (1) (2012) 224–238.
regression tree: an empirical comparison with popular classical statistical
[29] E.B. Deakin, A discriminant analysis of prediction of business failure, J.
methods and top classification mining methods, Expert Syst. Appl. 37 (8)
Account. Res. 3 (spring) (1972) 167–169.
(2010) 5895–5904.
[30] A.I. Dimitras, R. Slowinski, R. Susmaga, Business failure prediction using
rough sets, Eur. J. Oper. Res. 114 (1999) 263–280.
J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56 55

[66] H. Li, H. Huang, J. Sun, C. Lin, On sensitivity of case-based reasoning to [98] P. Ravikumar, V. Ravi, Bankruptcy prediction in banks by an ensemble
optimal feature subsets in business failure prediction, Expert Syst. Appl. 37 classifier, IEEE Int. Conf. Ind. Technol. 12 (2006) 2032–2036.
(7) (2010) 4811–4821. [99] P.S. Rose, G.A. Giroux, Predicting corporate bankruptcy: an analytical and
[67] H. Li, Y. Lee, Y. Zhou, J. Sun, The random subspace binary logit (RSBL) model empirical evaluation, Rev. Bus. Econ. Res. 19 (1984) 1–12.
for bankruptcy prediction, Knowl.-Based Syst. 24 (8) (2011) 1380–1388. [100] S.A. Ross, R.W. Westerfield, J.F. Jaffe, Corporate finance, second ed.,
[68] H. Li, J. Sun, Predicting business failure using support vector machines with Homewood IL, 1999.
straightforward wrapper: a re-sampling study, Expert Syst. Appl. 38 (10) [101] Y. Ryu, R. Chandrasekaran, V. Jacob, Prognosis using an isotonic prediction
(2011) 12747–12756. technique, Manage. Sci. 50 (6) (2004) 777–785.
[69] H. Li, J. Sun, Principal component case-based reasoning ensemble for business [102] Y. Ryu, W. Yue, Firm bankruptcy prediction: experimental comparison of
failure prediction, Inform. Manage. 48 (6) (2011) 220–227. isotonic separation and other classification approaches, IEEE Trans. Syst. Man
[70] H. Li, J. Sun, Forecasting business failure: the use of nearest-neighbour Cybern–Part A: Syst. Humans 21 (4) (2005) 265–276.
support vectors and correcting imbalanced samples: evidence from the [103] S. Sarkar, R.S. Sriram, Bayesian models for early warning of bank failures,
Chinese hotel industry, Tourism Manage. (2012). Manage. Sci. 47 (11) (2001) 1457–1475.
[71] L. Liang, D. Wu, An application of pattern recognition on scoring Chinese [104] R. Setiono, B. Baesens, C. Mues, Rule extraction from minimal neural
corporations financial conditions based on backpropagation neural network, networks for credit card screening, Int. J. Neural Syst. 21 (4) (2011) 265–276.
Comput. Oper. Res. 32 (5) (2005) 1115–1129. [105] C. Serrano-Cinca, B. Gutiérrez-Nieto, Partial least square discriminant
[72] T.-H. Lin, A cross model study of corporate financial distress prediction in analysis for bankruptcy prediction, Decis. Support Syst. 54 (3) (2013)
Taiwan: multiple discriminant analysis, logit, probit and neural networks 1245–1255.
models, Neurocomputing 72 (2009) 3507–3516. [106] K.-S. Shin, Y.-J. Lee, A genetic algorithm application in bankruptcy prediction
[73] F. Lin, C.-C. Yeh, M.-Y. Lee, The use of hybrid manifold learning and support modeling, Expert Syst. Appl. 23 (2002) 321–328.
vector machines in the prediction of business failure, Knowl.-Based Syst. 24 [107] K.-S. Shin, T.S. Lee, H.-J. Kim, An application of support vector machines in
(2011) 95–101. bankruptcy prediction model, Expert Syst. Appl. 28 (2005) 127–135.
[74] W.-Y. Lin, Y.-H. Hu, C.-F. Tsai, Machine learning in financial crisis prediction: a [108] J. Sun, K. He, H. Li, SFFS-PC-NN optimized by genetic algorithm for dynamic
survey, IEEE Trans. Syst. Man Cybern.—Part C: Appl. Rev. 42 (4) (2012) 421– prediction of financial distress with longitudinal data streams, Knowl.-Based
436. Syst. 24 (2011) 1013–1023.
[75] D. Martens, T. Gestel, M. Backer, R. Haesen, J. Vanthienen, B. Baesens, Credit [109] J. Sun, X.-F. Hui, Financial distress prediction based on similarity weighted
rating prediction using ant colony optimization, J. Oper. Res. Soc. 61 (2010) voting CBR, Lect. Notes Artif. Int. 4093 (2006) 947–958.
561–573. [110] J. Sun, M. Jia, H. Li, AdaBoost ensemble for financial distress prediction: An
[76] T.E. McKee, Developing a bankruptcy prediction model via rough sets theory, empirical comparison with data from Chinese listed companies, Expert Syst.
Intell. Syst. Account. Finan. Manage. 9 (2000) 159–173. Appl. 38 (2011) 9305–9312.
[77] T.E. McKee, M. Greenstein, Predicting bankruptcy using recursive partitioning [111] J. Sun, H. Li, Data mining method for listed companies’ financial distress
and a realistically proportioned data set, J. Forecasting 19 (2000) 219–230. prediction, Knowl.-Based Syst. 21 (2008) 1–5.
[78] A.I. Marques, V. Garcia, J.S. Sanchez, A literature review on the application of [112] J. Sun, H. Li, Listed companies’ financial distress prediction based on weighted
evolutionary computing to credit scoring, J. Oper. Res. Soc. 64 (12) (2013) majority voting combination of multiple classifiers, Expert Syst. Appl. 35
1384–1399. (2008) 818–827.
[79] T.E. McKee, T. Lensberg, Genetic programming and rough sets: a hybrid [113] J. Sun, H. Li, Financial distress prediction based on serial combination of
approach to bankruptcy classification, Eur. J. Oper. Res. 138 (2002) 436–451. multiple classifiers, Expert Syst. Appl. 36 (2009) 8659–8666.
[80] J.H. Min, Y.-C. Lee, Bankruptcy prediction using support vector machine with [114] J. Sun, H. Li, Financial distress early warning based on group decision making,
optimal choice of kernel function parameters, Expert Syst. Appl. 28 (2005) Comput. Oper. Res. 36 (2009) 885–906.
128–134. [115] J. Sun, H. Li, Dynamic financial distress prediction using instance selection for
[81] S.-H. Min, J. Lee, I. Han, Hybrid genetic algorithms and support vector the disposal of concept drift, Expert Syst. Appl. 38 (2011) 2566–2576.
machines for bankruptcy prediction, Expert Syst. Appl. 31 (2006) 652–660. [116] L. Sun, P. Shenoy, Using Bayesian networks for bankruptcy prediction: Some
[82] L. Nanni, A. Lumini, An experimental comparison of ensemble of classifiers for methodological issues, Eur. J. Oper. Res. 180 (2007) 738–753.
bankruptcy prediction and credit scoring, Expert Syst. Appl. 36 (2009) 3028– [117] T. Sung, N. Chang, G. Lee, Dynamics of modeling in data mining: interpretive
3033. approach to bankruptcy prediction, J. Manage. Inform. Syst. 16 (1) (1999) 63–
[83] W. Ng, D. Yeung, M. Firth, et al., Feature selection using localized 86.
generalization error for supervised classification problems using RBFNN, [118] K. Tam, Neural network models and the prediction of bank bankruptcy,
Pattern Recogn. 41 (2008) 3706–3719. Omega 19 (5) (1991) 429–445.
[84] M. Odom, R. Sharda, A neural networks model for bankruptcy prediction, [119] K. Tam, M. Kiang, Managerial applications of neural networks: the case of
Proc. IEEE Int. Conf. Neural Network 2 (1990) 163–168. bank failure prediction, Manage. Sci. 38 (7) (1992) 926–947.
[85] J.A. Ohlson, Financial ratios and probabilistic prediction of bankruptcy, J. [120] L. Todorovski, S. Dzeroski, Combining classifiers with meta decision trees,
Account. Res. 18 (1980) 109–131. Mach. Learn. 50 (2003) 223–249.
[86] D. Olson, D. Delen, Y. Meng, Comparative analysis of data mining methods for [121] C.-F. Tsai, Feature selection in bankruptcy prediction, Knowl.-Based Syst. 22
bankruptcy prediction, Decis. Support Syst. 52 (2) (2012) 464–473. (2009) 120–127.
[87] C.-S. Park, I. Han, A case-based reasoning with the feature weights derived by [122] C.-F. Tsai, J. Wu, Using neural network ensembles for bankruptcy prediction
analytic hierarchy process for bankruptcy prediction, Expert Syst. Appl. 23 and credit scoring, Expert Syst. Appl. 34 (2008) 2639–2649.
(2002) 255–264. [123] F. Tseng, L. Lin, A quadratic interval logit model for forecasting bankruptcy,
[88] P.C. Pendharkar, A potential use of data envelopment analysis for the inverse Omega 33 (1) (2005) 85–91.
classification problem, Omega 30 (2002) 243–248. [124] F. Varetto, Genetic algorithms applications in the analysis of insolvency risk,
[89] P.C. Pendharkar, A threshold varying artificial neural network approach for J. Bank. Finance 22 (1998) 1421–1439.
classification and its application to bankruptcy prediction problem, Comput. [125] A. Verikas, Z. Kalsyte, M. Bacauskiene, A. Gelzinis, Hybrid and ensemble-
Oper. Res. 32 (2005) 2561–2582. based soft computing techniques in bankruptcy prediction: a survey, Soft.
[90] P.C. Pendharkar, Misclassification cost minimizing fitness functions for Comput. 14 (9) (2010) 995–1010.
genetic algorithm-based artificial neural network classifiers, J. Oper. Res. [126] D. West, S. Dellana, J. Qian, Neural network ensemble strategies for financial
Soc. 60 (2009) 1123–1134. decision applications, Comput. Oper. Res. 32 (2005) 2543–2559.
[91] H.D. Platt, M.B. Platt, A note on the use of industry-relative ratios in [127] W. Wu, Improving classification accuracy and causal knowledge for better
bankruptcy prediction, Banking Finance 15 (1991) 1183–1194. credit decisions, Int. J. Neural Syst. 21 (4) (2011) 297–309.
[92] I.M. Premachandra, Y. Chen, J. Watson, DEA as a tool for predicting corporate [128] C.-H. Wu, G.-H. Tzeng, Y.-J. Goo, W.-C. Fang, A real-valued genetic algorithm
failure and success: a case of bankruptcy assessment, Omega 39 (6) (2011) to optimize the parameters of support vector machine for predicting
620–626. bankruptcy, Expert Syst. Appl. 32 (2007) 397–408.
[93] F.M. Rafiei, S.M. Manzari, S. Bostanian, Financial health prediction models [129] D. Wu, L. Liang, Z. Yang, Analyzing the financial distress of Chinese public
using artificial neural networks, genetic algorithm and multivariate companies using probabilistic neural networks and multivariate discriminate
discriminant analysis: iranian evidence, Expert Syst. Appl. 38 (2011) analysis, Socio-Econ. Plan. Sci. 42 (3) (2008) 206–220.
10210–10217. [130] Y. Wang, S. Wang, K.K. Lai, A new fuzzy support vector machine to evaluate
[94] V. Ravi, H. Kurniawan, P. Thai, P. Kumar, Soft computing system for bank credit risk, IEEE Trans. Fuzzy Syst. 13 (6) (2005) 820–831.
performance prediction, Appl. Soft Comput. 8 (1) (2008) 305–315. [131] X. Xu, Y. Sun, Z. Hua, Reducing the probability of bankruptcy through supply
[95] V. Ravi, C. Pramodh, Threshold accepting trained principal component neural chain coordination, IEEE Trans. Syst. Man Cybern.–Part C: Appl. Rev. 44
network and feature subset selection: application to bankruptcy prediction in (2010) 67–74.
banks, Appl. Soft Comput. 8 (4) (2008) 1539–1548. [132] Z.R. Yang, M.B. Platt, H.D. Platt, Probabilistic neural networks in bankruptcy
[96] P. Ravisankar, V. Ravi, I. Bose, Failure prediction of dotcom companies using prediction, J. Bus. Res. 44 (1999) 67–74.
neural network-genetic programming hybrids, Inf. Sci. 180 (2010) 1257– [133] C.-C. Yeh, D.-J. Chi, M.-F. Hsu, A hybrid approach of DEA, rough set and
1267. support vector machines for business failure prediction, Expert Syst. Appl. 37
[97] P. Ravisankar, V. Ravi, Financial distress prediction in banks using Group (2010) 1535–1541.
Method of Data Handling neural network, counter propagation neural [134] C.V. Zavgren, Assesing the vulnerability to failure of American industrial firm:
network and fuzzy ARTMAP, Knowl.-Based Syst. 23 (2010) 823–831. a logistic analysis, J. Bus. Finance Account. 12 (1985) 19–45.
56 J. Sun et al. / Knowledge-Based Systems 57 (2014) 41–56

[135] G. Zhang, M.Y. Hu, B.E. Patuwo, D.C. Indro, Artificial neural networks in [138] J. Sun, H. Li, H. Adeli, Concept drift-oriented adaptive and dynamic support
bankruptcy prediction: General framework and cross-validation analysis, vector machine ensemble with time window in corporate financial risk
Eur. J. Oper. Res. 116 (1999) 16–32. prediction, IEEE Trans. Syst. Man Cybern: Syst. 43 (4) (2013) 801–813.
[136] M.E. Zmijewski, Methodological issues related to the estimation of financial [139] K. Kiviluoto, Predicting Bankruptcies with the Self-Organizing Map,
distress prediction models, J. Account. Res. 22 (1984) 59–82. Neurocomput. 21 (1998) 191–201.
[137] J. Sun, H. Li, Financial distress prediction using support vector machines:
Ensemble vs. individual, Appl. Soft Comput. 12 (8) (2012) 2254–2265.

You might also like