Professional Documents
Culture Documents
The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
a r t i c l e i n f o a b s t r a c t
Keywords: Due to the important role of financial distress prediction (FDP) for enterprises, it is crucial to improve the
Financial distress prediction accuracy of FDP model. In recent years, classifier ensemble has shown promising advantage over single
AdaBoost ensemble classifier, but the study on classifier ensemble methods for FDP is still not comprehensive enough and
Single attribute test leaves to be further explored. This paper constructs AdaBoost ensemble respectively with single attribute
Decision tree
test (SAT) and decision tree (DT) for FDP, and empirically compares them with single DT and support vec-
Support vector machine
tor machine (SVM). After designing the framework of AdaBoost ensemble method for FDP, the article
describes AdaBoost algorithm as well as SAT and DT algorithm in detail, which is followed by the com-
bination mechanism of multiple classifiers. On the initial sample of 692 Chinese listed companies and 41
financial ratios, 30 times of holdout experiments are carried out for FDP respectively one year, two years,
and three years in advance. In terms of experimental results, AdaBoost ensemble with SAT outperforms
AdaBoost ensemble with DT, single DT classifier and single SVM classifier. As a conclusion, the choice of
weak learner is crucial to the performance of AdaBoost ensemble, and AdaBoost ensemble with SAT is
more suitable for FDP of Chinese listed companies.
Ó 2011 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.01.042
Author's personal copy
used machine learning methods for FDP, and many studies (Carlos, Alternatively, different training data sets can be produced by
1996; Fletcher & Goss, 1993; Odom & Sharda, 1990; Pendharkar, selecting from the same initial training set according to certain
2005; Zhang, Hu, Patuwo, & Indro, 1999) concluded that it outper- mechanism, and then used to generate diverse classifiers through
forms traditional statistic methods. NN has the advantage of strong the same learning algorithm. Two popular methods for creating
nonlinear mapping ability, but its black-box property makes the such ensembles are bagging and boosting. Alfaro, Gámez, and Gar-
learned knowledge difficult to understand for corporate managers. cía (2007) have shown that AdaBoost, one of the popularly used
Developed on the basis of statistical learning theory, support boosting algorithms, decreases the generalization error and im-
vector machine (SVM) is a relatively new machine learning tech- proves the accuracy in its application to FDP. Alfaro, García, Gámez,
nique (Kim, Kim, & Lee, 2002). SVM was applied to bankruptcy pre- and Elizondo (2008) carried out an empirical comparison for FDP
diction respectively by Shin, Lee, and Kim (2005) and Min and Lee and showed that AdaBoost with DT outperforms NN both in the
(2005) with Korean data and Hui and Sun (2006) with data of cross-validation and test set estimation of the classification error.
Chinese listed companies. They all used radial basis function Kim and Kang (2010) established bagging and AdaBoost ensembles
(RBF) as SVM’s kernel function, and supported the conclusion that with NN and compared them with single NN classifier. It is indi-
SVM outperforms MDA, Logit and NN in FDP. Instead of empirical cated that bagged and boosted NN ensembles consistently improve
risk minimization, SVM uses the principle of structural risk mini- predictive accuracy.
mization, which well prevents SVM from over fitting. Besides, the
problem of local optimization can also be avoided by SVM algo-
3. Contribution of this paper
rithm for the reason that it is a convex optimization problem and
its local optimal solution is just the global optimal solution. Finally,
Former researches on AdaBoost ensemble for FDP used DT or
SVM can also provide good generalization ability and stable classi-
NN as weak learner, and were both compared to single NN classi-
fication performance for relatively small sample. From this point of
fier. As the most simple classification method for FDP, univariate
view, SVM is superior to NN because NN is easy to get into over
analysis, also called single attribute test (SAT) in this paper, needs
fitting when sample is not large enough. Recently, Ding, Song,
lower computational cost than almost all other classification meth-
and Zen (2008) and Boyacioglu, Kara, and Baykan (2009) further
ods. In addition and more importantly, SAT itself has real weak
investigated SVM-based FDP, and affirmed that SVM can serve as
learning ability, which can produce moderate accurate but not so
promising FDP model.
strong classifier. As a universal principle, for two things with the
Considering the possible limitation of single classifier, more and
same function, the simpler is the better. However, to the best of
more researchers began to pay attention to FDP based on multiple
our knowledge, no literatures have provided evidence on whether
classifier combination, or classifier ensemble. It is expected to re-
AdaBoost ensemble with SAT is more suitable for FDP than Ada-
duce the variance of estimated error and improve the whole recog-
Boost ensemble with DT. For this reason, this paper constructs Ada-
nition performance (Kim, Min, & Han, 2006; Kim et al., 2002; Ruta
Boost ensemble respectively with SAT and DT for FDP, and
& Gabrys, 2005). To construct an effective multiple classifier sys-
empirically compares them with single DT and SVM classifier.
tem, diversity is essential, which means the base classifiers to be
The reason why SVM, instead of NN, is chosen for the purpose of
combined should be different. By now, several methods to produce
comparison is that SVM has been proved to be a prominent single
such diversity have been proposed.
classifier for FDP by many literatures, and it is superior to NN in
Firstly, diverse base classifiers can be generated by applying dif-
terms of generalization ability especially when the sample is not
ferent learning algorithms (with heterogeneous model representa-
so large. Therefore, this study contributes to provide further insight
tions) to a single data set. Jo and Han (1996) integrated CBR, NN,
into FDP method based on AdaBoost ensemble, particularly for the
and MDA to predict bankruptcy, and concluded that the combined
situation that FDP needs to be made with limited samples. It is also
model is superior to each independent one. Sun and Li (2008) pro-
believed that such empirical results can provide useful guideline
posed FDP method by weighted majority voting combination of
for the practice of FDP.
MDA, Logit, DT, NN, SVM, and CBR, and Cho, Kim, and Bae (2009)
introduced an integration strategy with subject weight based on
NN to combine MDA, Logit, NN and DT for bankruptcy prediction. 4. Methodology
They concluded that FDP based on combination of multiple classifi-
ers is superior to single classifier according to accuracy rate or stabil- 4.1. Framework of AdaBoost ensemble method for FDP
ity to some extent. Li and Sun (2009) put forward a multiple CBR
system by majority voting, which inherits the ability of producing As a commonly used technique for constructing ensemble clas-
maximum accuracy generated by its component, improves the abil- sifiers, Boosting tries to construct a classifier ensemble by develop-
ity of producing minimum accuracy, and achieves more ability on ing one classifier at a time incrementally. This means each
stability. Sun and Li (2009) and Hung and Chen (2009) respectively classifier that joints the ensemble is trained on a data set selec-
studied the FDP ensemble method by classifier selection with differ- tively sampled from a training data set by gradually increasing
ent inner structures. The former called it serial combination of mul- the likelihood of ‘‘difficult’’ data points at each step. AdaBoost, pro-
tiple classifiers and compared it with candidate single classifiers. It posed by Freund and Schapire (1997), is the most well known
was concluded that for the FDP problem with two categories serial boosting method. This paper tries to study on AdaBoost ensemble
combination does not show much superiority to the best base clas- method for FDP, whose framework is designed as Fig. 1. It firstly
sifier. The latter called it selective ensemble and compared it with samples a training set from the initial data set according to uni-
stacking ensemble by voting and weighting, to conclude that selec- form distribution (W1), and then adaptively adjusts each example’s
tive ensemble performs better than stacking ensemble. weight in terms of whether it is difficult or easy to classify. Namely,
Besides, Tsai and Wu (2008) used NN ensemble for bankruptcy the initial training examples which are misclassified by the weak
prediction, whose diversified base classifiers were constructed on learner trained in the nearest last step are regarded as the difficult
different data sets from three countries. Their experimental results ones, and their weights should be increased. The updated weight
showed that NN ensemble did not outperform a single best NN distribution is then used to sample another training set from the
classifier, based on which they considered that the proposed multi- initial data set, on which another weak learner can be trained. After
ple classifier system may be not suitable for the binary classifica- T times of such iterations, AdaBoost ensemble for FDP can be com-
tion problem as bankruptcy prediction. posed of T weak learners, whose individual outputs are combined
Author's personal copy
Table 1
Initial training dataset Sn AdaBoost algorithm.
AdaBoost algorithm
Weight distribution W1 Weight distribution W2 … Weight distribution WT
Input: Initial training set composed of n examples, denoted as
Sn = {(x1, y1), (x2, y2), . . . , (xn, yn)}
Training dataset 1 Training dataset 2 Training dataset T Weak learning algorithm, denoted as WeakLearner
selected from Sn selected from Sn selected from Sn Integer T specifying total number of iterations
according to W1 according to W2 according to WT Initialize: wi1 ¼ 1=n, i.e. W 1 ¼ fw11 ; w21 ; . . . ; wn1 g ¼ f1=n; 1=n; . . . ; 1=ng
The ensemble F = /
For t = 1, 2, . . . , T
Weak learner 1 Weak learner 2 Weak learner T 1. Take a sample Rt from Sn using distribution Wt
2. Build a classifier ft using Rt as the trainingset
P
3. Compute: Et ¼ i:ft ðxi Þ–yi wit and at ¼ 0:5 ln 1E Et
t
i
AdaBoost ensemble 4. Update the weight:witþ1 ¼ normalizeðwit expðat lt ÞÞ
Weak learner 1
Output: The ensemble F = {f1, f2, . . . , fT} and A = {a1, a2, . . . , aT}
Testing Weak learner 2 Combination Prediction
dataset … result
Weak learner T
4.3. Weak learners
Fig. 1. Framework of AdaBoost ensemble method for FDP.
AdaBoost is a method used to significantly reduce the error of
to produce the final prediction result. The details of AdaBoost algo- weak learning algorithm. In theory, the weak learning algorithm
rithm, weak learning algorithm and combination mechanism are to can be any one as long as it can generate classifiers which need
be stated respectively in the following subsections. only be a little better than random guessing (Freund & Schapire,
1996). That means the weak learners should not result in over-
4.2. AdaBoost fitting. In this paper, SAT and DT are respectively used as weak
learning algorithms for the following two reasons. Firstly, SAT
Suppose Sn = {(x1, y1), (x2, y2), . . . , (xn, yn)} is a set of training and DT are both non-parametric learning algorithms which need
samples, and yi (i = 1, 2, . . . , n) e {1, 1}, which represents only not search optimal parameters in the training stage, and thus they
two classes for simplification purpose. The weight distribution have relatively faster learning ability than other parametric algo-
over these samples at the tth boosting iteration is denoted as rithms. This property makes them especially suitable for AdaBoost
W t ¼ fw1t ; w2t ; . . . ; wnt g (t = 1, 2, . . . , T), which is initially set uni- ensemble, which needs time-consuming iterations repeated for
formly. It means the weight wit (i = 1, 2, . . . , n) is given a value of many times. Secondly, AdaBoost prefers weak learning algorithms
1/n at the first iteration when t = 1, and will be updated adaptively over strong ones. Namely, AdaBoost can only provide very limited
at later iterations. At iteration t, AdaBoost builds a new training improvement in accuracy for a strong learning algorithm. Though
data set by sampling from the initial training data set with the Alfaro et al. (2008) used DT with deep pruning as weak learner
weight distribution of Wt, and calls the Weak Leaner to construct for FDP AdaBoost ensemble, this paper also attempts the FDP Ada-
a base classifier, represented as ft, on this new training data set. ft Boost ensemble with SAT weak learner, because SAT is surpassed
should then be applied to classifying the samples in the initial data by all other FDP methods in former researches and usually obtains
set, and the error of ft, denoted as Et, can be calculated as follows: moderately but not so accurate learner, which just meets the
X requirement of AdaBoost.
Et ¼ wit ð1Þ
i:ft ðxi Þ–yi
4.3.1. Single attribute test
According to the idea that easy samples correctly classified by ft SAT is the first algorithm proposed for diagnosing corporate
get lower weights and difficult samples misclassified get higher financial distress, and it was named as univariate discriminant
weights, samples’ weight distribution should be updated as analysis by Beaver (1966). Suppose X is a matrix composed of m
follows: rows and n columns. Here, m is the number of all attributes and
n is also the number of training samples. The SAT algorithm ap-
i
w0itþ1 ¼ wit expðat lt Þ ði ¼ 1; 2; . . . ; nÞ ð2Þ plied in this study is listed in Table 2.
i
In the above formula (2), at and lt is calculated as follows:
4.3.2. Decision tree
1 Et DT began to be applied to financial distress prediction by
at ¼ 0:5 ln ð3Þ
Et Frydman et al. (1985). It is a kind of tree-shaped decision struc-
ture learned inductively from sample data whose class is already
i 1 if f t ðxi Þ ¼ yi known by recursively partitioning attribute values. In DT, each
lt ¼ ð4Þ
1 if f t ðxi Þ – yi non-leaf node means a testing of an attribute value, and each
leaf node represents a class. Thus, DT can provide well under-
The above calculated weights should be normalized so that they standable knowledge and aid decision making for less-experi-
add up to one enced users easily. Basic algorithm of DT is stated in Table 3(
w0i Sun & Li, 2008).
witþ1 ¼ Pn tþ10i ði ¼ 1; 2; . . . ; nÞ ð5Þ In the basic DT algorithm described in Table 3, IG represents
i¼1 wtþ1
information gain, which is most widely used as the criterion for
When T iterations are processed, the ensemble will be com- choosing an attribute split. Its calculation is as the follows.
posed of T weak classifiers. The final AdaBoost classification result Sn is a data set consisting of n samples. Label of a sample has
is made through combination of their classification results two different values, namely: C1 = 1 and C2 = 1. If sl (l = 1, 2) is
weighted by at. In detail, the AdaBoost algorithm is listed in the sample number of class Cl. Then the total information entropy
Table 1. needed to classify the given data set is I(s1, s2)
Author's personal copy
1
X
2 Commonly, Chinese listed companies will be specially treated (ST) if: (1) a
Iðs1q ; s2q Þ ¼ plq log2 ðplq Þ ð8Þ company has had negative net profit in consecutive two years or (2) a firm’s net
l¼1 capital per share is lower than its face value. This study chooses samples according to
the above ST criteria. If a company is specially treated because (1) the firm purposely
slq publishes financial statements with serious false and misstatement or (2) other
plq ¼ ðl ¼ 1; 2Þ ð9Þ abnormal incidents described in Chinese Stock Listing Exchange Rule appear, it is
s1q þ s2q
excluded.
Author's personal copy
predict financial distress of year (t 0) respectively according to 5.3. Experimental results and analysis
financial ratio information of year (t 1), (t 2) and (t 3). That
is, financial distress is tried to be predicted respectively one year Thirty times of holdout testing errors for FDP are listed in Ta-
in advance, two years in advance and three years in advance. ble 5. For direct comparison among different FDP methods, the
Forty one financial ratios are utilized as input variables, which mean values of 30 times of holdout testing errors are also calcu-
cover profitability, activity, solvency, growth, risk level, per share lated in the last row of Table 5. As can be seen, no matter FDP is
ratios, and cash flow ratios, as listed in Table 4 (Sun and Li, made one year, two years or three years in advance, SA (AdaBoost
2011). Hence, these explanatory variables can provide comprehen- ensemble with single attribute test) outperforms the other three
sive indication of firm’s financial and operational state. Since SAT methods in terms of mean testing error. That is, in the 30 times
and DT are applied as weak learners in the study, no other feature of holdout testing, SA has the lowest mean testing errors of
selection methods are needed for the reason that these weak lear- 2.78%, 12.81% and 27.51% respectively at year (t 1), (t 2) and
ner algorithms have the ability of feature selection by themselves. (t 3). However, DTA does not always obtain the lower mean test-
ing errors than the single classifiers of DT and SVM from year
5.2. Experimental design (t 1) to (t 3). In detail, the mean testing error of DTA at year
(t 2) is 13.04%, which is both lower than DT (14.26%) and SVM
To obtain comparable experimental results, the same FDP prob- (13.43%), but the mean testing error of DTA is higher than DT at
lem is solved by four different classification methods, i.e. AdaBoost year (t 1) and higher than SVM at year (t 3). For the two single
ensemble with SAT (represented as SA), AdaBoost ensemble with classifiers, SVM evidently outperforms DT at year (t 2) and
DT (represented as DTA), single classifier on DT and single classifier (t 3), but performs a little worse than DT at year (t 1). The pos-
on SVM. Thirty times of holdout tests are carried out to estimate sible reason for this phenomenon is that more linearity may exist
the prediction accuracy more objectively. Each time, the total ini- between explanatory variables and output label at year (t 1) than
tial sample composed of 692 Chinese listed companies are divided year (t 2) and (t 3). While, the RBF SVM applied in the experi-
into two subsets, namely training set and testing set. The former ment is more suitable for non-linear problems.
has the proportion of two thirds (2/3) and the latter occupies the For clearer illustration, testing error curves on 30 times of hold-
rest one third (1/3). For each classification method, the above de- out experiments are graphed in Figs. 2–4 respectively at year
scribed process of dividing training set and testing set are repeated (t 1), (t 2) and (t 3).
for 30 times, so that 30 estimated errors can be ultimately obtained As can be seen, the testing error curves of SA are at a relatively
for the purpose of statistical analysis. This makes the comparison lower position for all three years. Furthermore, the testing error
among different classification methods more scientific. curves of SA wave in smaller ranges than those of DTA, DT and
Individual DT classifier is pruned using the level of depth with SVM, indicating that the FDP method on SA is more stable than
which the pruned tree has the lowest 10-fold cross validation the other three ones. Such an advantage of SA over the other three
error. Therefore, the size of individual tree is limited to avoid methods is obvious according to the shape of testing error curve.
over-fitting. SVM algorithm with RBF kernel function is applied When FDP is made at year (t 1), RBF SVM tends to have the high-
to building single SVM classifier, since RBF SVM is proved to be est testing error and largest waving range for the possible reason
an effective one for FDP in former researches (Hui & Sun, 2006; mentioned above. But when FDP is made at year (t 2) or
Min & Lee, 2005; Shin et al., 2005). Because the tuning parameter (t 3), the points on DT’s testing error curve tend to be on the
C and the kernel parameter c are crucial to RBF SVM’s classification higher position more frequently. Therefore, FDP methods based
performance, grid search technique is used to find optimal param- on AdaBoost ensemble is superior to the single classifier of DT or
eter values by the criteria of 10-fold cross validation error. SVM as a whole.
Table 4
Financial ratios used as explanatory variables.
Category Variables
Profitability Gross income/operating revenue Net profit/operating revenue
Earning before interest and tax/total assets Net profit/total assets
Net profit/current assets Net profit/fixed assets
Profit margin Net profit/equity
Return on invested capital
Activity Account receivables turnover Inventory turnover
Account payable turnover Working capital turnover
Current assets turnover Fixed assets turnover
Long-term assets turnover Total assets turnover
Net assets turnover
Solvency Current ratio Quick ratio
Working capital ratio Asset-liability ratio Equity/debt ratio
Current assets/total assets Fixed assets/total assets
Equity/fixed assets Current liability/total liabilities
Debt/tangible assets ratio Liabilities/market value of equity
Growth ratios Growth rate of prime operating revenue Rate of capital preservation and appreciation
Growth rate of total assets Growth rate of net profit
Risk level Coefficient of financial leverage Coefficient of operating leverage
Per share ratios Operating revenue per share Earning per share
Net assets per share
Cash flow ratios Cash flow/current liabilities ratio Cash rate of prime operating revenue
Net operating cash flow per share Net cash flow per share
Net operating cash flow/net profit ratio
Author's personal copy
Table 5
Holdout testing errors for FDP.
9 39
PA
8 DTA 37
DT 35
Holdout testing error (%)
7 SVM 33
6 31
5 29
27
4
25
3 23
2 21 PA
DTA
19
1 DT
17 SVM
0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 15
The number of holdout testing 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
The number of holdout testing
Fig. 2. Testing error curve on 30 times of holdout experiments at year (t 1).
Fig. 4. Testing error curve on 30 times of holdout experiments at year (t 3).
20
For convincing support of the above comparison, the statistical
19 analysis of left-tailed T test for mean comparison is carried out and
the results corresponding to year (t 1), (t 2) and (t 3) are
Holdout testing error (%)
Table 6 play well. Hence, the choice of weak learner is crucial to the perfor-
Left-tailed T testing results at year (t 1). mance AdaBoost ensemble. For FDP, AdaBoost ensemble with SAT
t1 is a better choice because of its acceptable prediction accuracy as
SA DTA DT SVM well as the relatively low computational cost.
SA – 4.419 4.095 4.910
0.000*** 0.000*** 0.000*** 6. Conclusion
DTA 4.419 – 0.473 1.742
1.000 0.680 0.046**
DT 4.095 0.473 – 2.118 FDP takes an important role in the prevention of corporate fail-
1.000 0.320 0.021** ure, which makes the accuracy of FDP model be widely concerned
SVM 4.910 1.742 2.118 – by FDP researches. Though former researches have made compre-
1.000 0.954 0.979
hensive investigation on different single classifiers for FDP, FDP
**
Significance level of 5%. based on classifier ensemble just arose in recent years and has a
***
Significance level of 1%. good prospect of application. This paper further explores AdaBoost
ensemble for FDP and makes an empirical comparison. After
designing the framework of AdaBoost ensemble method for FDP,
Table 7 it describes the algorithms of AdaBoost as well as SAT and DT,
Left-tailed T testing results at year (t2). and uses weighted majority voting as the combination mechanism.
On the sample from Chinese listed companies, 30 times of holdout
t2
experiments are carried out respectively for the four FDP methods
SA DTA DT SVM
of AdaBoost ensemble with SAT, AdaBoost ensemble with DT, sin-
SA – 0.515 4.385 1.733 gle DT and single SVM. Experimental results show that AdaBoost
0.305 0.000*** 0.047**
ensemble with SAT outperforms the other three methods with sta-
DTA 0.515 – 3.288 1.127
0.695 0.001*** 0.135 tistical significance and especially suits for Chinese listed compa-
DT 4.385 3.288 – 2.510 nies FDP. It is also confirmed that choice of weak learner
1.000 0.999 0.991 algorithm does affect the FDP performance of AdaBoost ensemble,
SVM 1.733 1.127 2.510 – because DT weak learner is inferior to SAT weak learner in our
0.953 0.865 0.009***
experiments with Chinese listed companies. FDP experiments are
**
Significance level of 5%. carried out respectively one year, two years and three years in ad-
***
Significance level of 1%. vance, which make the above conclusion more comprehensive.
Therefore, this study contributes to provide incremental evidence
for FDP research based on AdaBoost and guide the real world prac-
Table 8 tice of FDP to some extent. However, this study also has the limi-
Left-tailed T testing results at year (t 3). tation that the experimental data sets are only collected from
t3 Chinese listed companies, and further investigation can be done
SA DTA DT SVM
based on other countries’ real world data sets in future study.
Fletcher, D., & Goss, E. (1993). Forecasting with neural networks: An application Ruta, D., & Gabrys, B. (2005). Classifier selection for majority voting. Information
using bankruptcy data. Information and Management, 24, 159–167. Fusion, 6, 63–81.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Odom, M., & Sharda, R. (1990). A neural networks model for bankruptcy prediction.
Proceedings of the 13th international conference on machine learning (pp. 148– Proceedings of the IEEE International Conference on Neural Network, 2, 163–168.
156). San Francisco: Morgan Kaufmann. Ohlson, J. (1980). Financial ratios and probabilistic prediction of bankruptcy. Journal
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line of Accounting Research, 18, 109–131.
learning and an application to boosting. Journal of Computer and System Sciences, Pendharkar, P. C. (2005). A threshold varying artificial neural network approach for
55(1), 119–139. classification and its application to bankruptcy prediction problem. Computers
Hung, C., & Chen, J. (2009). A selective ensemble based on expected probabilities for & Operations Research, 32, 2561–2582.
bankruptcy prediction. Expert Systems with Applications, 36, 5297–5303. Sun, J., & Li, H. (2008). Listed companies’ financial distress prediction based on
Hui, X.-F., & Sun, J. (2006). An application of support vector machine to companies’ weighted majority voting combination of multiple classifiers. Expert Systems
financial distress prediction. Lecture Notes in Artificial Intelligence, 3885, with Applications, 35, 818–827.
274–282. Sun, J., & Li, H. (2008). Data mining method for listed companies’ financial distress
Jo, H., & Han, I. (1996). Integration of case-based forecasting, neural network, and prediction. Knowledge-Based Systems, 21(1), 1–5.
discriminant analysis for bankruptcy prediction. Expert Systems with Sun, J., & Li, H. (2009). Financial distress prediction based on serial combination of
Applications, 11(4), 415–422. multiple classifiers. Expert Systems with Applications, 36, 8659–8666.
Kim, E., Kim, W., & Lee, Y. (2002). Combination of multiple classifiers for the Sun, J., & Li, H. (2011). Dynamic financial distress prediction using instance selection
customer’s purchase behavior prediction. Decision Support Systems, 34, for the disposal of concept drift. Expert Systems with Applications, 38,
167–175. 2566–2576.
Kim, M.-J., Min, S.-H., & Han, I. (2006). An evolutionary approach to the combination Shin, K.-S., Lee, T. S., & Kim, H.-J. (2005). An application of support vector machines
of multiple classifiers to predict a stock price index. Expert Systems with in bankruptcy prediction model. Expert Systems with Applications, 28(1),
Applications, 37, 241–247. 127–135.
Kim, M.-J., & Kang, D.-K. (2010). Ensemble with neural networks for bankruptcy Tsai, C.-F., & Wu, J.-W. (2008). Using neural network ensembles for bankruptcy
prediction. Expert Systems with Applications, 31(4), 3373–3379. prediction and credit scoring. Expert Systems with Applications, 34, 2639–2649.
Li, H., & Sun, J. (2009). Majority voting combination of multiple case-based Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in
reasoning for financial distress prediction. Expert Systems with Applications, 36, bankruptcy prediction: General framework and cross-validation analysis.
4363–4373. European Journal of Operational Research, 116, 16–32.
Min, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine
with optimal choice of kernel function parameters. Expert Systems with
Applications, 28(4), 128–134.