Professional Documents
Culture Documents
DOI 10.1007/s10614-016-9581-4
1 Introduction
B Ligang Zhou
mrlgzhou@gmail.com
Kin Keung Lai
mskklai@cityu.edu.hk
1 School of Business, Macau University of Science and Technology, Taipa, Macau, China
2 Department of Industrial and Manufacturing Systems Engineering, The University of Hong
Kong, Pokfulam, Hong Kong, China
3 International Business School, Shaanxi Normal University, Xian, China
123
L. Zhou, K. K. Lai
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
The paper proceeds as follows. Section 2 provides a review of the literature on bank-
ruptcy prediction. Section 3 describes three different AdaBoost algorithms. Section 4
introduces the missing data imputation methods used here. Empirical study of the
models on real world datasets is presented in Sect. 5. Finally, Sect. 6 presents the
conclusion.
2 Literature Review
High prediction accuracy is the most important goal for bankruptcy prediction models.
In many applications, this goal is entirely appropriate (Dietrich 1984). Researchers
strive to improve the prediction accuracy of bankruptcy prediction models in two
streams. One stream is to find the more effective explanatory variables in terms of
financial or accounting knowledge. Beaver (1966) identified thirty ratios considered
to be important factors for predicting corporate bankruptcy and the empirical study
showed that “Cash flow/Total debt”, “Net Income/Total assets”, “Total debt/Total
assets” are the three most effective ratios, achieving more than 80 % prediction
accuracy in terms of one-year-ahead forecasting. Altman (1968) selected five ratios,
employed a multivariate discriminant analysis model, and tested the model on 33
pairs of bankruptcy/non-bankruptcy firms. The model could correctly classify 90 %
of the firms one year prior to failure. The five selected ratios are: “Working capi-
tal/Total assets”, “Retained earnings/Total assets”, “EBIT/Total assets”, “Market value
equity/Book value of total debt”, and “Sales/Total assets”. Ravi Kumar and Ravi (2007)
presented a comprehensive review of work on bankruptcy prediction between 1968
and 2005. They collected some 500 different ratios that had been used in 128 reviewed
papers. The top 30 variables with high frequency of usage are shown in Table 1.
One reason for the various set of variables in bankruptcy prediction model may be,
as pointed out by Beaver (1966): (a) not all ratios predict equally well; (b) the ratios
do not predict bankrupt and non-bankrupt firms with the same degree of success.
However, there is some degree of agreement on the predictive ability of some ratios
in terms of research in the fields of finance and accounting.
Another stream to improve predictive ability is to develop more powerful classifi-
cation models. Many techniques have been used for developing bankruptcy prediction
models, whose objective is to improve prediction accuracy. These techniques can be
categorized in three groups: (1) statistical techniques; (2) intelligent techniques; and (3)
hybrid and ensemble models. Statistical techniques employed for bankruptcy predic-
tion include: linear discriminant analysis, quadratic discriminant analysis, regression
analysis, naive Bayes classifier, and Bayes network, etc. (Sun and Shenoy 2007; Zhou
et al. 2012). Intelligent techniques applied in bankruptcy prediction include: neural
networks (Alfaro et al. 2008), self-organizing map (Zhu et al. 2007), decision trees
(Gepp et al. 2009), case-based reasoning (Park and Han 2002), evolutionary algorithms
(Varetto 1998), rough set (Sanchis et al. 2007), support vector machines (Härdle et al.
2009), etc. (Ravi Kumar and Ravi 2007) provide a comprehensive review of the appli-
cation of statistical and intelligent techniques to solve bankruptcy prediction problems,
from year 1968 to 2005.
Much recent research on the development of bankruptcy prediction models focuses
mainly on hybrid models and ensemble models. The main idea of hybrid models is
123
L. Zhou, K. K. Lai
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
ter than the GP/OLS model in bankruptcy prediction. Cho et al. (2010) proposed a
hybrid model for bankruptcy prediction with a combination of variables selected by
using a decision tree and case-based reasoning using the Mahalanobis distance with
variable weights. The experimental result indicates that the proposed hybrid model
outperforms some currently-in-use techniques. Chen et al. (2009) proposed a hybrid
neuro fuzzy approach which combines the functionality of fuzzy logic and the learning
ability of neural networks. The empirical results show that the neuro fuzzy approach
has a better accuracy rate than logit regression. Chaudhuri and De (2011) introduced
fuzzy support vector machines (FSVM) to solve bankruptcy prediction problem and
they demonstrated the efficiency of the FSVM. Other models based on support vector
machines (SVM) mainly focus on a combination of searching techniques for optimiz-
ing parameters or input features sets and powerful classification capability from SVM.
Ahn et al. (2006) used genetic algorithms (GAs) to optimize the parameters in SVM
kernel functions and features subsets for bankruptcy prediction. Zhou et al. (2008)
introduced a direct search method to optimize parameters in SVM model.
The ensemble model is somewhat different from a hybrid model. The main idea
of ensemble models is to combine a set of base models, each of which is simple
and weak, in order to obtain a more powerful model with more accurate and reliable
classification accuracy than that can be obtained from a single model. AdaBoost is
a widely used ensemble algorithm that can be employed in conjunction with many
other types of learning methods for base learners to improve their performance. West
et al. (2005) employed the multilayer perceptron neural network as a base learner
and investigate three ensemble strategies: crossvalidation, bagging, and boosting. The
neural network ensemble is found to be superior to the single model for three real world
financial decision applications. Alfaro et al. (2008) provided an empirical comparison
of AdaBoost and neural network for bankruptcy prediction. The prediction accuracy
of both techniques on a set of European firms shows that the proposed AdaBoost
approach can decrease the generalization error by about thirty percent compared to
the error produced with a neural network.
In the development of bankruptcy prediction models, few researchers have
addressed the issue of missing feature values and most of them have simply deleted the
observations with missing feature values and consider those observations with com-
plete predictor values (Cheng et al. 2010; Hwang et al. 2007). However, Shumway
(2001) pointed out that a complete set of explanatory variables is not always observ-
able for each firm year, and he substituted variable values from past years for missing
values in some cases. Chava and Jarrow also handle (Chava and Jarrow 2004) missing
accounting and market data by substituting the previous available observations. Little
existing research discusses the performance of bankruptcy prediction models using
observations with missing values and the effect of different missing values imputation
methods.
3 AdaBoost Algorithms
123
L. Zhou, K. K. Lai
many base learners into one high-quality ensemble predictors, such as uniform voting,
distribution summation, Bayesian combination, etc. The elements for construction of
AdaBoost include: input features and responses, ensemble methods, base learners and
number of base learners in an ensemble. Different combinations of these elements build
different AdaBoost models. In the present study, three different AdaBoost models are
employed.
For the bankruptcy prediction problem, let S = {x n , yn }n=1
N denote the training data
set, where input data x n ∈ R (m is the number of features) and its corresponding
m
3.1 AdaBoost.M1
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N ,
n = 1, 2, . . . N , t = 1;
Step 2: for t = 1 to T
2.1 build classifier Ct using base learner and distribution Dt
2.2 compute the weighted error εt from model Ct on S as Eq. (1):
N
εt = Dt (n) × errt (x n ) (1)
n=1
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
⎧
⎪ Dt (n)
⎪
⎨ if Ft (x n ) = yn
2(1 − εt )
Dt+1 (n) = , n = 1, 2, . . . , N . (3)
⎪
⎪ D (n)
⎩ t otherwise
2εt
end if
2.4 normalize Dt+1 to be a proper distribution
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (4)
N
Dt+1 (n)
n=1
2.5 t = t + 1
end for
Following the above method, a set of base learners Ct which actually defines a set
of functions {Ft |t = 1, 2, . . . , T } can be obtained and the final decision from this set
of functions (AdaBoost models) is defined as (5):
⎛ ⎞
1⎠
ŷ(x) = argmax ⎝ log (5)
y∈{+1,−1} βt
t:Ft (x)=y
where εt
βt = (6)
1 − εt
If decision tree is selected as the base learner, the AdaBoost.M1 model is denoted
by ABM1.DT.
AdaBoosting neural networks (AdNN) are to use a neural network as the weak clas-
sifier instead of a decision tree in the traditional AdaBoost models and is expected
to provide more accurate generalization than a single model. Schwenk and Bengio
(2000) reported that AdNN is significantly better than boosted decision trees in terms
of accuracy on a data set of online handwritten digits reorganization.
There are two ways to deal with the weighted instances. One is to sample with
replacement (SWR) in terms of weight distribution, then samples with greater weight
may be sampled several times while those with less weight may not occur in the
123
L. Zhou, K. K. Lai
training sample sets at all. Another way is to train the base learner with respect to a
weighted cost function (WCF) which assign a larger weight to the incorrectly classified
instances. The AdaBoosting neural networks with SWR is denoted by ABNN.SWR,
and that with WCF is denoted by ABNN.WCF.
The detail of ABNN.SWR algorithm is described as follows:
Input:
S, a set of samples for training with size N ;
T , the number of rounds to construct the AdaBoosting model;
C, base learner neural network;
Output: AdaBoost ABNN.SWR model
Algorithm: ABNN.SWR
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N ,
n = 1, 2, . . . , N ;
Step 2: for t = 1 to T
repeat
2.1 sample S with replacement according to Dt to obtain St ;
2.2 train neural network with St to obtain classifier model Ct ;
2.3 compute the weighted error εt from model Ct on S as (7):
N
εt = Dt (n) × errt (x n ) (7)
n=1
1 i f Ft (x n ) = yn
errt (x n ) =
0 other wise
Dt (n) × βt if Ft (x n ) = yn
Dt+1 (n) = , n = 1, 2, . . . , N . (8)
Dt (n) otherwise
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (9)
N
Dt+1 (n)
n=1
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Pt b
Input 2
Input 3
Pt n
Input 4
where (9) is to normalize the weight such that Dt+1 is a probability distribution
function, βt is obtained by formula (10):
εt
βt = (10)
1 − εt
end if
end for
Following the above steps, a set of neural network classifiers Ct which actually
defines a set of functions {Ft |t = 1, 2, . . . , T } and the final decision from this set of
functions is defined as (5).
From Formula (8), it can be observed that the weights of correctly classified samples
decrease while the weights of misclassified sample increase after normalization. The
structure of the neural network is shown in Fig. 1: two nodes in the output layer which
indicate the probability that the company is non-bankrupt and bankrupt, denoted by
Ptn , Ptb respectively. For a non-bankrupt company, the completely correct output of
the neural network should be [1 0] and for a bankrupt company, it should be [0 1]
indicating that the probability for it to be non-bankrupt is 0 and to be bankrupt is 1.
The output function F(x) can be defined as (11).
+1 if Ptb > Ptn
Ft (x) = (11)
−1 otherwise
In the ABNN.SWR algorithm, the training data set for each neural network is
obtained by resampling with replacement based on the weight distribution function Dt
from S. Another way is to combine Dt with the cost function used in network training
function which guide the direction of optimization of weights in a neural network.
This AdaBoost neural network algorithm ABNN.WCF is described as follows (Zhou
and Lai 2009):
123
L. Zhou, K. K. Lai
Input:
S, a set of samples for training with size N ;
T , the number of rounds to construct the AdaBoosting model;
C, base learner neural network;
Output: AdaBoost ABNN.WCF model
Algorithm: ABNN.WCF
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N , n =
1, 2, . . . , N ;
Step 2: for t = 1 to T
repeat
2.1 Training neural network with Levenberg-Marquardt back propagation on S with
respect to weight distribution Dt and to obtain model Ct which corresponds to the
function Ft .
2.2 compute the weighted error εt from model Ct on S as Formula (12):
1
N
εt = Dt (n) × 1 − Ptn (x n ) − Ptb (x n ) × yn (12)
2
n=1
When yn =1, the completely correct output of NN should be [1 0], the actual output
of NN is [ Ptn Ptb ], thus the error is the sum of error from the two output neuron, i.e.
(1 − Ptn ) + Ptb , the same goes for yn = −1.
2.3 if εt > 0.5, then set Dt (n) = 1/N , n = 1, 2, . . . , N ;
until εt < 0.5
if εt = 0, then set T = t, break;
else
update the weight function Dt+1 (n) by Formula (13) and (14):
1
1+ Ptn (x n )−Ptb (x n ) ×yn
Dt+1 (n) = Dt (n) × βt2 , n = 1, 2, . . . , N . (13)
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (14)
N
Dt+1 (n)
n=1
where (14) is to normalize the weight such that Dt+1 is a probability distribution
function, βt is obtained by formula (15):
εt
βt = (15)
1 − εt
end if
end for
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
The final decision function is the same as shown in Formula (5). Since 0 < εt < 0.5,
so 0 < βt < 1, the weights of the correctly classified samples are reduced by βt , and
weights of those misclassified samples will increase by normalization.
Missing data arise in a wide variety of statistical analyses, especially in the analysis of
financial time series, market research and social science research. Many efforts have
been made to handle missing data (Maimon and Rokach 2005; Little and Rubin 2002).
In this study, three common imputation methods are employed.
The k-nearest neighbors average method imputes the missing feature of an instance
with the average of the corresponding feature of its k-most nearest neighbors of this
instance in the training samples set. Suppose feature i of observation n is missing,
then the features vector of observation is as follows:
Step 1: Select all observations without missing feature i and having no missing
value in the features in which observation n have in the training samples set;
Step 2: Calculate Euclidean distance between each observation j in the selected
samples with following formula:
m
d(n, j) = (xn,l − x j,l )2
l=1,l=i
1
k
xn,i = x j,i
k
j=1
123
L. Zhou, K. K. Lai
The global closest fit method replaces a missing feature value by the known value in
another observation that resembles as much as possible the instance with the missing
attribute values (Grzymala-Busse et al. 2002). The closest fit case is the instance
which has the minimum distance to the instance with the missing values. The distance
function for two instances n, j with feature vectors x n and x j respectively is computed
as follows (Maimon and Rokach 2005):
m
d(n, j) = d(xn,i , x j,i )
i=1
where
⎧
⎪
⎪ 0 if xn,i = x j,i
⎪
⎪
⎨1 if xn,i and x j,i are symbolic and xn,i = x j,i or
d(xn,i , x j,i ) = xn,i , x j,i is unknown
⎪
⎪
⎪
⎩ |xn,i − x j,i | if x and x are numerical and x = x
⎪
n,i j,i n,i j,i
r
where r is the range of the numerical feature regardless of the missing value.
5 Empirical Study
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
120
Bankruptcy By Year
Number of Bankruptcies
100
80
60
40
20
0
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009
Year
30
Non-Bankrupt
Bankrupt
25
Number of instances with
20
missing value
15
10
0
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009
Year
in this study from Table 1 are R1–R7, R10, R11 and R14, which include the five
important features found by Altman (1968), i.e. R3–R6, R7.
Figure 2 gives the number of bankruptcies by observed year over the sample period.
As shown in Fig. 2, the number of bankruptcies is cyclical. with the largest number
of bankruptcies occurring in the late 1980s and early 1990s and in 1997 and 1998.
Although the sub-prime crisis happened in USA in 2007, since most bankrupt firms
were in financial industries and some firms were officially declared as bankrupt after
2009, there are only few bankrupt firms after 2007 in this data set. Finally, there are a
total of 1168 bankrupt companies and 1168 non-bankrupt companies in this USABDS
and the number of bankrupt and non-bankrupt companies is the same in each year.
Figure 3 shows the number of bankrupt or non-bankrupt firms with missing values
on the selected features by observed year over the sample period in USABDS. There
were a total of 183 (15.7 %) non-bankrupt firms with missing values and 244 (20.9 %)
bankrupt firms with missing values. As seen, the number of companies containing
missing values was decreasing over the period, perhaps because corporate governance
had been improved and the corporate finance had become more standardized. How-
123
L. Zhou, K. K. Lai
ever, irregular management of finance and accounting is very common in small and
medium enterprises (SMEs) in emerging countries. It may be common to face missing
values problems in predicting financial bankruptcy of SMEs as supplier or partners or
borrowers.
Four common performance measures as follows are selected to evaluate the perfor-
mance of the models:
TP
1. Sensitivity (Sen) =
TP + FN
TN
2. Specificity (Spe) =
TN + FP
TP + TN
3. Accuracy (Acc) = where TP: positive classified as positive,
TP + FN + TN + FP
FN: positive classified as negative, TN: negative classified as negative, FP: negative
classified as positive.
4. Area under ROC curve (AUC): ROC graphs are two-dimensional graph in which
Sensitivity is plotted on the Y axis and 1−Specificity is plotted on X axis. An
ROC graph depicts relative trade-off between benefits (true positives) and costs
(false positives), which is useful for organizing classifiers and visualizing their
performance especially in the domains with skewed class distributions and unequal
classification error costs. The AUC of a classifier is equivalent to the probability
that the classifier will rank a randomly chosen positive instance higher than a
randomly chosen negative instance (Fawcett 2003).
Each observation includes a company’s financial status in year t and its financial
ratios in year t − 1. Rolling time windows is used to test the performance of models
and imputation methods. For example, to test the model performance for observations
in year 2005, the test samples set consists of all observations in 2005, all observations
in year 2004 and before 2004 constitute the training samples set. There were a total
of 143 bankrupt firms and 143 non-bankrupt firms, 24 (16.78 %) bankrupt firms and
14 (9.79 %) non-bankrupt firms with missing values in the test sets from year 2001 to
2009. In this experiment, ABM1.DT selects decision tree as the base learner. The size
of decision tree in ABM1.DT is controlled by three parameters: (1) MaxNumSplits—
maximal number of decision splits; (2) MinLeafSize—minimum observations per leaf;
(3) MinParentSize—minimum observations per branch node. These three parameters
are set to the default values in Matlab as 1, 5, 10, respectively. ABNN.WCF and
ABNN.SWR use simple multilayer perceptron neural network. The neural network
base learner has four nodes in the hidden layer and is implemented by the neural
network toolbox in Matlab. The number of rounds T is set to 100. The k is set to 10
in the knnA imputation method.
Since ABM1.DT and the decision tree (DT) can handle missing values directly,
to see the effect of the imputation method, the performance of these two models
on USABDS with different imputation methods (include no imputation) is shown in
Table 2. Each figure is the weighted average in term of the number of test samples of
different classes in each test year. In Table 2, Non-Imp denotes there is no imputation
methods employed. Row “Test samples with MV” lists the average performance on
the 38 observations with missing values. The observations with missing values in each
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
of the test years are no more than ten and it is almost impossible to obtain a smooth
ROC curve and thus the AUC becomes unreliable, therefore, the AUC performance is
not presented for test samples with MV.
In Table 2, the best performance for each measure is marked in bold. It is interesting
to observe that when imputation is employed on the observations with missing values,
the performance of Acc from ABM1.DT and Decision Tree on the test samples with
missing values increases considerably, i.e. the imputation can help these two models to
classify bankrupt firms with missing values at a higher accuracy. The best classification
accuracy on test samples with MV can achieve 89.47 % by ABM1.DT with AM and
at the same time the classification accuracy on all test samples can achieve 72.73 %.
From Table 2, it can be observed that the three imputation methods can greatly
improve the predictive accuracy in comparison to no imputation. Table 3 shows the
results of the Wilcoxon Signed Rank Test on Acc from the decision tree on test samples
with missing values between each pair of imputation methods. It demonstrates that
the performance of knnA imputation methods is significantly greater than that of the
non-imputation method; however, there is no statistically significant difference among
knnA and other two imputation methods under significance level α = 0.05.
The number of correctly classified non-bankrupt and bankrupt samples with MV in
each year by ABM1.DT under Non-Imp and AM is shown in Figs. 4 and 5, respectively.
Regarding the samples with missing values in the test set, there is no observation in
year 2007 and no non-bankrupt observations from year 2004 to 2008. There are a total
of 14 non-bankrupt observations with MV and 24 bankrupt observations with MV.
123
L. Zhou, K. K. Lai
Hits by Non-Imp
6 Failure by Non-Imp
Hits by AM
Number of non-bankrupt observations
Failure by AM
5
0
2001 2002 2003 2004 2005 2006 2008 2009
observed years
Fig. 4 Number of non-bankrupt observations with missing values hit or failed by ABM1.DT with Non-Imp
and AM on USABDS
ABM1.DT with both Non-Imp and AM can correctly predict 23 bankrupt observations
and this number explains their Spe of 0.9583 (23/24) in Table 2.
To show the performance of AdaBoost models compared to other methods, Table 4
lists the average performance of AdaBoost models and four other widely used methods
on all test samples, such as a decision tree, linear regression (LR), logistic regression
(LOGR), and neural network (NN). The neural network here is a multilayer network
with Backpropagation training techniques from Matlab neural network toolbox. It
can be observed that AdaBoost models always achieve the best Acc and AUC per-
formance except that NN achieves a slightly greater ACC than ABNN.SWR under
GCM, although which one is best shifts among the three AdaBoost Models in differ-
ent imputation methods. Table 5 lists the average performance of all these methods
on the test samples with missing values. The best performance for each measure is
marked in bold in Tables 4 and 5.
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Hits by Non-Imp
5 Failure by Non-Imp
Hits by AM
Failure by AM
Number of bankrupt observations
0
2001 2002 2003 2004 2005 2006 2008 2009
observed years
Fig. 5 Number of bankrupt observations with missing values hit or failed by ABM1.DT with Non-Imp
and AM on USABDS
To make a statistical comparison among the classifiers, the data from three different
imputation methods on test samples from observed years 2001 to 2009 are regarded
as different data groups. There is no test sample in year 2007. For each group, the
performance of the seven classifiers is ranked. The Friedman test is conducted with
rows representing blocks and classifiers representing treatments in order to make a
comparison among these classifiers. The p-value of Friedman test on AUC in Table 4
and Acc in Table 5 among the seven classifiers is 5.30 ×10−14 and 0.0702, respec-
tively. Given the significance level α = 0.10, the null hypothesis “all the classifiers
are equivalent” is rejected. The Nemenyi test is then used to compare all classifiers
with each other. The performance of two classifiers is significantly different, if the
corresponding average ranks differ by at least the criteria difference
k(k + 1)
CD = qα
6N
where qα is the critical value for two-tailed Nemenyi test with significance level α;
k is the number of models;
N is the number of data sets.
The average ranks (AR) of AUC performance on all test samples from all seven
models are shown in Table 6. ABNN.SWR has the best AUC performance with average
rank value of 2.56. If the significance level α is set at 0.10, qα = 2.693, N = 3 × 8 =
24, CD is 1.68. Any model with average rank less than 2.56 + 1.68 = 4.24 has no
123
L. Zhou, K. K. Lai
Models AM
Sen Spe Acc AUC
Models GCM
Sen Spe Acc AUC
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Models knnA AM
Sen Spe Acc Sen Spe Acc
Table 6 Comparison of average rank on AUC among seven models on all test samples from USABDS
Table 7 Comparison of average rank on Acc among seven models on test samples with missing values
from USABDS
companies’ financial statements. Only non-financial firms are included and bankruptcy
prediction is based on the financial data released just one year before the year in
question. JPNBDS include samples with observed financial status (Non-Bankrupt or
Bankrupt) from 1995 to 2009.
Figure 6 gives the number of bankruptcies by observed year over the sample period.
There are a total of 76 bankrupt observations and 76 non-bankrupt observations in the
JPNBDS and the number of bankrupt and non-bankrupt observations is the same each
year.
123
L. Zhou, K. K. Lai
18 Bankruptcy By Year
16
Number of Bankruptcies
14
12
10
8
6
4
2
0
1995 1997 1999 2001 2003 2005 2007 2009
Year
6
Non-Bankrupt
Bankrupt
Number of instances with
5
missing value
0
1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2007 2009
Year
JPNBDS has the same input features as USABDS. Figure 7 shows the number of
bankrupt or non-bankrupt firms with missing values by observed year over the sample
period in JPNBDS. There were a total of 11 (14.47 %) non-bankrupt firms with missing
values and 7 (9.21 %) bankrupt firms with missing values.
The models setting for JPNBDS is the same as that for USABDS. The test years
are from 2001 to 2009. There were a total of 42 non-bankrupt firms and 42 bankrupt
firms in the test samples, and 11 of the 42 non-bankrupt observations and 7 of the 42
bankrupt observations had missing values.
Table 8 shows the average performance of different imputation methods with
ABM1.DT and Decision Tree. The best performance of each measure is marked in
bold. It can be observed that for either ABM1.DT or DT model, the performance
from different imputation methods is almost the same. It may be due to almost all 18
observations with missing values just missing the same group of features and these
features contributing little in the classification function implied by the models. For
JPNBDS, ABM1.DT with knnA or AM can achieve the Acc of 83.3 % on test sam-
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Imputation
Samples Models Sen Spe Acc AUC
methods
All test samples ABM1.DT Non-Imp 0.8814 0.7627 0.8220 0.9113
knnA 0.9153 0.7627 0.8390 0.9168
AM 0.9153 0.7627 0.8390 0.9168
GCM 0.8814 0.7627 0.8220 0.9099
Decision tree Non-Imp 0.8475 0.8305 0.8390 0.8115
knnA 0.8644 0.7966 0.8305 0.8165
AM 0.8644 0.7966 0.8305 0.8165
GCM 0.8644 0.7966 0.8305 0.8165
Test samples with MV ABM1.DT Non-Imp 0.8182 0.7143 0.7778 –
knnA 1.0000 0.5714 0.8333 –
AM 1.0000 0.5714 0.8333 –
GCM 0.8182 0.7143 0.7778 –
Decision tree Non-Imp 0.8182 0.7143 0.7778 –
knnA 0.8182 0.7143 0.7778 –
AM 0.8182 0.7143 0.7778 –
GCM 0.8182 0.7143 0.7778 –
ples with missing values while achieving Acc of 83.9 % on all test samples. Table 9
shows the results of the Wilcoxon Signed Rank Test on Acc from ABM1.DT on all
test samples between each pair of imputation methods. It shows that Acc performance
from knnA and AM both are significantly greater than non-imputation by ABM1.DT
under significance level α = 0.10. However, for DT, the Acc performance of three
imputation methods have no significant difference from non-imputation in terms of
Wilcoxon Signed-Rank Test.
The number of correctly classified observations with missing values in test set of
JPNBDS by ABM1.DT under Non-Imp and AM is shown in Figure 8. There are a total
of 18 observations with MV. ABM1.DT with both Non-Imp and AM only fails four
and three times, respectively, which explains their Acc of 0.7778 (14/18) and 0.8333
(15/18) in Table 8.
Table 10 lists the average performance of all seven methods on JPNBDS. ABM1.DT
achieves the best AUC performance with various imputation methods. Table 11 lists
the average performance of all these methods on the test samples with missing values.
123
L. Zhou, K. K. Lai
Hits by Non-Imp
8
Failure by Non-Imp
Hits by AM
7
Failure by AM
Number of observations
0
2001 2002 2003 2004 2005
observed years
Fig. 8 Number of observations with missing values hit or failed by ABM1.DT with Non-Imp and AM on
test set of JPNBDS
The best performance of each measure is marked in bold in Tables 10 and 11. Since
there are only 18 observations with missing values in the test samples, one more
observation correctly classified indicates an increase of performance of more than
5 %. For JPNBDS, all models have better performance on Sen than Spe, which means
that the models do better on non-bankrupt companies than on bankrupt companies.
The average ranks of AUC performance on all test samples from all seven models
are shown in Table 12. ABM1.DT has the best AUC performance. The Friedman test
was conducted with rows representing blocks and classifiers representing treatments.
The p-values of Friedman test on AUC in Table 10 and Acc in Table 11 among the
seven classifiers are 2.42 × 10−9 and 0.1769, respectively. Given the significance level
α = 0.10, there is significant difference among the seven models on AUC from all test
samples but no difference on Acc from test samples with missing values. Therefore,
Nemenyi test can be used to compare each pair of models listed in Table 12. If α = 0.10,
N = 3 × 5 = 15, then qα = 2.693, C D = 2.12. ABM1.DT has the top performance
on AUC with A R = 1.38. Any model with average rank less than 1.38 + 2.12 = 3.50
has no significant difference from the ABM1.DT. Therefore, among other six models,
only LR achieving A R = 3.33 has no significant difference on the AUC performance
from ABM1.DT.
The average ranks of Acc performance on test samples with missing values from
all seven models are shown in Table 13. Although these models have different average
ranks, Friedman test shows the difference is not statistically significant.
From the experimental results, it can be observed that ABM1.DT can achieve the
highest level of performance of AUC and Acc on both data sets with large differences
in sample size. For all the test years in both data sets, the average ranks of ABM1.DT
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Models AM
Sen Spe Acc AUC
Models GCM
Sen Spe Acc AUC
ABM1.DT 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 0.8182 0.7143 0.7778
DT 0.8182 0.7143 0.7778 0.8182 0.7143 0.7778 0.8182 0.7143 0.7778
ABNN.WCF 1.0000 0.8571 0.9444 1.0000 0.5714 0.8333 1.0000 0.2857 0.7222
ABNN.SWR 1.0000 0.5714 0.8333 1.0000 0.1429 0.6667 1.0000 0.0000 0.6111
LR 0.9091 0.7143 0.8333 0.9091 0.7143 0.8333 0.9091 0.7143 0.8333
LOGR 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333
NN 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 0.7273 0.5714 0.6667
123
L. Zhou, K. K. Lai
Table 12 Comparison of average rank on AUC among 7 models on all test samples from JPNBDS
Table 13 Comparison of average rank on Acc among 7 models on test samples with missing values from
JPNBDS
among seven models are at the top or very close to the top. For ABM1.DT on both
data sets, knnA and AM always significantly outperform non-imputation.
6 Conclusion
This paper has investigated three AdaBoost models combining with imputation meth-
ods for bankruptcy prediction with missing data. Each AdaBoost model is tested on
two data sets (USABDS and JPNBDS) with large differences in sample size. The
ABM1.DT algorithm with knnA or AM imputation method can keep a constant good
performance on observations with or without missing values in both data sets. The
experimental results show that the ABM1.DT algorithm is robust in bankruptcy pre-
diction for companies with or without missing financial data even in the case where
there are only a few training samples, and the accuracy performance of preprocessing
methods is better than that of parallel methods implemented by ABM1.DT and deci-
sion tree. This study has selected samples in the same way as that in many other papers.
It has kept the same number of non-bankrupt companies and bankrupt companies in
the data set, even if there were many non-bankrupt companies in the original data set.
If the non-bankrupt companies are not chosen as the training or test samples, it is a
waste for the data resource in the bankruptcy prediction. Can the non-bankrupt com-
panies excluded from the samples set help to improve prediction performance? How
to take advantage of the whole data set in bankruptcy prediction. These two problems
need further research.
Acknowledgments Thank the anonymous reviewer for the very helpful and valuable suggestions and
comments to improve this article. This work is partially supported by the Faculty Research Grants of
Macau University of Science and Technology (No. 0497) and the National Natural Science Foundation of
China (NSFC No. 71433001).
References
Ahn, H., Lee, K., & Kim, K. J. (2006). Global optimization of support vector machines using genetic
algorithms for bankruptcy prediction. Lecture Notes in Computer Sciences, 4234, 420–429.
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison
of adaboost and neural networks. Decision Support Systems, 45, 110–122.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
Journal of Finance, 23, 589–609.
Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111.
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., & Kegl, B. (2006). Aggregate features and adaboost for
music classification. Machine Learning, 65, 473–484.
Chaudhuri, A., & De, K. (2011). Fuzzy support vector machine for bankruptcy prediction. Applied Soft
Computing, 11, 2472–2486.
Chava, S., & Jarrow, R. A. (2004). Bankruptcy prediction with industry effects. Review of Finance, 8,
537–569.
Chen, H.-J., Huang, S. Y., & Lin, C.-S. (2009). Alternative diagnosis of corporate bankruptcy: A neuro
fuzzy approach. Expert Systems With Applications, 36, 7710–7720.
Cheng, K. F., Chu, C. K., & Hwang, R. C. (2010). Predicting bankruptcy using the discrete-time semipara-
metric hazard model. Quantitative Finance, 10, 1055–1066.
Cho, S., Hong, H., & Ha, B.-C. (2010). A hybrid approach based on the combination of variable selec-
tion using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy
prediction. Expert Systems With Applications, 37, 3482–3488.
Dietrich, J. R. (1984). Discussion of methodological issues related to the estimation of financial distress
prediction models. Journal of Accounting Research, 22, 83–86.
Divsalar, M., Javid, M. R., Gandomi, A. H., Soofi, J. B., & Mahmood, M. V. (2011). Hybrid genetic
programming-based search algorithms for enterprise bankruptcy prediction. Applied Artificial Intelli-
gence, 25, 669–692.
Fawcett, T. (January 2003). Roc graphs: Notes and practical considerations for data mining researchers, HP
Laboratories Report.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine learning:
Proceedings of the 13th international conference (pp. 325–332). San Francisco: Morgan Kaufmann.
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application
to boosting. Journal of Computer and System Sciences, 55, 119–139.
Gepp, A., Kumar, K., & Bhattacharya, S. (2009). Business failure prediction using decision trees. Journal
of Forecasting, 29, 536–555.
Grzymala-Busse, J., Grzymala-Busse, W., & Goodwin, L. (2002). A comparison of three closest fit
approaches to missing attribute values in preterm birth data. International Journal of Intelligent Sys-
tems, 17, 125–134.
Grzymala-Busse, J. W. (2004). Data with missing attribute values: Generalization of indiscernibility relation
and rule induction. In J. Peters, J. Grzymala-Busse, B. Kostek, R. Swiniarski, & M. Szczuka (Eds.),
Transactions on rough sets I (pp. 78–95). New York: Springer.
Grzymala-Busse, J. W. (2004). Rough set approach to incomplete data. In L. Rutkowski, J. Siekmann,
R. Tadeusiewicz, & L. Zadeh (Eds.), Artificial intelligence and soft computing-ICAISC 2004 (pp.
50–55). New York: Springer.
Han, J., Kamber, M., & Pei, J. (2006). Data mining: Concepts and techniques (2nd ed.). San Francisco:
Morgan Kaufmann.
Härdle, W., Lee, Y. J., Schäfer, D., & Yeh, Y. R. (2009). Variable selection and oversampling in the use of
smooth support vector machines for predicting the default risk of companies. Journal of Forecasting,
28, 512–534.
Hwang, R.-C., Cheng, K. F., & Lee, J. C. (2007). A semiparametric method for predicting bankruptcy.
Journal of Forecasting, 26, 317–342.
Kawakita, M., Minami, M., Eguchi, S., & Lennert-Cody, C. E. (2005). An introduction to the predictive
technique adaboost with a comparison to generalized additive models. Fisheries Research, 76, 328–
343.
Lakshminarayan, K., Harp, S. A., & Samad, T. (1999). Imputation of missing data in industrial databases.
Applied Intelligence, 11, 259–275.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Chichester: Wiley.
Maimon, O. Z., & Rokach, L. (2005). Data mining and knowledge discovery handbook. New York: Springer.
Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook. New York: Springer.
Ochs, R. A., Goldin, J. G., Abtin, F., Kim, H. J., Brown, K., Batra, P., et al. (2007). Automated classification
of lung bronchovascular anatomy in CT using AdaBoost. Medical Image Analysis, 11, 315–324.
123
L. Zhou, K. K. Lai
Park, C. S., & Han, I. (2002). A case-based reasoning with the feature weights derived by analytic hierarchy
process for bankruptcy prediction. Expert Systems with Applications, 23, 255–264.
Quinlan, J. R. (1993). C4. 5: programs for machine learning. San Francisco: Morgan Kaufmann.
Ravi Kumar, P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent
techniques—a review. European Journal of Operational Research, 180, 1–28.
Sanchis, A., Segovia, M. J., Gil, J. A., Heras, A., & Vilar, J. L. (2007). Rough sets and the role of the monetary
policy in financial stability (macroeconomic problem) and the prediction of insolvency in insurance
sector (microeconomic problem). European Journal of Operational Research, 181, 1554–1573.
Schwenk, H., & Bengio, Y. (2000). Boosting neural networks. Neural Computation, 12, 1869–1887.
Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. Journal of Business,
74, 101–124.
Sun, L., & Shenoy, P. (2007). Using bayesian networks for bankruptcy prediction: Some methodological
issues. European Journal of Operational Research, 180, 738–738.
Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking
and Finance, 22, 1421–1439.
West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision appli-
cations. Computers and Operations Research, 32, 2543–2559.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top 10 algorithms in
data mining. Knowledge and Information Systems, 14, 1–37.
Zhou, L., & Lai, K. K. (2009). Adaboosting neural networks for credit scoring. In The 6th international
symposium on neural networks (ISNN 2009) (pp. 875–884). New York: Springer.
Zhou, L., Lai, K. K., & Yu, L. (2008). Credit scoring using support vector machines with direct search for
parameters selection. Soft Computing, 13, 149–155.
Zhou, L., Lai, K. K., & Yen, J. (2012). Empirical models based on features ranking techniques for corporate
financial distress prediction. Computers & Mathematics with Applications, 64, 2484–2496.
Zhu, Z., He, H., Starzyk, J. A., & Tseng, C. (2007). Self-organizing learning array and its application to
economic and financial problems. Information Sciences, 177, 1180–1192.
123