AdaBoost Models For Corporate Bankruptcy Prediction With Missing Data

Comput Econ
DOI 10.1007/s10614-016-9581-4
AdaBoost Models for Corporate Bankruptcy Prediction

with Missing Data
Ligang Zhou1 · Kin Keung Lai2,3
Accepted: 29 March 2016

© Springer Science+Business Media New York 2016
Abstract Very little existing research in corporate bankruptcy prediction discusses

modeling where there are missing values. This paper investigates AdaBoost models for
corporate bankruptcy prediction with missing data. Three AdaBoost models integrated
with different imputation methods are tested on two data sets with very different
sample sizes. The experimental results show that the AdaBoost algorithm combined
with imputation methods has strong predictive accuracy in both data sets and it is a
useful alternative for bankruptcy prediction with missing data.
Keywords AdaBoost algorithms · Bankruptcy prediction · Missing data
1 Introduction
In today’s strongly competitive business environment, many companies are involved

in various credit transactions with some trading partners at different times, such as
procurement, credit sales. Consequently, bankruptcy prediction of trading partners is
very important for managers in business decision making. Investors also need to assess
the bankruptcy risk of a firm before making an investment, in order to avoid suffering
a big loss.
B Ligang Zhou
mrlgzhou@gmail.com
Kin Keung Lai
mskklai@cityu.edu.hk
1 School of Business, Macau University of Science and Technology, Taipa, Macau, China
2 Department of Industrial and Manufacturing Systems Engineering, The University of Hong
Kong, Pokfulam, Hong Kong, China
3 International Business School, Shaanxi Normal University, Xian, China
123
L. Zhou, K. K. Lai
Given the practical significance of bankruptcy prediction, it has attracted much

attention from researchers and practitioners. Many researchers have explored a wide
variety of techniques for developing bankruptcy prediction models, which use the
explainable variables derived from a company’s financial statements or market infor-
mation to predict the company’s future financial status: non-bankrupt or bankrupt.
Bankruptcy prediction can be taken as a typical binary classification problem.
Although much existing research has demonstrated the efficacy of various binary
classification models for addressing bankruptcy prediction problems, almost all of
them have assumed the availability of all necessary input features, and, in empirical
studies, observations with missing data are usually removed or missing values are
simply imputed from the last available item according to its time series. In practice,
listed companies have strict and highly qualified finance and accounting management
systems, such that missing data in financial statements seldom happen. However, some
small and medium non-listed enterprises lack mature financial management systems
and sufficient financial professionals, so that their financial statements are not always
complete and thus contain some missing values. No matter whether missing data exist
or not, the bankruptcy prediction models need to be sufficiently robust to handle both
of these two cases.
In general, two different types of methods are always employed to deal with missing
feature values: preprocessing methods and parallel methods (Maimon and Rokach
2005). Preprocessing methods include techniques based on replacing a missing feature
value by the most common value of that feature or by the mean of numerical feature, or
the corresponding value taken from the closest fit case, etc. Parallel methods take into
account the missing feature values during the main process of acquiring knowledge,
represented by a modification of the learning algorithms. For example, a decision tree
model C4.5 (Quinlan 1993) induces a decision tree during tree generation, splitting
observations with missing feature values into fractions and adding these fractions to
new observational subsets.
After the missing values in an observation have been handled by the preprocess-
ing method, any binary classification model can be used to analyze the observations.
However, the parallel methods are only suitable for some rule induction based clas-
sification models, such as decision trees and learning from examples module (LEM),
etc. (Grzymala-Busse 2004). On the basis of existing research which compares these
two different methods for handling missing feature values (Grzymala-Busse 2004;
Lakshminarayan et al. 1999), there is no universal and best method. Thus, the most
suitable method of handing missing feature values should be chosen individually for
every specific data set (Maimon and Rokach 2005).
AdaBoost algorithm, proposed by Freund and Schapire (1996), is one of the most
important ensemble methods, since it has a solid theoretical foundation, highly accu-
rate predictive accuracy, and wide and successful applications (Alfaro et al. 2008;
Kawakita et al. 2005; Bergstra et al. 2006; Ochs et al. 2007; Wu et al. 2008). The main
purposes of this paper are: (1) to investigate the performance of AdaBoost models with
different base learners in bankruptcy prediction with missing feature values; (2) and
to compare preprocessing methods with parallel methods in handling missing feature
values; (3) and to show how the performance of AdaBoost and some other common
models is affected by the missing feature values.
123
AdaBoost Models for Corporate Bankruptcy Prediction with...
The paper proceeds as follows. Section 2 provides a review of the literature on bank-
ruptcy prediction. Section 3 describes three different AdaBoost algorithms. Section 4
introduces the missing data imputation methods used here. Empirical study of the
models on real world datasets is presented in Sect. 5. Finally, Sect. 6 presents the
conclusion.
2 Literature Review
High prediction accuracy is the most important goal for bankruptcy prediction models.
In many applications, this goal is entirely appropriate (Dietrich 1984). Researchers
strive to improve the prediction accuracy of bankruptcy prediction models in two
streams. One stream is to find the more effective explanatory variables in terms of
financial or accounting knowledge. Beaver (1966) identified thirty ratios considered
to be important factors for predicting corporate bankruptcy and the empirical study
showed that “Cash flow/Total debt”, “Net Income/Total assets”, “Total debt/Total
assets” are the three most effective ratios, achieving more than 80 % prediction
accuracy in terms of one-year-ahead forecasting. Altman (1968) selected five ratios,
employed a multivariate discriminant analysis model, and tested the model on 33
pairs of bankruptcy/non-bankruptcy firms. The model could correctly classify 90 %
of the firms one year prior to failure. The five selected ratios are: “Working capi-
tal/Total assets”, “Retained earnings/Total assets”, “EBIT/Total assets”, “Market value
equity/Book value of total debt”, and “Sales/Total assets”. Ravi Kumar and Ravi (2007)
presented a comprehensive review of work on bankruptcy prediction between 1968
and 2005. They collected some 500 different ratios that had been used in 128 reviewed
papers. The top 30 variables with high frequency of usage are shown in Table 1.
One reason for the various set of variables in bankruptcy prediction model may be,
as pointed out by Beaver (1966): (a) not all ratios predict equally well; (b) the ratios
do not predict bankrupt and non-bankrupt firms with the same degree of success.
However, there is some degree of agreement on the predictive ability of some ratios
in terms of research in the fields of finance and accounting.
Another stream to improve predictive ability is to develop more powerful classifi-
cation models. Many techniques have been used for developing bankruptcy prediction
models, whose objective is to improve prediction accuracy. These techniques can be
categorized in three groups: (1) statistical techniques; (2) intelligent techniques; and (3)
hybrid and ensemble models. Statistical techniques employed for bankruptcy predic-
tion include: linear discriminant analysis, quadratic discriminant analysis, regression
analysis, naive Bayes classifier, and Bayes network, etc. (Sun and Shenoy 2007; Zhou
et al. 2012). Intelligent techniques applied in bankruptcy prediction include: neural
networks (Alfaro et al. 2008), self-organizing map (Zhu et al. 2007), decision trees
(Gepp et al. 2009), case-based reasoning (Park and Han 2002), evolutionary algorithms
(Varetto 1998), rough set (Sanchis et al. 2007), support vector machines (Härdle et al.
2009), etc. (Ravi Kumar and Ravi 2007) provide a comprehensive review of the appli-
cation of statistical and intelligent techniques to solve bankruptcy prediction problems,
from year 1968 to 2005.
Much recent research on the development of bankruptcy prediction models focuses
mainly on hybrid models and ensemble models. The main idea of hybrid models is
123
L. Zhou, K. K. Lai
Table 1 Description of variables
No. Description Cited times No. Description Cited times
R1 Net income/total 30 R16 Current 7

assets (ROA) liabilities/total
assets
R2 Current ratioa 22 R17 Quick assets/total 5
assets
R3 Retained 19 R18 Total assets 5
earnings/total
assets
R4 Working 21 R19 (Total borrowings 5
capital/total + bonds
assets payable)/total
assets
R5 EBIT/total assets 20 R20 Market value of 4
equity/book
value of debt
R6 Sales/total assets 20 R21 Net income/sales 4
R7 Market value of 11 R22 Inventories 4
equity/total debt turnover
R8 Cash/total assets 10 R23 Size 3
R9 Quick ratiob 9 R24 Quick assets/sales 3
R10 Cash flow/total 7 R25 Sales/cash 3
debt
R11 Current 7 R26 Working 3
assets/total capital/sales
assets
R12 Quick 7 R27 Dividend 3
assets/current
liabilities
R13 Total debt/total 7 R28 Total loans and 3
assets leases/total
assets
R14 Cash/current 7 R29 Financial 3
liabilities expenses/sales
R15 Net 7 R30 Fixed 3
income/stockholder’s assets/(stockholder’s
equity equity +
long-term
liabilities)
a Current ratio = current assets/current liabilities
b Quick ratio = (cash and cash equivalent + marketable securities + accounts receivable)/current liabilities
to combine two or more methods in order to take advantage of their affordances to

obtain a more powerful model. Divsalar et al. (2011) constructed two hybrid mod-
els: one based on genetic programming and orthogonal least squares (GP/OLS) and
another based on a combination of genetic programming and simulated annealing
algorithms (GP/SA). The experimental results show that GP/SA model performs bet-
123
ter than the GP/OLS model in bankruptcy prediction. Cho et al. (2010) proposed a
hybrid model for bankruptcy prediction with a combination of variables selected by
using a decision tree and case-based reasoning using the Mahalanobis distance with
variable weights. The experimental result indicates that the proposed hybrid model
outperforms some currently-in-use techniques. Chen et al. (2009) proposed a hybrid
neuro fuzzy approach which combines the functionality of fuzzy logic and the learning
ability of neural networks. The empirical results show that the neuro fuzzy approach
has a better accuracy rate than logit regression. Chaudhuri and De (2011) introduced
fuzzy support vector machines (FSVM) to solve bankruptcy prediction problem and
they demonstrated the efficiency of the FSVM. Other models based on support vector
machines (SVM) mainly focus on a combination of searching techniques for optimiz-
ing parameters or input features sets and powerful classification capability from SVM.
Ahn et al. (2006) used genetic algorithms (GAs) to optimize the parameters in SVM
kernel functions and features subsets for bankruptcy prediction. Zhou et al. (2008)
introduced a direct search method to optimize parameters in SVM model.
The ensemble model is somewhat different from a hybrid model. The main idea
of ensemble models is to combine a set of base models, each of which is simple
and weak, in order to obtain a more powerful model with more accurate and reliable
classification accuracy than that can be obtained from a single model. AdaBoost is
a widely used ensemble algorithm that can be employed in conjunction with many
other types of learning methods for base learners to improve their performance. West
et al. (2005) employed the multilayer perceptron neural network as a base learner
and investigate three ensemble strategies: crossvalidation, bagging, and boosting. The
neural network ensemble is found to be superior to the single model for three real world
financial decision applications. Alfaro et al. (2008) provided an empirical comparison
of AdaBoost and neural network for bankruptcy prediction. The prediction accuracy
of both techniques on a set of European firms shows that the proposed AdaBoost
approach can decrease the generalization error by about thirty percent compared to
the error produced with a neural network.
In the development of bankruptcy prediction models, few researchers have
addressed the issue of missing feature values and most of them have simply deleted the
observations with missing feature values and consider those observations with com-
plete predictor values (Cheng et al. 2010; Hwang et al. 2007). However, Shumway
(2001) pointed out that a complete set of explanatory variables is not always observ-
able for each firm year, and he substituted variable values from past years for missing
values in some cases. Chava and Jarrow also handle (Chava and Jarrow 2004) missing
accounting and market data by substituting the previous available observations. Little
existing research discusses the performance of bankruptcy prediction models using
observations with missing values and the effect of different missing values imputation
methods.
3 AdaBoost Algorithms
AdaBoost is a widely used ensemble algorithm which constructs a composite classifier

by sequentially training classifiers while putting more and more emphasis on certain
patterns (Han et al. 2006). There are several methods for combining results from
123
L. Zhou, K. K. Lai
many base learners into one high-quality ensemble predictors, such as uniform voting,
distribution summation, Bayesian combination, etc. The elements for construction of
AdaBoost include: input features and responses, ensemble methods, base learners and
number of base learners in an ensemble. Different combinations of these elements build
different AdaBoost models. In the present study, three different AdaBoost models are
employed.
For the bankruptcy prediction problem, let S = {x n , yn }n=1
N denote the training data
set, where input data x n ∈ R (m is the number of features) and its corresponding
m
output yn ∈ {1, −1}. If the company is bankrupt, then yn = 1 , otherwise, yn = −1.
3.1 AdaBoost.M1
AdaBoost.M1, proposed by Freund and Schapire (1997), is a popular boosting algo-

rithm for binary classification problems. The main idea of AdaBoost.M1 is to assign
a weight to each sample in the training set. Initially, each sample in training set S is
assigned an equal weight of 1/N , which means that each sample has the same oppor-
tunity to be selected at the first step. Generating AdaBoost model need T rounds of
training base learners with T different training sample groups St (t = 1, 2, . . . , T ).
In round t, the function to determine the weight of observation n is denoted by Dt (n).
In each round after the construction of learner Ct which provides a function Ft to
map x to {1, −1}, the value of Dt (n) is adjusted in terms of how the observations are
classified by the classifier Ct and the training sample group is then generated in term
of Dt on S with sample replacement.
The AdaBoost.M1 described as follows is from Matlab Statistics and Machine
Learning Toolbox.
Input:
S, a set of samples for training with size N ;
T , the number of rounds to construct the AdaBoosting model;
C, base learner decision tree;
Output: AdaBoost.M1 model
Algorithm: AdaBoost.M1
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N ,
n = 1, 2, . . . N , t = 1;
Step 2: for t = 1 to T
2.1 build classifier Ct using base learner and distribution Dt
2.2 compute the weighted error εt from model Ct on S as Eq. (1):

N
εt = Dt (n) × errt (x n ) (1)
n=1
where errt (x n ) is the misclassification error of x n by model Ct , it is defined as fol-

lowing:
1 if Ft (x n ) = yn
errt (x n ) = (2)
0 otherwise
123
2.3 if εt > 0.5 or εt = 0, then

T = t − 1; break.
else
update the weights Dt+1 :
⎧
⎪ Dt (n)
⎪
⎨ if Ft (x n ) = yn
2(1 − εt )
Dt+1 (n) = , n = 1, 2, . . . , N . (3)
⎪
⎪ D (n)
⎩ t otherwise
2εt
end if
2.4 normalize Dt+1 to be a proper distribution
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (4)

N
Dt+1 (n)
n=1
2.5 t = t + 1
end for
Following the above method, a set of base learners Ct which actually defines a set
of functions {Ft |t = 1, 2, . . . , T } can be obtained and the final decision from this set
of functions (AdaBoost models) is defined as (5):
⎛ ⎞
1⎠
ŷ(x) = argmax ⎝ log (5)
y∈{+1,−1} βt
t:Ft (x)=y
where εt
βt = (6)
1 − εt
If decision tree is selected as the base learner, the AdaBoost.M1 model is denoted
by ABM1.DT.
3.2 AdaBoosting Neural Networks
AdaBoosting neural networks (AdNN) are to use a neural network as the weak clas-
sifier instead of a decision tree in the traditional AdaBoost models and is expected
to provide more accurate generalization than a single model. Schwenk and Bengio
(2000) reported that AdNN is significantly better than boosted decision trees in terms
of accuracy on a data set of online handwritten digits reorganization.
There are two ways to deal with the weighted instances. One is to sample with
replacement (SWR) in terms of weight distribution, then samples with greater weight
may be sampled several times while those with less weight may not occur in the
123
L. Zhou, K. K. Lai
training sample sets at all. Another way is to train the base learner with respect to a
weighted cost function (WCF) which assign a larger weight to the incorrectly classified
instances. The AdaBoosting neural networks with SWR is denoted by ABNN.SWR,
and that with WCF is denoted by ABNN.WCF.
The detail of ABNN.SWR algorithm is described as follows:
Input:
C, base learner neural network;
Output: AdaBoost ABNN.SWR model
Algorithm: ABNN.SWR
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N ,
n = 1, 2, . . . , N ;
repeat
2.1 sample S with replacement according to Dt to obtain St ;
2.2 train neural network with St to obtain classifier model Ct ;
2.3 compute the weighted error εt from model Ct on S as (7):

N
εt = Dt (n) × errt (x n ) (7)
n=1
where errt (x n ) is the misclassification error of x n by model Ct , it is defined as follows:
1 i f Ft (x n ) = yn
errt (x n ) =
0 other wise
2.4 if εt > 0.5, set Dt (n) = 1/N , n = 1, 2, . . . , N ;

until εt < 0.5
if εt = 0, then set T = t, break
else
update the weight function Dt+1 (n) by formula (8) and (9):

Dt (n) × βt if Ft (x n ) = yn
Dt+1 (n) = , n = 1, 2, . . . , N . (8)
Dt (n) otherwise
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (9)

N
Dt+1 (n)
n=1
123
Fig. 1 Example of neural

network architecture in
AdaBoost neural network model
Input 1
Pt b
Input 2
Input 3
Pt n
Input 4
Input Layer Hidden Layer Output Layer
where (9) is to normalize the weight such that Dt+1 is a probability distribution
function, βt is obtained by formula (10):
εt
βt = (10)
1 − εt
end if
end for
Following the above steps, a set of neural network classifiers Ct which actually
defines a set of functions {Ft |t = 1, 2, . . . , T } and the final decision from this set of
functions is defined as (5).
From Formula (8), it can be observed that the weights of correctly classified samples
decrease while the weights of misclassified sample increase after normalization. The
structure of the neural network is shown in Fig. 1: two nodes in the output layer which
indicate the probability that the company is non-bankrupt and bankrupt, denoted by
Ptn , Ptb respectively. For a non-bankrupt company, the completely correct output of
the neural network should be [1 0] and for a bankrupt company, it should be [0 1]
indicating that the probability for it to be non-bankrupt is 0 and to be bankrupt is 1.
The output function F(x) can be defined as (11).

+1 if Ptb > Ptn
Ft (x) = (11)
−1 otherwise
In the ABNN.SWR algorithm, the training data set for each neural network is
obtained by resampling with replacement based on the weight distribution function Dt
from S. Another way is to combine Dt with the cost function used in network training
function which guide the direction of optimization of weights in a neural network.
This AdaBoost neural network algorithm ABNN.WCF is described as follows (Zhou
and Lai 2009):
123
L. Zhou, K. K. Lai
Input:
C, base learner neural network;
Output: AdaBoost ABNN.WCF model
Algorithm: ABNN.WCF
Step 1: initialize the weight of each sample in S to 1/N , i.e. D1 (n) = 1/N , n =
1, 2, . . . , N ;
repeat
2.1 Training neural network with Levenberg-Marquardt back propagation on S with
respect to weight distribution Dt and to obtain model Ct which corresponds to the
function Ft .
2.2 compute the weighted error εt from model Ct on S as Formula (12):
1
N
εt = Dt (n) × 1 − Ptn (x n ) − Ptb (x n ) × yn (12)
2
n=1
When yn =1, the completely correct output of NN should be [1 0], the actual output
of NN is [ Ptn Ptb ], thus the error is the sum of error from the two output neuron, i.e.
(1 − Ptn ) + Ptb , the same goes for yn = −1.
2.3 if εt > 0.5, then set Dt (n) = 1/N , n = 1, 2, . . . , N ;
until εt < 0.5
if εt = 0, then set T = t, break;
else
update the weight function Dt+1 (n) by Formula (13) and (14):
1

1+ Ptn (x n )−Ptb (x n ) ×yn
Dt+1 (n) = Dt (n) × βt2 , n = 1, 2, . . . , N . (13)
Dt+1 (n)
Dt+1 (n) = , n = 1, 2, . . . , N . (14)

N
Dt+1 (n)
n=1
where (14) is to normalize the weight such that Dt+1 is a probability distribution
function, βt is obtained by formula (15):
εt
βt = (15)
1 − εt
end if
end for
123
The final decision function is the same as shown in Formula (5). Since 0 < εt < 0.5,
so 0 < βt < 1, the weights of the correctly classified samples are reduced by βt , and
weights of those misclassified samples will increase by normalization.
4 Missing Data Imputation Methods
Missing data arise in a wide variety of statistical analyses, especially in the analysis of
financial time series, market research and social science research. Many efforts have
been made to handle missing data (Maimon and Rokach 2005; Little and Rubin 2002).
In this study, three common imputation methods are employed.
4.1 k-Nearest Neighbors Average (knnA)
The k-nearest neighbors average method imputes the missing feature of an instance
with the average of the corresponding feature of its k-most nearest neighbors of this
instance in the training samples set. Suppose feature i of observation n is missing,
then the features vector of observation is as follows:
x n = (xn,1 , xn,2 , ...xn,i−1 , ?, xn,i+1 , ..., xn,m ).
The steps of imputation methods knnA are as follows:
Step 1: Select all observations without missing feature i and having no missing
value in the features in which observation n have in the training samples set;
Step 2: Calculate Euclidean distance between each observation j in the selected
samples with following formula:

m
d(n, j) = (xn,l − x j,l )2
l=1,l=i
Step 3: Select the k observations with the minimum distance to observation n.

Step 4: Impute the missing value of feature i of observation n as follows:
1
k
xn,i = x j,i
k
j=1
4.2 Attribute Mean (AM)
A missing attribute value of a numerical attribute is replaced by the arithmetic mean

of all known values of the feature of the observations in training samples set instead
of only the k-nearest neighbors observations in knnA (Maimon and Rokach 2005).
123
L. Zhou, K. K. Lai
4.3 Global Closest Fit (GCF)
The global closest fit method replaces a missing feature value by the known value in
another observation that resembles as much as possible the instance with the missing
attribute values (Grzymala-Busse et al. 2002). The closest fit case is the instance
which has the minimum distance to the instance with the missing values. The distance
function for two instances n, j with feature vectors x n and x j respectively is computed
as follows (Maimon and Rokach 2005):

m
d(n, j) = d(xn,i , x j,i )
i=1
where
⎧
⎪
⎪ 0 if xn,i = x j,i
⎪
⎪
⎨1 if xn,i and x j,i are symbolic and xn,i = x j,i or
d(xn,i , x j,i ) = xn,i , x j,i is unknown
⎪
⎪
⎪
⎩ |xn,i − x j,i | if x and x are numerical and x = x
⎪
n,i j,i n,i j,i
r
where r is the range of the numerical feature regardless of the missing value.
5 Empirical Study
5.1 USA Bankruptcy Data Set (USABDS)
The financial ratios used in USABDS (available in https://goo.gl/IFFzDp) are derived

from data of financial statements of companies in the United States. Only non-financial
firms have been selected and bankruptcy prediction is based on the financial data
released just one year before the year in question. USABDS includes samples with
observed financial status (Non-Bankrupt or Bankrupt) from 1981 to 2009. A bankrupt
company is defined as “Bankruptcy” or “Liquidation”.
Alfaro et al. (2008) lists three criteria for selecting financial ratios (FRs) for
bankruptcy prediction: (1) FRs should be commonly used in bankruptcy prediction
literature; (2) the information needed to calculate these FRs should be available; (3)
the researchers’ own decisions based on their experience in previous studies or on the
basis of the preliminary trials. In this study, since the missing values will be handled
directly by the models or imputation methods, the criterion (2) is always followed.
Criterion (3) is not objective. Thus, only criterion (1) is decisive. However, more than
500 different variables or ratios have been listed earlier (Ravi Kumar and Ravi 2007).
It is not reasonable to select all the variables to construct the model. In this study, the
ratios or factors that have been used with the highest frequency in the 128 reviewed
papers by Ravi Kumar and Ravi (2007) are considered. At the same time, the collinear-
ity problem should be avoided in the features selection. The final 10 features selected
123
120
Bankruptcy By Year
Number of Bankruptcies
100
80
60
40
20
0
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009
Year
Fig. 2 Number of bankruptcies by year in USABDS
30
Non-Bankrupt
Bankrupt
25
Number of instances with
20
missing value
15
10
0
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009
Year
Fig. 3 Number of observations with missing values by year in USABDS
in this study from Table 1 are R1–R7, R10, R11 and R14, which include the five
important features found by Altman (1968), i.e. R3–R6, R7.
Figure 2 gives the number of bankruptcies by observed year over the sample period.
As shown in Fig. 2, the number of bankruptcies is cyclical. with the largest number
of bankruptcies occurring in the late 1980s and early 1990s and in 1997 and 1998.
Although the sub-prime crisis happened in USA in 2007, since most bankrupt firms
were in financial industries and some firms were officially declared as bankrupt after
2009, there are only few bankrupt firms after 2007 in this data set. Finally, there are a
total of 1168 bankrupt companies and 1168 non-bankrupt companies in this USABDS
and the number of bankrupt and non-bankrupt companies is the same in each year.
Figure 3 shows the number of bankrupt or non-bankrupt firms with missing values
on the selected features by observed year over the sample period in USABDS. There
were a total of 183 (15.7 %) non-bankrupt firms with missing values and 244 (20.9 %)
bankrupt firms with missing values. As seen, the number of companies containing
missing values was decreasing over the period, perhaps because corporate governance
had been improved and the corporate finance had become more standardized. How-
123
L. Zhou, K. K. Lai
ever, irregular management of finance and accounting is very common in small and
medium enterprises (SMEs) in emerging countries. It may be common to face missing
values problems in predicting financial bankruptcy of SMEs as supplier or partners or
borrowers.
Four common performance measures as follows are selected to evaluate the perfor-
mance of the models:
TP
1. Sensitivity (Sen) =
TP + FN
TN
2. Specificity (Spe) =
TN + FP
TP + TN
3. Accuracy (Acc) = where TP: positive classified as positive,
TP + FN + TN + FP
FN: positive classified as negative, TN: negative classified as negative, FP: negative
classified as positive.
4. Area under ROC curve (AUC): ROC graphs are two-dimensional graph in which
Sensitivity is plotted on the Y axis and 1−Specificity is plotted on X axis. An
ROC graph depicts relative trade-off between benefits (true positives) and costs
(false positives), which is useful for organizing classifiers and visualizing their
performance especially in the domains with skewed class distributions and unequal
classification error costs. The AUC of a classifier is equivalent to the probability
that the classifier will rank a randomly chosen positive instance higher than a
randomly chosen negative instance (Fawcett 2003).
Each observation includes a company’s financial status in year t and its financial
ratios in year t − 1. Rolling time windows is used to test the performance of models
and imputation methods. For example, to test the model performance for observations
in year 2005, the test samples set consists of all observations in 2005, all observations
in year 2004 and before 2004 constitute the training samples set. There were a total
of 143 bankrupt firms and 143 non-bankrupt firms, 24 (16.78 %) bankrupt firms and
14 (9.79 %) non-bankrupt firms with missing values in the test sets from year 2001 to
2009. In this experiment, ABM1.DT selects decision tree as the base learner. The size
of decision tree in ABM1.DT is controlled by three parameters: (1) MaxNumSplits—
maximal number of decision splits; (2) MinLeafSize—minimum observations per leaf;
(3) MinParentSize—minimum observations per branch node. These three parameters
are set to the default values in Matlab as 1, 5, 10, respectively. ABNN.WCF and
ABNN.SWR use simple multilayer perceptron neural network. The neural network
base learner has four nodes in the hidden layer and is implemented by the neural
network toolbox in Matlab. The number of rounds T is set to 100. The k is set to 10
in the knnA imputation method.
Since ABM1.DT and the decision tree (DT) can handle missing values directly,
to see the effect of the imputation method, the performance of these two models
on USABDS with different imputation methods (include no imputation) is shown in
Table 2. Each figure is the weighted average in term of the number of test samples of
different classes in each test year. In Table 2, Non-Imp denotes there is no imputation
methods employed. Row “Test samples with MV” lists the average performance on
the 38 observations with missing values. The observations with missing values in each
123
Table 2 Average performance of different imputation methods on USABDS
Samples Models Imputation Sen Spe Acc AUC

methods
All test samples ABM1.DT Non-imp 0.7203 0.4406 0.5804 0.6412

knnA 0.5594 0.8671 0.7133 0.7790
AM 0.5804 0.8741 0.7273 0.7858
GCM 0.5734 0.8322 0.7028 0.7697
Decision tree Non-Imp 0.6923 0.6014 0.6469 0.6986
knnA 0.6923 0.7762 0.7343 0.7541
AM 0.6783 0.7692 0.7238 0.7225
GCM 0.6783 0.7133 0.6958 0.7284
Test samples with MV ABM1.DT Non-imp 0.3571 0.9583 0.7368 –
knnA 0.6429 0.9583 0.8421 –
AM 0.7857 0.9583 0.8947 –
GCM 0.7143 0.8750 0.8158 –
Decision tree Non-imp 0.8571 0.3750 0.5526 –
knnA 0.7857 0.8333 0.8158 –
AM 0.7143 0.7500 0.7368 –
GCM 0.7143 0.7500 0.7368 –
of the test years are no more than ten and it is almost impossible to obtain a smooth
ROC curve and thus the AUC becomes unreliable, therefore, the AUC performance is
not presented for test samples with MV.
In Table 2, the best performance for each measure is marked in bold. It is interesting
to observe that when imputation is employed on the observations with missing values,
the performance of Acc from ABM1.DT and Decision Tree on the test samples with
missing values increases considerably, i.e. the imputation can help these two models to
classify bankrupt firms with missing values at a higher accuracy. The best classification
accuracy on test samples with MV can achieve 89.47 % by ABM1.DT with AM and
at the same time the classification accuracy on all test samples can achieve 72.73 %.
From Table 2, it can be observed that the three imputation methods can greatly
improve the predictive accuracy in comparison to no imputation. Table 3 shows the
results of the Wilcoxon Signed Rank Test on Acc from the decision tree on test samples
with missing values between each pair of imputation methods. It demonstrates that
the performance of knnA imputation methods is significantly greater than that of the
non-imputation method; however, there is no statistically significant difference among
knnA and other two imputation methods under significance level α = 0.05.
The number of correctly classified non-bankrupt and bankrupt samples with MV in
each year by ABM1.DT under Non-Imp and AM is shown in Figs. 4 and 5, respectively.
Regarding the samples with missing values in the test set, there is no observation in
year 2007 and no non-bankrupt observations from year 2004 to 2008. There are a total
of 14 non-bankrupt observations with MV and 24 bankrupt observations with MV.
123
L. Zhou, K. K. Lai
Table 3 Wilcoxon signed rank

Non-imp knnA AM GCM
test p-values on Acc of DT on
test samples with missing values Non-imp 1.0000 0.0312 0.2500 0.5625
between pairs of imputation
method on USABDS knnA 1.0000 0.2500 0.3750
AM 1.0000 1.0000
GCM 1.0000
Hits by Non-Imp
6 Failure by Non-Imp
Hits by AM
Number of non-bankrupt observations
Failure by AM
5
0
2001 2002 2003 2004 2005 2006 2008 2009
observed years
Fig. 4 Number of non-bankrupt observations with missing values hit or failed by ABM1.DT with Non-Imp
and AM on USABDS
ABM1.DT with both Non-Imp and AM can correctly predict 23 bankrupt observations
and this number explains their Spe of 0.9583 (23/24) in Table 2.
To show the performance of AdaBoost models compared to other methods, Table 4
lists the average performance of AdaBoost models and four other widely used methods
on all test samples, such as a decision tree, linear regression (LR), logistic regression
(LOGR), and neural network (NN). The neural network here is a multilayer network
with Backpropagation training techniques from Matlab neural network toolbox. It
can be observed that AdaBoost models always achieve the best Acc and AUC per-
formance except that NN achieves a slightly greater ACC than ABNN.SWR under
GCM, although which one is best shifts among the three AdaBoost Models in differ-
ent imputation methods. Table 5 lists the average performance of all these methods
on the test samples with missing values. The best performance for each measure is
marked in bold in Tables 4 and 5.
123
Hits by Non-Imp
5 Failure by Non-Imp
Hits by AM
Failure by AM
Number of bankrupt observations
0
2001 2002 2003 2004 2005 2006 2008 2009
observed years
Fig. 5 Number of bankrupt observations with missing values hit or failed by ABM1.DT with Non-Imp
and AM on USABDS
To make a statistical comparison among the classifiers, the data from three different
imputation methods on test samples from observed years 2001 to 2009 are regarded
as different data groups. There is no test sample in year 2007. For each group, the
performance of the seven classifiers is ranked. The Friedman test is conducted with
rows representing blocks and classifiers representing treatments in order to make a
comparison among these classifiers. The p-value of Friedman test on AUC in Table 4
and Acc in Table 5 among the seven classifiers is 5.30 ×10−14 and 0.0702, respec-
tively. Given the significance level α = 0.10, the null hypothesis “all the classifiers
are equivalent” is rejected. The Nemenyi test is then used to compare all classifiers
with each other. The performance of two classifiers is significantly different, if the
corresponding average ranks differ by at least the criteria difference

k(k + 1)
CD = qα
6N
where qα is the critical value for two-tailed Nemenyi test with significance level α;
k is the number of models;
N is the number of data sets.
The average ranks (AR) of AUC performance on all test samples from all seven
models are shown in Table 6. ABNN.SWR has the best AUC performance with average
rank value of 2.56. If the significance level α is set at 0.10, qα = 2.693, N = 3 × 8 =
24, CD is 1.68. Any model with average rank less than 2.56 + 1.68 = 4.24 has no
123
L. Zhou, K. K. Lai
Table 4 Average performance

Models knnA
on all test samples in USABDS
Sen Spe Acc AUC
ABM1.DT 0.5594 0.8671 0.7133 0.7790

DT 0.6923 0.7762 0.7343 0.7541
ABNN.WCF 0.6713 0.8042 0.7378 0.7782
ABNN.SWR 0.6713 0.7832 0.7273 0.7815
LR 0.4965 0.6014 0.5490 0.5211
LOGR 0.6503 0.6713 0.6608 0.6713
NN 0.6294 0.7273 0.6783 0.7462
Models AM
Sen Spe Acc AUC
ABM1.DT 0.5804 0.8741 0.7273 0.7858

DT 0.6783 0.7692 0.7238 0.7225
ABNN.WCF 0.6643 0.8462 0.7552 0.7982
ABNN.SWR 0.6783 0.8182 0.7483 0.8108
LR 0.4126 0.6783 0.5455 0.5690
LOGR 0.6573 0.6713 0.6643 0.6701
NN 0.6783 0.8042 0.7413 0.7941
Models GCM
Sen Spe Acc AUC
ABM1.DT 0.5734 0.8322 0.7028 0.7697

DT 0.6783 0.7133 0.6958 0.7284
ABNN.WCF 0.6643 0.7972 0.7308 0.7822
ABNN.SWR 0.6573 0.8112 0.7343 0.7881
LR 0.4336 0.6713 0.5524 0.6329
LOGR 0.6573 0.6993 0.6783 0.7061
NN 0.6853 0.7902 0.7378 0.7724
significant difference from ABNN.SWR on AUC performance. Therefore, other two

AdaBoost models, NN and DT have the same level of performance as ABNN.SWR.
The average ranks of Acc performance on all test samples with missing values of all
seven models are shown in Table 7. ABM1.DT has the best Acc performance and DT
achieving AR less than 1.74 + 1.68 = 3.42 performs as well as ABM1.DT.
5.2 Japanese Bankruptcy Data Set (JPNBDS)
The Japanese Bankruptcy Data Set (JPNBDS, available in https://goo.gl/IFFzDp) is

much smaller than USABDS. The financial ratios are derived from data of Japanese
123
Table 5 Average performance on test samples with missing values in USABDS
Models knnA AM
Sen Spe Acc Sen Spe Acc
ABM1.DT 0.6429 0.9583 0.8421 0.7857 0.9583 0.8947

DT 0.7857 0.8333 0.8158 0.7143 0.7500 0.7368
ABNN.WCF 0.5714 0.8333 0.7368 0.5714 0.8333 0.7368
ABNN.SWR 0.5714 0.8750 0.7632 0.5714 0.8333 0.7368
LR 0.7857 0.6250 0.6842 0.3571 0.6667 0.5526
LOGR 0.7143 0.5833 0.6316 0.7143 0.5833 0.6316
NN 0.3571 0.8750 0.6842 0.7143 0.9167 0.8421
Models GCM
Sen Sen Sen
ABM1.DT 0.7143 0.8750 0.8158

DT 0.7143 0.7500 0.7368
ABNN.WCF 0.5714 0.7083 0.6579
ABNN.SWR 0.5714 0.8333 0.7368
LR 0.5714 0.8750 0.7632
LOGR 0.7857 0.7083 0.7368
NN 0.5000 0.6250 0.5789
Table 6 Comparison of average rank on AUC among seven models on all test samples from USABDS
Models ABM1.DT DT ABNN.WCF ABNN.SWR LR LOGR NN
AR 2.89 4.11 2.81 2.56 6.30 5.63 3.70
Table 7 Comparison of average rank on Acc among seven models on test samples with missing values
from USABDS
AR 1.74 3.04 4.00 4.48 4.33 5.11 5.30
companies’ financial statements. Only non-financial firms are included and bankruptcy
prediction is based on the financial data released just one year before the year in
question. JPNBDS include samples with observed financial status (Non-Bankrupt or
Bankrupt) from 1995 to 2009.
Figure 6 gives the number of bankruptcies by observed year over the sample period.
There are a total of 76 bankrupt observations and 76 non-bankrupt observations in the
JPNBDS and the number of bankrupt and non-bankrupt observations is the same each
year.
123
L. Zhou, K. K. Lai
18 Bankruptcy By Year
16
Number of Bankruptcies
14
12
10
8
6
4
2
0
1995 1997 1999 2001 2003 2005 2007 2009
Year
Fig. 6 Number of Bankruptcies by year in JPNBDS
6
Non-Bankrupt
Bankrupt
Number of instances with
5
missing value
0
1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2007 2009
Year
Fig. 7 The number of instances with missing values by year in JPNBDS
JPNBDS has the same input features as USABDS. Figure 7 shows the number of
bankrupt or non-bankrupt firms with missing values by observed year over the sample
period in JPNBDS. There were a total of 11 (14.47 %) non-bankrupt firms with missing
values and 7 (9.21 %) bankrupt firms with missing values.
The models setting for JPNBDS is the same as that for USABDS. The test years
are from 2001 to 2009. There were a total of 42 non-bankrupt firms and 42 bankrupt
firms in the test samples, and 11 of the 42 non-bankrupt observations and 7 of the 42
bankrupt observations had missing values.
Table 8 shows the average performance of different imputation methods with
ABM1.DT and Decision Tree. The best performance of each measure is marked in
bold. It can be observed that for either ABM1.DT or DT model, the performance
from different imputation methods is almost the same. It may be due to almost all 18
observations with missing values just missing the same group of features and these
features contributing little in the classification function implied by the models. For
JPNBDS, ABM1.DT with knnA or AM can achieve the Acc of 83.3 % on test sam-
123
Table 8 Average performance of different imputation methods on JPNBDS
Imputation
Samples Models Sen Spe Acc AUC
methods
All test samples ABM1.DT Non-Imp 0.8814 0.7627 0.8220 0.9113
knnA 0.9153 0.7627 0.8390 0.9168
AM 0.9153 0.7627 0.8390 0.9168
GCM 0.8814 0.7627 0.8220 0.9099
Decision tree Non-Imp 0.8475 0.8305 0.8390 0.8115
knnA 0.8644 0.7966 0.8305 0.8165
AM 0.8644 0.7966 0.8305 0.8165
GCM 0.8644 0.7966 0.8305 0.8165
Test samples with MV ABM1.DT Non-Imp 0.8182 0.7143 0.7778 –
knnA 1.0000 0.5714 0.8333 –
AM 1.0000 0.5714 0.8333 –
GCM 0.8182 0.7143 0.7778 –
Decision tree Non-Imp 0.8182 0.7143 0.7778 –
knnA 0.8182 0.7143 0.7778 –
AM 0.8182 0.7143 0.7778 –
GCM 0.8182 0.7143 0.7778 –
Table 9 Wilcoxon signed rank

Non-Imp knnA AM GCM
test p-values on Acc from
ABM1.DT between pairs of Non-Imp 1.0000 0.0625 0.0625 0.2500
imputation methods on all test
samples in JPNBDS knnA 1.0000 0.1250 0.1875
AM 1.0000 0.2500
GCM 1.0000
ples with missing values while achieving Acc of 83.9 % on all test samples. Table 9
shows the results of the Wilcoxon Signed Rank Test on Acc from ABM1.DT on all
test samples between each pair of imputation methods. It shows that Acc performance
from knnA and AM both are significantly greater than non-imputation by ABM1.DT
under significance level α = 0.10. However, for DT, the Acc performance of three
imputation methods have no significant difference from non-imputation in terms of
Wilcoxon Signed-Rank Test.
The number of correctly classified observations with missing values in test set of
JPNBDS by ABM1.DT under Non-Imp and AM is shown in Figure 8. There are a total
of 18 observations with MV. ABM1.DT with both Non-Imp and AM only fails four
and three times, respectively, which explains their Acc of 0.7778 (14/18) and 0.8333
(15/18) in Table 8.
Table 10 lists the average performance of all seven methods on JPNBDS. ABM1.DT
achieves the best AUC performance with various imputation methods. Table 11 lists
the average performance of all these methods on the test samples with missing values.
123
L. Zhou, K. K. Lai
Hits by Non-Imp
8
Failure by Non-Imp
Hits by AM
7
Failure by AM
Number of observations
0
2001 2002 2003 2004 2005
observed years
Fig. 8 Number of observations with missing values hit or failed by ABM1.DT with Non-Imp and AM on
test set of JPNBDS
The best performance of each measure is marked in bold in Tables 10 and 11. Since
there are only 18 observations with missing values in the test samples, one more
observation correctly classified indicates an increase of performance of more than
5 %. For JPNBDS, all models have better performance on Sen than Spe, which means
that the models do better on non-bankrupt companies than on bankrupt companies.
The average ranks of AUC performance on all test samples from all seven models
are shown in Table 12. ABM1.DT has the best AUC performance. The Friedman test
was conducted with rows representing blocks and classifiers representing treatments.
The p-values of Friedman test on AUC in Table 10 and Acc in Table 11 among the
seven classifiers are 2.42 × 10−9 and 0.1769, respectively. Given the significance level
α = 0.10, there is significant difference among the seven models on AUC from all test
samples but no difference on Acc from test samples with missing values. Therefore,
Nemenyi test can be used to compare each pair of models listed in Table 12. If α = 0.10,
N = 3 × 5 = 15, then qα = 2.693, C D = 2.12. ABM1.DT has the top performance
on AUC with A R = 1.38. Any model with average rank less than 1.38 + 2.12 = 3.50
has no significant difference from the ABM1.DT. Therefore, among other six models,
only LR achieving A R = 3.33 has no significant difference on the AUC performance
from ABM1.DT.
The average ranks of Acc performance on test samples with missing values from
all seven models are shown in Table 13. Although these models have different average
ranks, Friedman test shows the difference is not statistically significant.
From the experimental results, it can be observed that ABM1.DT can achieve the
highest level of performance of AUC and Acc on both data sets with large differences
in sample size. For all the test years in both data sets, the average ranks of ABM1.DT
123
Table 10 Average performance

Models knnA
on all test samples in JPNBDS
Sen Spe Acc AUC
ABM1.DT 0.9153 0.7627 0.8390 0.9168

DT 0.8644 0.7966 0.8305 0.8165
ABNN.WCF 0.8305 0.7627 0.7966 0.8168
ABNN.SWR 0.9831 0.3220 0.6525 0.6643
LR 0.8475 0.7458 0.7966 0.8781
LOGR 0.8644 0.6780 0.7712 0.8311
NN 0.8814 0.7119 0.7966 0.8040
Models AM
Sen Spe Acc AUC
ABM1.DT 0.9153 0.7627 0.8390 0.9168

DT 0.8644 0.7966 0.8305 0.8165
ABNN.WCF 0.8644 0.7797 0.8220 0.9065
ABNN.SWR 0.9492 0.2034 0.5763 0.5898
LR 0.8475 0.7458 0.7966 0.8781
LOGR 0.8644 0.6780 0.7712 0.8311
NN 0.8475 0.7288 0.7881 0.8434
Models GCM
Sen Spe Acc AUC
ABM1.DT 0.8814 0.7627 0.8220 0.9099

DT 0.8644 0.7966 0.8305 0.8165
ABNN.WCF 0.8305 0.6780 0.7542 0.8609
ABNN.SWR 1.0000 0.0339 0.5169 0.5169
LR 0.8305 0.7627 0.7966 0.8778
LOGR 0.8644 0.6610 0.7627 0.8292
NN 0.6271 0.7288 0.6780 0.7079
Table 11 Average performance on test samples with missing values in JPNBDS
Models knnA AM GCM

Sen Spe Acc Sen Spe Acc Sen Spe Acc
ABM1.DT 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 0.8182 0.7143 0.7778
DT 0.8182 0.7143 0.7778 0.8182 0.7143 0.7778 0.8182 0.7143 0.7778
ABNN.WCF 1.0000 0.8571 0.9444 1.0000 0.5714 0.8333 1.0000 0.2857 0.7222
ABNN.SWR 1.0000 0.5714 0.8333 1.0000 0.1429 0.6667 1.0000 0.0000 0.6111
LR 0.9091 0.7143 0.8333 0.9091 0.7143 0.8333 0.9091 0.7143 0.8333
LOGR 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333
NN 1.0000 0.5714 0.8333 1.0000 0.5714 0.8333 0.7273 0.5714 0.6667
123
L. Zhou, K. K. Lai
Table 12 Comparison of average rank on AUC among 7 models on all test samples from JPNBDS
AR 1.38 4.14 3.52 6.19 3.33 4.38 5.05
Table 13 Comparison of average rank on Acc among 7 models on test samples with missing values from
JPNBDS
AR 2.00 2.87 2.87 4.87 4.13 5.13 6.13
among seven models are at the top or very close to the top. For ABM1.DT on both
data sets, knnA and AM always significantly outperform non-imputation.
6 Conclusion
This paper has investigated three AdaBoost models combining with imputation meth-
ods for bankruptcy prediction with missing data. Each AdaBoost model is tested on
two data sets (USABDS and JPNBDS) with large differences in sample size. The
ABM1.DT algorithm with knnA or AM imputation method can keep a constant good
performance on observations with or without missing values in both data sets. The
experimental results show that the ABM1.DT algorithm is robust in bankruptcy pre-
diction for companies with or without missing financial data even in the case where
there are only a few training samples, and the accuracy performance of preprocessing
methods is better than that of parallel methods implemented by ABM1.DT and deci-
sion tree. This study has selected samples in the same way as that in many other papers.
It has kept the same number of non-bankrupt companies and bankrupt companies in
the data set, even if there were many non-bankrupt companies in the original data set.
If the non-bankrupt companies are not chosen as the training or test samples, it is a
waste for the data resource in the bankruptcy prediction. Can the non-bankrupt com-
panies excluded from the samples set help to improve prediction performance? How
to take advantage of the whole data set in bankruptcy prediction. These two problems
need further research.
Acknowledgments Thank the anonymous reviewer for the very helpful and valuable suggestions and
comments to improve this article. This work is partially supported by the Faculty Research Grants of
Macau University of Science and Technology (No. 0497) and the National Natural Science Foundation of
China (NSFC No. 71433001).
References
Ahn, H., Lee, K., & Kim, K. J. (2006). Global optimization of support vector machines using genetic
algorithms for bankruptcy prediction. Lecture Notes in Computer Sciences, 4234, 420–429.
123
Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison
of adaboost and neural networks. Decision Support Systems, 45, 110–122.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
Journal of Finance, 23, 589–609.
Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111.
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., & Kegl, B. (2006). Aggregate features and adaboost for
music classification. Machine Learning, 65, 473–484.
Chaudhuri, A., & De, K. (2011). Fuzzy support vector machine for bankruptcy prediction. Applied Soft
Computing, 11, 2472–2486.
Chava, S., & Jarrow, R. A. (2004). Bankruptcy prediction with industry effects. Review of Finance, 8,
537–569.
Chen, H.-J., Huang, S. Y., & Lin, C.-S. (2009). Alternative diagnosis of corporate bankruptcy: A neuro
fuzzy approach. Expert Systems With Applications, 36, 7710–7720.
Cheng, K. F., Chu, C. K., & Hwang, R. C. (2010). Predicting bankruptcy using the discrete-time semipara-
metric hazard model. Quantitative Finance, 10, 1055–1066.
Cho, S., Hong, H., & Ha, B.-C. (2010). A hybrid approach based on the combination of variable selec-
tion using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy
prediction. Expert Systems With Applications, 37, 3482–3488.
Dietrich, J. R. (1984). Discussion of methodological issues related to the estimation of financial distress
prediction models. Journal of Accounting Research, 22, 83–86.
Divsalar, M., Javid, M. R., Gandomi, A. H., Soofi, J. B., & Mahmood, M. V. (2011). Hybrid genetic
programming-based search algorithms for enterprise bankruptcy prediction. Applied Artificial Intelli-
gence, 25, 669–692.
Fawcett, T. (January 2003). Roc graphs: Notes and practical considerations for data mining researchers, HP
Laboratories Report.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine learning:
Proceedings of the 13th international conference (pp. 325–332). San Francisco: Morgan Kaufmann.
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application
to boosting. Journal of Computer and System Sciences, 55, 119–139.
Gepp, A., Kumar, K., & Bhattacharya, S. (2009). Business failure prediction using decision trees. Journal
of Forecasting, 29, 536–555.
Grzymala-Busse, J., Grzymala-Busse, W., & Goodwin, L. (2002). A comparison of three closest fit
approaches to missing attribute values in preterm birth data. International Journal of Intelligent Sys-
tems, 17, 125–134.
Grzymala-Busse, J. W. (2004). Data with missing attribute values: Generalization of indiscernibility relation
and rule induction. In J. Peters, J. Grzymala-Busse, B. Kostek, R. Swiniarski, & M. Szczuka (Eds.),
Transactions on rough sets I (pp. 78–95). New York: Springer.
Grzymala-Busse, J. W. (2004). Rough set approach to incomplete data. In L. Rutkowski, J. Siekmann,
R. Tadeusiewicz, & L. Zadeh (Eds.), Artificial intelligence and soft computing-ICAISC 2004 (pp.
50–55). New York: Springer.
Han, J., Kamber, M., & Pei, J. (2006). Data mining: Concepts and techniques (2nd ed.). San Francisco:
Morgan Kaufmann.
Härdle, W., Lee, Y. J., Schäfer, D., & Yeh, Y. R. (2009). Variable selection and oversampling in the use of
smooth support vector machines for predicting the default risk of companies. Journal of Forecasting,
28, 512–534.
Hwang, R.-C., Cheng, K. F., & Lee, J. C. (2007). A semiparametric method for predicting bankruptcy.
Journal of Forecasting, 26, 317–342.
Kawakita, M., Minami, M., Eguchi, S., & Lennert-Cody, C. E. (2005). An introduction to the predictive
technique adaboost with a comparison to generalized additive models. Fisheries Research, 76, 328–
343.
Lakshminarayan, K., Harp, S. A., & Samad, T. (1999). Imputation of missing data in industrial databases.
Applied Intelligence, 11, 259–275.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Chichester: Wiley.
Maimon, O. Z., & Rokach, L. (2005). Data mining and knowledge discovery handbook. New York: Springer.
Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook. New York: Springer.
Ochs, R. A., Goldin, J. G., Abtin, F., Kim, H. J., Brown, K., Batra, P., et al. (2007). Automated classification
of lung bronchovascular anatomy in CT using AdaBoost. Medical Image Analysis, 11, 315–324.
123
L. Zhou, K. K. Lai
Park, C. S., & Han, I. (2002). A case-based reasoning with the feature weights derived by analytic hierarchy
process for bankruptcy prediction. Expert Systems with Applications, 23, 255–264.
Quinlan, J. R. (1993). C4. 5: programs for machine learning. San Francisco: Morgan Kaufmann.
Ravi Kumar, P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent
techniques—a review. European Journal of Operational Research, 180, 1–28.
Sanchis, A., Segovia, M. J., Gil, J. A., Heras, A., & Vilar, J. L. (2007). Rough sets and the role of the monetary
policy in financial stability (macroeconomic problem) and the prediction of insolvency in insurance
sector (microeconomic problem). European Journal of Operational Research, 181, 1554–1573.
Schwenk, H., & Bengio, Y. (2000). Boosting neural networks. Neural Computation, 12, 1869–1887.
Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. Journal of Business,
74, 101–124.
Sun, L., & Shenoy, P. (2007). Using bayesian networks for bankruptcy prediction: Some methodological
issues. European Journal of Operational Research, 180, 738–738.
Varetto, F. (1998). Genetic algorithms applications in the analysis of insolvency risk. Journal of Banking
and Finance, 22, 1421–1439.
West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision appli-
cations. Computers and Operations Research, 32, 2543–2559.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., et al. (2008). Top 10 algorithms in
data mining. Knowledge and Information Systems, 14, 1–37.
Zhou, L., & Lai, K. K. (2009). Adaboosting neural networks for credit scoring. In The 6th international
symposium on neural networks (ISNN 2009) (pp. 875–884). New York: Springer.
Zhou, L., Lai, K. K., & Yu, L. (2008). Credit scoring using support vector machines with direct search for
parameters selection. Soft Computing, 13, 149–155.
Zhou, L., Lai, K. K., & Yen, J. (2012). Empirical models based on features ranking techniques for corporate
financial distress prediction. Computers & Mathematics with Applications, 64, 2484–2496.
Zhu, Z., He, H., Starzyk, J. A., & Tseng, C. (2007). Self-organizing learning array and its application to
economic and financial problems. Information Sciences, 177, 1180–1192.
123

AdaBoost Models For Corporate Bankruptcy Prediction With Missing Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AdaBoost Models For Corporate Bankruptcy Prediction With Missing Data

Uploaded by

Copyright:

Available Formats

Comput Econ

AdaBoost Models for Corporate Bankruptcy Prediction

Ligang Zhou1 · Kin Keung Lai2,3

Accepted: 29 March 2016

Abstract Very little existing research in corporate bankruptcy prediction discusses

Keywords AdaBoost algorithms · Bankruptcy prediction · Missing data

In today’s strongly competitive business environment, many companies are involved

Given the practical significance of bankruptcy prediction, it has attracted much

Table 1 Description of variables

No. Description Cited times No. Description Cited times

R1 Net income/total 30 R16 Current 7

to combine two or more methods in order to take advantage of their affordances to

AdaBoost is a widely used ensemble algorithm which constructs a composite classifier

output yn ∈ {1, −1}. If the company is bankrupt, then yn = 1 , otherwise, yn = −1.

AdaBoost.M1, proposed by Freund and Schapire (1997), is a popular boosting algo-

where errt (x n ) is the misclassification error of x n by model Ct , it is defined as fol-

2.3 if εt > 0.5 or εt = 0, then

3.2 AdaBoosting Neural Networks

where errt (x n ) is the misclassification error of x n by model Ct , it is defined as follows:

2.4 if εt > 0.5, set Dt (n) = 1/N , n = 1, 2, . . . , N ;

Fig. 1 Example of neural

Input Layer Hidden Layer Output Layer

4 Missing Data Imputation Methods

4.1 k-Nearest Neighbors Average (knnA)

x n = (xn,1 , xn,2 , ...xn,i−1 , ?, xn,i+1 , ..., xn,m ).

The steps of imputation methods knnA are as follows:

Step 3: Select the k observations with the minimum distance to observation n.

4.2 Attribute Mean (AM)

A missing attribute value of a numerical attribute is replaced by the arithmetic mean

4.3 Global Closest Fit (GCF)

5.1 USA Bankruptcy Data Set (USABDS)

The financial ratios used in USABDS (available in https://goo.gl/IFFzDp) are derived

Fig. 2 Number of bankruptcies by year in USABDS

Fig. 3 Number of observations with missing values by year in USABDS

Table 2 Average performance of different imputation methods on USABDS

Samples Models Imputation Sen Spe Acc AUC

All test samples ABM1.DT Non-imp 0.7203 0.4406 0.5804 0.6412

Table 3 Wilcoxon signed rank

Table 4 Average performance

ABM1.DT 0.5594 0.8671 0.7133 0.7790

ABM1.DT 0.5804 0.8741 0.7273 0.7858

ABM1.DT 0.5734 0.8322 0.7028 0.7697

significant difference from ABNN.SWR on AUC performance. Therefore, other two

5.2 Japanese Bankruptcy Data Set (JPNBDS)

The Japanese Bankruptcy Data Set (JPNBDS, available in https://goo.gl/IFFzDp) is

Table 5 Average performance on test samples with missing values in USABDS

ABM1.DT 0.6429 0.9583 0.8421 0.7857 0.9583 0.8947

ABM1.DT 0.7143 0.8750 0.8158

Models ABM1.DT DT ABNN.WCF ABNN.SWR LR LOGR NN

AR 2.89 4.11 2.81 2.56 6.30 5.63 3.70

Models ABM1.DT DT ABNN.WCF ABNN.SWR LR LOGR NN

AR 1.74 3.04 4.00 4.48 4.33 5.11 5.30

Fig. 6 Number of Bankruptcies by year in JPNBDS

Fig. 7 The number of instances with missing values by year in JPNBDS

Table 8 Average performance of different imputation methods on JPNBDS

Table 9 Wilcoxon signed rank

Table 10 Average performance

ABM1.DT 0.9153 0.7627 0.8390 0.9168

ABM1.DT 0.9153 0.7627 0.8390 0.9168

ABM1.DT 0.8814 0.7627 0.8220 0.9099

Table 11 Average performance on test samples with missing values in JPNBDS

Models knnA AM GCM

Models ABM1.DT DT ABNN.WCF ABNN.SWR LR LOGR NN