You are on page 1of 14

Applied Soft Computing Journal 83 (2019) 105663

Contents lists available at ScienceDirect

Applied Soft Computing Journal


journal homepage: www.elsevier.com/locate/asoc

A new perspective of performance comparison among machine


learning algorithms for financial distress prediction

Yu-Pei Huang a,b , Meng-Feng Yen c ,
a
College of Management, National Cheng Kung University, Taiwan
b
Department of Electronic Engineering, National Quemoy University, No. 1, University Rd., Kinmen, County 892, Taiwan
c
Department of Accountancy and Institute of Finance, National Cheng Kung University, No. 1, University Rd., Tainan 701, Taiwan

highlights

• This paper reviewed the pros and cons of recent literature on various MLmodels for FDP.
• This paper compared the performance of six ML-based approaches using real-life data.
• Among the four supervised models, the XGBoost algorithm provided the most accurate FD prediction.
• The hybrid DBN-SVM model gave betterforecasts than both the SVM and the classifier DBN models.

article info a b s t r a c t

Article history: We set out in this study to review a vast amount of recent literature on machine learning (ML)
Received 14 March 2019 approaches to predicting financial distress (FD), including supervised, unsupervised and hybrid
Received in revised form 22 July 2019 supervised–unsupervised learning algorithms. Four supervised ML models including the traditional
Accepted 25 July 2019
support vector machine (SVM), recently developed hybrid associative memory with translation (HACT),
Available online 1 August 2019
hybrid GA-fuzzy clustering and extreme gradient boosting (XGBoost) were compared in prediction
JEL classification: performance to the unsupervised classifier deep belief network (DBN) and the hybrid DBN-SVM model,
G17 whereby a total of sixteen financial variables were selected from the financial statements of the
G32 publicly-listed Taiwanese firms as inputs to the six approaches. Our empirical findings, covering the
O16 2010–2016 sample period, demonstrated that among the four supervised algorithms, the XGBoost
O31 provided the most accurate FD prediction. Moreover, the hybrid DBN-SVM model was able to generate
Keywords: more accurate forecasts than the use of either the SVM or the classifier DBN in isolation.
HACT © 2019 Elsevier B.V. All rights reserved.
GA-fuzzy clustering
XGBoost
Hybrid DBN-SVM
Financial distress prediction

1. Introduction The prediction of the possibility of financial distress is clearly


a challenging task; however, it is generally believed that there
When a firm is unable to continue to generate cash flows are many symptoms and alerts that can be observed prior to any
from its business, maintain its profitability or meet its maturing financial problems that firms may encounter. A firm’s basic fi-
obligations as they fall due, it finds itself in a financial situation nancial statements, which are the written records of the financial
usually referred to as financial distress; however, when many situation of a business, include standard reports such as the state-
firms simultaneously become mired in such financial distress, this ment of financial position, statement of comprehensive income,
may bring about a severe financial crisis which can manifest itself statement of cash flows and statement of changes in equity, all of
through serious social problems, such as economic recession and
which provide a wide range of users with the information on the
rising unemployment. Thus, the global financial tsunami which
results of the firm’s operations and its financial position, including
occurred in 2008 led to increasing attention being paid by finan-
the cash flows of the business.
cial institutions to credit risk and ‘financial distress prediction’
(FDP).
Conventional FDP methods
∗ Corresponding author.
E-mail addresses: tim@nqu.edu.tw (Y.-P. Huang), yenmf@mail.ncku.edu.tw Since financial distress usually produces various signals, such
(M.-F. Yen). as a gradual or sudden significant reduction in profits, deferred

https://doi.org/10.1016/j.asoc.2019.105663
1568-4946/© 2019 Elsevier B.V. All rights reserved.
2 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

payment of obligations (interest, preferred dividends and finan- fish swarm algorithm and fuzzy SVM. Basing on empirical evi-
cial bills), and even bankruptcy, the information contained within dence, the hybrid model was effective at predicting the operating
the financial statements can be used to establish diagnostic mod- performance for both public and private companies.
els of FDP [1]. However, since the traditional forms of assessment Cleofas-Sanchez et al. [3] applied Santiago-Montero’s hybrid
of the credit risk of firms are invariably reliant upon the sub- associative classifier with translation (HACT) model to predict
jective judgments of human experts – based upon their past financial distress and provided empirical results supporting that
experience and some guiding principles – such assessments can HACT dominated four traditional neural networks, including
tend to be reactive rather than predictive. In particular, credit multi-layer perceptron (MLP), radial basis function (RBF), Bayesian
scoring was not used until the late 1980s in the U.K. and the U.S. network (BN), and voted perceptron (VP), one SVM and one
and perhaps, for a few lenders, until in the late 1990s. Before multi-variate logistic regression (LR) model. HACT was developed
the introduction of credit scoring methods, a bank manager, from the traditional linear associative memory model (LAMM),
usually male, had to rely on his ‘‘gut feel’’, an assessment of the one of the artificial neural network models. It is characterized
prospective borrower’s character, ability to repay, and collateral to mimic the memorizing process of human kind via a series of
or security, and an independent reference from a community associative memory. The HACT boasts its two advantages over
leader or the applicant’s employer to reach a decision. This pro- the traditional LAMM. First, while the HACT model is also a
cess was slow and inconsistent [2]. Since the invention of credit LAMM during the learning phase, it uses the Steinbuch Lernma-
scoring system, moreover, ‘‘homegrown’’ scorecards may be so trix method to enhance its prediction performance compared to
informal or easily altered. Different users at a bank may input the LAMM during the recall phase. The LAMM can only take a
the data and analyze the results using their own approaches to binary 0/1 value as its input. However, HACT accepts any type of
achieving their desired outcome. Thus, the use of such approaches numerical, raw data as the feature vector for model input. Second,
when attempting to make consistent estimates may provide er- the inputted vector does not need to meet the orthonormal
roneous results [3]. Therefore, it is desirable to develop fairly condition as required by the traditional LAMM. While the HACT
accurate quantitative prediction models using various internal model is quick to train due to its feed-forward leaning frame-
and external factors. work, however, it is only suitable for simple data with repetitive
Numerous techniques have been developed over the years in structure. It would not be able to generate good learning and
an attempt to provide analysts and decision-makers with effec- prediction results for complicated data.
tive methods of predicting financial distress based upon various Chou et al. [16] proposed a hybrid structure integrating the
financial ratios and mathematical models, with these models genetic algorithm (GA) with a fuzzy clustering algorithm. In par-
including linear and logistic regressions, multivariate adaptive ticular, key financial ratios selected by the GA are clustered by the
regression splines, survival analysis, linear and quadratic pro- fuzzy C-means clustering after the training data is divided into
gramming and multiple-criteria programming [4–6]; for example, financially distressed and financially non-distressed samples. The
Altman and Narayanan, suggested that when assessing the man- optimal number of clusters for both samples are decided by the
agement of distressed firms, the Z-score model could be used WB index. Given the clustered training data, the distance between
as a guide to financial turnaround [7]. Most of these techniques the feature vectors of each cluster’s center and a testing sample
are typically reliant upon the assumptions of linear separability is calculated and used to predict whether the testing sample will
and multivariate normality, and indeed, the independence of the be an FD case according to the concept of ‘nearest neighbor’. For a
explanatory variables [3]. However, these conditions are often clustering method, it is important to ensure that selected features
violated in real-life situations. are capable of correctly categorizing the samples into their corre-
sponding classes. The hybrid model integrates the advantages of
Machine learning both statistical theory and soft computing by using the genetic
algorithm to select features in order to make the clusters of
ML techniques have the capability of extracting meaningful features meaningful for the subsequent data classifier. However,
information from unstructured data whilst also effectively dealing the major drawbacks of this method are the high possibility of
with non-linearity. However, the application of advanced ML overfitting the training data as well as the time consumed to find
techniques to financial forecasting is still a relatively new area for the optimal results.
researchers to explore [8,9]. ML algorithms can be categorized in Zięba et al. [17] proposed using extreme gradient boosting
to two major branches: supervised learning versus unsupervised. (XGBoost) to learn an ensemble of decision trees for bankruptcy
prediction. Their so-called synthetic feature is comprised of var-
Supervised learning ious arithmetic operations such as addition, subtraction, mul-
tiplication, and division. Each synthetic feature can be treated
As regards the application of ML in the field of finance, su- as a regression model and constructed in an evolutionary way.
pervised methods such as Bayesian network algorithms, logistic They used the model to predict the financial condition of Polish
regressions and the support vector machine (SVM) are used to companies spanning the 2000–2013 period and found it able to
detect financial misstatements with fraudulent intention [10]. generate prediction results significantly better than quite a few
Neural network (NN) models are also used as the means of de- reference models such as linear discriminant analysis, multilayer
tecting earnings management [11,12]. Although NN architectures perceptron with a hidden layer, decision rules inducer, decision
have been found to demonstrate good performance in various tree model, LR, AdaBoost, SVM and random forest. Xia et al. [18]
financial applications [1], the SVM is a better technique than employed the XGBoost to structure a sequential ensemble credit
other classification and prediction methods since it produces scoring model, whereby they tuned the hyperparameters using a
more accurate results than its competitors. This finding makes Bayesian optimization algorithm with a selected feature subset.
SVM particularly suitable for FDP problems since the number of Their empirical results showed that the Bayesian optimization
financially-distressed firms is usually limited [13,14]. A recent performed better than random search, grid search, and manual
SVM-based hybrid model was proposed [15], in which the author search. The credit scoring results were interpreted by a decision
integrated a risk metric with a two-level data envelopment anal- tree using the relative importance among selected features, which
ysis (DEA) to describe the operating performance of a company. was ranked in a descending order. Carmona et al. [19] used
The hybrid model was comprised of the rough set theory, artificial the XGBoost to predict bankruptcy basing on 156 national U.S.
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 3

commercial banks spanning the 2001–2015 period. A total of 30 Unsupervised learning


financial ratios were collected on an annual basis. They found
that the ratio of retained earnings over average equity, pretax Traditional ML algorithms need to extract certain features
ROA and total risk-based capital ratio were negatively associated from the data, such as the moving average of a time series on
with the probability of bank failure. In contrast to the three ratios the price of a stock; however, it is invariably found to be quite
above, exceptionally high return on earning assets brought about difficult to systematically extract these features or to obtain all
higher risk of failure for the bank. In parallel with the study above, of the representable factors. Deep learning (DL) has emerged as
the same technique was applied by Climent et al. [20] to predict a new area of ML research since 2006, as it is found to be very
financial distress for the commercial banks in the Eurozone using effective in terms of learning the data characteristics [25,26]. The
25 yearly financial ratios spanning the 2006–2016 period. They ‘Deep belief network’ (DBN) introduced by Hinton and Salakhut-
found that the two-year-before-failure model performed better dinov [27] is a generative neural network model and works well
than its one-year and three-year counterparts. In addition, the for dimensionality reduction and feature extraction.
size of bank (measured by total assets) and the non-operating Principal component analysis (PCA) is another unsupervised
technique which is mainly used to learn linear combinations of
income over net income ratio were positively associated with the
all original variables which best fit the distribution of the training
risk of bank failure, while leverage (measured by equity divided
samples. Stated otherwise, PCA aims to find a lower-dimensional
by liabilities) and asset liquidity were negatively associated with
region in the input space in which the training data have a
the risk of bank failure.
higher probabilistic density. However, real-world data manifolds
Antunes et al. [21] proposed a probabilistic Gaussian processes
are usually complex and highly non-linear [28]. DBNs are thus a
(GP) classifier which is less sensitive to the class balance relative
better choice in modeling highly non-linear data with complexity.
to SVMs and LRs. It provided a comparable performance to imbal-
The entropy-based evaluation method, Information Gain, is
anced and balanced datasets. Basing on an entropy-based analysis characterized by complicated mathematical theories and formu-
for real-world bankruptcy data, the GP classifier was capable of las about entropy. A classification is made by maximizing the
tackling uncertainly and providing better prediction than SVMs amount of information that a certain feature can provide, which
LRs. is obtained based on the difference value of the feature’s en-
Wang and Wu [22] proposed a two-stage ensemble model, tropy. All features with trivial information gain are deleted while
which integrated a manifold learning algorithm with a kernel- preserved features are ranked in a descending order according
based fuzzy self-organizing map (KFSOM) to predict bankruptcy. to their information gain. As a result, Information Gain is able
The model employs three manifold learning algorithms, Isomap, to reduce the dimensionality of vector space model by setting
Laplacian Eigenmaps and Locally Linear Embedding, to select a the threshold. However, the problem is that it is very difficult to
feature subset, respectively. These algorithms are good at reduc- choose a proper threshold [29]. Using the unsupervised DBNs to
ing the dimension of input data for any data distribution. Three select important features has the following advantages relative to
base classifiers were constructed by the kernel-based fuzzy self- PCA and Information Gain. First, in contrast to PCA, DBNs are able
organizing map (KFSOM) using three kernel functions, Gaussian, to characterize complex, non-linear data. Second, the capability
Polynomial and Sigmoid. A total of 9 classifiers, 3 feature subsets of discovering important features by DBNs in an unsupervised
vs. 3 KFSOMs, were then integrated by the two-stage selective en- manner makes them less prone to the problem of overfitting
semble algorithm. According to different criteria, the 9 classifiers than models such as feedforward neural networks [30]. Finally,
were integrated based on a stepwise forward selection method classifier DBNs provide classification performance comparable to
in the first stage. In the second stage, three selective classifiers SVMs but better other techniques like Maximum Entropy and
were further integrated to bring about the final prediction results. Boosting-based classifiers [30].
KFSOM can improve fuzzy self-organizing map (FSOM) which is
subject to the limitation of spherical data distribution. Financial Hybrid supervised–unsupervised learning
data is usually very complicated so that the requirement for
spherical data distribution by traditional FSOM cannot be satis- While SVM is effective at producing decision surfaces from
fied. Wang and Wu [22] used kernel-based methods to map the well-behaved feature vectors, it is time-consuming when tack-
original data to high-dimensional feature space without a specific ling a large, high-dimensional dataset. Although SVM can be
used to reduce high dimensionality of features, moreover, impor-
mapping rule and conduct the non-linear transformation, which
tant information might be lost during dimensionality reduction.
facilitated the classification or clustering for the transformed data.
Therefore, an unsupervised deep learning technique, DBN, was
Garcia et al. [23] employed four linear classifiers, Fisher’s
proposed since it was able to reduce the feature dimensionality if
linear discriminant, linear discriminant classifier, SVM and LR, to
given enough hidden layers. The DBN model is capable of learning
predict corporate bankruptcy basing on the dissimilarity space in-
more profound features and capturing rich information hidden
stead of the traditional feature space. Their empirical results sug-
in the data [31]. The features obtained are substantially more
gested that the four classifiers tended to provide better prediction representative than the original data, and thus the DBN model
when applied onto the dissimilarity space than the feature space is more suitable for classification and visualization problem [25].
in terms of overall accuracy, true-positive rate and true-negative Yu et al. [31] proposed a DBN based resampling SVM ensemble
rate. learning paradigm to solve imbalanced data problem in credit
Masmoudi et al. [24] reported that, basing on two classes classification. Through integration with a resampling SVM, the
of bank-loan data, the default prediction performance of their DBN model served as a competitive ensemble strategy, capable of
discrete BN model was comparable to SVM but better than the improving the performance of classification, especially for highly
decision tree approach. Including the latent variable in the BN imbalanced datasets. The invented revenue matrix basing on the
enhanced the classification results. Their model is useful in iden- principle redeemed (from non-failure borrowers) and interest
tifying the relatively important loan characteristics which affect collected (from failure borrowers) was very relevant to the real-
the default probability. Therefore, the model is informative for world scenarios. In the circumstance of real-world setting, the
management of non-performing loans. In addition, this type of weights or costs of different class are presumed to differ from
models are quick and easy to train. each other significantly, and a certain class should attract more
4 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

attention than the other. Basing on genuine German and Japanese 2. Methodology
credit datasets, the cost-sensitive hybrid SVM-DBN structure was
effective at classifying credit risk for imbalanced data. 2.1. Financial distress variables
A hybrid DBN-SVM method was verified by Zhu et al. [32]
that it could be utilized to accurately recognize emotion status Firms are required to provide formal records of their financial
in speech, in which DBN acted as a deep feature extractor and activities in the form of financial statements; thus, all publicly-
SVM was a classifier. Combining them could achieve a better listed enterprises are required to generate four basic financial
result than that of using only DBN or SVM in isolation. Their statements, as shown below:
empirical results showed that DBN features can reflect emotion
status better than artificial features, and the hybrid DBN-SVM a. Statement of financial position — statement of financial
classification approach achieved a high accuracy of 95.8%, which position at a given date.
was higher than using either DBN or SVM separately. In sum- b. Statement of Comprehensive Income — reporting financial
mary, coupling a DBN with an SVM is advantageous since the performance in terms of net profit or loss over a specific
former addresses the complexity and scalability issues of the period.
SVM, especially when training with large-scale datasets [28]. The
c. Statement of Changes in Equity — reporting any movement
experimental results by [28] showed that the hybrid DBN-SVM
in the owner’s equity over a specific period.
model yielded comparable anomaly detection performance with
a deep autoencoder, while reducing its training and testing time d. Statement of Cash Flows — summarizing the movement in
by a factor of 3 and 1000, respectively. cash and bank balances over a specific period.

These four basic financial statements provide information on


Advantages and disadvantages of recent ML-based FDP methods
the results of the firms’ operations, financial position and cash
flows.
Following the literature review on the recent development of In order to establish the diagnosis model for financial distress
ML-based FDP methods, we highlight he advantages and disad- prediction, we selected 16 financial variables recommended in
vantages of them in Table 1 below. numerous prior related works and extracted from the state-
Having surveyed a vast amount of literature on financial dis- ments in this study [40–48]. We then constructed the proposed
tress (bankruptcy) prediction models, we would like to mention diagnosis model using the financial data provided by the Tai-
a recent reference by Barboza et al. [9] who compared the per- wan Economic Journal (TEJ). The selected variables and their
formance of four types of ML models against three types of definitions are shown in Table 2.
traditional approaches in terms of one-year-before-failure predic- The TEJ database highlighted a total of 32 firms which had
tion. The four ML families were SVMs, bagging, boosting and RF, encountered a period of financial distress between 2010 and
whilst the three kinds of traditional methods involved discrim- 2016. Each of these distressed firms was then matched with a
inant analysis, LR, and NNs. Basing on 10,000 firm-year obser- non-distressed firm from the same industry in the same event
vations of Altman’s Z-score as well as Carton and Hofer’s [38] year. The total assets of the non-distressed counterpart firm were
six financial variables for North American firms spanning the required to be the most approximate to the distressed sample
1985–2013 period, ML models were found to provide better pre- from all of the competitor firms in the same industry in the same
diction results than traditional approaches. On average, Barboza event year, resulting in a total of 32 financially-distressed and 32
et al. [9] found ML models to be about 10% more accurate than non-distressed firms in our 2010–2016 sample period. The firms
the traditional approaches (71% to 87% versus 52% to 77% in terms were categorized as financially-distressed if they were either in
of accuracy ratio). The six financial variables were composed of bankruptcy or in the process of recovery.
operating margin, change in ROE, change in P/B ratio, growth We collected and analyzed financial statement data on these
measures related to assets, sales and number of employees, which firms for one to four quarters prior to any financial distress event,
improved all models’ prediction performance obtained by using with seven combinations of the latest four quarterly financial
the Z-score alone. Among ML models, however, SVM did not bring datasets then being used to evaluate the proposed model; that
about better prediction results than its competitors, a finding is, 1Q, 2Q, 3Q, 4Q, 1Q∼2Q, 1Q∼3Q and 1Q∼4Q. These financial
contradicting Cleofas-Sánchez et al. [3] but supporting Wang data are obtained from at least one set of quarterly financial
et al. [39] among others. In particular, RF performed best among statements.
all models. As illustrated in Fig. 1, our 64 sample firms were located in 16
Intended Contribution different industry categories, of which the major industries were
Given the disparity in FDP performance among supervised optoelectronic materials, semiconductors and electronic compo-
and unsupervised ML methods reviewed above, we aimed in this nents.
study to provide a new perspective of the relative FDP perfor- The breakdown of the industries is summarized in Table 3,
mance between the two types of ML techniques. In particular, we while Table 4 shows that the total assets of the firms examined
used five recently developed methods as well as the traditional in this study ranged from ‘smaller than NT$ 1 billion’ to ‘larger
SVM to predict financial distress basing on the financial data on than NT$ 10 billion’, with about 47 per cent of the firms falling
public-listed companies in Taiwan. Four of the six models above within the NT$ 1 billion to 10 billion range. The results reported in
fell into the supervised family: SVM, HACT, hybrid GA-fuzzy clus- Table 5 reveal that the average total asset value of the distressed
tering, XGBoost. One fell into the unsupervised family: classifier firms was around NT$ 25 billion, whilst the average for the
DBN. The last one fell into the hybrid supervised–unsupervised non-distressed firms was around NT$ 29 billion.
family: hybrid DBN-SVM. We implemented the hybrid DBN-SVM
algorithm mainly to test whether the unsupervised feature ex- 2.2. Construction of FDP models
tractor can enhance the traditional supervised approach. The rest
of the paper is organized as follows. Section 2 documents the data Fig. 2 illustrates how we used the three types of ML algorithms
and machine learning techniques used in this study. Section 3 to perform FDP prediction. In particular, the four supervised ML
discusses the empirical results while Section 4 concludes. algorithms consist of SVM, HACT, hybrid GA-fuzzy clustering and
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 5

Table 1
Advantages and disadvantages of ML-based FDP models.
Supervised learning Advantages Disadvantages
Support Vector Machine (SVM) SVM is capable of eliminating massive redundancy, and The model tends to be computationally demanding for
[13,14,31] thus has superiority in low algorithmic complexity and large-scale data. Moreover, it is not able to generate
high robustness. SVM can reduce the possibility of satisfactory classification results for the ‘small’ class in
overfitting by setting the parameter of cost function, imbalanced data. Re-sampling methods can help
which is a significant problem in credit risk evaluation. overcome these drawbacks. Although SVM can be used
Relative to other supervised learning algorithms, SVM to reduce high dimensionality of features, it might
has proved to show good generalization performance. cause loss of important information.
Hybrid Associative Classifier with HACT is a kind of feed-forward neural network. The It is only suitable for learning simple and repetitive
Translation (HACT) [3] model is very fast to train since it does not involve any data. It is not able to generate satisfactory results for
back-propagation framework. complicated data.
Hybrid genetic algorithm (GA) and The hybrid model integrates the advantages of both The model is computationally demanding, in particular,
fuzzy clustering [16] statistical theory and soft computing by using the for large-scale data. High possibility of overfitting might
genetic algorithm to select features in order to make be another problem.
the clusters of features meaningful for the subsequent
data classifier. The hybrid model proved to provide
better performance than the BPNN classifier in
predicting bankruptcy.
eXtreme Gradient Boosting The supervised learning algorithm integrates gradient It has a large and complex set of hyperparameters
(XGBoost) [17–20] descent and tree ensemble learning, which can be used which are not easy to be tuned. Relative to DL
for regression and classification problems. The model is algorithms, it does not tend to perform well when
regarded as one of the best supervised learners which is handling non-structural features such as images and
fast to train and supported by various packages and voices.
platforms.
Probabilistic Gaussian processes (GP) The probabilistic GP classifier is less sensitive to the Like SVM, the final performance behavior of this
classifier [21] class balance compared to SVM and LR. It provides a kernel-based model is sensitive to parameter selection.
comparable performance to both imbalanced and
balanced datasets.
Random Forest (RF) [9,32–35] Noisy data or outliers are allowed in the training RF usually does not perform as well in regression
dataset in an RF model. RF does not only classify data problems as in classification because it cannot produce
but also provide information for the driving continuous outputs. For regression problems, RF is not
determinants of classification among groups because the able to generate predictions outside the range of the
model is able to identify the importance of each training data, indicating high possibility of overfitting
variable in the classification results. when the training data contain noise.
Two-stage selective ensemble The manifold learning algorithms are good at reducing The model’s effectiveness is only examined by using
model [22] the dimension of input data for any data distribution. Chinese listed companies. The financial data of other
The original data can be mapped into a countries should be included to validate the model.
high-dimensional feature space by the kernel-based
FSOM method to cluster original data, which relaxes the
limitation of the spherical distribution for the data in
traditional FSOM. The stepwise forward selection
method in the first stage of this two-stage ensemble
model is easy to understand and implement, and can
strengthen the model’s performance effectively.
Linear classifiers [23] The four linear classifiers, Fisher’s linear discriminant, They are not applicable for non-linearly distributed data.
linear discriminant classifier, SVM and LR, are good at
handling sparse data, easy to describe mathematically.
In addition, they are computationally simple and easy to
interpret. When applied to data of dissimilarity, they
are able to provide satisfactory performance.
Discrete Bayesian network BNs are one of the most consistent and general Although the Bayesian formalism is useful in modeling
(BN) [24,36] specifications for modeling complicated systems. The the knowledge of experts, it is difficult to define the
model is useful in identifying the relatively important conditional probability table out of experts’ opinion due
loan characteristics which affect the default probability. to the difficulty to reach agreement on the BN model
with experts.
Unsupervised learning Advantages Disadvantages
Classifier Deep Belief Network Given sufficient hidden layers, the model is good at Finding the optimal parameterization for the model is
(DBN) [29–31,37] learning features. In particular, it can identify more complicated and time demanding.
profound features and uncover rich information hidden
in the data.
Hybrid supervised–unsupervised Advantages Disadvantages
method
Hybrid DBN-SVM [28,31,37] Since it is difficulty for SVM to classify large datasets Finding the optimal parameterization for the DBN part
with high dimensionality, using the before-mentioned of the hybrid model is complicated and time demanding.
DBN to reduce the dimensionality of features for SVM
classifier is advantageous.

XGBoost, while the unsupervised ML algorithm refers to the clas- to distinguish whether a firm was financially distressed (FD) or
sifier DBN and the hybrid supervised–unsupervised ML algorithm
refers to the hybrid DBN-SVM model. All six models were used financially non-distressed (FND).
6 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

Fig. 1. Breakdown of the 64 companies investigated, by industry.

Table 2 The financial data on the FD and FND firms were retrieved
Selected financial variables for financial distress prediction. from the four basic financial statements which are readily avail-
Variables Definitions References able from the TEJ database. We calculated the 16 variables which
V1 Current ratio [16,17,21,22,40–42] were regarded within the extant literature as having the greatest
V2 Cash flow/total debt [22,40–42]
relevance to financial distress. This set of 16 financial ratios on
V3 Cash flow/total asset [40,43,44]
V4 Cash flow/sales [40,43,45] all firms was then divided into validation and training sets for
V5 Debt ratio [16,40,44,46] prediction accuracy evaluation.
V6 Working capital/total asset [9,17,21,40,47,48]
V7 Market value equity/total debt [40,42,45] 2.3. Supervised prediction models
V8 Current asset/total asset [40,43]
V9 Quick asset/total asset [40,43]
V10 Sales/total asset [9,40,45,48] The four supervised ML algorithms are specified as follows.
V11 Current debt/sales [40,43]
V12 Quick asset/sales [40,43] 2.3.1. SVM
V13 Working capital/sales [40,42,44]
V14 Net income/total asset [40,41,44]
The SVM is a supervised ML technique, originally introduced
V15 Retained earnings/total asset [9,17,40,46,48] by Cortes and Vapnik [49], which has since become a standard
V16 Earnings before interest and taxes/total asset [40,45,48] classification technique in many classification problems. When
the SVM is exploited as a binary classifier, it constructs a hy-
Table 3
perplane which separates two classes by training data with a
Summary of the industry categories of the companies investigated. maximal margin by quadratic optimization method [50]. The SVM
Industry categories No. Industry categories No. model of this study was constructed with the training dataset
Accommodation 2 Metallic furniture 4
{xi , yi }(i = 1, 2, . . . , n; yi ∈ {−1, +1}), where xi was the input
Air transport 2 Optoelectronic materials 14 financial data vector; yi = 1 was defined as an output belonging
Basic metals 6 Raw material medicines 2 to the FD class, and yi = −1 was set as an output in the FND class;
Computers 2 Semi-conductors 8 n was the size of the training dataset. In addition, to avoid the
Construction 2 Software 4
Consumer electronics 2 Telecommunications equipment 2
dominance of financial data of greater numerical variation over
Electronic components 8 Textiles 2 those of relatively smaller variation, all features of the training
Materials recovery 2 Wearing apparel 2 data were scaled to the range [−1, +1].
The aim of training an SVM is to find a maximum-margin
hyperplane ω · xi + b = 0 to separate yi = 1 from yi = −1, where
Table 4
Total assets of the candidate firms. ω represents the weight vector of the separating hyperplane, and
Total assets (NT$) Number of firms % b represents the offset. It is usually difficult to linearly separate
<1 billion 24 38
financial data points. As a result, a ‘radial basis function’ (RBF)
1∼10 billion 30 47 kernel function k(xi , xj ) was adopted in this research to map
>10 billion 10 16 the training data vector xi onto a higher-dimensional feature
space vector Φ (xi ) for separation. All the parameters of the SVM
Table 5
model were tuned in order to obtain the optimal maximal-margin
Summary statistics on the total assets of the candidate firms. hyperplane for classification.
Total assets (NT$) Distressed Non-Distressed
Average (billion) 25.83 29.11 2.3.2. Hybrid associative classifier with translation (HACT)
Max. (billion) 722.86 612.37 Following Cleofas-Sánchez et al. [3], in the learning phase of
Min. (billion) 0.03 0.26 HACT, all features (financial ratios) of the training data needed
Median (billion) 0.88 2.7 to be normalized, in which the mean of a feature is subtracted
No. of firms 32 32
from its original value. The feature vector and classification label
of the 48 out of the 64 samples were used for training a linear
associative memory model which would learn and memorize the
patterns of the data above. In the recall phase (i.e. prediction
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 7

Fig. 2. Data processing flow of the three types of ML FDP algorithms.

Fig. 3. Algorithm of HACT.

phase) of HACT, we normalized each feature of the testing dataset the optimal set of features in which the number of genes of each
by subtracting the mean of all 48 samples of the training dataset chromosome was equal to the number of data features (financial
from the feature’s value of the testing dataset. The 16 normalized ratios). If the value of a gene was equal to 1 (0), the corresponding
samples of the testing dataset were then used to produce FDP. We feature would (not) be selected. We set the maximum number
cross-validated the prediction accuracy by performing the experi- of generations equal to 100, each with 200 chromosomes. Only
ment above four times, each corresponding to a non-overlapping 10 chromosomes with the highest fitness values were kept for
testing dataset of 16 samples. Fig. 3 illustrates the algorithm of the next generation. We used the uniform crossover and set the
HACT. mutation rate equal to 0.1. An early stop would apply when
the best fitness value could not be further elevated within any
2.3.3. Hybrid GA-fuzzy clustering 10 consecutive generations. To evaluate the feature selection
Figs. 4 and 5 below illustrate Chou et al.’s [16] algorithm in the outcome (fitness) of a chromosome, we reserved a random 30%
context of this study. The genetic algorithm was used to select of the training samples. In particular, the 70% of the training
8 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

Fig. 4. Algorithm of hybrid GA-fuzzy clustering.

samples were used to construct the model, while the remaining XGBoost model to improve on its prediction performance and
30% of them were reserved to evaluate the FDP accuracy (i.e. the optimize the value of a pre-specified objective function. There is a
fitness of the chromosome). When constructing the model, we
total of 5 hyperparameters for an XGBoost model: learning rate,
applied the fuzzy C-mean clustering to the FD samples and FND
samples, respectively, using the Silhouette index to determine the maximum depth, subsampling percentage for samples, subsam-
optimal cluster number. Note that we replaced Chou et al.’s [16] pling percentage for features, and maximum number of iterations.
WB index by the commonly-adopted Silhouette index since the To achieve best prediction performance, we selected the optimal
WB index was proposed by a non-published reference which is
set of hyperparameters using a 4-fold cross-validation. In line
not available.
with the hybrid GA-fuzzy clustering algorithm, we needed to
2.3.4. XGBoost normalize each feature of the training samples to the range [0,
XGBoost is a supervised learning which can be used for re- 1]. In particular,
gression and classification problems. The concept of XGBoost
combines gradient descent and tree ensemble learning. Through xtrain − min(xtrain )
additive training, one new tree is added at a time to the previous xˆ
train = (1)
max(xtrain ) − min(xtrain )
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 9

Fig. 5. The feature selection process of the hybrid GA-fuzzy clustering algorithm.

The original range of each feature of the training samples was 2.4. Unsupervised prediction model (DBN)
then used to normalize the testing samples as follows.
DBN is one of the unsupervised ML algorithms attracting most
xtest − min(xtrain ) attention. The model is comprised of multiple layers of stacked

test = (2)
max(xtrain ) − min(xtrain ) ‘restricted Boltzmann machines’ (RBM). RBM is a stochastic gen-
erative neural network with symmetric weighted connections
If any normalized feature of the testing sample fell outside the [0,
(w) between the visible and hidden layers, but there are no
1] range, however, we would renormalize it back the [0, 1] range.
connections between the neurons within a layer. As illustrated
As regards the hyperparameters, we set the learning rate equal in Fig. 7, there are two layers of neurons in a RBM, which are the
to 0.1, the maximum depth 5, 7, 9, 10, and 11, respectively, visible layer with neurons (vi ) and the hidden layer with latent
the subsampling percentage for samples 0.5, 0.7, and 0.9, re- neurons (hj ).
spectively, the subsampling percentage for features 0.5, 0.7, and An autoencoder DBN is a powerful model for feature extrac-
0.9, respectively, amounting to a total of 45 combinations. The tion applications [27]. As illustrated in Fig. 8, an autoencoder
maximum number of iterations was set equal to 2000. When DBN is constructed by stacking RBMs on top of each other. In
the loss function of the evaluation dataset could not be further this study, the visible neurons of the RBM#1 take real values
reduced within any 20 consecutive iterations, the training was because the calculated financial ratios are all continuous data, as
opposed to the hidden neurons with stochastic binary values. The
immediately terminated. We used a 4-fold cross-validation to
DBN model is trained from bottom to top layers; for instance,
obtain the optimal set of hyperparameters. Given the optimal
the RBM#1 is trained first, and the activation probabilities of the
hyperparameters, we used all 48 samples of the training dataset neurons of its hidden layer (h1 ) then become the input data of
to construct the model which was then used to predict FD for the visible layer (v2 ) for the RBM#2 learning. All the RBMs of a
the testing dataset. Fig. 6 illustrates the algorithm of the XGBoost DBN are sequentially trained in the aforementioned manner, and
model. finally the activation values of the hidden neurons of the top RBM
10 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

Fig. 6. The algorithm of XGBoost.

are returned as extracted features. The extracted feature vector


of the top RBM is more useful for classification than the original
input data. This layer-by-layer learning can be repeated as many
times as desired to find the best feature-extracting DBN.
The efficient training method for RBMs proposed by Hinton
is called contrastive divergence (CD), which provides an approxi-
mation to the maximum likelihood method to adjust the weights
and biases between the visible and hidden neurons. The learning
process of the CD algorithm is illustrated in Fig. 9. The CD method
starts with a ‘‘learning’’ procedure, which uses RBMs as a tool
to extract the features of the input data. After that, a ‘‘recon-
struction’’ procedure learns features from the extracted feature,
and therefore the reconstructed vectors can be used to compare
with the original data for weights and biases training. A higher
probability of the joint configuration of the visible vectors (vi ) and Fig. 7. An example of RBM.

hidden vectors (hj ) can be trained by adjusting the corresponding


weights and biases to obtain a lower energy function value [51].
However, a DBN can also be used as a classifier, the primary
a classifier DBN, it is necessary to include a discriminative RBM
aim of which was to obtain labels from the input data, namely,
distressed or non-distressed in the context of this study. For such in the last layer for classification.
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 11

FDP model. Having experimented with a number of different


frameworks, a five-layer DBN with three hidden layers of 200,
100 and 50 neurons, along with three features at the output layer,
was found to provide the best performance in this study.

2.6. Training and validation of the models

The DBN was implemented in the object-oriented Deep Learn-


ing Toolbox v1.2 in MATLAB. All of the code and experiments
were carried out using MATLAB 2016a. The Parallel Computing
ToolboxTM and a CUDA⃝ R
-enabled NVIDIA⃝ R
GPU were also used
in order to accelerate DBN training. In contrast, the SVM was
implemented using the MATLAB Toolbox, with the ‘svmtrain’
method being used for training and the ‘svmclassify’ method
for classification. As regards the algorithms and experiments for
HACT, hybrid GA-fuzzy clustering and XGBoost, they were all
programmed and implemented using MATLAB 2016a.
As illustrated in Fig. 2, data on the 16 financial ratios for both
FD and FND samples were divided into validation and training
sets for prediction accuracy evaluation. In order to make the most
of the available data and achieve an average level of classifi-
cation accuracy, we employed a variant of the ‘leave-one-out’
cross-validation approach, since the cross-validation procedure
is capable of circumventing the problem of over-fitting. In par-
ticular, in each experiment, the data belonging to each class
(distressed or non-distressed) were randomly divided into four
sub-sets, with each of these four sub-sets containing 25 per cent
of the data in any given class. In each cross-validation, one data
sub-set was selected as a test set, with the remaining sub-sets
forming the training set. We carried out cross-validation four
times, and then subsequently measured the average classification
accuracy of the classifier. We also took into consideration the
different time spans of the prior financial data in the study to
Fig. 8. An autoencoder DBN with four stacked RBMs [27]. identify whether prediction performance was dependent upon
how many quarterly financial statements were used.

3. Empirical results

3.1. Classification results

The evaluation results reported in Table 6 revealed that, among


all algorithms, XGBoost provided the most accurate FD prediction.
Among the unsupervised and hybrid learning algorithms, on the
other hand, the hybrid DBN-SVM generated the most accurate FD
prediction. In general, SVM, XGBoost, classifier DBN, and hybrid
Fig. 9. The learning process of the contrastive divergence algorithm. DBN-SVM models produced better prediction using consecutive
quarterly financial ratios (1Q∼2Q, 1Q∼3Q, 1Q∼4Q) than using
one single quarterly financial ratio. For HACT and hybrid GA-fuzzy
2.5. Hybrid supervised–unsupervised prediction model (DBN-SVM) clustering, the difference in prediction performance between the
use of consecutive quarterly data and the use of one single
In addition to the supervised and unsupervised ML algorithms, quarterly data was economically trivial.
we also examined a recently introduced hybrid FDP model which In addition, the hybrid DBN-SVM model was capable of gen-
integrated the advantages of both supervised and unsupervised erating more accurate forecasts of financial distress, as compared
ML algorithms [52]. In particular, we used an autoencoder DBN to the use of either the SVM or the classifier DBN algorithm in
to extract features from financial ratios. Each visible neuron of isolation. Furthermore, the results also showed that the informa-
the first layer represented a financial ratio, while each hidden tion contained in a consecutive series of quarterly data provided
neuron of the last RBM represented a feature after the train- all three approaches with better forecasting accuracy than basic
ing was completed. Given that the data were pre-trained using quarterly data.
the autoencoder DBN network, the extracted features were then Turning to the best FD predictor, XGBoost, Table 7 summarized
classified using the SVM. In this study, the number of features its prediction performance using various spans of quarterly data
extracted by the autoencoder DBN was set equal to 3. As a result, prior to the event quarter. Basing on the results contained by
the 16 financial ratios of each firm were transformed into three Table 7, the best prediction rate was generated by using either
features which were then used as input data for the SVM. We the 1Q∼2Q or 1Q∼4Q dataset. In both scenarios, the type I error
employed the same parameter specifications for the classifier was as low as 3.1% and the type II error was only 6.3%.
DBN (including the same numbers of input and hidden layers) The prediction performance of the hybrid DBN-SVM model
to carry out the model training in the previous unsupervised using different combinations of prior quarterly financial data was
12 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

Table 6
Comparisons of model prediction accuracy across different spans of quarterly data prior to each event quarter.
Prediction accuracy (%)
Learning types Models
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
SVM 73.4 70.3 67.2 62.5 82.8 79.2 81.3
Supervised ML learning HACT 67.2 56.3 53.1 64.1 64.1 57.8 56.3
GA-Fuzzy 43.8 35.9 39.1 37.5 39.1 39.1 40.6
XGBoost 87.5 89.1 84.4 82.8 90.6 87.5 90.6
Unsupervised ML learning Classifier DBN 53.1 68.8 51.6 65.6 68.0 69.8 71.9
Hybrid supervised–unsupervised ML Learning DBN-SVM 79.7 75.0 70.3 71.9 89.1 82.3 86.3

Table 7
The prediction performance of the XGBoost model using different spans of
quarterly data prior to each event quarter.
Prediction accuracy (%)
Variables
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
Type I error 4.7 4.7 9.4 10.9 3.1 3.1 3.1
Type II error 7.8 6.3 6.3 6.3 6.3 9.4 6.3
True positive rate 84.4 87.5 87.5 87.5 87.5 81.3 87.5
True negative rate 90.6 90.6 81.3 78.1 93.8 93.8 93.8
Prediction accuracy 87.5 89.1 84.4 82.8 90.6 87.5 90.6

Note:
Type I error (%) = FP /(TP + FN + FP + TN), Type II error(%) = FN /(TP + FN +
FP + TN),
True positive rate (%) = TP /(TP + FN), True negative rate (%) = TN /(FP + TN),
Accuracy (%) = (TP + TN)/(TP + FN + FP + TN);
Abbreviation: FP: false positives, FN: false negatives, TP: true positives, TN: true
negatives.

Table 8 Fig. 10. Feature extraction result for the DBN (1Q∼2Q).
The prediction performance of the hybrid DBN-SVM model using different spans
of quarterly data prior to each event quarter.
Prediction accuracy (%)
Variables
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
Type I error 12.5 10.9 15.6 15.6 4.7 13.0 7.0
Type II error 7.8 14.1 14.1 12.5 6.3 5.2 6.6
True positive rate 84.4 71.9 71.9 75.0 87.5 89.6 86.7
True negative rate 75.0 78.1 68.8 68.8 90.6 74.0 85.9
Prediction accuracy 79.7 75.0 70.3 71.9 89.1 81.8 86.3

summarized in Table 8, where the results clearly indicated that


the use of multiple quarterly data produced greater accuracy
(>80 per cent), as compared to the use of basic quarterly data.
Consistent with the results observed for feature extraction, the
highest prediction accuracy was found to be obtained by using
the previous two consecutive quarters of financial data. We found
that the prediction accuracy was around 89 per cent, with a
type I error of 4.7 per cent and a type II error of 6.2 per cent. Fig. 11. Separation of distressed and non-distressed firms (1Q∼2Q).
However, our results also indicated that the use of longer-period
consecutive quarterly data might not necessarily lead to better
prediction accuracy.
roughly separating the financially-distressed (FD) and financially
3.2. Feature extraction of the hybrid DBN-SVM model non-distressed (FND) classes. The features were produced by a
16-200-100-50-3 DBN which mapped the input financial data
In the hybrid model, the data on the 16 selected financial onto three features, where 16, 200, 100, 50 and 3 referred to the
variables were pre-trained using a DBN network which extracted number of hidden neurons of the input layer, the three hidden
the features from the input data. The features were extracted by layers and the output layer, respectively. As a result, the DBN
the DBN, layer by layer, with the hidden-neuron activation values mapped the input data (16 features) onto three features.
then being returned into the output layer as extracted features. The data extracted from the DBN were subsequently classified
The empirical results showed that using these extracted fea- by the SVM, with the results illustrated in Fig. 11 being based
tures did improve the financial distress prediction performance
upon the two consecutive quarters of data prior to the occurrence
relative to using the original input data for SVM. The highest
of any financial distress event. The financially distressed and non-
prediction accuracy was obtained by using the previous two
consecutive quarters (1Q∼2Q) of financial data prior to any fi- distressed classes were successfully separated by the hyperplane,
nancial distress event. The DBN feature-extraction results using with the highest accuracy level being provided by the previous
the two consecutive quarters of data were illustrated in Fig. 10. two consecutive quarters of financial data, as clearly illustrated
As shown in Fig. 10, the three extracted features were capable of in Fig. 11.
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 13

4. Conclusions Acknowledgments

A series of financial distress by large companies might cause The corresponding author is grateful to the Center for Inno-
a chain reaction such as the 2008 financial crisis. If the firm vative FinTech Business Models, National Cheng Kung University,
management, the lenders or stockholders are not able to foresee Taiwan (ROC) for a research grant to support this work. We would
such a crisis, chances are the result will damage their interests like to thank the Reviewers for their valuable comments and
to a significant extent in the end. However, traditional credit suggestions, which have helped to improve the quality of this
risk assessment approaches are inconsistent across various users paper substantially.
and the assessment result highly depends upon their desired
outcome. As a result, accurate quantitative methods are necessary References
in providing an objective and consistent estimates of the risk
[1] W.-S. Chen, Y.-K. Du, Using neural networks and data mining techniques
of financial failure. We set out in this study to compare the for the financial distress prediction model, Expert Syst. Appl. 36 (2009)
performance of six machine-learning approaches in the context of 4075–4086.
financial-distress prediction. By referring to the extant literature, [2] L.C. Thomas, D.B. Edelman, J.N. Crook, Credit Scoring and Its Applications,
SIAM, 2002.
we summarized sixteen financial ratios, calculated from the four
[3] L. Cleofas-Sánchez, V. García, A.I. Marqués, J.S. Sánchez, Financial distress
basic financial statements of publicly-listed firms in the Taiwan prediction using the hybrid associative memory with translation, Appl. Soft
stock market which are regarded as being related to future finan- Comput. 44 (2016) 144–152.
cial distress. The data on these financial ratios were subsequently [4] G.V. Karels, A.J. Prakash, Multivariate normality and forecasting of business
bankruptcy, J. Bus. Finance Account. 14 (1987) 573–593.
used as input data for six classification algorithms, traditional
[5] M. Ezzamel, C. Mar-Molinero, A. Beech, On the distributional properties of
SVM, HACT, hybrid GA-fuzzy clustering, XGBoost, classifier DBN financial ratios, J. Bus. Finance Account. 14 (1987) 463–481.
and hybrid DBN-SVM, to predict financial distress. Our empir- [6] V. Ravi, H. Kurniawan, P.N.K. Thai, P.R. Kumar, Soft computing system for
ical results appeared to suggest that among all four types of bank performance prediction, Appl. Soft Comput. 8 (2008) 305–315.
[7] E.I. Altman, P. Narayanan, An international survey of business failure
supervised approaches, the XGBoost model produced the most
classification models, Financial Mark. Inst. Instrum. 6 (1997) 1–57.
accurate prediction performance. In addition, we demonstrated [8] R.C. Cavalcante, R.C. Brasileiro, V.L.F. Souza, J.P. Nobrega, A.L.I. Oliveira,
that the hybrid model comprising of the DBN feature extractor Computational intelligence and financial markets: A survey and future
and the SVM – within which the data were pre-trained using directions, Expert Syst. Appl. 55 (2016) 194–211.
the DBN algorithm and then classified using the SVM algorithm [9] F. Barboza, H. Kimura, E. Altman, Machine learning models and bankruptcy
prediction, Expert Syst. Appl. 83 (2017) 405–417.
– was capable of predicting the financial distress of a firm to a [10] Y.J. Kim, B. Baik, S. Cho, Detecting financial misstatements with fraud
satisfactory extent. The latent features were extracted from the intention using multi-class cost-sensitive learning, Expert Syst. Appl. 62
ratios calculated from the financial statement data using the DBN. (2016) 32–43.
Our results showed the users of financial statements the fea- [11] J. Haga, J. Siekkinen, D. Sundvik, A neural network approach to measure
real activities manipulation, Expert Syst. Appl. 42 (2015) 2313–2322.
sibility of using financial statement data to construct an effective [12] H. Höglund, Detecting earnings management with neural networks, Expert
FDP model. Indeed, XGBoost and the use of deep learning tech- Syst. Appl. 39 (2012) 9564–9570.
nique to augment supervised ML approaches have shown promise [13] B.E. Erdogan, Prediction of bankruptcy using support vector machines:
in this research area. The main contributions of and values added an application to bank bankruptcy, J. Stat. Comput. Simul. 83 (2013)
1543–1555.
by this study are as follows. First, it summarized the advan- [14] J. Sun, H. Li, Financial distress prediction using support vector machines:
tages and limitations of recent literature on different supervised, Ensemble vs. individual, Appl. Soft Comput. 12 (2012) 2254–2265.
unsupervised, and hybrid supervised–unsupervised models for fi- [15] M.-F. Hsu, A fusion mechanism for management decision and risk analysis,
nancial distress prediction. Second, it compared the performance Cybern. Syst. (2019) 1–19.
[16] C.-H. Chou, S.-C. Hsieh, C.-J. Qiu, Hybrid genetic algorithm and fuzzy
of six ML-based approaches in the context of financial distress clustering for bankruptcy prediction, Appl. Soft Comput. 56 (2017)
prediction using real-life data. Third, it suggested that among 298–316.
the four supervised algorithms, the recently popular XGBoost [17] M. Zięba, S.K. Tomczak, J.M. Tomczak, Ensemble boosted trees with syn-
algorithm provided the most accurate FD prediction. Finally, it thetic features generation in application to bankruptcy prediction, Expert
Syst. Appl. 58 (2016) 93–101.
verified that the hybrid DBN-SVM model was able to generate [18] Y. Xia, C. Liu, Y. Li, N. Liu, A boosted decision tree approach using Bayesian
more accurate forecasts than the use of either the SVM or the hyper-parameter optimization for credit scoring, Expert Syst. Appl. 78
classifier DBN in isolation. Future research is encouraged to ex- (2017) 225–241.
amine if the feature extracting function of a DBN can be used to [19] P. Carmona, F. Climent, A. Momparler, Predicting failure in the U.S banking
sector: An extreme gradient boosting approach, Int. Rev. Econ. Finance 61
further improve on the latest supervised ML approaches such as (2019) 304–323.
the XGBoost for financial distress prediction. [20] F. Climent, A. Momparler, P. Carmona, Anticipating bank distress in the
There is, however, one specific limitation that is worth noting. Eurozone: An extreme gradient boosting approach, J. Bus. Res. 101 (2019)
In line with the prior studies, we constructed the FDP model 885–896.
[21] F. Antunes, B. Ribeiro, F. Pereira, Probabilistic modeling and visualization
based upon the quarterly data of the 16 selected variables; future for bankruptcy prediction, Appl. Soft Comput. 60 (2017) 831–843.
studies may succeed in improving prediction accuracy further [22] L. Wang, C. Wu, Business failure prediction based on two-stage selec-
by using different sets of financial variables sampled at different tive ensemble with manifold learning algorithm and kernel-based fuzzy
time frequencies, such as data on monthly revenue or daily stock self-organizing map, Knowl.-Based Syst. 121 (2017) 99–110.
[23] V. Garcia, A.I. Marques, J.S. Sanchez, H.J. Ochoa-Dominguez, Dissimilarity-
prices. based linear models for corporate bankruptcy prediction, Comput. Econ.
53 (2019) 1019–1031.
[24] K. Masmoudi, L. Abid, A. Masmoudi, Credit risk modeling using Bayesian
Declaration of competing interest network with a latent variable, Expert Syst. Appl. 127 (2019) 157–166.
[25] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief
No author associated with this paper has disclosed any po- nets, Neural Comput. 18 (2006) 1527–1554.
[26] Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn.
tential or pertinent conflicts which may be perceived to have
2 (2009) 1–127.
impending conflict with this work. For full disclosure statements [27] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with
refer to https://doi.org/10.1016/j.asoc.2019.105663. neural networks, Science 313 (2006) 504–507.
14 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663

[28] S.M. Erfani, S. Rajasegarar, S. Karunasekera, C. Leckie, High-dimensional [40] F. Lin, D. Liang, E. Chen, Financial ratio selection for business crisis
and large-scale anomaly detection using a linear one-class SVM with deep prediction, Expert Syst. Appl. 38 (2011) 15094–15102.
learning, Pattern Recognit. 58 (2016) 121–134. [41] M.E. Zmijewski, Methodological issues related to the estimation of financial
[29] H. Liang, X. Sun, Y. Sun, Y. Gao, Text feature extraction based on deep distress prediction models, J. Account. Res. 22 (1984) 59–82.
learning: a review, EURASIP J. Wireless Commun. Networking 2017 (2017) [42] D. Martens, L. Bruynseels, B. Baesens, M. Willekens, J. Vanthienen, Pre-
211. dicting going concern opinion with data mining, Decis. Support Syst. 45
[30] R. Sarikaya, G.E. Hinton, A. Deoras, Application of deep belief networks (2008) 765–777.
for natural language understanding, IEEE/ACM Trans. Audio Speech Lang. [43] E.B. Deakin, Discriminant analysis of predictors of business failure, J.
Process. 22 (2014) 778–784. Account. Res. 10 (1972) 167–179.
[31] L. Yu, R. Zhou, L. Tang, R. Chen, A dbn-based resampling SVM ensemble [44] J.A. Ohlson, Financial ratios and the probabilistic prediction of bankruptcy,
learning paradigm for credit classification with imbalanced data, Appl. Soft J. Account. Res. 18 (1980) 109–131.
Comput. 69 (2018) 192–202. [45] H. Li, J. Sun, B.L. Sun, Financial distress prediction based on OR-CBR in the
[32] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32. principle of k-nearest neighbors, Expert Syst. Appl. 36 (2009) 643–659.
[33] J. Kruppa, A. Schwarz, G. Arminger, A. Ziegler, Consumer credit risk: [46] Y. Ding, X. Song, Y. Zen, Forecasting financial condition of Chinese listed
Individual probability estimates using machine learning, Expert Syst. Appl. companies based on support vector machine, Expert Syst. Appl. 34 (2008)
40 (2013) 5125–5131. 3081–3089.
[34] C.-C. Yeh, D.-J. Chi, Y.-R. Lin, Going-concern prediction using hybrid random [47] W.H. Beaver, Financial ratios as predictors of failure, J. Account. Res. 4
forests and rough set approach, Inform. Sci. 254 (2014) 98–110. (1966) 71–111.
[35] C. Maione, E.S. de Paula, M. Gallimberti, B.L. Batista, A.D. Campiglia, F. [48] E.I. Altman, Financial ratios discriminat analysis and prediction of corporate
Barbosa Jr, R.M. Barbosa, Comparative study of data mining techniques for bankruptcy, J. Finance 23 (1968) 589–609.
the authentication of organic grape juice based on ICP-MS analysis, Expert [49] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995)
Syst. Appl. 49 (2016) 60–73. 273–297.
[36] L. Uusitalo, Advantages and challenges of Bayesian networks in [50] V. Kecman, Support vector machines – An introduction, in: L. Wang
environmental modelling, Ecol. Model. 203 (2007) 312–318. (Ed.), Support Vector Machines: Theory and Applications, Springer Berlin
[37] L. Zhu, L. Chen, D. Zhao, J. Zhou, W. Zhang, Emotion recognition from Heidelberg, Berlin, Heidelberg, 2005, pp. 1–47.
Chinese speech for smart affective services using a combination of SVM [51] G.E. Hinton, Training products of experts by minimizing contrastive
and DBN, Sensors 17 (2017) 1694. divergence, Neural Comput. 14 (2002) 1771–1800.
[38] R.B. Carton, C.W. Hofer, Measuring Organizational Performance: Metrics [52] Z. Lanbouri, S. Achchab, A hybrid deep belief network approach for
for Entrepreneurship and Strategic Management Research, Edward Elgar financial distress prediction, in: 2015 10th International Conference on
Publishing, 2006. Intelligent Systems: Theories and Applications, SITA, 2015, pp. 1–6.
[39] G. Wang, J. Ma, S. Yang, An improved boosting based on feature selec-
tion for corporate bankruptcy prediction, Expert Syst. Appl. 41 (2014)
2353–2361.

You might also like