Professional Documents
Culture Documents
highlights
• This paper reviewed the pros and cons of recent literature on various MLmodels for FDP.
• This paper compared the performance of six ML-based approaches using real-life data.
• Among the four supervised models, the XGBoost algorithm provided the most accurate FD prediction.
• The hybrid DBN-SVM model gave betterforecasts than both the SVM and the classifier DBN models.
article info a b s t r a c t
Article history: We set out in this study to review a vast amount of recent literature on machine learning (ML)
Received 14 March 2019 approaches to predicting financial distress (FD), including supervised, unsupervised and hybrid
Received in revised form 22 July 2019 supervised–unsupervised learning algorithms. Four supervised ML models including the traditional
Accepted 25 July 2019
support vector machine (SVM), recently developed hybrid associative memory with translation (HACT),
Available online 1 August 2019
hybrid GA-fuzzy clustering and extreme gradient boosting (XGBoost) were compared in prediction
JEL classification: performance to the unsupervised classifier deep belief network (DBN) and the hybrid DBN-SVM model,
G17 whereby a total of sixteen financial variables were selected from the financial statements of the
G32 publicly-listed Taiwanese firms as inputs to the six approaches. Our empirical findings, covering the
O16 2010–2016 sample period, demonstrated that among the four supervised algorithms, the XGBoost
O31 provided the most accurate FD prediction. Moreover, the hybrid DBN-SVM model was able to generate
Keywords: more accurate forecasts than the use of either the SVM or the classifier DBN in isolation.
HACT © 2019 Elsevier B.V. All rights reserved.
GA-fuzzy clustering
XGBoost
Hybrid DBN-SVM
Financial distress prediction
https://doi.org/10.1016/j.asoc.2019.105663
1568-4946/© 2019 Elsevier B.V. All rights reserved.
2 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663
payment of obligations (interest, preferred dividends and finan- fish swarm algorithm and fuzzy SVM. Basing on empirical evi-
cial bills), and even bankruptcy, the information contained within dence, the hybrid model was effective at predicting the operating
the financial statements can be used to establish diagnostic mod- performance for both public and private companies.
els of FDP [1]. However, since the traditional forms of assessment Cleofas-Sanchez et al. [3] applied Santiago-Montero’s hybrid
of the credit risk of firms are invariably reliant upon the sub- associative classifier with translation (HACT) model to predict
jective judgments of human experts – based upon their past financial distress and provided empirical results supporting that
experience and some guiding principles – such assessments can HACT dominated four traditional neural networks, including
tend to be reactive rather than predictive. In particular, credit multi-layer perceptron (MLP), radial basis function (RBF), Bayesian
scoring was not used until the late 1980s in the U.K. and the U.S. network (BN), and voted perceptron (VP), one SVM and one
and perhaps, for a few lenders, until in the late 1990s. Before multi-variate logistic regression (LR) model. HACT was developed
the introduction of credit scoring methods, a bank manager, from the traditional linear associative memory model (LAMM),
usually male, had to rely on his ‘‘gut feel’’, an assessment of the one of the artificial neural network models. It is characterized
prospective borrower’s character, ability to repay, and collateral to mimic the memorizing process of human kind via a series of
or security, and an independent reference from a community associative memory. The HACT boasts its two advantages over
leader or the applicant’s employer to reach a decision. This pro- the traditional LAMM. First, while the HACT model is also a
cess was slow and inconsistent [2]. Since the invention of credit LAMM during the learning phase, it uses the Steinbuch Lernma-
scoring system, moreover, ‘‘homegrown’’ scorecards may be so trix method to enhance its prediction performance compared to
informal or easily altered. Different users at a bank may input the LAMM during the recall phase. The LAMM can only take a
the data and analyze the results using their own approaches to binary 0/1 value as its input. However, HACT accepts any type of
achieving their desired outcome. Thus, the use of such approaches numerical, raw data as the feature vector for model input. Second,
when attempting to make consistent estimates may provide er- the inputted vector does not need to meet the orthonormal
roneous results [3]. Therefore, it is desirable to develop fairly condition as required by the traditional LAMM. While the HACT
accurate quantitative prediction models using various internal model is quick to train due to its feed-forward leaning frame-
and external factors. work, however, it is only suitable for simple data with repetitive
Numerous techniques have been developed over the years in structure. It would not be able to generate good learning and
an attempt to provide analysts and decision-makers with effec- prediction results for complicated data.
tive methods of predicting financial distress based upon various Chou et al. [16] proposed a hybrid structure integrating the
financial ratios and mathematical models, with these models genetic algorithm (GA) with a fuzzy clustering algorithm. In par-
including linear and logistic regressions, multivariate adaptive ticular, key financial ratios selected by the GA are clustered by the
regression splines, survival analysis, linear and quadratic pro- fuzzy C-means clustering after the training data is divided into
gramming and multiple-criteria programming [4–6]; for example, financially distressed and financially non-distressed samples. The
Altman and Narayanan, suggested that when assessing the man- optimal number of clusters for both samples are decided by the
agement of distressed firms, the Z-score model could be used WB index. Given the clustered training data, the distance between
as a guide to financial turnaround [7]. Most of these techniques the feature vectors of each cluster’s center and a testing sample
are typically reliant upon the assumptions of linear separability is calculated and used to predict whether the testing sample will
and multivariate normality, and indeed, the independence of the be an FD case according to the concept of ‘nearest neighbor’. For a
explanatory variables [3]. However, these conditions are often clustering method, it is important to ensure that selected features
violated in real-life situations. are capable of correctly categorizing the samples into their corre-
sponding classes. The hybrid model integrates the advantages of
Machine learning both statistical theory and soft computing by using the genetic
algorithm to select features in order to make the clusters of
ML techniques have the capability of extracting meaningful features meaningful for the subsequent data classifier. However,
information from unstructured data whilst also effectively dealing the major drawbacks of this method are the high possibility of
with non-linearity. However, the application of advanced ML overfitting the training data as well as the time consumed to find
techniques to financial forecasting is still a relatively new area for the optimal results.
researchers to explore [8,9]. ML algorithms can be categorized in Zięba et al. [17] proposed using extreme gradient boosting
to two major branches: supervised learning versus unsupervised. (XGBoost) to learn an ensemble of decision trees for bankruptcy
prediction. Their so-called synthetic feature is comprised of var-
Supervised learning ious arithmetic operations such as addition, subtraction, mul-
tiplication, and division. Each synthetic feature can be treated
As regards the application of ML in the field of finance, su- as a regression model and constructed in an evolutionary way.
pervised methods such as Bayesian network algorithms, logistic They used the model to predict the financial condition of Polish
regressions and the support vector machine (SVM) are used to companies spanning the 2000–2013 period and found it able to
detect financial misstatements with fraudulent intention [10]. generate prediction results significantly better than quite a few
Neural network (NN) models are also used as the means of de- reference models such as linear discriminant analysis, multilayer
tecting earnings management [11,12]. Although NN architectures perceptron with a hidden layer, decision rules inducer, decision
have been found to demonstrate good performance in various tree model, LR, AdaBoost, SVM and random forest. Xia et al. [18]
financial applications [1], the SVM is a better technique than employed the XGBoost to structure a sequential ensemble credit
other classification and prediction methods since it produces scoring model, whereby they tuned the hyperparameters using a
more accurate results than its competitors. This finding makes Bayesian optimization algorithm with a selected feature subset.
SVM particularly suitable for FDP problems since the number of Their empirical results showed that the Bayesian optimization
financially-distressed firms is usually limited [13,14]. A recent performed better than random search, grid search, and manual
SVM-based hybrid model was proposed [15], in which the author search. The credit scoring results were interpreted by a decision
integrated a risk metric with a two-level data envelopment anal- tree using the relative importance among selected features, which
ysis (DEA) to describe the operating performance of a company. was ranked in a descending order. Carmona et al. [19] used
The hybrid model was comprised of the rough set theory, artificial the XGBoost to predict bankruptcy basing on 156 national U.S.
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 3
attention than the other. Basing on genuine German and Japanese 2. Methodology
credit datasets, the cost-sensitive hybrid SVM-DBN structure was
effective at classifying credit risk for imbalanced data. 2.1. Financial distress variables
A hybrid DBN-SVM method was verified by Zhu et al. [32]
that it could be utilized to accurately recognize emotion status Firms are required to provide formal records of their financial
in speech, in which DBN acted as a deep feature extractor and activities in the form of financial statements; thus, all publicly-
SVM was a classifier. Combining them could achieve a better listed enterprises are required to generate four basic financial
result than that of using only DBN or SVM in isolation. Their statements, as shown below:
empirical results showed that DBN features can reflect emotion
status better than artificial features, and the hybrid DBN-SVM a. Statement of financial position — statement of financial
classification approach achieved a high accuracy of 95.8%, which position at a given date.
was higher than using either DBN or SVM separately. In sum- b. Statement of Comprehensive Income — reporting financial
mary, coupling a DBN with an SVM is advantageous since the performance in terms of net profit or loss over a specific
former addresses the complexity and scalability issues of the period.
SVM, especially when training with large-scale datasets [28]. The
c. Statement of Changes in Equity — reporting any movement
experimental results by [28] showed that the hybrid DBN-SVM
in the owner’s equity over a specific period.
model yielded comparable anomaly detection performance with
a deep autoencoder, while reducing its training and testing time d. Statement of Cash Flows — summarizing the movement in
by a factor of 3 and 1000, respectively. cash and bank balances over a specific period.
Table 1
Advantages and disadvantages of ML-based FDP models.
Supervised learning Advantages Disadvantages
Support Vector Machine (SVM) SVM is capable of eliminating massive redundancy, and The model tends to be computationally demanding for
[13,14,31] thus has superiority in low algorithmic complexity and large-scale data. Moreover, it is not able to generate
high robustness. SVM can reduce the possibility of satisfactory classification results for the ‘small’ class in
overfitting by setting the parameter of cost function, imbalanced data. Re-sampling methods can help
which is a significant problem in credit risk evaluation. overcome these drawbacks. Although SVM can be used
Relative to other supervised learning algorithms, SVM to reduce high dimensionality of features, it might
has proved to show good generalization performance. cause loss of important information.
Hybrid Associative Classifier with HACT is a kind of feed-forward neural network. The It is only suitable for learning simple and repetitive
Translation (HACT) [3] model is very fast to train since it does not involve any data. It is not able to generate satisfactory results for
back-propagation framework. complicated data.
Hybrid genetic algorithm (GA) and The hybrid model integrates the advantages of both The model is computationally demanding, in particular,
fuzzy clustering [16] statistical theory and soft computing by using the for large-scale data. High possibility of overfitting might
genetic algorithm to select features in order to make be another problem.
the clusters of features meaningful for the subsequent
data classifier. The hybrid model proved to provide
better performance than the BPNN classifier in
predicting bankruptcy.
eXtreme Gradient Boosting The supervised learning algorithm integrates gradient It has a large and complex set of hyperparameters
(XGBoost) [17–20] descent and tree ensemble learning, which can be used which are not easy to be tuned. Relative to DL
for regression and classification problems. The model is algorithms, it does not tend to perform well when
regarded as one of the best supervised learners which is handling non-structural features such as images and
fast to train and supported by various packages and voices.
platforms.
Probabilistic Gaussian processes (GP) The probabilistic GP classifier is less sensitive to the Like SVM, the final performance behavior of this
classifier [21] class balance compared to SVM and LR. It provides a kernel-based model is sensitive to parameter selection.
comparable performance to both imbalanced and
balanced datasets.
Random Forest (RF) [9,32–35] Noisy data or outliers are allowed in the training RF usually does not perform as well in regression
dataset in an RF model. RF does not only classify data problems as in classification because it cannot produce
but also provide information for the driving continuous outputs. For regression problems, RF is not
determinants of classification among groups because the able to generate predictions outside the range of the
model is able to identify the importance of each training data, indicating high possibility of overfitting
variable in the classification results. when the training data contain noise.
Two-stage selective ensemble The manifold learning algorithms are good at reducing The model’s effectiveness is only examined by using
model [22] the dimension of input data for any data distribution. Chinese listed companies. The financial data of other
The original data can be mapped into a countries should be included to validate the model.
high-dimensional feature space by the kernel-based
FSOM method to cluster original data, which relaxes the
limitation of the spherical distribution for the data in
traditional FSOM. The stepwise forward selection
method in the first stage of this two-stage ensemble
model is easy to understand and implement, and can
strengthen the model’s performance effectively.
Linear classifiers [23] The four linear classifiers, Fisher’s linear discriminant, They are not applicable for non-linearly distributed data.
linear discriminant classifier, SVM and LR, are good at
handling sparse data, easy to describe mathematically.
In addition, they are computationally simple and easy to
interpret. When applied to data of dissimilarity, they
are able to provide satisfactory performance.
Discrete Bayesian network BNs are one of the most consistent and general Although the Bayesian formalism is useful in modeling
(BN) [24,36] specifications for modeling complicated systems. The the knowledge of experts, it is difficult to define the
model is useful in identifying the relatively important conditional probability table out of experts’ opinion due
loan characteristics which affect the default probability. to the difficulty to reach agreement on the BN model
with experts.
Unsupervised learning Advantages Disadvantages
Classifier Deep Belief Network Given sufficient hidden layers, the model is good at Finding the optimal parameterization for the model is
(DBN) [29–31,37] learning features. In particular, it can identify more complicated and time demanding.
profound features and uncover rich information hidden
in the data.
Hybrid supervised–unsupervised Advantages Disadvantages
method
Hybrid DBN-SVM [28,31,37] Since it is difficulty for SVM to classify large datasets Finding the optimal parameterization for the DBN part
with high dimensionality, using the before-mentioned of the hybrid model is complicated and time demanding.
DBN to reduce the dimensionality of features for SVM
classifier is advantageous.
XGBoost, while the unsupervised ML algorithm refers to the clas- to distinguish whether a firm was financially distressed (FD) or
sifier DBN and the hybrid supervised–unsupervised ML algorithm
refers to the hybrid DBN-SVM model. All six models were used financially non-distressed (FND).
6 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663
Table 2 The financial data on the FD and FND firms were retrieved
Selected financial variables for financial distress prediction. from the four basic financial statements which are readily avail-
Variables Definitions References able from the TEJ database. We calculated the 16 variables which
V1 Current ratio [16,17,21,22,40–42] were regarded within the extant literature as having the greatest
V2 Cash flow/total debt [22,40–42]
relevance to financial distress. This set of 16 financial ratios on
V3 Cash flow/total asset [40,43,44]
V4 Cash flow/sales [40,43,45] all firms was then divided into validation and training sets for
V5 Debt ratio [16,40,44,46] prediction accuracy evaluation.
V6 Working capital/total asset [9,17,21,40,47,48]
V7 Market value equity/total debt [40,42,45] 2.3. Supervised prediction models
V8 Current asset/total asset [40,43]
V9 Quick asset/total asset [40,43]
V10 Sales/total asset [9,40,45,48] The four supervised ML algorithms are specified as follows.
V11 Current debt/sales [40,43]
V12 Quick asset/sales [40,43] 2.3.1. SVM
V13 Working capital/sales [40,42,44]
V14 Net income/total asset [40,41,44]
The SVM is a supervised ML technique, originally introduced
V15 Retained earnings/total asset [9,17,40,46,48] by Cortes and Vapnik [49], which has since become a standard
V16 Earnings before interest and taxes/total asset [40,45,48] classification technique in many classification problems. When
the SVM is exploited as a binary classifier, it constructs a hy-
Table 3
perplane which separates two classes by training data with a
Summary of the industry categories of the companies investigated. maximal margin by quadratic optimization method [50]. The SVM
Industry categories No. Industry categories No. model of this study was constructed with the training dataset
Accommodation 2 Metallic furniture 4
{xi , yi }(i = 1, 2, . . . , n; yi ∈ {−1, +1}), where xi was the input
Air transport 2 Optoelectronic materials 14 financial data vector; yi = 1 was defined as an output belonging
Basic metals 6 Raw material medicines 2 to the FD class, and yi = −1 was set as an output in the FND class;
Computers 2 Semi-conductors 8 n was the size of the training dataset. In addition, to avoid the
Construction 2 Software 4
Consumer electronics 2 Telecommunications equipment 2
dominance of financial data of greater numerical variation over
Electronic components 8 Textiles 2 those of relatively smaller variation, all features of the training
Materials recovery 2 Wearing apparel 2 data were scaled to the range [−1, +1].
The aim of training an SVM is to find a maximum-margin
hyperplane ω · xi + b = 0 to separate yi = 1 from yi = −1, where
Table 4
Total assets of the candidate firms. ω represents the weight vector of the separating hyperplane, and
Total assets (NT$) Number of firms % b represents the offset. It is usually difficult to linearly separate
<1 billion 24 38
financial data points. As a result, a ‘radial basis function’ (RBF)
1∼10 billion 30 47 kernel function k(xi , xj ) was adopted in this research to map
>10 billion 10 16 the training data vector xi onto a higher-dimensional feature
space vector Φ (xi ) for separation. All the parameters of the SVM
Table 5
model were tuned in order to obtain the optimal maximal-margin
Summary statistics on the total assets of the candidate firms. hyperplane for classification.
Total assets (NT$) Distressed Non-Distressed
Average (billion) 25.83 29.11 2.3.2. Hybrid associative classifier with translation (HACT)
Max. (billion) 722.86 612.37 Following Cleofas-Sánchez et al. [3], in the learning phase of
Min. (billion) 0.03 0.26 HACT, all features (financial ratios) of the training data needed
Median (billion) 0.88 2.7 to be normalized, in which the mean of a feature is subtracted
No. of firms 32 32
from its original value. The feature vector and classification label
of the 48 out of the 64 samples were used for training a linear
associative memory model which would learn and memorize the
patterns of the data above. In the recall phase (i.e. prediction
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 7
phase) of HACT, we normalized each feature of the testing dataset the optimal set of features in which the number of genes of each
by subtracting the mean of all 48 samples of the training dataset chromosome was equal to the number of data features (financial
from the feature’s value of the testing dataset. The 16 normalized ratios). If the value of a gene was equal to 1 (0), the corresponding
samples of the testing dataset were then used to produce FDP. We feature would (not) be selected. We set the maximum number
cross-validated the prediction accuracy by performing the experi- of generations equal to 100, each with 200 chromosomes. Only
ment above four times, each corresponding to a non-overlapping 10 chromosomes with the highest fitness values were kept for
testing dataset of 16 samples. Fig. 3 illustrates the algorithm of the next generation. We used the uniform crossover and set the
HACT. mutation rate equal to 0.1. An early stop would apply when
the best fitness value could not be further elevated within any
2.3.3. Hybrid GA-fuzzy clustering 10 consecutive generations. To evaluate the feature selection
Figs. 4 and 5 below illustrate Chou et al.’s [16] algorithm in the outcome (fitness) of a chromosome, we reserved a random 30%
context of this study. The genetic algorithm was used to select of the training samples. In particular, the 70% of the training
8 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663
samples were used to construct the model, while the remaining XGBoost model to improve on its prediction performance and
30% of them were reserved to evaluate the FDP accuracy (i.e. the optimize the value of a pre-specified objective function. There is a
fitness of the chromosome). When constructing the model, we
total of 5 hyperparameters for an XGBoost model: learning rate,
applied the fuzzy C-mean clustering to the FD samples and FND
samples, respectively, using the Silhouette index to determine the maximum depth, subsampling percentage for samples, subsam-
optimal cluster number. Note that we replaced Chou et al.’s [16] pling percentage for features, and maximum number of iterations.
WB index by the commonly-adopted Silhouette index since the To achieve best prediction performance, we selected the optimal
WB index was proposed by a non-published reference which is
set of hyperparameters using a 4-fold cross-validation. In line
not available.
with the hybrid GA-fuzzy clustering algorithm, we needed to
2.3.4. XGBoost normalize each feature of the training samples to the range [0,
XGBoost is a supervised learning which can be used for re- 1]. In particular,
gression and classification problems. The concept of XGBoost
combines gradient descent and tree ensemble learning. Through xtrain − min(xtrain )
additive training, one new tree is added at a time to the previous xˆ
train = (1)
max(xtrain ) − min(xtrain )
Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663 9
Fig. 5. The feature selection process of the hybrid GA-fuzzy clustering algorithm.
The original range of each feature of the training samples was 2.4. Unsupervised prediction model (DBN)
then used to normalize the testing samples as follows.
DBN is one of the unsupervised ML algorithms attracting most
xtest − min(xtrain ) attention. The model is comprised of multiple layers of stacked
xˆ
test = (2)
max(xtrain ) − min(xtrain ) ‘restricted Boltzmann machines’ (RBM). RBM is a stochastic gen-
erative neural network with symmetric weighted connections
If any normalized feature of the testing sample fell outside the [0,
(w) between the visible and hidden layers, but there are no
1] range, however, we would renormalize it back the [0, 1] range.
connections between the neurons within a layer. As illustrated
As regards the hyperparameters, we set the learning rate equal in Fig. 7, there are two layers of neurons in a RBM, which are the
to 0.1, the maximum depth 5, 7, 9, 10, and 11, respectively, visible layer with neurons (vi ) and the hidden layer with latent
the subsampling percentage for samples 0.5, 0.7, and 0.9, re- neurons (hj ).
spectively, the subsampling percentage for features 0.5, 0.7, and An autoencoder DBN is a powerful model for feature extrac-
0.9, respectively, amounting to a total of 45 combinations. The tion applications [27]. As illustrated in Fig. 8, an autoencoder
maximum number of iterations was set equal to 2000. When DBN is constructed by stacking RBMs on top of each other. In
the loss function of the evaluation dataset could not be further this study, the visible neurons of the RBM#1 take real values
reduced within any 20 consecutive iterations, the training was because the calculated financial ratios are all continuous data, as
opposed to the hidden neurons with stochastic binary values. The
immediately terminated. We used a 4-fold cross-validation to
DBN model is trained from bottom to top layers; for instance,
obtain the optimal set of hyperparameters. Given the optimal
the RBM#1 is trained first, and the activation probabilities of the
hyperparameters, we used all 48 samples of the training dataset neurons of its hidden layer (h1 ) then become the input data of
to construct the model which was then used to predict FD for the visible layer (v2 ) for the RBM#2 learning. All the RBMs of a
the testing dataset. Fig. 6 illustrates the algorithm of the XGBoost DBN are sequentially trained in the aforementioned manner, and
model. finally the activation values of the hidden neurons of the top RBM
10 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663
3. Empirical results
Table 6
Comparisons of model prediction accuracy across different spans of quarterly data prior to each event quarter.
Prediction accuracy (%)
Learning types Models
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
SVM 73.4 70.3 67.2 62.5 82.8 79.2 81.3
Supervised ML learning HACT 67.2 56.3 53.1 64.1 64.1 57.8 56.3
GA-Fuzzy 43.8 35.9 39.1 37.5 39.1 39.1 40.6
XGBoost 87.5 89.1 84.4 82.8 90.6 87.5 90.6
Unsupervised ML learning Classifier DBN 53.1 68.8 51.6 65.6 68.0 69.8 71.9
Hybrid supervised–unsupervised ML Learning DBN-SVM 79.7 75.0 70.3 71.9 89.1 82.3 86.3
Table 7
The prediction performance of the XGBoost model using different spans of
quarterly data prior to each event quarter.
Prediction accuracy (%)
Variables
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
Type I error 4.7 4.7 9.4 10.9 3.1 3.1 3.1
Type II error 7.8 6.3 6.3 6.3 6.3 9.4 6.3
True positive rate 84.4 87.5 87.5 87.5 87.5 81.3 87.5
True negative rate 90.6 90.6 81.3 78.1 93.8 93.8 93.8
Prediction accuracy 87.5 89.1 84.4 82.8 90.6 87.5 90.6
Note:
Type I error (%) = FP /(TP + FN + FP + TN), Type II error(%) = FN /(TP + FN +
FP + TN),
True positive rate (%) = TP /(TP + FN), True negative rate (%) = TN /(FP + TN),
Accuracy (%) = (TP + TN)/(TP + FN + FP + TN);
Abbreviation: FP: false positives, FN: false negatives, TP: true positives, TN: true
negatives.
Table 8 Fig. 10. Feature extraction result for the DBN (1Q∼2Q).
The prediction performance of the hybrid DBN-SVM model using different spans
of quarterly data prior to each event quarter.
Prediction accuracy (%)
Variables
1Q 2Q 3Q 4Q 1Q∼2Q 1Q∼3Q 1Q∼4Q
Type I error 12.5 10.9 15.6 15.6 4.7 13.0 7.0
Type II error 7.8 14.1 14.1 12.5 6.3 5.2 6.6
True positive rate 84.4 71.9 71.9 75.0 87.5 89.6 86.7
True negative rate 75.0 78.1 68.8 68.8 90.6 74.0 85.9
Prediction accuracy 79.7 75.0 70.3 71.9 89.1 81.8 86.3
4. Conclusions Acknowledgments
A series of financial distress by large companies might cause The corresponding author is grateful to the Center for Inno-
a chain reaction such as the 2008 financial crisis. If the firm vative FinTech Business Models, National Cheng Kung University,
management, the lenders or stockholders are not able to foresee Taiwan (ROC) for a research grant to support this work. We would
such a crisis, chances are the result will damage their interests like to thank the Reviewers for their valuable comments and
to a significant extent in the end. However, traditional credit suggestions, which have helped to improve the quality of this
risk assessment approaches are inconsistent across various users paper substantially.
and the assessment result highly depends upon their desired
outcome. As a result, accurate quantitative methods are necessary References
in providing an objective and consistent estimates of the risk
[1] W.-S. Chen, Y.-K. Du, Using neural networks and data mining techniques
of financial failure. We set out in this study to compare the for the financial distress prediction model, Expert Syst. Appl. 36 (2009)
performance of six machine-learning approaches in the context of 4075–4086.
financial-distress prediction. By referring to the extant literature, [2] L.C. Thomas, D.B. Edelman, J.N. Crook, Credit Scoring and Its Applications,
SIAM, 2002.
we summarized sixteen financial ratios, calculated from the four
[3] L. Cleofas-Sánchez, V. García, A.I. Marqués, J.S. Sánchez, Financial distress
basic financial statements of publicly-listed firms in the Taiwan prediction using the hybrid associative memory with translation, Appl. Soft
stock market which are regarded as being related to future finan- Comput. 44 (2016) 144–152.
cial distress. The data on these financial ratios were subsequently [4] G.V. Karels, A.J. Prakash, Multivariate normality and forecasting of business
bankruptcy, J. Bus. Finance Account. 14 (1987) 573–593.
used as input data for six classification algorithms, traditional
[5] M. Ezzamel, C. Mar-Molinero, A. Beech, On the distributional properties of
SVM, HACT, hybrid GA-fuzzy clustering, XGBoost, classifier DBN financial ratios, J. Bus. Finance Account. 14 (1987) 463–481.
and hybrid DBN-SVM, to predict financial distress. Our empir- [6] V. Ravi, H. Kurniawan, P.N.K. Thai, P.R. Kumar, Soft computing system for
ical results appeared to suggest that among all four types of bank performance prediction, Appl. Soft Comput. 8 (2008) 305–315.
[7] E.I. Altman, P. Narayanan, An international survey of business failure
supervised approaches, the XGBoost model produced the most
classification models, Financial Mark. Inst. Instrum. 6 (1997) 1–57.
accurate prediction performance. In addition, we demonstrated [8] R.C. Cavalcante, R.C. Brasileiro, V.L.F. Souza, J.P. Nobrega, A.L.I. Oliveira,
that the hybrid model comprising of the DBN feature extractor Computational intelligence and financial markets: A survey and future
and the SVM – within which the data were pre-trained using directions, Expert Syst. Appl. 55 (2016) 194–211.
the DBN algorithm and then classified using the SVM algorithm [9] F. Barboza, H. Kimura, E. Altman, Machine learning models and bankruptcy
prediction, Expert Syst. Appl. 83 (2017) 405–417.
– was capable of predicting the financial distress of a firm to a [10] Y.J. Kim, B. Baik, S. Cho, Detecting financial misstatements with fraud
satisfactory extent. The latent features were extracted from the intention using multi-class cost-sensitive learning, Expert Syst. Appl. 62
ratios calculated from the financial statement data using the DBN. (2016) 32–43.
Our results showed the users of financial statements the fea- [11] J. Haga, J. Siekkinen, D. Sundvik, A neural network approach to measure
real activities manipulation, Expert Syst. Appl. 42 (2015) 2313–2322.
sibility of using financial statement data to construct an effective [12] H. Höglund, Detecting earnings management with neural networks, Expert
FDP model. Indeed, XGBoost and the use of deep learning tech- Syst. Appl. 39 (2012) 9564–9570.
nique to augment supervised ML approaches have shown promise [13] B.E. Erdogan, Prediction of bankruptcy using support vector machines:
in this research area. The main contributions of and values added an application to bank bankruptcy, J. Stat. Comput. Simul. 83 (2013)
1543–1555.
by this study are as follows. First, it summarized the advan- [14] J. Sun, H. Li, Financial distress prediction using support vector machines:
tages and limitations of recent literature on different supervised, Ensemble vs. individual, Appl. Soft Comput. 12 (2012) 2254–2265.
unsupervised, and hybrid supervised–unsupervised models for fi- [15] M.-F. Hsu, A fusion mechanism for management decision and risk analysis,
nancial distress prediction. Second, it compared the performance Cybern. Syst. (2019) 1–19.
[16] C.-H. Chou, S.-C. Hsieh, C.-J. Qiu, Hybrid genetic algorithm and fuzzy
of six ML-based approaches in the context of financial distress clustering for bankruptcy prediction, Appl. Soft Comput. 56 (2017)
prediction using real-life data. Third, it suggested that among 298–316.
the four supervised algorithms, the recently popular XGBoost [17] M. Zięba, S.K. Tomczak, J.M. Tomczak, Ensemble boosted trees with syn-
algorithm provided the most accurate FD prediction. Finally, it thetic features generation in application to bankruptcy prediction, Expert
Syst. Appl. 58 (2016) 93–101.
verified that the hybrid DBN-SVM model was able to generate [18] Y. Xia, C. Liu, Y. Li, N. Liu, A boosted decision tree approach using Bayesian
more accurate forecasts than the use of either the SVM or the hyper-parameter optimization for credit scoring, Expert Syst. Appl. 78
classifier DBN in isolation. Future research is encouraged to ex- (2017) 225–241.
amine if the feature extracting function of a DBN can be used to [19] P. Carmona, F. Climent, A. Momparler, Predicting failure in the U.S banking
sector: An extreme gradient boosting approach, Int. Rev. Econ. Finance 61
further improve on the latest supervised ML approaches such as (2019) 304–323.
the XGBoost for financial distress prediction. [20] F. Climent, A. Momparler, P. Carmona, Anticipating bank distress in the
There is, however, one specific limitation that is worth noting. Eurozone: An extreme gradient boosting approach, J. Bus. Res. 101 (2019)
In line with the prior studies, we constructed the FDP model 885–896.
[21] F. Antunes, B. Ribeiro, F. Pereira, Probabilistic modeling and visualization
based upon the quarterly data of the 16 selected variables; future for bankruptcy prediction, Appl. Soft Comput. 60 (2017) 831–843.
studies may succeed in improving prediction accuracy further [22] L. Wang, C. Wu, Business failure prediction based on two-stage selec-
by using different sets of financial variables sampled at different tive ensemble with manifold learning algorithm and kernel-based fuzzy
time frequencies, such as data on monthly revenue or daily stock self-organizing map, Knowl.-Based Syst. 121 (2017) 99–110.
[23] V. Garcia, A.I. Marques, J.S. Sanchez, H.J. Ochoa-Dominguez, Dissimilarity-
prices. based linear models for corporate bankruptcy prediction, Comput. Econ.
53 (2019) 1019–1031.
[24] K. Masmoudi, L. Abid, A. Masmoudi, Credit risk modeling using Bayesian
Declaration of competing interest network with a latent variable, Expert Syst. Appl. 127 (2019) 157–166.
[25] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief
No author associated with this paper has disclosed any po- nets, Neural Comput. 18 (2006) 1527–1554.
[26] Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn.
tential or pertinent conflicts which may be perceived to have
2 (2009) 1–127.
impending conflict with this work. For full disclosure statements [27] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with
refer to https://doi.org/10.1016/j.asoc.2019.105663. neural networks, Science 313 (2006) 504–507.
14 Y.-P. Huang and M.-F. Yen / Applied Soft Computing Journal 83 (2019) 105663
[28] S.M. Erfani, S. Rajasegarar, S. Karunasekera, C. Leckie, High-dimensional [40] F. Lin, D. Liang, E. Chen, Financial ratio selection for business crisis
and large-scale anomaly detection using a linear one-class SVM with deep prediction, Expert Syst. Appl. 38 (2011) 15094–15102.
learning, Pattern Recognit. 58 (2016) 121–134. [41] M.E. Zmijewski, Methodological issues related to the estimation of financial
[29] H. Liang, X. Sun, Y. Sun, Y. Gao, Text feature extraction based on deep distress prediction models, J. Account. Res. 22 (1984) 59–82.
learning: a review, EURASIP J. Wireless Commun. Networking 2017 (2017) [42] D. Martens, L. Bruynseels, B. Baesens, M. Willekens, J. Vanthienen, Pre-
211. dicting going concern opinion with data mining, Decis. Support Syst. 45
[30] R. Sarikaya, G.E. Hinton, A. Deoras, Application of deep belief networks (2008) 765–777.
for natural language understanding, IEEE/ACM Trans. Audio Speech Lang. [43] E.B. Deakin, Discriminant analysis of predictors of business failure, J.
Process. 22 (2014) 778–784. Account. Res. 10 (1972) 167–179.
[31] L. Yu, R. Zhou, L. Tang, R. Chen, A dbn-based resampling SVM ensemble [44] J.A. Ohlson, Financial ratios and the probabilistic prediction of bankruptcy,
learning paradigm for credit classification with imbalanced data, Appl. Soft J. Account. Res. 18 (1980) 109–131.
Comput. 69 (2018) 192–202. [45] H. Li, J. Sun, B.L. Sun, Financial distress prediction based on OR-CBR in the
[32] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32. principle of k-nearest neighbors, Expert Syst. Appl. 36 (2009) 643–659.
[33] J. Kruppa, A. Schwarz, G. Arminger, A. Ziegler, Consumer credit risk: [46] Y. Ding, X. Song, Y. Zen, Forecasting financial condition of Chinese listed
Individual probability estimates using machine learning, Expert Syst. Appl. companies based on support vector machine, Expert Syst. Appl. 34 (2008)
40 (2013) 5125–5131. 3081–3089.
[34] C.-C. Yeh, D.-J. Chi, Y.-R. Lin, Going-concern prediction using hybrid random [47] W.H. Beaver, Financial ratios as predictors of failure, J. Account. Res. 4
forests and rough set approach, Inform. Sci. 254 (2014) 98–110. (1966) 71–111.
[35] C. Maione, E.S. de Paula, M. Gallimberti, B.L. Batista, A.D. Campiglia, F. [48] E.I. Altman, Financial ratios discriminat analysis and prediction of corporate
Barbosa Jr, R.M. Barbosa, Comparative study of data mining techniques for bankruptcy, J. Finance 23 (1968) 589–609.
the authentication of organic grape juice based on ICP-MS analysis, Expert [49] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995)
Syst. Appl. 49 (2016) 60–73. 273–297.
[36] L. Uusitalo, Advantages and challenges of Bayesian networks in [50] V. Kecman, Support vector machines – An introduction, in: L. Wang
environmental modelling, Ecol. Model. 203 (2007) 312–318. (Ed.), Support Vector Machines: Theory and Applications, Springer Berlin
[37] L. Zhu, L. Chen, D. Zhao, J. Zhou, W. Zhang, Emotion recognition from Heidelberg, Berlin, Heidelberg, 2005, pp. 1–47.
Chinese speech for smart affective services using a combination of SVM [51] G.E. Hinton, Training products of experts by minimizing contrastive
and DBN, Sensors 17 (2017) 1694. divergence, Neural Comput. 14 (2002) 1771–1800.
[38] R.B. Carton, C.W. Hofer, Measuring Organizational Performance: Metrics [52] Z. Lanbouri, S. Achchab, A hybrid deep belief network approach for
for Entrepreneurship and Strategic Management Research, Edward Elgar financial distress prediction, in: 2015 10th International Conference on
Publishing, 2006. Intelligent Systems: Theories and Applications, SITA, 2015, pp. 1–6.
[39] G. Wang, J. Ma, S. Yang, An improved boosting based on feature selec-
tion for corporate bankruptcy prediction, Expert Syst. Appl. 41 (2014)
2353–2361.