Paper

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/323737968
Bankruptcy Prediction Using Data Mining Classiﬁcation Techniques
Thesis · March 2018
CITATIONS READS
0 978
1 author:
Safwan Umer
University of Salford
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Bankruptcy Prediction Using Data Mining Classification Techniques View project
All content following this page was uploaded by Safwan Umer on 13 March 2018.
The user has requested enhancement of the downloaded file.

MSc in Databases and Web-Based-Systems
School of Computing, Science and Engineering
MSc Dissertation
Bankruptcy Prediction Using

Data Mining Classification
Techniques
Author: Safwan Umer
Supervisor: Prof. Farid Meziane
2014
Abstract
At present, data mining has achieved a significant place in diverse fields of science,
engineering and management because of the importance and requirements of the extraction of
hidden patterns and valuable information from large sets of data available. Data mining is
also being used in the field of bankruptcy prediction to classify bankrupt and non-bankrupt
firms by using the financial factors of the firms. In this regard, it is important to provide
efficient data mining models to predict bankruptcy. These models can help humans to
understand, analyse, and forecast the financial distress of a company to avoid bankruptcy.
This has inspired me to develop and apply various past and present data mining models to
predict bankruptcy.
The objective of bankruptcy prediction in the fields of data mining and machine learning is,
to develop a model that can give higher prediction accuracy (Tsai, Hsu and Yen, 2014). This
is also the main objective of this thesis. In this dissertation, in order to assess the efficiency of
the data mining models, five years of financial ratios of 464-Bankrupt and 464- Non-bankrupt
firms are used. This dissertation, presents an application of about all the data mining models
used in previous extensive literature and many new techniques using state of the art data
mining software. The models are developed using SAS Enterprise Miner, WEKA and IBM
SPSS. This study shows the application of 11 models using SAS Enterprise Miner (EM). The
bankruptcy prediction accuracy of Neural Network, Auto Neural, Regression and High
Performance Regression were excellent using SAS Enterprise Miner. This study also presents
application of 21 data mining models using the WEKA data mining software. Using WEKA
Simple Classification and regression trees(SimpleCART),Multi-Boost Ada-
Boost(MultiBoostAB), OneR and Radial based function network (RBFNetwork) models were
efficient to predict bankruptcy. Finally, 6 models of IBM SSPSS were employed to determine
the classification accuracy of bankrupt and non-bankrupt firms. Multi-Layer Perception
Neural Network model prove to be the best predictor of bankruptcy using IBM SPSS. Overall
37 data mining models have been applied and the empirical results of all the models have
been analysed, the highest bankruptcy prediction accuracy is achieved by using Neural
Networks. The results of this study show that it is possible to forecast bankruptcy five years
before it is happening.
Keywords: Data mining; Neural Network; Auto Neural; Regression; High Performance
Regression; Simple Classification and Decision Trees; Multi-Boost Ada-Boost ; OneR;
Multi-Layer Perception Neural Networks;
2
Acknowledgments
I am immensely thankful to my supervisor, Prof. Farid Meziane, who has guided me

patiently thought the process of my dissertation. I would have never been able to finish my
dissertation without his invaluable support, encouragement, supervision and important
suggestions.
I am also very thankful to Dr. Mo Saraee who gave me a strong theoretical and practical
understanding of data mining classification concepts during the course work.
I would also like to express my immense appreciation to Dr. Rasool Eskandari for providing
me financial data and basic understanding of financial factors. I would never been able to
complete my research without his help.
Finally I am also very thankful to my parents, family and elder brother who were always
supporting me morally and encouraging me with their best wishes.
3
Contents
Abstract ................................................................................................................................................... 2
Acknowledgments................................................................................................................................... 3
Chapter 1 Introduction and Motivation................................................................................................ 10
1.1 Introduction ................................................................................................................................. 10
1.2 Research Motivations.................................................................................................................. 12
1.3 Objectives of the thesis ............................................................................................................... 12
1.4 Contributions: ............................................................................................................................. 12
1.5 Thesis Outline ............................................................................................................................. 13
Chapter 2 Literature Review ................................................................................................................. 14
2.1 Introduction ................................................................................................................................. 14
2.2 Statistical Techniques ................................................................................................................. 15
2.3 Uni-variate or Linear statistical methods .................................................................................... 15
2.4 Multiple Discriminant Analysis .................................................................................................. 17
2.5 Probability, Regression, Logistic and factor analysis models ..................................................... 20
2.5.1 Linear probability model ...................................................................................................... 20
2.5.2 Conditional probability models ............................................................................................ 21
2.6 Machine learning Models............................................................................................................ 25
2.6.1 Neural Networks .................................................................................................................. 25
2.6.2 Decision trees ........................................................................................................................... 26
2.6.3 Support Vector Machines..................................................................................................... 27
2.6.4 Fuzzy logic ........................................................................................................................... 28
2.6.5 Rough Sets ........................................................................................................................... 29
2.6.6 Case based reasoning ........................................................................................................... 30
2.7 Other Methods ............................................................................................................................ 31
Chapter 3 Financial Distress and Bankruptcy ....................................................................................... 33
3.1 Introduction ................................................................................................................................. 33
3.2 Financial Distress ........................................................................................................................ 33
3.1.1 Stages of Financial Distress ................................................................................................. 34
3.1.2 Factors of Financial distress ................................................................................................. 35
Internal Factors: ............................................................................................................................ 35
External factors: ............................................................................................................................ 35
3.1.3 Causes of Financial Distress ................................................................................................ 35
3.1.4 Result of corporate financial Distress .................................................................................. 36
4
3.2 Bankruptcy .................................................................................................................................. 36
3.2.1 Cost of bankruptcy ............................................................................................................... 37
3.1.3 Determining cost of bankruptcy ........................................................................................... 37
3.1.4 Direct costs of bankruptcy endured by the firm ................................................................... 38
3.1.5 Indirect costs of bankruptcy endured by the firm ................................................................ 38
Chapter 4 Data ..................................................................................................................................... 39
4.1 Introduction ................................................................................................................................. 39
4.2 Importance of Data sample ......................................................................................................... 39
4.2.1 Population ............................................................................................................................ 39
4.2.2 Sample.................................................................................................................................. 39
4.2.3 Importance ........................................................................................................................... 39
4.3 Source of Data............................................................................................................................. 41
4.4 Selection of Ratios ...................................................................................................................... 41
Table 4.1 Financial ratios used in this study ..................................................................................... 42
4.5 Data Pre-Processing .................................................................................................................... 43
4.5.1 Missing values ..................................................................................................................... 44
4.5.2 Outliers................................................................................................................................. 44
4.6 Descriptive Statistics of data samples ......................................................................................... 45
4.7 Summary ..................................................................................................................................... 45
Chapter 5: Model development and application .................................................................................. 46
5.1 Introduction ................................................................................................................................. 46
Part-1:.................................................................................................................................................... 46
5.2 Overview ..................................................................................................................................... 46
5.3 SAS Enterprise miner and its predictive modelling .................................................................... 46
5.3 Application of the Models .......................................................................................................... 48
5.3.1 Decision Trees ......................................................................................................................... 49
5.3.2 Decision Trees Model: ......................................................................................................... 49
5.3.3 High Performance Trees Model ........................................................................................... 49
5.3.4 Neural Network .................................................................................................................... 49
5.3.5 Neural Network Model ........................................................................................................ 50
5.3.6 Auto Neural Model .............................................................................................................. 50
5.3.7 High Performance Neural Model ......................................................................................... 50
5.3.8 Data Mining Neural Model .................................................................................................. 51
5.3.9 Regression Model ................................................................................................................ 51
5
5.3.10 High Performance Support Vector Machine Model .......................................................... 51
5.3.11 High Performance Regression Model ................................................................................ 52
5.3.12 Memory Based Reasoning Model ...................................................................................... 52
Part 2: .................................................................................................................................................... 54
5.4 WEKA: ....................................................................................................................................... 54
5.4.1 Naïve Bayes ......................................................................................................................... 55
5.4.2 Naïve Bayes Model .............................................................................................................. 55
5.4.3 BayesNet Model................................................................................................................... 55
5.4.4 SMO OR SVM Model ......................................................................................................... 55
5.4.5 RBFNetwork Model............................................................................................................. 56
5.4.6 Kstar Model ......................................................................................................................... 56
5.4.7 LWL Model ......................................................................................................................... 56
5.4.8 AdaBoostM1 Model............................................................................................................. 56
5.4.9 ClassificationViaRegression Model ..................................................................................... 56
5.4.10 Decorate Model .................................................................................................................. 57
5.4.11 Dagging Model .................................................................................................................. 57
5.4.12 LogisticBoost Model .......................................................................................................... 57
5.4.13 MultiBoostAB Model ........................................................................................................ 57
5.4.14 Random Committee Model ................................................................................................ 58
5.4.15 HyperPipes Model.............................................................................................................. 58
5.4.17 NNge Model....................................................................................................................... 58
5.4.18 OneR Model ....................................................................................................................... 58
5.4.19 ZeroR Model ...................................................................................................................... 59
5.4.20 Random Forest Model ........................................................................................................ 59
5.4.21 J48 Model........................................................................................................................... 59
5.4.22 SimpleCart Model .............................................................................................................. 59
5.4.23 END Model ........................................................................................................................ 60
Part 3 ..................................................................................................................................................... 61
5.5 IBM SPSS ................................................................................................................................... 61
5.5.1 MLP neural network Model ................................................................................................. 61
5.6 Models implementation using variations of decision trees ..................................................... 61
5.6.1 CHAID Model ..................................................................................................................... 61
5.6.2 CHAID Exhaustive Model ................................................................................................... 62
5.6.3 CART Model ....................................................................................................................... 62
6
5.6.4 QUEST Model ..................................................................................................................... 63
5.6.5 K-NN Model ........................................................................................................................ 63
5.7 Summary ................................................................................................................................. 63
Chapter 6 Results Analysis and Critical Evaluation .............................................................................. 64
6.1 Introduction ................................................................................................................................. 64
6.2 Type-I Error ................................................................................................................................ 64
6.3 Type-II Error ............................................................................................................................... 64
6.4 Total Error................................................................................................................................... 64
6.5 Classification Accuracy .............................................................................................................. 65
6.6 Empirical Results Analysis ......................................................................................................... 65
6.6.1 Analysis of Results of SAS Enterprise Miner Models ......................................................... 65
6.6.2 Analysis of Results of WEKA ............................................................................................. 67
6.6.3 Analysis of results of IBM SPSS models ............................................................................. 69
6.7 Critical Evaluation ...................................................................................................................... 70
6.8 Summary ..................................................................................................................................... 70
Chapter 7 Conclusion and Future Directions ...................................................................................... 71
7.1 Conclusions ................................................................................................................................. 71
7.2 Future Directions .................................................................................................................... 73
Bibliography .......................................................................................................................................... 74
Appendix-A:........................................................................................................................................... 87
Appendix B .......................................................................................................................................... 117
7
List of Figures
Figure 2.1 Neural Network basic understanding ................................................................................. 26
Figure 2.2 Basic understanding of decision trees ................................................................................ 27
Figure 2.3 Basic idea of the Hyperplanes and support vectors ............................................................ 28
Figure 2.4 Cased Based Reasoning 4-step cycle................................................................................... 30
Figure 2.5 A comparison of different bankruptcy prediction approaches............................................. 31
Figure 2.6 Accuracy of different methods being used in the past ......................................................... 32
Figure 2.7 Studies using different model of bankruptcy prediction...................................................... 32
Figure 4.1 Method used in SPSS to find 5th and 95th percentile ........................................................ 45
Figure 5.1 step by step method of creating any project in SAS Enterprise miner ................................ 47
Figure 5.2 The step by step implementation of the model generation using SAS EM ......................... 48
Figure 5.3 Final implementation diagram of models using SAS .......................................................... 53
Figure 5.14 Final application diagram of models using WEKA ........................................................... 54
Figure 6.1 Bankrupt and non-Bankrupt firms prediction Accuracy ..................................................... 66
Figure 6.2 Bankrupt firms five years ahead prediction accuracy using WEKA models....................... 67
Figure 6.3 non-Bankrupt firms five years prediction accuracy using WEKA models ........................ 67
Figure 6.4 Bankrupt and non-bankrupt firms prediction accuracy ....................................................... 69
Figure 5.4 Model Decision Trees........................................................................................................ 117
Figure 5.5 Model HP Tree .................................................................................................................. 118
Figure 5.6 Neural Network Model ...................................................................................................... 119
Figure 5.7 Auto Neural Model ............................................................................................................ 120
Figure 5.8 HP Neural Model ............................................................................................................... 121
Figure 5.9 DMNeural Model .............................................................................................................. 122
Figure 5.10 Regression Model ............................................................................................................ 123
Figure 5.11 HP SVM Model ............................................................................................................... 124
Figure 5.12 HP Regression Model ...................................................................................................... 125
Figure 5.13 Memory Based Reasoning Model ................................................................................... 126
List of Tables
Table 2.1 some studies that used Univariate statistical methods to predict bankruptcy ....................... 16
Table 2.2 Studies using MDA model from 1968 to 1996 ..................................................................... 18
Table 2.3 the use of the logistic model in different studies ................................................................. 23
Table 4.1 Financial ratios used in this study ......................................................................................... 42
Table 6.1 Bankrupt and non-bankrupt five years ahead prediction accuracy table using SAS Enterprise
miner models......................................................................................................................................... 66
Table 6.2 Bankrupt and non-bankrupt firms five years ahead prediction accuracy table using WEKA
models ................................................................................................................................................... 68
Table 6.3 Bankrupt and non-bankrupt firms five years prediction accuracy table using SPSS ............ 69
Table 4.2 Containing 5th and 95th percentile for the data one year before bankruptcy ....................... 87
Table 4.3 Containing 5th and 95th percentile for the data 2 year before bankruptcy. .......................... 88
Table 4.7 Univariate Statistics for data sample one year before bankruptcy ....................................... 92
Table 4.8 Univariate Statistics for data sample two year before bankruptcy: ....................................... 93
Table 4.9 Univariate Statistics for data sample three year before bankruptcy ...................................... 94
8
Table 4.10 Univariate Statistics for data sample four year before bankruptcy ..................................... 95
Table 4.11 Univariate Statistics for data sample five year before
bankruptcy……………………………..94
Table 5.1 Prediction accuracy of the model starting from year one to five using Decision Trees Model
.............................................................................................................................................................. 97
Table 5.2 Prediction accuracy of the model starting from year one to five using HP Trees Model ..... 98
Table 5.3 Prediction accuracy of the model starting from year one to five using Neural Network
Model .................................................................................................................................................... 98
Table 5.4 Prediction accuracy of the model starting from year one to five using Auto Neural Model 99
Table 5.5 Prediction accuracy of the model starting from year one to five using HP Neural Model ... 99
Model .................................................................................................................................................. 100
Model .................................................................................................................................................. 100
Table 5.8 Prediction accuracy of the model starting from year one to five using HP SVM Model ... 101
Model .................................................................................................................................................. 102
Table 5.10 Prediction accuracy of the model starting from year one to five using MBR Model ....... 102
Table 5.11 Bankruptcy prediction accuracy using Naïve Bayes Model ............................................. 103
Table 5.12 Bankruptcy prediction accuracy using BayesNet Model .................................................. 103
Table 5.13 Bankruptcy prediction accuracy table using SMO OR SVM Model ................................ 104
Table 5.14 Bankruptcy prediction accuracy table using RBFNetwork Model ................................... 104
Table 5.15 Bankruptcy prediction accuracy table using KSTAR Model ............................................ 105
Table 5.16 Bankruptcy prediction accuracy table using LWL Model ................................................ 105
Table 5.17 Bankruptcy prediction accuracy table using AdaBoostM1 Model ................................... 106
Table 5.18 Bankruptcy prediction accuracy table using ClassificationviaRegression Model ............ 106
Table 5.19 Bankruptcy prediction accuracy table using Decorate Model .......................................... 107
Table 5.20 Bankruptcy prediction accuracy table using Dagging Model ........................................... 107
Table 5.21 Bankruptcy prediction accuracy table using ogisticBoost Model ..................................... 108
Table 5.22 Bankruptcy prediction accuracy table using MultiBoostAB Model ................................. 108
Table 5.23 Bankruptcy prediction accuracy table using Random Committee Model ........................ 109
Table 5.24 Bankruptcy prediction accuracy table using HyperPipes Model ...................................... 109
Table 5.25 Bankruptcy prediction accuracy table using NNge Model ............................................... 110
Table 5.26 Bankruptcy prediction accuracy table using OneR Model ............................................... 110
Table 5.27 Bankruptcy prediction accuracy table using ZeroR Model............................................... 111
Table 5.28 Bankruptcy prediction accuracy table using Random Forest Model ................................ 111
Table 5.29 Bankruptcy prediction accuracy table using J48 Model ................................................... 112
Table 5.30 Bankruptcy prediction accuracy table using SimpleCart Model....................................... 112
Table 5.31 Bankruptcy prediction accuracy table using END Model................................................. 113
Table 5.32 Bankruptcy prediction accuracy table using MLP neural network Model ....................... 113
Table 5.33 Bankruptcy prediction accuracy table using CHAID Model ............................................ 114
Table 5.34 Bankruptcy prediction accuracy table CHAID Exhaustive Model ................................... 114
Table 5.35 Bankruptcy prediction accuracy table CART Model ........................................................ 115
Table 5.36 Bankruptcy prediction accuracy table QUEST Model ..................................................... 115
Table 5.37 Bankruptcy prediction accuracy table K-NN Model ........................................................ 116
9
Chapter 1 Introduction and Motivation
1.1 Introduction
Data mining is used to find hidden patterns in large sets of data. Data mining has been widely
used in many different fields to conceive logics in the data stored in databases (Shamsinejad,
Saraee and Shekholeslam, 2011). State of the art data mining classification models are being
used in the field of bankruptcy prediction. The most popular techniques which are being used
now-a-days are decision trees (DT), Artificial Neural Networks (ANN), Support Vector
Machines (SVM), Case Base Reasoning (CBR), K-Nearest Neighbour (K-NN), Bayesian
Networks, Regression and hybrid methods (Chen et al. 2011).
Given the economic and financial consequences of bankruptcy to companies, it is not a

surprise that bankruptcy prediction issue was and remains of great attraction to researchers,
creditors, shareholders, and auditors. All the stockholders have great attraction in observing
the financial performance of their firms (Wilson and Sharda, 1994).
Bankruptcy forecast of an organisation has been a paramount subject in the accounting and
finance literature (Zhang, Hu, Patuwo & Indro, 1999). Financial failure of a company
significantly affects the company, stakeholders, employees, customers and nation.
Bankruptcy prediction is one of the areas that have been extensively studied in the fields of
accounting and finance (Wilson and Sharda, 1994). The companies cannot be immune against
bankruptcy and bankruptcy is not something that happens overnight. Therefore, it is very
important to understand and predict the phenomena that lead to bankruptcy (Kim and Kang,
2009).
Timely prediction of bankruptcy also helps in making best business decisions for the future of
the company. The accuracy of the bankruptcy prediction is very important and if it is not
predicted accurately, the results would be catastrophic for the company. Prediction of the
corporate failure is very important because it impacts employees of the company,
management, auditors and debtors (Jardin, 2014).
Companies which do not have enough financial means to operate have to eliminate the
company’s assets and pay its debts. If a company does not have enough money to pay its
10
debts then the company goes in a financial distress. The company must have to be in a
solvent state to keep its progress (Blum, 1974).
Bankruptcy could be caused by many factors like poor management, less financial funds,
shortage of fund providers, revenue decrement, lack of assets, lack of management
knowledge, lack of stockholders in terms of fund raising and lack of shares (David and Denis,
1995).
Various researches are available on the topic of bankruptcy prediction. These studies have
analysed different financial distress factors that lead to bankruptcy (Wilson and Sharda,
1994). This dissertation has a comprehensive literature review spanning from 1932 to 2014
and comprises various theoretical, statistical and machine learning approaches for bankruptcy
prediction.
The major purpose of this dissertation is to evaluate bankruptcy prediction through the use of
data mining models. This study also illustrates the theoretical concepts and practical results of
the data mining models in the prediction of bankruptcy. The tools used in this study are very
well known in data mining community and these are SAS enterprise miner, WEKA and IBM
SPSS.
The process of bankruptcy prediction involves several important steps on data containing
financial ratios. First of all data is gathered. Secondly, the data is processed in a meaningful
format to apply different data mining techniques. Thirdly, the processed data is used to apply
data mining techniques and different data mining classification models are generated. Finally,
the results of different employed models are compared and the best model is selected.
The sample data that I have used is gathered from the Financial Analysis Made Easy (FAME)
Database. This sample consists of 464 bankrupt and 464 non-bankrupt companies. This
dissertation shows the importance of data and its pre-processing phase using an effective
statistical method. The 41 financial ratios used in this study are also very important because
these have been used in most of the research articles. An important contribution of this
dissertation is its use of 5 years prior ratios for different companies from 2000 to 2012 to
predict the bankruptcy five years ahead.
11
1.2 Research Motivations
We are witnessing a very competitive era for companies where bankruptcy is seen as
tarnishing the companies’ reputations. The bankruptcy prediction is a very challenging
subject. When a company starts to go into insolvent state and does not return to the solvent
state due to the debts which have not been paid because of lesser amount of liquidity. In this
state, the company has either to pay its debts or file for bankruptcy (Wruck, 1990).
Many large organizations like Delta Airlines, United Airlines, New Century Financial,
Calpine, Lyondell Chemicals, Telecom Company Global Crossing, Thornburg Mortgage and
Pacific Gas have filed for bankruptcies in last 2 decades (Anon., 2014). These incidents
completely disturbed the investors around the world and made it even more important to
predict the financial distress before bankruptcy. Auditors, as a general duty use bankruptcy
prediction techniques to assess the financial state of a company before investing in the
company (Wilson and Sharda, 2009).The managers of the companies who make the decisions
are always looking for a prediction model that gives the best results in bankruptcy prediction.
Many techniques have been used in the past.
1.3 Objectives of the thesis

Bankruptcy prediction is not a field related to the accounting and finance it is a versatile area.
I have chosen this area because I want to apply the state of the art data mining software
available to obtain best models for bankruptcy prediction using five years back ratios.
The major objectives of this thesis are:
1. Utilization of different data mining methods and algorithms using SAS enterprise
miner, WEKA and IBM SPSS.
2. Analysis of results obtained from various data mining models implementation.
1.4 Contributions:
The contributions of the dissertation in the field of bankruptcy prediction are:
1. Bankruptcy prediction using 5 years prior ratios because most of the research articles
have used 3 years back ratios for prediction.
2. Predict bankruptcy five years ahead using five years back ratios.
3. Use of 41 most important financial ratios.
4. Use of 11 SAS Enterprise miner models, 21 WEKA models and 6 IBM SPSS models.
5. Find the most effective model for bankruptcy prediction.
12
1.5 Thesis Outline
On the basis of the theoretical and practical literature review this dissertation describes
different features of bankruptcy prediction models. This thesis is divided into 7 chapters.
Chapter 2
Provide a comprehensive literature review of various statistical and machine learning

techniques in the domain of data mining to predict bankruptcy.
Chapter 3
This chapter elaborates financial distress, factors of financial distress, causes of financial
distress, bankruptcy definitions and costs of bankruptcy.
Chapter 4
This chapter illustrates importance of data, ratios and pre-processing phase of data. This
chapter also elaborated the method of winsorizing to eliminate outliers in the data.
Chapter 5
This chapter offers a complete analysis and applications of different models using SAS
Enterprise Miner, WEKA and IBM SPSS. It also presents prediction accuracy results
provided by each model.
Chapter 6
This chapter give a complete insight and critical evaluation of each data mining model. It
also gives the five years prior results of each model in a detailed manner.
Chapter 7
This chapter summarizes the major contributions of this dissertation and gives directions for
future work.
13
Chapter 2 Literature Review
2.1 Introduction
Various methods have been used in the literature for predicting the business failure. Each
methodology has its importance and contributions in this area. But each prediction technique
is basically used to divide the firms in financially healthy or financially failed firms
(Dimitras, Zankis and Zopounidis,1996).
Business failure studies have attracted world-wide interest from many researchers and
practitioners. Earlier techniques, when there was no statistical or machine learning technique
available, used to compare two companies, one with a healthy financial state and the other
with a failed financial state (Bellovary, Giacomino and Akers, 2007). According to Fitzpatric
(1932) there are five stages of financial failure. These stages are incubation, financial
embarrassment, financial insolvency, total insolvency and confirmed insolvency. Then
statistical bankruptcy prediction models started from the Beaver’s (1966) one variable model
and Altman’s Linear Discriminant Analysis model (Altman, 1968).
Since bankruptcy prediction has become a hot topic for the researchers and they have started
to use different techniques to get better and more reliable results. Many researchers started to
use different models to improve the results of the Altman’s technique. Data mining
techniques were not used until 1980. The use of data mining techniques like SVM, NN,
Decision trees was started in late 1980’s for bankruptcy prediction (Pompe and Feedlers,
1997).
There are various statistical, machine learning, soft computing, operational and evolutionary
approaches to predict bankruptcy and each have its own pros and cons (Kumar and Ravi
2007). The most important methods used in the past, their research procedures and prediction
accuracy results are discussed in the next section.
14
2.2 Statistical Techniques
These are the techniques that use statistical methods on sample of data containing bankrupt
and non-bankrupt companies. Many studies are available and have used statistical techniques
on different financial ratios. A statistical technique contains the methods using financial
parameters and ratios to predict financial distress. The Beaver’s uni-variate model was the
initial point of research for these techniques. Examples of these techniques are Linear
Discriminant analysis (LDA), MDA Multiple Discriminant Analysis (MDA), Quadratic
Discriminant Analysis (QDA), Logistic regressions and Factor analysis (Kumar and Ravi
2007).
The traditional statistical methods can better control huge data sets without losing the
prediction performance, while machine learning techniques obtain better performance with
smaller data sets and would be affected by large data sets (chen, 2011).
2.3 Uni-variate or Linear statistical methods

Univariate statistical models are the simplest models and are based on the speculation of a
sequential relationship between all ratios and the failure status. These models use the
quantitative methods like mean, median, mode, range, variance, frequency distribution and
standard deviation. In this model, ratios are used for bankruptcy prediction. In a univariate
analysis model of bankruptcy prediction there are two most important aspects (Balcaen and
Ooghe, 2006).
1. Optimal cut-off point for each ratio.

2. Classification procedure carried out for each ratio separately.
These are the earlier techniques used to differentiate between a financially stable and
financially failed firm. Table 2.1 shows some of the studies that used Univariate statistical
methods to predict bankruptcy.
The Univariate models were heavily criticised but laid the path for other models like MDA,
Linear Probability Model (LPM), Logistic and Regression.
15
Table 2.1 some studies that used Univariate statistical methods to predict bankruptcy
Name of the Special features of the study

Researcher(s)
Fitzpatrick 1. Compared 13 ratios.
(1932) 2. Used 20 pair of healthy and failed firms.
3. Most Significant ratios were Net Worth/Debt and Net Profits/Net worth.
4. Least Important Ratios were Current Ratio and Quick Ratio for firms with long-term liabilities.
Smith and 1. Used Ratios of 183 Bankrupted firms from a variety of industries.
Winakor (1935) 2. Prediction of bankruptcy was better using Working Capital/ Total Assets ratio.
3. The Current Assets/Total Assets Ratio declined when a firm approached bankruptcy.
Merwin 1. He used Small manufactures in his study.
(1942) 2. Prediction could be possible five years before bankruptcy.
3. Most significant ratios for business failures were Net working Capital to Total Assets the current ration and net worth
to total debt.
Chudson 1. Financial patterns were studied for the first time.
(1945) 2. His study specified that models of bankruptcy prediction for general application cannot be suitable as industry specific
models.
Jackendoff 1. Used the method of ratios’ comparison of profitable and unprofitable firms.
(1962) 2. Current Ratio and Net working capital to total asset were the most significant while Debt to worth least significant
ratios for profitable firms.
Beaver 1. Used 79 Failed and 79 non-failed firms in 38 industries.
(1966) 2. Used 30 ratios for the first time.
3. He came to know which ratios have highest predictive ability.
4. Ratios like Net income to total debt had 92%; Net Income to Sales have 91% and cash flow to total assets have 90% of
accuracy in bankruptcy prediction.
Pinches, 1. Used specific financial ratios which are more important in predicting bankruptcy.
Eubank, Mingo 2. Used financial data of 221 firms and 48 ratios.
and 3. Financial ratios and their predictive accuracies were as following:
Caruthers(1975) Debt/Total Capital=99%,Total income/Total capital = 97%Cash/ Total Assets=91%
16
2.4 Multiple Discriminant Analysis
MDA is the most commonly used statistical method for bankruptcy prediction. This method
has been used in more than 70 research studies from 1960 to present. This method is used to
classify a variable into one of the several a priori groups available, depending upon the
features of that variable. This technique was also very efficient in the prediction of the
qualitative data. MDA technique examines a complete profile of features prevalent to the
pertinent group of corporations. It also considers the interaction of these characteristics. The
major benefit of MDA is that it can deal with the problem of classification because it can
observe the complete profile of a financial factor. The MDA method also decreases the
analyst’s space dimensionality (Altman, 1968). An MDA technique is made up of linear
collection of variables, which are used to discriminate between failing and non-failing firms
(Balcaen and Ooghe, 2006).
Altman (1968) specified the discriminant function of a firm as follows.
𝑍= 𝑉1 𝑋1 + 𝑉2 𝑋2 + 𝑉3 𝑋3 + ………………….. + 𝑉𝑛 𝑋𝑛
Where 𝑉1, 𝑉2, 𝑉3 , …………………………. 𝑉𝑛 are Discriminant Coefficients.
And 𝑋1 , 𝑋2 , 𝑋3 , …………………………….𝑋𝑛 are Independent Variables.
The MDA calculate the Discriminant Coefficients, 𝑉𝑖 and the Independent Variables 𝑋𝑖 are
actual values. Where 𝑖= 1, 2, 3, 4, ………………,𝑛
Many researchers used MDA bankruptcy prediction technique, based on the methodology by
the Altman Z-Score model. Deakin, Edmister and lis (1972) used LDA method and obtained
prediction accuracy of 80%, 88% and 83% respectively. Table 2.2 shows the studies using
MDA model for predicting bankruptcy from 1968 to 2004.
Varun (2009) applied these techniques on 78 failed companies and 91 non-failed companies
in the period of 1999 to 2007. His research showed that the ratios total debt to total assets,
cash flow from operations / Interest Expense and net profit / total assets were the most
differentiating ratios one year before the bankruptcy and short term debt / total assets and
sales/ total assets were the most discriminating variables for predicting two years before the
bankruptcy.
17
Table 2.2 Studies using MDA model from 1968 to 2004
No. of No. of Firms Used Accuracy Special feature(s).

Reference Application Ratios in the data Sample. In percentage
Used
Altman (Manufacturing) 5 33 failed and 33 79% First time use of MDA method which is also called Z-
(1968) Mfg. non-failed firms. Score.
Firms Classification of data in bankrupt and non-bankrupt firms.
Deakin General 14 32 failed and 32 77% to 96% Finding of decision rule validated over different sample of
(1972) non-failed firms. Failed Firms firms.
78% to 92%
Non-Failed Firms
Edmister General Small 7 562 failed and 562 88% Failed Firms Some ratios can be used to predict bankruptcy rather than
(1972) Business non-failed firms. and 83% Non- using all financial variables.
Fail3ed Firms
Blum(1974) General Firms 2 115 failed and 115 57% to 94% Gave the failing company model. First time use of “F”
non-failed firms. statistic function for bankruptcy prediction.
Sinkey Banks 5 110 banks. Bankrupt Bank- Specified important factors to discriminate between failed
Jr.(1975) 53.64% to71.85% and non-failed banks.
Altman and Dealers and 15 Sample consists of Failed Firms- Financial early warning system model was developed to
Loris(1976) Brokers 40 failed broker 66.7% to 87.5% detect the failure.
firms and 113 and Non-Failed
active entities. Firms-58.3% to
85.0%
Altman, General 7 53 Bankrupt Firms Failed Firms- Z-Score model was updated to Zeta model.
Haldeman and 58 Non- 61.7% to 92.5% Compared linear and quadratic discriminant analyses and
and Bankrupt firms and Non-Failed obtained efficient results.
Narayanan Firms 84% to
(1977) 91.4%
18
Ketz(1978) General 16 75 failed firms and Failed firms- The use of general price level statements to distinguish
597 non-failed 56% and Non- between a failing and non- failing firms.
firms. failed firms 93%
Castanga and Austrailian 10 A sample of 21 Failed firms- 0% This study proposed that it is not easy to use a distinct
Matolcsy Firms companies. to 90% and Non- model to predict financial distress efficiently.
(1981) failed firms 76%
to 100%
Izan (1984) Austrailian 5 A sample of 53 40% to 100% He used company ratios using their industry median and
Firms failed and 50 non- made a combination of five variables for Discriminant
failed firms model.
Keasey and Small UK 5 A sample of 10 Failed firms- The use of trade-credit specialists and statistical model to
Watson firms failed and 10 non- 70% predict financial failure.
(1986) failed firms. Non-failed firms
66.7% to 68.3%
Koh and General 5 A sample of 400 Failed firms- SAS 34 and SAS 59 were used to make a prediction
Killough firms. Out of 400 78.6% and Non- model. Development of a prediction model which was
(1990) only 14 were failed firms accurate approximately 88 percent.
bankrupt. 88.25%
Laitinen Small and mid- 6 40 randomly Failed firms- Finding the existence of the failure processes in the firms.
(1991) size Finnish selected failed and 57.5% to 90% These processes were used on selected ratios to predict
firms non-failed firms. and Non-failed financial failure.
firms 52.5% to
87.5%
Alici(1996) UK Mfg. firms 4 29 Failed and 31 Failed firms- Introduced wavelet networks and pruning techniques were
Non-failed British 60.12% Non- examined in his model.
corporations failed firms
71.07%
Pidado and Mfg. firms 15 42-bankrupt and Used MDA technique in the footwear manufacturing
Rodriques 42- Non-bankrupt 89.58% industry.
(2004) firms
19
Lugovskaja (2009) also used MDA technique to predict financial failure of Russian Small
and medium-sized Enterprises (SMEs). He used two MDA models on a data set of 260
bankrupt and 260 non-bankrupt arbitrary SMEs. In the first model he found six important
bankruptcy prediction ratios and the classification result was 76.2% for the estimation sample
and 68.1% for the holdout sample. In the second model he used non-financial variable such
as size and age with financial factors of SMEs and classification accuracy was 77.9% for the
estimation sample and 79% for the holdout sample.
Ivica Pervan et al (2011) used this statistical technique on a sample of 78 bankrupted and 78
non-bankrupted companies from the Croatian manufacturing and trade industries. This study
mentioned that financial statements and financial factors are informative to predict the
bankruptcy of a company. They obtained results with 79.5% bankruptcy prediction accuracy.
Recently, Lee and Choi (2013) provided a multi-industry prediction model. This study used
different sets of variables and produced a model which is better in reflecting the
characteristics of the industry and selection of ratios to elaborate distinct prediction results.
The accuracy of this model for MDA model is 74.82%. In addition to these outcomes, this
study also emphasis on the fact that it is mandatory to build bankruptcy prediction models for
each industry specifically to generate the efficient and reliable prediction results.
2.5 Probability, Regression, Logistic and factor analysis models

There were many problems in Discriminant Analysis methods Eisenbeis (1977) provided a
summary of these problems. According to him there were 7 problems related to the
applications of DA like violation of assumptions, usage of linear function, no interpretation
of variables separately, less dimensions, no group definition, unsuitable choice of prior
probabilities and estimation of classification errors. Due to these errors researchers started to
introduce many other statistical models like linear probability, factor analysis legit and
probability analysis (Eisenbeis, 1977). These models are discussed below.
2.5.1 Linear probability model

Probability of failure could be used to predict bankruptcy. Therefore researcher started to
develop different LPM models as a substitute of DA (Dimitras et al.,1996). This model
consists of a special case of least squares and a dependent variable in the form of 0 or 1.
20
Meyer and Pifer (1970) presented techniques of simple least squares linear regression with
the concept of dummy variable 0 and 1 (0 for non-failed and 1 for financially failed banks).
They applied this technique on banks data set consisting on 18 financial ratios and their
empirical classification accuracy was 67% to 100% for failed and 55% to 89% for non-failed
banks. Later on, Grammatikos and Gloubos (1984), Theodossiou (1991), Vranas (1992) and
Lennox (1999) also used this research method in their studies to predict bankruptcy.
2.5.2 Conditional probability models

These models are divided into two subcategories logistic and probability models. These
models have a great deal of importance in the field of bankruptcy prediction.
The logistic method gives the probability of a firm that is going to be bankrupt. (Dimitras et
al., 1996) discussed that In the logistic model the probability of a company 𝑖 that bankrupt
given the vector variable 𝑋𝑖 as (Dimitras et al., 1996):
𝑃(𝑋𝑖 , 𝑐) = 𝐹(𝑑 + 𝑐 𝑋𝑖 )
Where 𝐹(𝑑 + 𝑐 𝑋𝑖 ) is the cumulative logistic function and is given by the equation as
1
𝐹(𝑑 + 𝑐 𝑋𝑖 ) =
1+𝑒 (𝑑+𝑐𝑋𝑖 )
Martin (1977) introduced the logistic regression model to predict the financial failure of
banks. He used a data set of about 5700 Federal Reserve member banks, 58 of the banks have
financially failed. He used six years back ratios for prediction and obtained a classification
accuracy starting from 91.3% to 41.7% one to six years before prediction for failed banks and
the results for non-failed banks were also remarkable starting from 91.1% to 82.2% one to six
years before bankruptcy prediction.
Ohlson (1980) proposed the concept of conditional probability model. The data set was used
from 10-K (Annual report of a firm that gives a comprehensive summary of firms’ financial
performance) financial statements for the first time. In this study he elaborated on the
following four statistically important factors for bankruptcy prediction:
1. The size of the company.

2. Financial structure of the company.
3. Performance of the company.
4. Current liquidity of the company.
21
He criticised the MDA technique because of the three problems associated with it. (i)
Matched samples. (ii) MDA behaves like a Discriminating device and does not provide any
statistical importance of variables. (iii) MDA model gives output in the form of a score which
is difficult to understand. Conditional logistic model keeps away all of the problems related
to the MDA. The accuracy of this logistic prediction model was 96.12%, 95.55% and 92.84%
for one year, two years and one-two years respectively. Mensah (1983) also used logistic
analysis method on a sample of 66 manufacturing firms and 32 factors and his classification
model accuracy was 18% to 55% for bankrupt firm while 80% to 86% for non-bankrupt
firms. Table 2.3 summarises the use of the logistic model in different studies.
Furthermore, Erkki and Teija(2000) also used a combination of the logistic model and
Taylor’s series. They used logistic model to describe insolvency and Taylor’s series to
approximate the exponent of the logistic function. They used a sample of 400 firms and
concluded that classification accuracy could be increased by using interacting ratios.
Kalori et al. (2002) applied this technique to develop an early warning system. They used this
model to predict the financial distress of banks. The classification accuracy of the model was
over 96% in 1 year before failure and 95% before 2 years. In 2003, Foreman performed
analysis of bankruptcy within US local telecommunications industry using logistic model.
Moreover, Jones and Hensher (2004) proposed a mixed logistic analysis model to predict
financial distress of a firm. They specified financial distress in three states 0 state for non-
failed, 1 state for insolvent and 2 state for failed firms. Mei and Lin (2005) also applied this
approach with quadratic interval regression model. Their empirical findings show that
quadratic model can help the logistic model to distinguish between failed and non-failed
firms.
Recently,(Masten and Masten, 2012) used logistic model with Classification and Regression
Trees (CART)- base methodology. This was a very simple approach and used dummy
variables. Their practical results show that the combination of these methods gives the
highest prediction accuracy of 95%.
22
Table 2.3 The use of logistic model in different studies
Industry No of Classification Accuracy Results

Reference Ratios
used
Mensah (1983) Mfg. firms 32 Bankrupt firms-18% to 55% and non-bankrup firms
80% to 86%.
Casey and General 9 Bankrupt firms-13% to 63% and non-bankrup firms
Bartczak (1985) 95% to 98%.
Lau (1987) General 10 Bankrupt firms-20% and non-bankrupt 85.4% to
93.7%.
Mahmood and General 13 Bankrupt firms-28.6% to 73.8% and non-bankrup
Lawrence (1987) firms 90% to 96.6%.
Pantalone and Bank 5 Failed banks-86.7% and non-failed banks 83.4%

Platt(1987a)
Peel (1987) Private UK 8 Bankrupt firms-67% to 92% and non-bankrup firms
firms 79% to 88%.
Aziz, Emauel General 6 Bankrupt firms-79.6% to 88.8% and non-bankrup
and firms 76.7% to 98.0%.
Lawson(1988)
Aziz and General 10 Bankrupt firms-53.9% to 92.3% and non-bankrup
Lawson(1989) firms 70.2% to 79.1%.
Hopwood, General 7 Bankrupt firms-3.1% to 62.5% and non-bankrup
Mckeowon and firms 87.5% to 100%.
Mutchler (1989)
Gilbert, Menon General 6 Bankrupt firms-29.2% to 62.5% and non-bankrup

and Schwartz firms 90% to 97.9%.
(1990)
Agarwal (1993) General 5 Bankrupt firms-40% to 80% and non-bankrup firms

56.5% to 86.6%.
Platt, Platt and Oil and Gas 6 Bankrupt firms-80% to 94% and non-bankrup firms
Pedersen (1994) companies 91% to 96%.
Dimitras, Greek firms 12 Bankrupt firms-63.2 and non-bankrup firms 84.2%.

Slowinski,
Susmaga
and Zopounidis
(1999)
Zhang. Hu. Mfg. Frims 6 Bankrupt firms-85 to 93% and non-bankrupt firms
Patuwo and 83% to 87%.
Indro(1999)
23
Probability model is also like logistic model but the function calculating the probability is
very different from the logistic model. Grablowsky and Talley (1981) used probability
analysis for classification of credit applicants and found that probability analysis could be
used as the substitute for Discriminant analysis. The studies using this model are less accurate
than the logistic model, and only a few researchers have worked in this particular area.
Hanweck (1977) applied this method on banks financial data for testing the financial distress.
He used 6 financial factors and obtained 67% accuracy for failed banks and 99% for non-
failed banks using hold out sample.
Zmijewski (1984) also investigated this statistical method on a biased sample of the data set
consisting of 40 bankrupt and 800 non-bankrupt firms. He used probability and bivariate
probability analysis to assess the sample bias issue.
Skogsvik (1990) examines this model to inspect the bankruptcy of Swedish mining and
manufacturing firms on a data sample consisting of 17 financial factors and period from 1966
to 1980. His empirical result shows a classification accuracy of 84.0% to 71.2% from 1 to 6
years respectively before bankruptcy. Moreover, Theodossiou (1995) used probability model
for Greek manufacturing firms and obtained a classification accuracy of 95.5% for Bankrupt
and 92.6% for non-Bankrupt firms. Later on, Boritz and Kennedy (1995), Lennox (1999) also
used this method for bankruptcy prediction.
Canbas et al. (2005) presented an Integrated Early Warning System (IEWS) to investigate the
financial problems of banks by incorporating logistic regression, DA, probability and
principal component analysis. This system helped in a great deal to assess the financial
conditions of banks. Their calculated failure prediction probability of banks were 56%, 99%
and 99.9% for year one, two and three respectively.
Factor analysis was used to describe a set of variables in terms of factors on the basis of the
relation between actual variables. This technique was used in a combination with logistic
estimation by West (1985) to investigate the financial condition of the bank.
24
2.6 Machine learning Models
Different Intelligent techniques are being used these days because statistical techniques have
distributional hypotheses that financial data do not always fit. Thus machine learning
techniques which do not require parameters conquer the limitations of traditional statistical
models. Machine learning models belonging to the data mining domain include artificial
neural networks, decision trees, Case-based reasoning, SVM, Fuzzy logic, and rough
sets(Kim and Kang,2010).
2.6.1 Neural Networks

Neural network models have always held a significant place in the history of bankruptcy
prediction. The researcher started to use neural networks for financial distress prediction in
early 1990’s and are still using this method in its different forms (Jeong et al.2012). These
models simulate the information processing power of human brain. These models learn by
example and can make different decisions on the basis of previous experience. A neural
network consists of three interconnected layers input layers, hidden layers and a target or
output layer. The input layer contains the raw data that is given to the neural network then
hidden layer assign weight to the input unit based on the connection between input and
hidden units. Output unit also depends on the weight between the hidden and output unit.
Thus, instead of using parameters NN use weights for prediction(Tsai and Wu, 2008).
Odom and Sharda (1990) firstly applied this technique for bankruptcy prediction with a
comparison to MDA technique. They used a data sample of 128 firms and obtained 77.8% to
81.5% and classification accuracy. There are many different variations of NN such as Back
Propagation Neural Networks (BPNN), Self-Orgnizing Feature Map (SOM) , Probabilistic
NN, Auto Associative NN and Cascade Correlation NN. These NNs are divided into different
categories due to their learning type, algorithm and connection of nodes with each other
(Kumar and Ravi,2007).
Vellido et al., (1999), Wong et al., (1997), Zhang et al., (1999), Atiya (2001), and Paliwal
and Kumar (2009) have reviewed the use of NN in business and other science and
engineering domains.
Jeong et al. (2012) proposed a new architecture of NN models by using hybrid tuning
method. The practical results show that tuned model was significant in predicting financial
failure. Their research has numerous advantages like the reflection of nonlinear aspects of
ratios using Generalized Additive Model(GAM) , most favourable parameter values of the
25
variable were secured and this model was more profitable than other non-tuned models such
as SVM, Generalized Logistic Model (GLM), MDA, CBR, DT and GAM.
Recently, Lee and Choi (2013) applied BNN and MDA model for construction, retail and
manufacturing industries to predict the financial distress. This study further elaborates on the
relative power of each independent variable and the classification accuracy of BNN model
was 81.43%.Figure 2.1 gives an example of one input and one target layer with hidden layer
neural network architecture.
Figure 2.1 Figure 2 Neural Network basic understanding (Tsai and Wu, 2008)
Input Layer Hidden Layer Output Layer

VAR-1
VAR-2
VAR-3
Target
*
VAR-N
2.6.2 Decision trees

Decision trees works on the principle of dividing a huge amount of data into small
understandable pieces until no more pieces are possible by utilizing different algorithmic
rules. A decision tree consists of a root and leaf nodes. The root nodes are also called
decision nodes and leaf nodes are also called terminal leafs (SAS Inc., 2013). The objective
of this partitioning is to make cases with similar target values. ID3 is the old decision tree
algorithm and it was proposed by Quinlan in 1979 it was then enhanced to C4.5.
Kumar and Ravi (2007) have mentioned that decision trees provide if-than-else rules which
are very simple to understand and they also defined different types of algorithms for decision
trees like CRT, CHAID, Quest and C5.0 ( which is the enhanced version of C4.5).CRT and
26
CHAID are new algorithmic techniques. CRT use towing optimum split techniques whereas
CHAID uses chi square statistics.
Frydman et al. (1985), Bryant (1997), Curram and Migers (1994) applied decision trees to
predict financial failures whereas Hui et al. (2010) gave a comparative study of decision trees
with other data mining models ANN,SVM, Logistic and MDA statistical method. The
decision trees are easily understandable by human, more accurate than NN and SVM but
sometimes excess of rules makes it difficult to comprehend (Olson et al., 2012). The
following figures give basic understanding of decision trees.
Figure 2.2 Basic understanding of decision trees (SAS Institute Inc., 2012)
Recently, Chih et al. (2014) made a comparative study of different classifiers for bankruptcy
prediction. They applied these techniques using the combination methods of bagging and
boosting on a data set of 220 failed and 220 non-failed firms and empirical results shows 83%
DT-bagging and 85% DT-boosting classification accuracy.
2.6.3 Support Vector Machines

The concept of support vector machines was proposed from statistical learning theory by
Vapnik (1998). This model surpasses the limitation of linear boundaries because it is a
combination of linear modelling and instance based learning. SVM use a linear model to
apply non-linear class boundaries through some linear mapping of the input vector into the
high dimensional feature space. In the new space maximum separation between the decision
classes is given by using maximum margin hyperplane. The training examples close to this
margin hyperplane are called support vectors and all others are called boundaries. The
27
maximum margin hyperplane is a special kind of linear model. Figure 2.3 gives the basic idea
of the hyperplanes and support vectors (Circled in the figure).
Figure 2.3 Basic idea of the Hyperplanes and support vectors (Han et al., 2006)
SVM is very powerful because it integrates statistical methods and machine learning
methods. According to Chaudhuri and De (2011) SVM initially started from the idea of
Search Reasoning Machines (SRM) (Shin, Lee and Kim, 2005) to build a model and is
becoming more famous due to its better predictive accuracy and performance.
A wide range of research articles have been written on this topic. In the past, Tay and Cao
(2001) and Kim (2004) used SVM in financial time series forecasting, Tay and Cao (2001)
applied a modified version of SVM in their research, Shin, Lee and Kim (2005) investigated
the efficiency of SVM for bankruptcy prediction and concluded that it works better than BPN
for smaller training data sets, Min and Lee (2005) evaluated this technique to find the optimal
parameter values of kernel function of SVM, Chih et al. (2007) implemented a real valued
genetic algorithm to optimize parameters of support vector machine for predicting financial
distress. Later on, Gao, Cui and Po (2008) predicted enterprise bankruptcy using Noisy-
Tolerant Support Vector Machine. Recently, Fong et al. (2014) also used a comparative
method of SVM to predict bankruptcy.
2.6.4 Fuzzy logic

This technique is based on fuzzy set of mathematical theory presented by Zadeh (1965).
According to Zadeh a fuzzy set is a collection of distinct objects and each object is associated
to a particular grade by a membership functions ranging from 0 to 1. Fuzzy logic also assists
classification problems by extracting the ‘if-then’ rules. These rules can easily be used to
understand two ways logic of the data. Basic idea of the fuzzy logic is to assimilate
experience and observation to convert experiential knowledge into a model. According to
Shapiro (2002) Fuzzy logic provides a structure for approximate reasoning, which could be
used to translate the qualitative knowledge about a problem into set of comprehensible rules.
28
But he also indicated a disadvantage that is difficult to build and tune a membership function
and rules using fuzzy logic model.
Fuzzy logic is used in many areas that included credit risk prediction (Chung et al., 2005),
loan analysis commercial system (Levy et al., 1991), correlations of crude oil systems
(Sunday et al., 2011) , disease of a firm (Hernan and Antonio, 2008) and forecasting
exchange rates (Korol, 2014) .
2.6.5 Rough Sets

The concept of rough sets was introduced by Pawlak (1982, 1984) and Pawlak et al. (1995) to
solve the problem of impossible distinguishability between objects in a set. It is convenient to
classify the objects in precise classes but can be imprecise with crisp sets. The bankruptcy
prediction using rough sets is a very recent technique because past comprehensive literature
review by Dimitras et al. (1996) used 158 bankruptcy prediction related to research articles
and none of them have used rough sets technique. This technique eradicates rules clash,
gathers extra raw facts and figures about ordering characteristics of attributes to produce a
very simple model (Mckee and Lensberg, 2002).
Slowinski and Stenfanowski (1994) described that rough sets of approaches essentially
allows the analysis of a huge set of predictive ratios to recognize numerous reduced ratios set
that it can forecast the characteristics of interest.
This technique was first used by Matarazzo et al. (1998a) and (1998b) to predict bankruptcy.
They used of dominance relation and indiscernibility relation in the first research study and
only dominance relation in their second research study. Susmaga et al.(1999) also applied this
technique to predict bankruptcy in comparison with DA and logistic and deduce that the
rough set of techniques performed better than the other two. Mckee (2000) employed rough
set model on variables specified by recursive partitioning technique and a holdout sample of
100 companies the empirical results show 88% classification accuracy and Popova and Bioch
(2001) used rough set method with a slight modification using monotone extensions to
predict bankruptcy.
Slowinski et al. (2001) and Matarazzo et al.(2002) used dominance based rough sets
approach and concluded that it is the only data mining method holding the preference order
of the data. Furthermore, this theory can be used to solve classification problems by using
exact and possible induced decision rules (Kumar and Ravi, 2007). Moreover, research
29
articles on this topic by Francis and Lixiang (2002), Indrani (2006) , Ching et al. (2010),
Chen (2012) and Zhi et al. (2012) also elaborate the use of rough set techniques for
bankruptcy prediction.
Recently, Chiang et al. (2014) used rough set and hybrid random forest method, while
intellectual capital as predictive variable for bankruptcy prediction and they concluded that
hybrid approach provided best classification rate with least Types-I and Type-II errors.
2.6.6 Case based reasoning

Case-based reasoning (CBR) is an intelligent technique that resolves new problems by
utilizing identical experienced solution in the past (Kolodner, 1991). When encountering a
new problem CBR recovers a case that is identical from the past cases, and if mandatory
modify to give the wanted result. The new solution is prepared by recovering and modifying
old experiences that nearly matches the given problem. CBR copies the problem by solving
the skill of human beings who resolve to present problems by using past experiences. CBR is
a technique to solve problems and make better decisions in a complex and altering business
environment (Han, 2002). According to Jeng and Liang (1995) CBR process requires four
steps to solve a problem these are (1) acceptance of new problem, (2) Recovering applicable
case from the library of cases, (3) modifying recovered cases to fit the problem in hand and
(4) assessing solution. This process is illustrated by a 4-step cycle as shown in Figure 2.4.
Figure 2.4 Cased Based Reasoning 4-step cycle (Aamodt and Plaza, 1994)
30
CBR has not been widely used in the field of bankruptcy prediction but has been widely used
in the fields of management, engineering, medical diagnosis, clash resolution in traffic
control, creating product index for e-shopping malls and in the drawing of semiconductors
(Turban and Aronson, 2001). For further reading on this topic the reader may refer to the
research articles by Ahn and Kim (2009), Sungbin et al. (2010) and Chuang (2013).
2.7 Other Methods

Many statistical and intelligent models have been discussed above with a wide range of
literature reviews. Nevertheless, there are many other methods to predict financial bankruptcy
each has its own advantages and disadvantages but they have not been discussed in this study.
The techniques which have not been discussed above include: soft computing methods Liang
et al.( 1997), Soltys and Ignizio (1996), Chaudhuri and De (2011), Gordini (2014) ,
Kurniawan et al. (2008), Heo and Yang (2014) and Fong et al. (2014), Operational research
techniques Sueyoshia and Goto (2009), Zhang et al.(1999), Leary (1992) , Banks and
Parakash (1994), Sun and Shenoy (2007), Kao and Liu (2004) and Jardin (2014), Self-
organizing maps Kiviluoto (1998), Kwon et al. (1996), Peltonen et al. (2001), Huysmansa et
al. (2006) and Chen et al.(2013).
Furthermore, for a comprehensive literature about statistical methods for bankruptcy

prediction readers may refer to the literature reviews by Zanakis et al. (1982) and Jodi et al.
(2007). In addition to this, extensive research reviews by Kumar and Ravi (2007) and Jodi et
al. (2007) give detailed description of the different models used from 1968 to 2007 and 1932
to 2000 respectively for bankruptcy prediction. Finally, Figures 2.5, 2.6, and 2.7 by Aziz and
Dar (2006) give the overall understanding of the different approaches used in the past and
their bankruptcy prediction accuracies.
Figure 2.5 A comparison of different bankruptcy prediction approaches (Aziz and Dar, 2006)
31
Figure 2.6 Accuracy of different methods being used in the past (Aziz and Dar, 2006)
Figure 2.7 Studies using different model of bankruptcy prediction (Aziz and Dar, 2006)
32
Chapter 3 Financial Distress and Bankruptcy
3.1 Introduction
This chapter describes the basic understanding of financial distress that leads to bankruptcy.
Additionally, causes and outcomes of financial distress have been elaborated. Finally, cost of
bankruptcy has also been discussed in the last section of the chapter.
3.2 Financial Distress

Due to the rigorous variation of worldwide economy and customer appetite, corporates are
facing high competition and unknown operational environment. Companies which cannot
understand financial distress and take significant measures at early stage, have to run into
bankruptcy, which not only effects the reputation of the company and stability of socials
economy, but also conduct a huge loss to stockholders, creditors, managers and employees of
the company (Sun and li ,2009).
Financial distress is also known by Bankruptcy and liquidation in different studies. If a

corporation does not have enough cash flow to pay its current contract obligations, debt to
suppliers of the stock and salaries of the workers then it is considered in a state of financial
distress. These obligations may also contain debts from court legal procedures and
reimbursement of interests. The breach of debt contract can be a message that financial
distress is forthcoming. The financial theory proposes that financial distress is the preliminary
phase in the life cycle of a corporation and it also gives an indication to change the
management (Wruck, 1990). Moreover, Whitaker (1999) and Gaughan (2011) also
considered inadequate Cash flow as major measure of financial distress.
Financial failure is the situation when profit is lower than invested capital, keeping the risk in
observation, even if the same investment is used at the different economic situation at
prevailing rates and where the average returning output of the firm is always below the
capital cost of firm. A firm is not in financial distress if it is unable to pay its slight amount of
debt or deficiency of debts. Insolvency can also be used to describe dismissive corporate
performance. The financial distress of a firm is further ascribed using four general terms in
many research studies: failure, default, insolvency and bankruptcy. Furthermore, the financial
idea of default also means that a company is not in a condition to pay debt or interest to
33
creditors on due time. At last, the financial distress is elaborated in technical and legal case.
The technical financial distress is the case where a corporate is unable to keep its contractor
and legal case refers to the failure of the company to meet regular repayment on loan (Altman
and Hotchkiss, 2005).
It is important for a financially distressed company to start renegotiating to reach at a better

agreement with creditors. These discussions need closing down of loss creating operations
and regeneration of company through temporary or permanent discharge of workers. If after
applying all negotiations and new contracts with creditors the company still faces financial
distress then physical exit of the company is the last option. The financial distress of the
company can also be eradicated by giving the company under the control of new owner
(Hashi, 1997).
According to, Gaughan (2011) financial failure does not means that a company is unable to
meet its due debt obligations. This can even happen when the corporate have enough net
worth to pay present legal responsibilities. Additionally, financial distress is not a necessary
measure of corporate bankruptcy because some companies also default due to management
ineligibility (Perold, 1999). Finally, reader may refer to the Karels and Prakash (1987) and
Lin and Mclean (2000) for further definitions and explorations of financial distress.
3.1.1 Stages of Financial Distress

According to research studies different stages in the financial distressed companies are early
stage, mid-stage and final or later stage. The symptoms of companies in these stages are:
1. Early stage: Customers start complaining about the services and quality (Whitaker,
1999), the company start to feel sales are decreasing and stock return turns less than
expected (Opler and Titman, 1994).
2. Mid-Stage: in the mid-stage of the financial distress the company faces problems like
cash shortage, less profit (Makridakis, 2001) unable to pay dividend payments and
disturbance in the payment of debt to suppliers (Altman and Hotchkiss, 2005).
3. Final or later stage: According to Altman and Hotchkiss (2005) the company have
constant cash deficit and it breaches the debt contract with the creditors.
The bankruptcy of a company can be predicted about 5 to six years before it is happening
because some of the researcher as stated in the Table 2.2 has predicted bankruptcy 5 years
ahead.
34
3.1.2 Factors of Financial distress
There are two factors of financial distress discussed in different research studies Internal and
External.
Internal Factors:
There are many different internal factors related to financial distress some of the important
are (Keskin, 2002):
1. Bad management,
2. Lack of communication between the business entities.
3. Major projects’ failure,
4. Expansion of business with no stability,
5. No agreement between domain growths.
Wruck (1990) and Whitaker (1999) also considered poor management a significant factor in
the financial distress of a company.
External factors:
Each company have to exist in an environment. The External factors involve environmental
factors that lead to financial distress and some of them are discussed by the researchers as
following:
1. Social Environment (Sevil et al.(1997) and (Tezcan, 2002))

2. Economic Environment (Buker et al (1997) and (Demir, 1997))
3. Legal and Political Environment, (Turko, 1999)
4. Technological Environment,
5. Natural Environment (Turko, 1999)
6. Industrial Endowment.
3.1.3 Causes of Financial Distress

The most significant causes of the financial distress discussed by David and Denis (1995)
following leveraged recapitalizations are:
1. Bad performance due to expanded industry.

2. Low proceeds from asset sales
3. Negative Stock price reactions
35
3.1.4 Result of corporate financial Distress
As mentioned above financial distress is continues event and it takes more or less six years to
reach its final stage bankruptcy. According to (Kumar and Ravi, 2007) the health of a firm or
bank relies on its:
1. solvency in the beginning,

2. capability, workability and planning of creating cash,
3. Accessibility to capital market,
4. Financial ability to endure in case of random cash deficiency.
And when a company gets more and more liquidated, it gets into a danger zone which is
called bankruptcy.
3.2 Bankruptcy
The concept of bankruptcy has been used to describe firm bearing financial troubles. A few
researchers have used generic term “failed” as synonym to “bankrupt”. Nonetheless,
bankruptcy is an activity starting financially and ends legally. It is hard to tell the particular
moment of occurrence of bankruptcy. It seems to be intuitive settlement in which financial
distress continues until the firm or creditors file a legal action. Financial failure is a
mandatory, but not enough, condition of bankruptcy (Karels and Parakash, 1987).
The firms under the allocations of National bankruptcy act are legally bankrupt either they
are in receivership or have been allowed the right to restructure (Altman, 1968).
When a firm is unable to pay its financial obligations as they are due, bond default, an
overdrawn bank account or preferred stock dividend, operationally this firm is said to be
bankrupt, failed or default (Blum, 1974). According to Deakin (1972) a firm encountering
insolvency, bankruptcy or liquidity for the interest of creditors is said to be a failed firm.
A variety of definitions have appeared to explain failure or bankruptcy. From a financial

point of view they consist on: negative net worth, non-payment of creditors, bond defaults,
inability to pay obligated debts overdrawn bank accounts omission of dividends, receivership
etc. Karels and Parakash(1987). However, for more information about bankruptcy definitions
reader may also read research studies by Elam (1975), Morris (1983), and Taffler and
Tisshaw (1977).
36
3.2.1 Cost of bankruptcy
The bankruptcy cost is generally divided into two categories (Kalay et al., 2007):
1. Direct cost
2. Indirect cost
The entire cost of bankruptcy including direct and indirect for firm is 15% of pre-distress
firm value and 7% for the retailer firms (Altman, 1984). According to Franks and Torous
(1994) the formal cost of bankruptcy is more than informal cost by 4.5%. The same year,
Opler and Titman (1994) announced that the firms with more leverage lose market shares. At
last, Kaplan (1994) concluded that profit from the liquidated financial reshuffle procedure
also increased the cost.
The bankruptcy cost can be divided into four sub-categories (Branch, 2002) :
1. Real costs endured personally by the bankrupt firm.

2. Real costs endured straight by the claimants.
3. Bankrupt firm losses that are balance by profit to other institution.
4. Real costs endured by other parties rather than bankrupt firm.
The costs (1), (2), and (3) are considered to be the sub-categories of direct costs while (4)
belong to the indirect cost.
3.1.3 Determining cost of bankruptcy

The cost of financial distress is associated to the market value of the company or firm just
before it become bankrupt and is given by the formula (Branch, 2002).
𝑃𝐷𝑉 = 𝐿𝐶𝐷 + 𝑇𝐷𝐶 + 𝑁𝑉𝑅
Where 𝑃𝐷𝑉 = 𝑃𝑟𝑒 − 𝑑𝑖𝑠𝑡𝑟𝑒𝑠𝑠 𝑣𝑎𝑙𝑢𝑒
𝑇𝐷𝐶 = 𝐶𝑜𝑠𝑡 𝑒𝑛𝑑𝑢𝑟𝑒𝑑 𝑏𝑦 𝑐𝑙𝑎𝑖𝑚𝑠
𝑁𝑉𝑅 = 𝑁𝑒𝑡 𝑉𝑎𝑙𝑢𝑒 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
PDV is the considered the entire value of the firm’s assets according to its previous
bankruptcy financial report. Mostly, at the final stage of the financial distress the equity value
of the company is near to zero when it is going to file bankruptcy. On the other hand the
37
balance sheet of the bankrupted frim will not be showing running losses but representing
some overdue assets values (Branch, 2002).
3.1.4 Direct costs of bankruptcy endured by the firm

After filing the bankruptcy the company have to hire a team of professional. These
professional may include people from the law, accounts, banks, auctioneers, actuaries and
practitioners who sell the distressed assets. The professional asks for a particular amount of
payment in return for their services. Moreover, at the stage of bankruptcy the firm also have
to bear the cost of the internal staff and other resource as well (Branch, 2002).
3.1.5 Indirect costs of bankruptcy endured by the firm

The indirect bankruptcy cost can be described as the lost gains of previous sales, the costs of
the assets at discounted sale, and the costs of the disruptions in the firm during the period of
the financial distress. These disruption may be in the investment and financial policies of the
firm (Rajeev and Yun, 2013).Managers of the company have to bear the personal cost of
bankruptcy, either they lose their jobs or give 35% of their previous salary(Gilson and
Vetsuypens, 1994). Furthermore, research studies by Thorburn (2000), Bris et al. (2006)
Pulvino (1999) and Kaplan (1994) also explain direct and indirect cost of the bankruptcy.
Finally, (Branch, 2002) have concluded the victims of bankruptcy costs in four steps. Firstly
the bankruptcy cost is imposed on the landlord, suppliers, customer, employees etc.
Secondly, creditors and claimant will also have to face the costs associated with bankruptcy
of the firm. Thirdly, the par value of the liquidated firm’s debt before bankruptcy is assigned
as follows, 28% to the loss causing bankruptcy, 16% cost to deal with bankruptcy and 56% is
the cost to the claims-holders. Lastly, interest holder also have to be given a cost if company
bankrupt.
38
Chapter 4 Data
4.1 Introduction
In this chapter I shall be discussing about the importance of bankruptcy prediction data
sample. The database source I have used to obtain this data. Finally I will be discussing about
the variables selection, data pre-processing phase and statistic description of the data used in
this dissertation.
4.2 Importance of Data sample

Before describing the importance of the data sample it is important to discuss two statistical
terms, Population and Sample.
4.2.1 Population
It is the complete collection of objects or items that may be the section of a study (Kathleen
and Jonathan, 2011), for instance, all manufacturing companies in the UK, all banks in UK,
all bankrupt firms in UK, all non-bankrupt companies that are still in active state.
4.2.2 Sample
It is the sub-group of items from a particular population (Katleen and Jonathan, 2011), for
example, the group of 63-bankrupt firms randomly selected from a large database containing
records of thousands of bankrupt firms. The data sample must be the representative of whole
population.
4.2.3 Importance
After reading exhaustive literature I have come to know that selection of data sample is the
most important aspect in the bankruptcy prediction. Since, it is an important fact that
computers provide information according to the data given to process. If computers are given
erroneous data to process the result would also be unexpectedly erroneous.
Nevertheless, previous studies show that researchers knew the importance of the data sample
to predict bankruptcy. Initially, the researchers used data sample containing limited number
of bankrupt and non-bankrupt firms. For example, Beaver (1966) used a data sample of 79-
bankrupt and 79 non-bankrupt firms, Piches et al. (1975) used data sample of 221 firms,
Altman (1968) and Deakin (1972) used a data sample of 32-Bankrupt and 32-Non-bankupt
39
firms. Later on, some researchers also used large data samples, for instance, Zmijewski
(1984) used a data sample of 40-Bankrupt and 800-Non bankrupt firms and Erkki and Teija
(2000) used equally divided data sample of 400 bankrupt and non-bankrupt firms.
Since my major concern in this study is to apply data mining classification techniques to
predict bankruptcy, hence, it is very important for me to select an unbiased training and test
data sample. The training data sample I have employed in this study consist of an unbiased
sample of 464 Bankrupt and 464 non-Bankrupt UK and Irish firms during the period of 2000
to 2012 while test data sample contains 64 bankrupt and 64 non-bankrupt companies during
period 2010 to 2012. I have selected 5 years prior ratios to analyse bankruptcy prediction.
Finally, I divided data into 5 different data files to perform my analysis as follows.
1. Data sample containing financial ratios one year before bankruptcy (dataset1.xlsx).
2. Data sample containing financial ratios two years before bankruptcy (dataset2.xlsx).
3. Data sample containing financial ratios three years before bankruptcy (dataset3.xlsx).
4. Data sample containing financial ratios four years before bankruptcy (dataset4.xlsx).
5. Data sample containing financial ratios five years before bankruptcy (dataset5.xlsx).
40
4.3 Source of Data
This data sample has been collected from the Financial Analysis Made Easy (FAME)
database. This database gives detailed information on all significant private and public
companies in the UK and Ireland. The information provided contains, Name, number of
employees, profile, location, assets, identification number, status, legal form, incorporate
date, phone number, industry, stock data, mortgage data, account type, accounting figures,
financial statistics, custom data and information related to directors and owners of the
companies. We can access the past 10 year’s financial data for a company from this database.
Using FAME database we can analyse detailed statistical description, aggregation, linear
regression and segmentation of data in seconds. Moreover, FAME database describe the
status of the companies in two categories:
1. Active
2. Inactive
The inactive companies are further subdivided into two classifications:
1. Dissolved
2. Liquidated
FAME database contains financial information of approximately 3,147,877 active and

9,186,893 dissolved or liquidated companies. I have selected the bankrupt firms during the
period of 2000 to 2012 in active and liquidation state. And similarly I have selected non-
bankrupt firms in liquidation state.
4.4 Selection of Ratios

Literature review shows different number of ratios used by different studies. Some of the
studies have used only 4 financial ratios while others have used more than 4 as mentioned in
chapter 2. I have selected 41 ratios to use in this study. The ratios used in this study are also
very important because I have selected significant ratios being used in previously most cited
research papers. Each type of financial ratio measures a certain type of financial aspect of a
business. Table 4.1 gives the description of the financial ratios used in this study. In this data
set some of the ratios like X2-T1, X2-T2, X2-T3, X2-T4, X2-T5 were saved as string data
type. I converted these ratios to numeric type data using IBM SPSS software.
41
Table 4.1 Financial ratios used in this study
Ratio Name used in this study Financial ratio Number of approximate
research articles containing
this ratio.
X1 Factor/Consideration 65
X2 Net income / Total assets 60
X3 Current ratio 50
X4 Working capital/Total assets 50
X5 Retained earnings / Total 40
assets
X6 Earnings before interest and 36
taxes / Total assets
X7 Sales / Total assets 35
X8 Quick ratio 33
X9 Total debt / Total assets 31
X10 Current assets / Total assets 29
X11 Net income / Net worth 25
X12 Total liabilities / Total assets 23
X13 Cash / Total assets 21
X14 Market value of equity/book 18
value of equity
X15 Cash flow from operations / 17
Total assets
Total liabilities
X17 Current liabilities / Total assets 15
Total debt
X19 Quick assets / Total assets 14
X20 Current assets / Sales 15
X21 Earnings before interest and 15
taxes / Interest
X22 Inventory / Sales 15
X23 Operating income / Total 13
assets
Sales
X25 Net income / Sales 14
X26 Long-term debt / Total assets 15
X27 Net worth / Total assets 13
X28 Total debt / Net worth 14
X29 Total liabilities / Net worth 14
X30 Cash / Current liabilities 15
Current liabilities
X32 Working capital/Sales 9
X33 Capital/Assets 7
X34 Net sales / Total assets 8
X35 Net worth / Total liabilities 7
X36 Total assets 7
X37 Cash flow (using net income) / 7
Debt
X38 Cash flow from operations 7
X39 Operating expenses / 7
Operating income
X40 Quick assets / Sales 7
X41 Sales / Inventory 7
42
4.5 Data Pre-Processing
To apply data mining techniques the data must be filtered and prepared for recognizing
efficient pattern in the data. According to Han and Kamber (2000) the data mining process
involves six important steps: Select data, Filter data, Give meaning (Value) to Filtered data,
programming, data mining and report generation.
Data cleaning is very important as it removes any errors from data and improves its quality.
Since, Data obtained from any source have missing values, outliers and noise. Data pre-
processing is a phase in which data is prepared for analysis by using different data cleaning
and processing methods. If data is not pre-processed before applying different models the
results would be very different than the processed data. Therefore, it is important to pre-
process data for better classification results.
Moreover, the data used in this study is presented in the form of a combination of X and T
variable. Where X (starting from 1 to 41) variable shows the ratios and T (starting from 1 to
5) shows the number of year before bankruptcy. Since, the data contains 5 years prior ratios
and I have to apply data mining on each year data before bankruptcy so, I made different files
of data containing ratios related to each year. For example, to apply data mining models on
data 5 years before bankruptcy I deleted first to four years ratios remaining 5 year ratios
specified as X,T (where X = 1 to 41 and T=5). I used IBM SPSS to make these samples of
data. In addition to this, I also deleted columns of data that were not required in this study.
The deleted columns were status and event data year. Since, I also want to find out the most
important ratios in the bankruptcy prediction I also made different data file with different
ratios (deleting others) in mind as well. To make data more cleaner I truncated the spare
decimal (if it were greater than 6 after decimal place) places to 4 decimal places by using
excel roundup function Roundup(Number, Digit).
Finally, the data was showing the bankruptcy of the firms in binary form (0 for bankrupt and
1 for non-bankrupt firm). I converted the form of this variable to nominal for classification
and changed 0 to “bankrupt” and 1 to “non-bankrupt” string data type for better classification
analysis. At last I deleted some columns from the sample data which was not required in this
study. The columns I deleted were, Company Name, status and Year of Event.
43
4.5.1 Missing values
Missing values have always been a problem for researchers and it is up to researchers how
they deal with the missing values. According to Rubin (2002) there are three major kinds of
missing values mechanisms:
1. Missing completely at random:

2. Missing at random:
3. Not missing at random:
The missing values in the data is limited or scattered in the whole data. Limited is when only
few values are missing in the data and total is when all data is full of missing values. The
most commonly method used to solve missing values it to impute missing values with the
average value. The SPSS missing values analysis gives the complete insight of the missing
values in the data one year before bankruptcy. According to this analysis variables in each
sample of data have certain missing value such as :
X1T1,X3T1,X4T1,X5T1,X6T1,X8T1,X9T1,X10T1,X11T1,X12T1,X14T1,X16T1,X18T1,x1
9T1,X20T1,X21T1,X22T1,X23T1,X24T1,X25T1,X26T1,X27T1,X28T1,X29T1,X30T1,X31
T1,X32T1,X33T1,X34T1,X35T1,X36T1,x39T1,X40T1,X41T1 contains zero missing values
variable X7T1, X13T1,X15T1,X17T1 contains more than 50 missing value while variable
X2T1 is having five missing values. Moreover, the SAS and IBM SPSS have methods to
impute the missing values in the data. I have used IBM SPSS to detect and impute missing
values in the data using mean of nearby point method.
4.5.2 Outliers
Outliers are the values in the data that are significantly far away from the other observation in
the data (Hansen et al., 1983). The outlier affects the results of analysis method and also skew
data from normal distribution. The most commonly used methods to deal with outliers are
(Dhiren and Ghosh, 2012):
1. Do not disturb it and treat it like other data values.

2. Winsorizing
3. Eliminating
In the trimming method the outliers are eradicated from the data during analysis and
winsorizing is a method to assign an outlier highest or lowest value in the data that is not an
outlier. A general method of winsorinzing is to replace any data value over the ninety fifth
44
percentile of the sample data by the 95th percentile and any value below the 5th percentile by
5th percentile (Dhiren and Ghosh, 2012).
4.5.2.1 Solution of outliers

I applied the descriptive statistics technique of IBM SPSS to find out the 5th and 95th
percentile of each ratio and applied winsorizing method to remove the extreme values in the
data. Figure 4.1 gives the method used in SPSS to find 5th and 95th percentile of each year
data.
Figure 4.1 Method used in SPSS to find 5th and 95th percentile
Tables 4.2, 4.3, 4.4, 4.5 and 4.6 present the 5th and 95th percentile of each year data in
Apendix-A.
4.6 Descriptive Statistics of data samples

Descriptive Statistics elaborates the basic characteristics of the data and provide summaries
related to the samples and measures. They are used to show quantitative measures, mean,
standard deviation of data in a feasible manner (Ibe, 2014). I have applied SPSS to determine
the descriptive statistics of data. Tables 4.7, 4.8, 4.9, 4.10 and 4.11 in Appendix-A show the
descriptive statistics of one to five years data.
4.7 Summary
Since the data has been pre-processed and cleansed by using different statistical methods.
Hence it is ready to be used in the bankruptcy prediction models development. The next
chapter will be presenting this implementation.
45
Chapter 5: Model development and application
5.1 Introduction
This chapter consists of three parts, Part - 1 presents the application of data mining methods
using SAS enterprise miner, Part-2 elaborates the used of data mining algorithms using
WEKA software and Part-3 presents the classification of bankruptcy data using IBM SPSS
Modeller.
Part-1:
5.2 Overview
This part gives a brief description to the SAS enterprise miner and its predictive modelling
approach. Moreover, this section introduces the step by step implementation of the models
with brief introduction to the data mining model nodes used and their execution using SAS
programming.
5.3 SAS Enterprise miner and its predictive modelling

SAS enterprise miner (EM) is a tool to generate most reliable and accurate predictive and
illustrative models using huge amount of data. SAS enterprise miner use a data mining
process with five SEMMA steps, Sample, Explore, Modify, Model and Assess. Since, I have
already performed first three steps on data in chapter 4, so I will be performing last two
methods, model and assessment in this chapter.
SAS enterprise miner provides a GUI to perform different data mining tasks. The GUI
consists of Workspace where nodes can be dragged from a toolbar to create a process flow
diagram. Figure 5.1 elaborates the process of creating any project in SAS Enterprise miner.
46
Figure 5.1 step by step method of creating any project in SAS Enterprise miner
Open SAS Enterprise

miner
Create New Project
Create a Library
Create a Diagram
Assign a Data Source
Place Node in
Workspace and Execute
Perform Data mining

Using Execution Results
SAS enterprise miner have many features including data mining set of tools, an easy to use
GUI, more accurate predictions, development of better predictive models for later use and
text editor to write code to perform task through SAS enterprise guide. SAS also helps in the
goal of data mining process to develop predictive models. These models help to find rules for
prediction using variable and data from one data source. After creating better predictive
model, it can be applied to the new data source for prediction.
47
5.3 Application of the Models
To develop a bankruptcy prediction models it is required to have a data set. This data set needs to be
imported into the SAS miner. Since SAS does not understand this data set hence it is converted to SAS
data set to perform tasks by SAS miner. In the later step, the SAS dataset is divided into three parts,
Training, Validation and Testing and explored using stat explorer node. I have used 70% data as training
and 30% as Validation data to test the results. After the data have been divided into two categories,
different predictive model are used and compared. The validation data set is employed to save a modelling
node from over fitting the training data and to compare different models. Finally the results of these
models are acquired and best ones are considered. The Figure 5.2 presents step by step implementation of
the model generation using SAS EM:
Figure 5.2 The step by step implementation of the model generation using SAS EM
Data Set
Import Data Set

into SAS Miner
Data Set conversion into

SAS Data Set.
Partition of Data in Exploring the

Training and Descriptive Statistics
Validation Sets
Different Models
Implementation
Calculating Results including Type-1 and

Type-2 Errors
Model Accuracy
Final Results
48
In this phase I shall be applying prediction models nodes at 5 data sets separately to perform
bankruptcy prediction task, with a short introduction of the model employed.
5.3.1 Decision Trees

Decision trees divides huge amount of data by applying a series of rules. These algorithmic
principles break data into small pieces. These rules make the subgroups of steps that have less
mixture than overall data sample. Using these steps the overall focus of data is isolated with
similar target values. SAS miner creates grows prunes and assess decision tree models.
Chapter 2 elaborates structure of decision trees. There are two decision trees variations in
SAS, Decision Trees and HP Trees:
5.3.2 Decision Trees Model:

Decision tree node is used to generate this model. This node allows applying multipath
splitting of data using data variables. SAS applies best of the CHAID,CART and C4.5
algorithms using a hybrid approach ( SAS Institute Inc., 2003). Overall model accuracy using
Decision Trees Model is 67.2%, 56.0%, 69.0%, 63.0%, and 67.5% for one to five years
respectively. Table 5.1 presents bankruptcy prediction accuracy given in the Appendix-A.
Moreover, Figure 5.4 in Appendix B shows the classification bar chart and score mode for
each year.
5.3.3 High Performance Trees Model

This model also applies F-Test in finding the splitting rules. It also helps to create a tree
model with interval targets. Overall model accuracy using HP Tree Model is 61%, 68.3%,
70.5%, 62.0%, and 61.3 % for one to five years respectively. Table 5.2 presents bankruptcy
prediction accuracy given in the Appendix-A. Moreover, Figure 5.5 in Appendix B shows the
classification bar graph and score mode for each year.
5.3.4 Neural Network

Neural networks consist of billions of interlinked neurons like human brain that can send and
receive information from each other. They copy the style of humans the way they learn from
experience. SAS provide multi variations of neural networks like Neural Network, DM
Neural, Auto Neural and HP Neural. In this section I have applied each of these to analyse
and find their predictive classification accuracy.
49
5.3.5 Neural Network Model
This node model helps to generate, train, and test multilayer feed forward neural networks (
SAS Institute Inc., 2003). Overall model accuracy using Neural Network Model is 95.4%,
97.7%, 93.25%, 92.2%, and 90.1 % for one to five years respectively. Table 5.3 presents
bankruptcy prediction accuracy given in the Appendix-A. Moreover, Figure 5.6 in Appendix
B shows the classification bar graph and score mode for each year bankrupt and non-bankrupt
classification.
5.3.6 Auto Neural Model

Auto neural node could be found in the model group of SAS miner. This model is used to
find the optimal configurations for a neural network model. Auto Neural node model
performs only a small number of searches to find better network configuration. There are
many options used by this model to handle configuration like, one hidden node may contain
more than two neurons, iterations used estimate vector and fit vector, freeze past used layers
and error functions are also used (SAS Institute Inc., 2013). Overall model accuracy using
Auto Neural Model is 93.5%, 99.5%, 50.0%, 97.7%, 50.0 % for one to five years
respectively. Table 5.4 presents bankruptcy prediction accuracy given in the Appendix-A. In
addition to the classification accuracy table, Figure 5.7 in Appendix B shows the
classification bar graph and score mode for each year’s bankrupt and non-bankrupt
classification.
5.3.7 High Performance Neural Model

This node model produce multi-layer neural network which delivers information between
different layer map particular inputs to a predicted value. This helps in creating neural
networks on huge data sets in no time. This model node has two goals (SAS Institute Inc.,
2013):
1. Conducts efficient and rapid training of NN.

2. Generate easy to use and reliable model.
Overall model accuracy using HP Neural Model is 51.0%, 47.25.0%, 84.0%, 89.4%, and 54.6
% for one to five years respectively. Table 5.5 presents bankruptcy prediction accuracy given
in the Appendix-A. In addition to the classification accuracy table, Figure 5.8 in Appendix B
shows the classification bar graph, NN diagram and score mode for each year bankrupt and
non-bankrupt classification.
50
5.3.8 Data Mining Neural Model
The DMNeural node model is used to create additive nonlinear model. The major purpose of
the algorithm that is used in DMNeural node is to eradicate certain problems like, Nonlinear
estimation problem, Computing time, Finding global and optimal solution. The training
process of DMNeural creates eight functions. Each function performs a particular
functionality and their optimization is also executed individually. DMneural node model
choose a function that gives most appropriate results (SAS Institute Inc., 2013). Overall
model accuracy using DMNeural Model is 46.64%, 55.15%, 52.4%, 61.1%, 64.7 % for one
to five years respectively. Table 5.6 presents bankruptcy prediction accuracy given in the
Appendix-A. In addition to the classification accuracy table, Figure 5.9 in Appendix- B
shows the classification bar graph, NN diagram and score mode for each year bankrupt and
non-bankrupt classification.
5.3.9 Regression Model

This node also belongs to the model group of SAS miner. Regression node could be used to
create both the linear and logistic regression models. The linear regression predicts the target
using one or more input variables. The logistic regression requires and event of interest as a
function of input variables. There are two functions used in regression node model:
1. Link Function
2. Error function
Link function is used for the distribution problems and error function is used perform linear
regression on the data (SAS Institute Inc., 2013). Overall model accuracy using Regression
Model is, 46.64%, 55.15%, 52.4%, 61.1%, and 64.7 % for one to five years respectively. .
Table 5.7 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to
the classification accuracy table, Figure 5.10 in Appendix B shows the classification bar
graph, NN diagram and score mode for each year bankrupt and non-bankrupt classification.
5.3.10 High Performance Support Vector Machine Model

High performance Support Vector Machine is a supervised intelligent technique used to
conduct classification and regression analysis. The HP SVM nod model of SAS enterprise
miner require only one binary target variable in the form of 0 and 1. The input variables can
be of any type supported by SAS miner (SAS Institute Inc., 2013) Overall model accuracy
using HP SVM Model is 58.41%, 54.0%, 54.0%, 54.2%, and 48.29 % for one to five years
respectively. Table 5.8 presents bankruptcy prediction accuracy given in the Appendix-A. In
51
addition to the classification accuracy table, Figure 5.11 in Appendix B shows the
classification bar graph, NN diagram and score mode for each year bankrupt and non-
bankrupt classification
5.3.11 High Performance Regression Model

HP regression node model also provide the facility of linear regression and logistic logistics
using but in a high performance environment and using interval as well as binary class value.
It predicts the target values depending on the input variable. On the contrary to Regression
model this node model support interval, binary, nominal and ordinal class target values. HP
Regression can perform particular selection techniques:
1. Forward, backward and stepwise for interval targets.

2. Forward, backward, stepwise, LAR and LASSO for the selection methods.
Overall model accuracy using HP Regression Model is 99.0%, 50.0%, 47.25%, 49.0%, and
50.5 % for one to five years respectively. Table 5.9 presents bankruptcy prediction accuracy
given in the Appendix-A. In addition to the classification accuracy table, Appendix- B shows
the classification bar graph, NN diagram and score mode for each year bankrupt and non-
bankrupt classification
5.3.12 Memory Based Reasoning Model

Memory-Based Reasoning is similar to the Case Based Reasoning method. MBR node model
recognize same cases and implement the information that is acquired from the cases to a new
situation or record. This model uses K-Nearest method like CBR to predict target values. K-
nearest neighbour usually carries a data sample and a probe, the data sample contains a
collection of variables and probe has a specific value for each variable. The distance between
variable value and probe is calculated. The values that have smallest distance to the probe are
k-nearest neighbour to that probe (SAS Institute Inc., 2013). Overall model accuracy using
MBR Model is 52.1%, 61.9%, 59.5%, 61.3%, and 59.55% for one to five years respectively.
Table 5.10 presents bankruptcy prediction accuracy given in the Appendix-A.
In addition to the classification accuracy table, Figure 5.13 in Appendix B shows the
classification bar graph, NN diagram and score mode for each year bankrupt and non-
bankrupt classification accuracy.
In this part I have applied different data mining node models available on data sample. Each
model has its strengths and weaknesses. Chapter 6 elaborates an complete insight of each
52
model results and accuracy. Following is the final implementation diagram of all SAS data
mining models that I have used in this study for data set 1.
Figure 5.3 Final implementation diagram of models using SAS
53
Part 2:
This section gives a brief introduction to WEKA and method to apply data mining in WEKA.
This part also elaborates applications of data mining algorithms on the data samples using
WEKA software.
5.4 WEKA:
WEKA is open source software consisting of a group of algorithms to perform data mining
tasks on large amount of data. Using WEKA is possible to perform different data mining
related techniques on data like classification, regression, clustering and association rule
mining. [Mark et al.(2009)]. WEKA divides classification algorithm into different groups,
Bayes classifiers, Functions classifiers, Lazy classifiers, Meta classifiers, MI classifiers, rules
base classifier and trees classifiers. The Figure 5.14 gives the step by step implementation of
data mining algorithms on data using WEKA.
Figure 5.14 Final application diagram of models using WEKA
Open Data File
Pre-process
Select Bankrupt/Non-
Bankrupt as Target
Select Classification
Algorithms and apply
Select Different Test

options
Calculate Results with

Type-1 Type-2 Errors
Calculate Model
Accuracy
54
Since WEKA provides the algorithmic models so, this section represents applications these
models and their empirical findings of the classification accuracy using the above mentioned
implementation approach. I will be processing the confusion matrix and calculating the
classification accuracy in each case. In every model generated I have used 10 fold cross
validation technique to validate the accuracy of the model.
5.4.1 Naïve Bayes

The Naïve Bayes classifier algorithm gives a very simple method, with vivid semantics, to
propose, use and learn probabilistic informatics knowledge. This technique is used for
supervised data mining tasks in which the goal is to predict a target class of test variables,
while training contains the class information. The Naïve Bayes can be used on the binary,
missing, nominal class values. It is also efficient for the binary, numeric, empty nominal,
unary, nominal, missing attributes (George and Pat, 1995).
5.4.2 Naïve Bayes Model

The overall prediction accuracy considering both bankrupt and non-bankrupt firms using
Naïve Bayes Model is 85.8%, 51.1%, 85.8%, 52.2%, and 93.70 % for one to five years
respectively. Moreover, Table 5.11 in Appendix-A gives a detailed prediction accuracy of
both bankrupt and non-bankrupt firms.
5.4.3 BayesNet Model

BayesNET or Bayes network is a general network used to infer probability of event using the
observations of other events in the similar network (Sankaran and Ramesh, 2005). The
overall prediction accuracy considering both bankrupt and non-bankrupt firms using
BayesNet Model is, 85.8%, 51.1%, 88.0%, 51.1%, and 50.1% for one to five years
both bankrupt and non-bankrupt firms.
5.4.4 SMO OR SVM Model

Sequential minimal optimization algorithm was presented by Platt (1998) to resolve the SVM
quadratic programming problem. The overall prediction accuracy considering both bankrupt
and non-bankrupt firms using SMO OR SVM Model is 61.7%, 55.1%, 59.4%, 53.2% and
50.7% for one to five years respectively. Moreover, Table 5.13 in Appendix-A gives a
detailed prediction accuracy of both bankrupt and non-bankrupt firms.
55
5.4.5 RBFNetwork Model
Radial base function network that employs radial basis function as activation functions is
based on neural network logic to solve problems (Schwenker et al., 2001). The overall
prediction accuracy considering both bankrupt and non-bankrupt firms using RBFNetwork
Model is 61.7%, 77.5%, 63.5%, 55.7% and 88.3% for one to five years respectively.
Moreover, Table 5.14 in Appendix-A gives a detailed prediction accuracy of both bankrupt
and non-bankrupt firms.
5.4.6 Kstar Model

K* belongs to the group of instance based classifiers but it uses an entropy based distance
function instead of using other distance function (John and Leonard, 1995). The overall
prediction accuracy considering both bankrupt and non-bankrupt firms using KSTAR Model
is 100%, 49.8%, 50.3%, 50.4%, and 50.2% for one to five years respectively. Moreover,
Table 5.15 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-
bankrupt firms applying Kstar model.
5.4.7 LWL Model

Locally weighted learning algorithm is also instance based algorithm but uses Naïve Bayes
for the classification problems (Eibe et al., 2003). The overall prediction accuracy using both
bankrupt and non-bankrupt firms using LWL Model is 51.7%, 50.3%, 52.1%, 93.6%, and
49.05% for one to five years respectively. Moreover, Table 5.16 in the Appendix-A gives a
detailed prediction accuracy of both bankrupt and non-bankrupt firms using LWL model.
5.4.8 AdaBoostM1 Model

This algorithm belongs to the group of nominal class classifiers and can classify only nominal
class problems using the boosting technique. It resolves the problem of over fitting (Yaov and
Robert, 1996). The overall prediction accuracy considering both bankrupt and non-bankrupt
firms using AdaBoostM1 Model is 58.6%, 57.0%, 56.2%, 54.4% and 49.8% for one to five
years respectively. Moreover, Table 5.17 in Appendix-A gives a detailed prediction accuracy
of both bankrupt and non-bankrupt firms manipulating AdaBoostM1 model.
5.4.9 ClassificationViaRegression Model

This algorithm belongs to the class of classification algorithms using regression for
performing data mining .It uses the binary method of target once regression model have been
created (Frank et al., 1998). The overall prediction accuracy considering both bankrupt and
non-bankrupt firms using ClassificationviaRegression Model is 50.6%, 47.8%, 67.75%,
56
48.22%, and 56.9% for one to five years respectively. Moreover, Table 5.18 in Appendix-A
gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms applying
ClassificationViaRegression model.
5.4.10 Decorate Model

Decorate belongs to the meta learner group of algorithms in WEKA. It is the most accurate
meta algorithm and classifies using particular intelligent training cases. Further information
about this algorithm could be found in the conference paper by (Melville et al., 2003).The
overall prediction accuracy considering both bankrupt and non-bankrupt firms using Decorate
Model is 55.0%, 52.6%, 51.8%, 53.4%, and 63.79% for one to five years respectively.
Moreover, Table 5.19 in Appendix-A gives a detailed prediction accuracy of both bankrupt
and non-bankrupt firms Decorate model utilizing Decorate model.
5.4.11 Dagging Model

This algorithm mimics the base class using the disjoint and stratified fold out of training data.
Dagging Model is 62.3%, 59.9%, 63.06%, 61.9% and 63.79% for one to five years
both bankrupt and non-bankrupt firms employing Dagging model.
5.4.12 LogisticBoost Model

This meta algorithm also perform classification on data using regression considering
regression as base learner. I can also solve multi class issues with the classification data
mining. The overall prediction accuracy considering both bankrupt and non-bankrupt firms
LogisticBoost Model is 54.5%, 62.7%, 69.8%, 45.5% and 66.09% for one to five years
both bankrupt and non-bankrupt firms LogisticBoost model.
5.4.13 MultiBoostAB Model

This algorithm is a combination of AdaBoost and Wagging. It uses the capabilities of both
algorithms to reduce biasness and variance. It also gives lower error rate and is more
significant than Ada boost and Wagging (Geoffrey and Webb, 2000).
MultiBoostAB Model is 57.8%, 52.15%, 57.7%, 87.7% and 56.1% for one to five years
57
both bankrupt and non-bankrupt firms MultiBoostAB model.
5.4.14 Random Committee Model

This Meta algorithm use random base classifiers. The base classifier is created using distinct
random seed. The final result is the average of the predictions produced by the individual
distinct classifiers.
Random Committee Model is 54.0%, 49.5%, 51.9%, 50.4% and 50.4% for one to five years
both bankrupt and non-bankrupt firms Random Committee model.
5.4.15 HyperPipes Model

This algorithm belongs to the miscellaneous group of the WEKA algorithms. It uses
hyperPipe classifiers. Hyper pipe is created for each group of classes and it consists of all
points of that group. The observations are suited in the groups that contain most of the similar
observations.
HyperPipes Model is 49.6%, 48.50%, 48.60%, 49.3% and 46.9% for one to five years
both bankrupt and non-bankrupt firms HyperPipes model.
5.4.17 NNge Model

This algorithm is just like nearest neighbour but uses non nested exemplars. The overall
prediction accuracy considering both bankrupt and non-bankrupt firms using NNge Model is
50.8%, 50.0%, 52.3%, 44.3% and 48.8% for one to five years respectively. Moreover, Table
5.25 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt
firms using NNge model.
5.4.18 OneR Model

This algorithm is very simple classification algorithm and works on creating a 1R classifier.
Further information related to this algorithm could be obtained from the research article by
Holte (1993).
58
OneR Model is 51.3%, 51.02%, 51.3%, 51.02% and 50.05% for one to five years
both bankrupt and non-bankrupt firms using OneR model.
5.4.19 ZeroR Model

This is the algorithm belongs to rules groups in WEKA. This is the simplest algorithm and
completely relies of the target variable without taking in consideration the predictors. The
overall considering accuracy using both bankrupt and non-bankrupt firms using ZeroR Model
is 49.5%, 49.5%, 49.5%, 49.5% and 49.5% for one to five years respectively. Moreover,
bankrupt firms using ZeroR model.
5.4.20 Random Forest Model

This algorithm is combination of tree predictors and each tree predictor relies on the values of
a random vector. Breiman (2001) has given a complete insight of random forest algorithm.
Random Forest Model is 51.2%, 49.4%, 49.2%, 47.05% and 50.3% for one to five years
both bankrupt and non-bankrupt firms using Random Forest model.
5.4.21 J48 Model

This algorithm is used to produce a C4.5 pruned or unpruned decision tree. The overall
prediction accuracy considering both bankrupt and non-bankrupt firms using J48 Model is
52.5%, 48.6%, 49.5%, 51.0% and 50.9% for one to five years respectively Moreover, Table
firms using J48 model.
5.4.22 SimpleCart Model

This algorithm belongs to the trees group of algorithms in WEKA and used to create
classification trees using fractional instances. The overall prediction accuracy considering
both bankrupt and non-bankrupt firms using SimpleCart Model is 49.86%, 49.87%, 50.15%,
58.4%, and 53.51% for one to five years respectively. Moreover, Table 5.30 in Appendix-A
gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using
SimpleCart model.
59
5.4.23 END Model
This algorithm belongs to the meta group of algorithms in WEKA. It is used to solve
problems related to two class classifiers. The overall prediction accuracy considering both
bankrupt and non-bankrupt firms using END Model is 52.5%, 52.4%, 54.1%, 51.0% and
52.5% for one to five years respectively. Moreover, Table 5.31 in Appendix-A gives a
detailed prediction accuracy of both bankrupt and non-bankrupt firms using End model.
60
Part 3
This section consists of a brief introduction to IBM SPSS, application of MLP neural
networks, different variations of decision trees and nearest neighbour algorithm to predict
bankruptcy.
5.5 IBM SPSS

This is a program developed and designed to perform predictive analytics using different
machine learning algorithms. It facilitates a wide range of algorithms and methods to perform
statistical and data mining tasks.
5.5.1 MLP neural network Model

This is the most extensively used neural network in data analysis and constructing classifiers
(Asil and Shahsavand, 2014). The basic of multilayer perceptron is based on hidden units
and input layers. Every hidden layer accepts a collection of input variables and the activation
function converts the results to final layers called output. The overall prediction accuracy
considering both bankrupt and non-bankrupt firms using MLP neural network Model is
100.0%, 86.2%, 94.5%, 58.7% and 51.6% for one to five years respectively. Moreover, Table
firms using MLP neural network model.
5.6 Models implementation using variations of decision trees

Since It has been discussed in chapter 2 that there are many types of decision trees algorithms
the most important one are CHAID, CHAID Exhaustive, CART and QUEST. Now I will be
implementing these algorithmic models to find out the most efficient and reliable model for
bankruptcy event prediction.
5.6.1 CHAID Model

CHAID tree based model was proposed by Kass ( 1980) to assess the relationship between
input variables and target variables. This model divides explanatory variables into
homogenous subgroups according to the response variable. In CHAID stepwise process it
recognize each input (explanatory) variable in turn of the least important with respect to
target (response) variable. If the difference is very below a particular level (p-value) then
both of the categories are considered to be linear and are combined in one category. Split
iteration ensures that best partition is found for each response (target) variable [Andrea et al.
(2014)]. The overall prediction accuracy using both bankrupt and non-bankrupt firms using
Naïve Bayes Model is 56.2%, 56.1%, 56.0%, 53.0% and 61.2% for one to five years
61
both bankrupt and non-bankrupt firms using CHAID model.
5.6.2 CHAID Exhaustive Model

CHAID Exhaustive algorithm was presented by Biggs et al. (1991). This algorithm is
basically based on three steps:
1. Merging
2. Splitting
3. Stopping
In the merging step each explanatory (input) variable merge non-important categories and
each final category have one child node. The merging step also calculates the p- value which
is used in the splitting step. The splitting step then find the best split for each predictor value
found in merging step and selects which one of the predictor value is to be used to split the
child node. In the final step the stopping step will stop the tree growing process:
➢ If the node is pure.

➢ If further split is not possible.
➢ If the node size is less than the node size specified by user.
➢ If the split node provides a child node whose node size is less than specified
by user.
➢ If the tree depth reaches user specified limit. Following is the classification
accuracy result performing CHAID Exhaustive on data using SPSS.
The overall prediction accuracy using both bankrupt and non-bankrupt firms using CHAID
Model is 58.5%, 82.2%, 55.2, 53.0% and 66.3% for one to five years respectively. Moreover,
bankrupt firms using CHAID Exhaustive model.
5.6.3 CART Model

Classification and regression trees are intelligent technique for creation of better prediction
model from the data provided. In the CART the models are generated by recursive
partitioning of data and each partition have significant fitting of the model which results in a
graphical decision tree. The classification tree are built for dependent variables that have
particular random values while regression trees are built for the dependent continuous and
ordered variables (Loh, 2011).
62
The overall prediction accuracy using both bankrupt and non-bankrupt firms using CART
bankrupt firms using CART model.
5.6.4 QUEST Model

Quick, unbiased, efficient statistical tree is a statistical algorithm for classification and data
mining proposed by (Loh and Shih, 1997). The major features of this algorithm are to:
1. Use unbiased variable selection.

2. Use Fisher’s LDA technique.
3. Impute missing values.
4. Predict variables with many categories.
The overall prediction accuracy using both bankrupt and non-bankrupt firms using QUEST
bankrupt firms using QUEST model.
5.6.5 K-NN Model

K- Nearest Neighbour (KNN) is the oldest and simple non-parametric classification
algorithmic technique. In the KNN a target is allocated to the most general target class among
its k-nearest neighbour. In K-NN classification approach the target is a membership class and
each object is inserted into this class by the majority closeness vote of its neighbour. The
features of KNN are its simplicity, easiness to interpret and greater accuracy rate [Hui et al.
(2011)].
The overall prediction accuracy using both bankrupt and non-bankrupt firms using K-NN
Model is 61.3%, 53.4%, 45.2, % and 47.1% for one to five years respectively. Moreover,
bankrupt firms using KNN model.
5.7 Summary
This chapter gives a complete insight of the models implementation, generation and overall
classification accuracy of each model using SAS miner, WEKA and IBM SPSS. Next step is
to critically analyse these results and select the most efficient model of data mining software
used in this chapter.
63
Chapter 6 Results Analysis and Critical Evaluation
6.1 Introduction
This chapter consists of brief description of Type-I and Type-II errors of bankruptcy prediction
models. This chapter also consists of the analysis and critical evaluation of the results obtained from
applications of models using SAS Enterprise Miner, WEKA and IBM SPSS.
6.2 Type-I Error

Type-I error is also called alpha error, this error happens when bankrupt firms are predicted as non-
bankrupt. Type-I error has a greater cost impact on banks than type-II error. In terms of credit
analysis, type-I error shows the loss of capital loan and interest related with a client that goes bust,
when he was predicted non-bankrupt. Hence, type-I error has a greater cost factor for banks than type-
II error (Neves and Vieira, 2006). According to Altman et al. (1977) type-I error costs are 35 times
higher for banks than type-II error costs.
According to Neves and Vieira (2006) overall Type-I error is calculated as:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑓𝑖𝑟𝑚𝑠 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑛𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡

𝑇𝑦𝑝𝑒 − 𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑁𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡
6.3 Type-II Error

Type-II error is also called beta error, this error happens when non-bankrupt firms are predicted as
bankrupt. In terms of credit analysis, type-II error causes loss to a business with as potential customer
that is healthy, but was predicted as bankrupt. Type-II error costs could be higher than type-I error
costs if a government decides to impose a formal early warning system. However, Type-I and Type-II
costs are not presented in most of the literature articles and remain mainly unrevealed Neves and
Vieira (2006).
According to Neves and Vieira (2006) overall Type-II error is calculated as:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑁𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑓𝑖𝑟𝑚𝑠 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡

𝑇𝑦𝑝𝑒 − 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡
6.4 Total Error

Total error of a predicted model is given as the sum of Type-I and Type-II errors divided by total
number of observations in the data.
According to Neves and Vieira (2006) total error is calculated as:
𝑇𝑦𝑝𝑒−𝐼 𝐸𝑟𝑟𝑜𝑟+𝑇𝑦𝑝𝑒−𝐼𝐼 𝐸𝑟𝑟𝑜𝑟

𝑇𝑦𝑝𝑒 − 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑎𝑚𝑝𝑙𝑒
64
6.5 Classification Accuracy
The classification accuracy of a bankruptcy prediction model is generally measured by the percentage
of correctly classified observations. The Classification accuracy is calculated as Neves and Vieira
(2006):
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑡𝑖𝑜𝑛𝑠

𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑎𝑚𝑝𝑙𝑒
6.6 Empirical Results Analysis

After applying Type-I error and Type-II error I have calculated the accuracy of each model for every
year. This section is divided into three categories according different software models.
1. Analysis of results of SAS enterprise miner models.

2. Analysis of results of WEKA models.
3. Analysis of results of IBM SPSS models.
6.6.1 Analysis of Results of SAS Enterprise Miner Models

The results obtained after the implementation of different SAS Models, it has been proved that four
SAS Miner models have given efficient result in bankruptcy prediction. Table 6.1 (Part-1) and
Figure 6.1, which consist of classification prediction accuracy of bankrupt firms prior five years,
clearly shows that four models, NN, HP Neural, Regression and HP Regression are giving bankruptcy
prediction accuracy more than 90% for each year before the event.
Table 6.1 (Part-2) and Figure 6.1, which also consist of classification prediction accuracy for non-
bankrupt firms prior five years, shows that Neural Network and Auto Neural models has given
bankrupt firms classification accuracy more than 90% for each year before the event. According to
Table 6.1 (Part-1) prediction accuracy of Neural networks is 95.90%,97.80%,95.50%,95.00% and
95% for starting from one to five years respectively before bankruptcy year, which shows that NN are
more efficient than other three models, as others have certain fluctuation in some years. Similarly,
Table 6.1 (Part-2) shows that Auto Neural model which is also a type of NN also gives 93%, 99.5%,
99%, 97.6% and 99% starting from year one to five respectively.
According to the research conducted in the field of bankruptcy prediction, various researchers have
used different statistical and intelligent methods to predict bankruptcy but Neural Networks and its
different types are most commonly used intelligent methods (kumar and ravi, 2007). Cadden (1991)
used neural network model to predict bankruptcy using three year ahead forecast, his classification
accuracy was 90%, 90% and 80% respectively for bankrupt firms and 100%,90% and 90% for non-
bankrupt firms. Moreover, Leshno and Spector (1996) also used Neural Network method to predict
bankruptcy, and obtained prediction accuracy of the two years ahead case 76.4% to 76.4%.
65
Table 6.1 Bankrupt and non-bankrupt five years ahead prediction accuracy table using SAS Enterprise miner models
(Part-1) Bankruptcy Prediction Accuracy (%) Prior Event (Part-2) Non-Bankruptcy Prediction Accuracy (%) Prior Event
Model Name One year Two Years Three Four Five One year Two Years Three Four Five Years
Years years Years Years years
Decision Trees 73.27 60.00 64.50 32.00 39.80 52.00 61.20 94.80 73.50 95.6
HP Trees 78.44 84.00 71.70 51.00 32.00 50.40 52.90 70.00 73.00 90.20
Neural Network 95.90 97.80 95.50 95.00 95.00 95.40 97.60 92.00 92.40 93.10
Auto Neural 94.00 99.50 0 97.80 0 93.00 99.50 99.00 97.60 99
HP Neural 90.00 95.00 95.00 90.00 97.8 12.00 0.00 73.00 88.90 12.00
DMNeural 59.30 93.70 92.27 74.20 39.80 63.57 16.60 13.00 48.00 90.51
Regression 98.70 95.00 92.27 94.42 95.90 99.50 0.00 0.00 6.20 3.40
HP SVM 67.70 40.70 40.70 38.70 33.40 49.13 67.70 67.70 70.00 63.57
HP Regression 98.00 92.27 95.60 95.60 93.00 100.00 0.00 0.00 3.00 3.00%
MBR 39.80 48.20 49.10 47.20 47.20 64.47 75.60 70.00 75.40 71.90
Banrkupt Prediction Accuracy Non-Bankrupt Prediction Accuracy
120 120
100 100
80 One year 80 One year
60 60
40 Two Years 40 Two Years
20 20
0 Three Years 0 Three Years
Four years Four years
Five Years Five Years
Figure 6.1 Bankrupt and non-Bankrupt firms prediction Accuracy chart
66
6.6.2 Analysis of Results of WEKA
The results obtained after the implementation of WEKA data mining algorithms have shown
that WEKA is also very good software to preform classification using different algorithms.
The Table 6.2 (Part-1) and Figure 6.2 clearly show that SimpleCart, RBFNetwork and
MultiboostAB are the most efficient algorithms to predict bankruptcy phenomena. The
prediction accuracy of SimpleCart algorithm is 89.60%, 70.00%, 89.80%, 86.40% and
85.70% starting from one to five years respectively for a case of five years ahead forecast of
bankruptcy. MultiboostAB algorithm is also showing good prediction accuracy of 82.10%,
76.29%, 80.40% for first, second and fourth year, but its classification accuracy is below 70%
for third and fifth year. The Figure 6.3 also represents that RBFNetwork is also a very good
predictor of bankruptcy with a prediction rate of over 70% in first two years, 90% in third and
fifth year, and 48% in the fourth years.
The non-bankrupt firms forecast is also handled efficiently by OneR, Hyperpipes and
Dagging algorithms. The Table 6.2 (Part-2) and Figure 6.3 evidently displays that prediction
classification of OneR is over 95.0% in case of five years ahead forecast. It can also be
observed that non-bankrupt classification accuracy of Dagging algorithm is more than 80%
for first four years and 77.6% for the fifth year.
Figure 6.2 Bankrupt firms five years ahead prediction accuracy using WEKA models chart
Non-Bankrupt Prediction Accuracy

150.00%
100.00%
50.00%
0.00%
MultiB…
Decora…
LogitBo…
BayesN…
Naïve…
Rando…
Rando…
HyperP…
RBFNet…
AdaBo…
Classifi…
Simple…
J48
SMO:
KSTAR:
Dagging:
NNge
ZeroR
END
LWL:
OneR
One year Two Years Three Years Four years Five Years
Figure 6.3 non-Bankrupt firms five years prediction accuracy using WEKA models chart
67
Table 5.2 Bankrupt and non-bankrupt firms five years ahead prediction accuracy table using WEKA models
(Part-1) Bankruptcy Prediction Accuracy Prior (Part-2) Non-Bankruptcy Prediction Accuracy Prior
Model Name One year Two Years Three Four Five Years One year Two Years Three Four Five Years
Years years Years years
Naïve Bayes: 92.00% 6.20% 79.70% 93.75% 92.80% 79.70% 96.30% 92.00% 10.50% 94.60%
BayesNet: 100.00% 6.20% 78.00% 57.10% 57.30% 79.70% 96.30% 98.00% 45.20% 43.00%
SMO: 73.70% 58.20% 62.50% 51.50% 55.60% 49.70% 51.90% 59.40% 53.20% 45.90%
RBFNetwork: 76.70% 75.40% 92.30% 62.90% 95.00% 46.70% 79.50% 34.60% 62.50% 81.70%
KSTAR: 100% 50.20% 49.80% 49.80% 52.80% 100% 47.40% 54.50% 51.00% 47.60%
LWL: 81.50% 61.20% 74.80% 91.60% 10.60% 21.90% 46.70% 29.50% 95.70% 87.50%
AdaBoostM1: 53.20% 51.00% 53.20% 83.80% 45.90% 64.00% 64.00% 37.20% 25.00% 49.80%
ClassificationviaRegression: 32.30% 29.31% 62.50% 18.70% 24.70% 68.90% 66.40% 73.06% 78.50% 89.22%
Decorate: 52.80% 92.70% 23.70% 55.20% 88.20% 57.30% 12.50% 79.90% 51.50% 16.60%
Dagging: 50.21% 37.90% 44.60% 52.80% 43.10% 74.40% 81.90% 81.50% 71.20% 84.50%
LogisticBoost: 68.10% 73.92% 68.90% 70.00% 60.30% 44.80% 51.50% 70.90% 21.20% 72.20%
MultiBoostAB 82.10% 76.29% 69.40% 80.40% 61.80% 33.60% 34.00% 46.10% 95.20% 50.31%
Random Committee 56.50% 51.30% 49.50% 53.20% 53.20% 51.50% 47.60% 54.30% 47.60% 50.40%
HyperPipes 19.20% 16.80% 16.80% 19.50% 16.20% 80.20% 80.20% 80.30% 80.20% 77.60%
NNge 53.01% 55.80% 57.90% 46.90% 53.50% 48.70% 44.30% 46.70% 41.59% 44.10%
OneR 6.00% 4.70% 6.00% 4.70% 5.30% 96.70% 97.50% 96.70% 96.70% 95.01%
ZeroR 39.60% 39.60% 39.60% 39.60% 39.60% 59.50% 59.50% 59.50% 59.50% 59.50%
Random Forest 55.20% 32.80% 42.00% 47.10% 36.60% 47.20% 66.16% 56.40% 47.00% 64.20%
J48 64.10% 40.80% 28.40% 54.00% 49.70% 40.90% 56.40% 70.60% 48.00% 52.10%
SimpleCart 89.60% 70.00% 89.80% 86.40% 85.70% 10.10% 29.74% 10.50% 30.40% 21.30%
END 64.00% 60.00% 66.20% 49.80% 64.00% 40.90% 44.80% 40.90% 52.20% 40.90%
68
6.6.3 Analysis of results of IBM SPSS models
The results obtained after the implementation of SPSS models have demonstrated that SPSS
can also be used to predict bankruptcy of a firm in an effective manner. The Table 6.3 (Part-
1) and Figure 6.4 effectively illustrate that Multi-Layer Perception Neural Network (MLP
Neural Network) is the most effective model to predict bankruptcy. The prediction accuracy
of this model is 100.00%, 90.40%, 98.10%, 74.40% and 32.10% starting from first year to
fifth year forecast respectively. It can also be observed that Classification and regression tree
(CART) model captured second position in prediction of bankruptcy. The classification
accuracy of CART model is 84.90%, 72.20%, 86.20%, 83.80% and 95.00%, one to five years
before bankruptcy respectively.
Non-bankrupt firms are also predicted by MLP Neural Network model. Table 6.3 (Part-2)
and figure 6.4 also presents the classification accuracy of non-bankrupt firms from 100.00%,
82.00%, 91.00%, 42.50%, 72.20% one to five years correspondingly. Figure 6.6 demonstrate
that Quick, Unbiased, Efficient Statistical Tree(QUEST) also provides a good classification
accuracy of 88.10%, 56.20%, 100.00% and 100.00% for fist four years and 0% for the fifth
year of non-bankrupt firms.
Table 6.3 Bankrupt and non-bankrupt firms five years prediction accuracy table using SPSS
(Part-1) Bankruptcy Prediction Accuracy Prior (Part-2)Non-Bankruptcy Prediction Accuracy Prior

Model Name One year Two Years Three Four Five Years One year Two Years Three Four years Five Years
Years years Years
MLP neural 100.00% 90.40% 98.10% 74.40% 32.10% 100.00% 82.00% 91.00% 42.50% 72.20%
network
CHAID 78.90% 41.80% 85.30% 12.90% 59.30% 33.60% 70.50% 26.70% 93.10% 63.10%
CHAID Exhaustive 65.10% 65.00% 75.20% 12.90% 91.40% 51.90% 100.00% 35.10% 93.10% 41.20%
CART 84.90% 72.20% 86.20% 83.80% 95.00% 30.80% 42.00% 26.30% 25.00% 10.30%
QUEST 88.10% 65.00% 56.20% 0.00% 100.00% 88.10% 56.20% 100.00% 100.00% 0.00%
K-NN 61.40% 57.60% 51.80% 50.30% 51.10% 61.20% 49.10% 42.70% 47.20% 44.90%
Bankrupt Prediction Accuracy Non-Bankrupt Prediction Accuracy

120.00% 120.00%
100.00% 100.00%
80.00% 80.00%
60.00% 60.00%
40.00% 40.00%
20.00% 20.00%
0.00% 0.00%
One year Two Years Three Years One year Two Years Three Years
Four years Five Years Four years Five Years
Figure 6.4 Bankrupt and non-bankrupt firms prediction accuracy
69
6.7 Critical Evaluation
There is one pitfall associated with all data mining software I have used in this empirical
study. Despite various advantages and characteristics of SAS enterprise miner used in this
study, there is one disadvantage, that it works on nodes and does not specify the name of the
algorithm used in the development of model. WEKA data mining software resolves this
problem but there is another problem associated with WEKA, that it does not provide
graphical user interface. Both of these problems are eliminated by IBM SPSS but I do not
have access to the complete data mining IBM SPSS modeller.
The dataset samples used in this study were also a big hindrance in performing different data
mining techniques. All data samples had a great deal of missing values. Though, I applied
IBM SPSS technique to eradicate missing values drawback yet, I am not sure that all the
values were imputed efficiently by IBM SPSS.
Final drawback of this approach is that it cannot predict the human faults and frauds. We
know that all financial statements are made by accountants and concerned staff of the
company. If they are making are not giving correct information about the company ratios
then these models are unable to predict the bankruptcy of the company. So, if the financial
ratios are faulty the result would also be accordingly faulty.
6.8 Summary
This chapter contains results of all major software used in this study. I have concluded that
the all models of software used in this study have their particular importance in the field of
bankruptcy prediction. The most important models to predict bankruptcy using SAS
Enterprise miner are, Neural Network, Auto Neural, Regression and HP Regression. The
most efficient models to forecast bankruptcy using WEKA are SimpleCart,
RBFNetwork,OneR and MultiboostAB. Considering IBM SPSS the most reliable models are
MLP Neural Network, CART and QUEST to classify bankruptcy prediction. Finally, the
main pitfall in the study is the missing values in the data.
70
Chapter 7 Conclusion and Future Directions
7.1 Conclusions
In this study I have used variety of data mining classification methods to deal with
bankruptcy prediction. I have applied numerous data mining models, using the most
commonly used software to predict bankruptcy more effectively as well as accurately.
In this dissertation, there were three major objectives to achieve, using five years prior
financial ratios of 464 bankrupt and 464 non-bankrupt firms. Firstly, to develop different
data mining models to predict bankruptcy using three data mining software, SAS Enterprise
miner, WEKA and IBM SPSS. Secondly, the application of these models, and analyse the
accuracy of each model separately. Thirdly, to obtain the most accurate model provided by
different data mining software individually. The first motivation of this study was to
understand financial distress that leads to bankruptcy, effects of bankruptcy, cost of
bankruptcy and the factors involved in bankruptcy. The second motivation was to find, most
commonly used data mining models used from 1932 to present and apply those models to test
their accuracy. Very vast research has been carried out in the field of bankruptcy prediction
because of the importance of the topic. Nevertheless, each research study has used only few
machine learning or statistical methods to predict bankruptcy.
To develop an effective data mining classification model, is a very significant but slightly
difficult task for financial organisations. These prediction models tests a new individual or
company, whether or not it will bankrupt. If the classification accuracy of these prediction
models is not efficient, this can lead to wrong decisions and cause huge financial lose (Tsai et
al., 2014).
To achieve goals of my study mentioned above, I developed 6 chapters and each chapter is a
building block to achieve my goal: Chapter one is related to introduction, Chapter 2 is related
to literature review, chapter 3 defines bankruptcy and its costs, chapter 4 gives a complete
insight of the data and test samples used, chapter 5 gives a detailed description of
development and application of each model using SAS EM, WEKA and SPSS, and Chapter 6
provides a critical evaluation of these effective models.
After carrying out an extensive research, in the field of bankruptcy prediction, I have
understood the importance of an effective model for bankruptcy prediction. Furthermore,
bankruptcy is an important phenomenon for a big or small company. Finally, I concluded that
71
most of the researchers only used one or two methods to predict bankruptcy. So, I chose to
apply a variety of data mining models using software, SAS Enterprise Miner, WEKA and
IBM SPSS.
Then, to give a better understanding of bankruptcy to the reader, corporate financial distress,
actual cause of bankruptcy, was defined and elaborated. Moreover, different stages of
financial distress, factors of financial distress, causes and results of corporate distress were
discussed. Later on, bankruptcy was defined and four types of costs associated with
bankruptcy were illustrated.
In the later step, data was gathered from FAME (Financial Analysis Made Easy) database.
This data was cleansed and pre-processed by applying statistical techniques and tools.
Missing values were minimized, using SPSS missing value imputation technique. Outliers
were handled using winsorization method. In addition, data was divided into five different
data sets prior to bankruptcy year. Since the research in the field of bankruptcy prediction,
shows that the selection of financial identifiers (ratios) is also very important factor for
creating an effective model. If significant identifiers are not selected, the results of the
developed model would not be accurate. By keeping in this in mind, I have chosen 41
financial ratios most commonly used in various research studies from different ratio groups,
Liquidity, Leverage, solvability, profitability, efficiency and cash flow .
Then, Chapter 5 consists of three parts, part-1 elaborate step by step procedure of model
development using SAS EM. I have developed 11 models using Decision Trees, HP Trees,
Neural Network, Auto Neural, HP Neural, DMNeural, Regression, HP SVM, HP Regression
and Memory Based Reasoning (MBR) nodes of SAS enterprise Miner and implemented these
models on the five years distinct data samples. The best bankruptcy prediction models using
SAS EM are Neural Network, Auto Neural, Regression and HP Regression. Later on, I have
illustrated a step by step process of model generation using WEKA, and developed 21
distinct models using Naïve Bayes, BayesNet, SMO, RBFNetwork, KSTAR, LWL,
AdaBoostM1,ClassificationviaRegression,Decorate,Dagging,LogisticBoost,MultiBoostAB,
Random, Committee, HyperPipes, NNge, OneR, ZeroR, Random Forest, J48, SimpleCart and
END algorithmic data mining models. The highest bankruptcy prediction model using
WEKA are SimpleCart, RBFNetwork,OneR and MultiboostAB. Finally, I gave a step by step
plan of model development using SPSS. I proposed, MLP neural network, CHAID, CHAID
Exhaustive, CART, QUEST and K-NN 6 individual models using IBM SPSS. The best
classification accuracy is given by MLP Neural Network to predict bankruptcy.
Finally, Chapter 6 critically evaluates the results provided by each software and model
separately. It is concluded that the classification accuracy of Neural Network model is higher
72
than all of the other models. In case of SAS EM, NN models provided results of 95.90%,
97.80%, 95.50%, 95.00%, and 95.00% and Auto neural provided classification accuracy of
93% , 99.5%, 99%, 97.6% and 99% in bankruptcy prediction using five years prior ratios of
the firms. Moreover, using WEKA SimpleCart data mining (DM) algorithm provided
89.60%, 70.00%, 89.80%, 86.40%, 85.70% classification accuracy for one to five years
respectively, on the other hand, RBFNetwork algorithm that works with hidden layers also
provided 76.70%, 75.40%, 92.30%, 62.90%, 95.00% bankruptcy prediction accuracy on a
five years financial ratios of different firms. Finally, MLP neural network model of IBM
SPSS also provided remarkable classification accuracy of 100.00%, 90.40%, 98.10%,
74.40% and 32.10% for one to five years respectively.
In the background history of bankruptcy prediction studies, the neural network models have
captured a significant place. Researches on the applications of NN models to financial
distress prediction problems inaugurated in the 1990s, and they are still operational in today’s
research. For two decades, researchers have verified the supremacy of NN models over
numerous statistical models such as MDA, logistic regression, and k-NN (Jeong et al., 2012).
This dissertation also acknowledges the supremacy of NN models over other data mining
models.
7.2 Future Directions

In this dissertation, I have employed about 37 distinct data mining classification models using
SAS EM, WEKA, and IBM SPSS, but many researchers have used only one or two
prediction models. I have come to the conclusion that NN models and their types are the most
effective models to predict bankruptcy. In future, it would be a fascinating subject to predict
bankruptcy using different financial statements instead of using financial ratios.
The bankruptcy prediction for five years ahead have been done in this study using numerous
data mining models, but financial statement, balance sheets, income statements, and
statements of cash flows could also be used in near future to predict bankruptcy. Moreover,
the models could also be used to predict bankruptcy of individuals in the near future.
I have applied many data mining models in this study to predict bankruptcy, but many other
methods are also available to predict bankruptcy. In future, research can also be conducted to
predict bankruptcy without using financial ratios and applying data mining on financial
statements.
73
Bibliography
Guoqiang Zhang, Michael Y. Hu, , B. Eddy Patuwo, Daniel C. Indro, 1999. Artificial neural networks in
bankruptcy prediction: General framework and cross-validation analysis. European Journal of
Operational Research, 116(1), pp. 16-32.
H. Kurniawan, Peter Nwe, Kok Thai, P. Ravi Kumar,V. Ravi,, 2008. Soft computing system for bank
performance prediction. Applied Soft Computing, 8(1), pp. 305-315.
SAS Institute Inc., 2003. Data Mining Using SAS® Enterprise MinerTM A Case Study Approach..
Second Edition ed. Carry: NC: SAS Institute Inc..
A. Aamodt, E. Plaza, 1994. Case-based reasoning; foundational issues, methodological variations,

and system approaches. AI Communications, 7(1), pp. 39-59.
A. Garmroodi Asil, A. Shahsavand, 2014. Reliable estimation of optimal sulfinol concentration in gas
treatment unit via novel stabilized MLP and regularization network. Journal of Natural Gas Science
and Engineering., Volume 21, pp. 791-804.
A. Vellido, P. Lisboa and J. Vaughan , 1999. Neural Networks in Business A survey of

Applications(1992-1998).. Expert system applications, Volume 17, pp. 51-71.
A.I. Dimitras , S.H. Zankis, C. Zopounidis, 1996. A survey of business failures with an emphasis on
prediction methods and industrail Applications. European Journal of Operational Research , I(90), pp.
487-513.
Altman E. ,R. Haldeman , P. Narayanan , 1977. Zeta analysis: A new model to identify bankruptcy risk
of corporations.. Journal of Banking and Finance , 1(1), pp. 29-51.
Altman, E. ,B. Loris, 1976. A financial early warning system for over-the-counter broker-dealers.
Journal of Finance , 4(12), pp. 1201-1217.
Altman, E.I., Hotchkiss E., 2005. Corporate Financial Distress and Bankruptcy:Predict and Avoid
Bankruptcy, Analyze and Invest in Distressed Debt.. 3rd ed. New Jersy: Jhon Wiley & sons.
Altman, E.I, 1968. Financial Ratios, Discriminant Analysis and the prediction of corporate bankruptcy.
Journal of Finance, 4(1968), pp. 589-609.
Altman, E. I., 1984. A further Empirical Investigation of the bankruptcy cost question.. The Journal of
Finance., XXXIX(4), pp. 1067-1089.
Andrea Bichlera, , Arnold Neumaierb, , Thilo Hofmanna,, 2014. A tree-based statistical classification
algorithm (CHAID) for identifying variables responsible for the occurrence of faecal indicator bacteria
during waterworks operations. Journal of Hydrology, 519 Part A.(27), pp. 909-917.
Anon., 2014. The Street. [Online]

Available at: http://www.thestreet.com/gallery/tsc-bankruptcy2-decade/0/photo-closed.html
[Accessed 09 09 2014].
Arindam Chaudhuri and kajal De, 2011. Fuzzy Support Vector Machine For Bankruptcy Prediction.
Applied Soft computing , Volume 11, pp. 2472-2486.
74
Arindam Chaudhuri and Kajal De, 2011. Fuzzy Support Vector machine for bankruptcy prediction..
Applied Vector Machine for bankruptcy predction., 11(2), pp. 2472-2486.
Arindam Chaudhuri, Kajal De, 2011. Fuzzy Support Vector Machine for bankruptcy prediction.
Applied Soft Computing, 11(2), p. 2472–2486.
Arjana Brezigar-Masten , Igor Masten, 2012. CART-based selection of bankruptcy predictors for the
logit model.. Expert Systems With Applications, Volume 39, pp. 10153-10159.
B. Matarazzo,R. Slowinski and S. Greco, 2002. Rough approximation by dominance relations..

International Journal of Intelligent Systems., 17(2), pp. 153-171.
B. Wong, T. Bodnovich and Y selvi, 1997. Neural network applications in Business. A review and
analysis of the literature(1988-95). Decision support systems, Volume 19, pp. 301-320.
Bankruptcy prediction with rough sets. (2001) ERIM Report Series Research in Management (ERS-
2001-11-LIS).
Beaver, W., 1966. Finanacial Ratios as predictors of failure. Journal of Accounting Research , 3(1966),
pp. 71-111.
Bigss, D., Ville, B., and Suen, E., 1991. A Method of Choosing Multiway Partitions for Classification
and Decision Trees.. Journal of Applied Statistics., 18(1), pp. 49-62.
Blum, M., 1974. Failing company discriminant analysis. Journal of Accounting Research , 1(12), pp. 1-
25.
Bose, I., 2006. Deciding the financial health of dot-coms using rough sets.. Information &
Management., 43(7), pp. 835-846.
Branch, B., 2002. A cost of bankruptcy A review.. International Review of Financial Analysis, Volume
11, pp. 39-57.
Breiman, L., 2001. Random Forests. Machine Learning, Volume 45, pp. 5-32.
Bris, A., Welch, I., Zhu, N, 2006. The costs of bankruptcy: Chapter 7 liquidation versus Chapter 11
reorganization.. Journal of Finance, Volume 61, pp. 1253-1303.
Bryant, S. M., 1997. A case-based reasoning approach to bankruptcy prediction modeling. Intelligent
Systems in Accounting, Finance and Management., Volume 6, pp. 195-214.
Büker, S., Asikoglu, R., Sevil, G., 1997. Finansal Yönetim. 2nd ed. Eskişehir: Anadulu Üniversitesi.
C. Kao, S.-T. Liu, 2004. Prediction bank performance with financial forecasts: A case of Taiwan
commercial banks. Journal of Banking & Finance, Volume 28, p. 2353–2368.
Castagna, A. a. Z. M., 1981. The prediction of corporate failure: Testing the. Australian Journal of
Management, 1(6), pp. 23-50.
Chen, M.-Y., 2012. Visualization and dynamic evaluation model of corporate financial structure with
self-organizing map and support vector regression.. Applied Soft Computing, 12(8), p. 2274–2288.
75
Chen, Y.-S., 2012. Classifying credit ratings for Asian banks using integrating feature selection and
the CPDA-based rough sets approach.. Knowledge-Based Systems., Volume 26, pp. 259-270.
Chih-Fong Tsai , Jhen- Wei Wu, 2008. Using neural network ensembles for bankruptcy prediction and
credit scoring.. Expert systems with applications, Volume 34, pp. 2639-2649.
Chih-Fong Tsai, Yu-Feng Hsu, David C. Yen, 2014. A comparative study of classifier ensembles for
bankruptcy prediction. Applied Soft Computing, Volume 24, pp. 977-984.
Chih-Fong Tsai, Yu-Feng Hsu, David C. Yen, 2014. A comparative study of classifier ensembles for
bankruptcy prediction.. Applied Soft Computing, Volume 24, pp. 977-984.
Chih-Hung Wua, Gwo-Hshiung Tzeng, Yeong-Jia Good, Wen-Chang Fang, 2007. A real-valued genetic
algorithm to optimize the parameters of support vector machine for predicting bankruptcy.. Expert
Systems with Applications., 32(2), pp. 397-408.
Ching-Chiang Yeh, Der-Jang Chi and Ming-Fu Hsu, 2010. A hybrid approach of DEA, rough set and
support vector machines for business failure prediction.. Expert Systems with Applications., 37(2),
pp. 1535-1541.
Chuang, C.-L., 2013. Application of hybrid case-based reasoning for enhanced performance in
bankruptcy prediction.. Information Sciences., Volume 236, pp. 174-185.
Chudson, W., 1945. The Pattern of Corporate Financial Structure.. New York: National Bureau of
Economic Research..
Chulwoo Jeong, Jae H. Min, Myung Suk Kim , 2012. A tuning method for the architecture of neural
network models incorporating GAM and GA as applied to bankruptcy prediction.. Expert Systems
with Applications., Volume 39, pp. 3650-3658.
Chulwoo Jeong, Jae H.Min , Myung Suk Kim, 2012. A tuning method for the architecture of neural
network models incorporating GAM and GA as applied to bankruptcy prediction. Expert Systems
with Applications , 39(3), p. 3650–3658.
Curram, S. P., & Mingers, J., 1994. Neural networks, decision trees induction and discriminant
analysis: An empirical comparison.. Journal of the operational research society., 4(45), pp. 440-450.
David J. Denis, Diane K. Denis, 1995. Causes of financial distress following leveraged
recapitalizations. Journal of Financial Economics, 37(2), pp. 129-157.
David L. Olson, Dursun Delen, Yanyan Meng, 2012. Comparative analysis of data mining methods for
bankruptcy prediction. Decision Support Systems, 52(2), pp. 464-473.
Deakin, E. E., 1972. A Discriminant Analysis of Predictors of Business Failure. Journal of Accounting
Reasearch, 1(10), pp. 167-179.
Demir, H., 1997. . İşletmelerde Başarısızlığın Nedenleri ve Çıkış Yolları, Dış Ticaret Dergisi, 6.. 6 ed.
s.l.:s.n.
76
Dhiren Ghosh and Andrew Vogt, 2012. Outliers: An Evaluation of Methodologies. Section on survey
Research Methods., pp. 3455-3460.
E. Frank, Y. Wang, S. Inglis, G. Holmes, I.H. Witten, 1998. Using model trees for classification.
Machine Learning, 32(1), pp. 63-76.
E. Turban, J.E. Aronson, 2001. Decision Support Systems and Intelligent Systems.. 6th ed. Upper
Saddle River, NJ: Prentice Hall.
E.I.Altman,E. Hotchkiss, 2005. Corporate Financial Distress and Bankruptcy : predict and avoid
bankruptcy. 3rd ed. New Jersey.: John Wiley & Sons .
Edmister, R., 1972. An Empirical test of financial ratio analysis for small business failurer prediction.
Journal of financial and quantitative analysis, 2(7), pp. 1477-1493.
Eibe Frank, Mark Hall, Bernhard Pfahringer, 2003. Locally Weighted Naive Bayes. In: 19th Conference
in Uncertainty in Artificial Intelligence, 249-256,. New York, s.n.
Eisenbeis, R., 1977. Pitfalls in the application of discriminant analysis in business and economics.. The
journal of Finance, Issue 32, pp. 875-900.
Elam, R., 1975. The Efforts of lease data on the predictive ability of financial ratios.. The accounting
Review., pp. 25-43.
Erkki K. Laitinen, Teija Laitinen, 2000. Bankruptcy prediction Application of the Taylor's expansion in
logistic regression.. International Review of Financial Analysis, 9(4), pp. 327-349.
F.E.H. Tay, L. Cao, 2001. Modified support vector machines in financial time series forecasting.
Omega, 29(4), pp. 309-317.
F.E.H. Tay, L. Cao, 2002. Modified support vector machines in financial time series forecasting.
Neurocomputing, 48(1-4), pp. 847-861.
Fang-Mei Tseng , Yi-Chung Hub, 2010. Comparing four bankruptcy prediction models: Logit,
quadratic interval logit,neural and fuzzy neural networks.. Expert Systems with Applications., Volume
37, pp. 1846-1853.
Fang-Mei Tseng, L. Lin, 2005. A quadratic interval logit model for forecasting bankruptcy.. Omega
The international Journal of Management Science., Volume 33, pp. 85-91.
Fitzpatrick, P., 1932. A comparison of ratios of successful industrial enterprises with whose of failed
companies. s.l.:s.n.
Foreman, R. D., 2003. A logistic analysis of bankruptcy within the US local. Journal of Economics and
Business, Volume 55, p. 135–166.
Francis E.H. Tay and Lixiang Shen, 2002. Economic and financial prediction using rough sets model..
European Journal of Operational Research., 141(3), pp. 641-659.
77
Frank, J. ,. &. T. W., 1994. A comparison of Financial restructuring is distress exchanges and chapter
11 reorgnization.. Journal of Financial Economics., Volume 27, pp. 315-353.
G. Zhang, M. Hu, and B. Patuwo et al., 1999. Artificial neural networks in bankruptcy prediction:
General framework and cross-validation analysis.. European Journal operational research., Volume
116, pp. 16-32.
Gaughan, P., 2011. Merger, Acquisitions and Corporate Restructuring. 3rd ed. New York: Jhon Wiley.
Geoffrey I. Webb, 2000. MultiBoosting: A Technique for Combining Boosting and Wagging. Machine
Learning., 40(2), pp. 1-50.
Gilson, S. C. and Vetsuypens, M. R, 1994. CEO Compensation in Financial Distressed Firms: An

Empirical Analysis.. The Journal of Finance., 48(2), pp. 425-458.
Gleb Lanine , Rudi Vander Vennet, 2006. Failure prediction in the Russian bank sector with logit and
trait recognition models.. Expert Systems with Applications., Volume 30, pp. 463-478.
Gordini, N., 2014. A genetic algorithm approach for SMEs bankruptcy prediction: Empirical evidence
from Italy. Expert Systems with Applications, 41(14), p. 6433–6445.
Grablowsky, B.J. and Talley, W.K., 1981. "Probit and discriminant factors for classifying credit
applicants: A comparison.. Journal of Economics and Business, Volume 33, pp. 254-261.
Grammatikos, T., and Gloubos, G., 1984. Predicting bankruptcy of industrial firms in
Greece.Spoudai,. The University of Piraeus Journal of Economics and Business Statistics and
operations Research, pp. 3-4, 421-443.
Guoqiang Zhang, Michael Y. Hu, Eddy Patuwo, Daniel C. Indro, 1999. Artificial neural networks in
bankruptcy prediction: General framework and cross-validation analysis. European Journal of
Operational Research, 116(1), pp. 16-32.
H. Frydman, E.I. Altman, D. Kao, 1985. Introducing recursive partitioning for financial classification:
The case of financial distress.. Journal of Finance, 1(40), p. 269–291.
H., I., 1984. Corporate distress in Australia. Journal of Banking and finance., Issue 8, pp. 303-320.
H.Tisshaw, R. T. a., 1977. Going, Going, Gone - Four Factors Which predict.. Accountancy., p. 50.
Han, C.-S. P. a. I., 2002. A case-based reasoning with the feature weights derived by analytic
hierarchy process for bankruptcy prediction.. Expert Systems with Applications., 23(3), pp. 255-264.
Hansen, M., Madow, W., and Tepping, B., 1983. An Evaluation of Model-Dependent and Probability
Sampling Inferences in Sample Surveys.. J. Amer. Stat. Assoc., Volume 78, pp. 776-793.
Hanweck, G., 1977. Predicting bank failures. Research Papers in Banking and Financial Economics,
Financial Studies Section, Board of Governors of the Federal Reserve System. Washington D.C: s.n.
Hashi, I., 1997. The Economics of Bankruptcy, Reorganization and Liquidation. Lessons for east
European Transition Economics.. Russian And East European Finance and Trade, 33(4), pp. 6-34.
78
Hernan Pedro Vigier and Antonio Terceno, 2008. A model for the prediction of disease of firms by
means of fuzzy relations.. Fuzzy sets and systems., 159(17), pp. 2299-2316.
Holte, R., 1993. Very simple classification Rules Perform well on most commonly used datasets..
Machine Learning, Volume 11, pp. 63-91.
Hsueh-Ju Chen, Shaio Yan Huang and Chin-Shien Kin, 2009. Alternative Diagnosis of corporate
bankruptcy: A neuro fuzzy approach.. Expert Systems with appications., 36(4), pp. 7710-7720.
Hui Li , Young-Chan Lee , Yan-Chun Zhou , Jie Sun , 2011. The random subspace binary logit (RSBL)
model for bankruptcy prediction.. Knowledge-Based Systems, Volume 24, pp. 1380-1388.
Hui-Ling Chen, Bo Yang, Gang Wang, Jie Liu, Xin Xu, Su-Jing Wang, Da-You Liu, 2011. A novel
bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowledge-
Based Systems., 24(8), pp. 1348-1359.
Hui-Ling Chen, Bo Yang, Gang Wang, Jie Liu, Xin Xu, Su-Jing Wang, Da-You Liu, 2011. A novel
bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowledge-
Based Systems, 24(8), pp. 1348-1359.
Hyunchul Ahn and Kyoung-jae Kim, 2009. Bankruptcy prediction modeling with hybrid case-based
reasoning and genetic algorithms approach.. Applied Soft Computing, 9(2), pp. 599-607.
I.M. Premachandra , Gurmeet Singh Bhabra , Toshiyuki Sueyoshi, 2009. DEA as a tool for bankruptcy
assessment: A comparative study with logistic regression technique.. European Journal of
Operational Research., Volume 193, pp. 412-424.
Ibe, O. C., 2014. Introduction to Descriptive Statistics.. 2nd ed. Elsevier Inc.: Academic Press .
Ivica Pervan, Maja Pervan, Bruno Vukoja, 2011. PREDICTION OF COMPANY BANKRUPTCY USING
STATISTICAL. Croatian Operational Research Review, Volume 2, pp. 158-167.
J. C. NEVES, A. VIEIRA, 2006. Improving Bankruptcy Prediction with Hidden Layer Learning Vector
Quantization.. European Accounting Review, 15(2), pp. 253-271.
J. Levy, E. Mallach, P. Duchessi, 1991. A fuzzy logic evaluation system for commercial loan analysis.
Omega, International Journal of Management Science, 19(6), pp. 651-669.
J. Peltonen,S. Kaski, J. Sinkkonen,, 2001. Bankruptcy analysis with self-organizing maps in learning
metrics. IEEE Transactions on Neural Networks, 12(4).
J.P. Ignizio, J.R. Soltyas, 1996. Simultaneous design and training of ontogenic neural network
classifier. Computers Operations Research, 23(6), p. 535–546.
Jackendoff, N., 1962. A study of Published Industry Finanacial and Operating Ratios.. Philadelphia:
Temple University, Bureau of Economic and Business Research.
Jae H. Min, Young-Chan Lee, 2005. Bankruptcy prediction using support vector machine with optimal
choice of kernel function parameters. Expert sysetms with applications., 28(4), pp. 603-614.
79
Jardin, P. d., 2014. Bankruptcy prediction using terminal failure processes. European Journal of
Operational Research.
Jie Sun and Hui Li, 2009. Financial distress early warning based on group decision making. Compters
and Operational Research., Volume 36, pp. 885-906.
Jodi Bellovary, Don Giacomino, Michael Akers, 2007 . A Review of Bankruptcy Prediction Studies:
1930 to Present. Journal of Financial Education, Volume 33, pp. 1-42.
Johan Huysmansa,Bart Baesens,Jan Vanthienen, Tony van Gestel , 2006. Failure prediction with self
organizing maps. Expert Systems with Applications, 30(3), p. 479–487.
John G. Cleary, Leonard E. Trigg, 1995. K*: An Instance-based Learner Using an Entropic Distance
Measure. In: 12th International Conference on Machine Learning. 108-114. s.l., s.n.
Junyoung Heo and Jin Yong Yang , 2014. AdaBoost based bankruptcy forecasting of Korean
construction companies.. Applied Soft Computing, Volume 24, pp. 494-499.
K.C. Lee, I. Han, Y. Kwon, 1996. Hybrid neural network models for bankruptcy predictions. Decision
Support Systems, Volume 18, pp. 63-72.
K.F. Lam, J.W. Moy, 2002. Combining discriminant methods in solving classification problems in two-
group discriminant analysis. European Journal of Operational Research, Volume 138, pp. 294-301.
K.Kim, 2004. Financial time series forecasting using support vector machines. Neurocomputing ,
Volume 55, pp. 307-319.
K.S Shin T.S Lee H.J Kim, 2005. An application of support vector machines in bankruptcy prediction
model. Expert Systems with Applications, Volume 28, pp. 127-135.
Kalay, A., Singhal, R., Tashjian, E, 2007. Is Chapter 11 costly?. Journal of Financial Economics., Volume
84, pp. 772-796.
Kaplan, S., 1994. Campeau's Acquisition of federated post-bankruptcy results.. Journal of Financial
Economicss., Volume 35, pp. 123-136.
Karels, G. V. and Prakash, A. P., 1987. Multivariate Normality and Forecasting of Business
Bankruptcy.. Journal of Business Finance and Accounting. , 14(4), pp. 573-593.
Kass, G., 1980. An Exploratory Technique For Investigating Large Quantities of Categorical data..
Applied Statistics., 29(2), pp. 119-127.
Kathleen McMillan and Jonathan Dundee, 2011. How to Write Dissertation and Project Reports.. 2nd
ed. Dundee: Pearson.
Keasey, K. and R. Watson. , 1986. The prediction of small company failure: Some behavioral evidence
for the UK.. Accounting and Business Research, Issue 17, pp. 49-57. .
Keskin, Y., 2002. İşletmelerde Finansal Başarısızlığın Tahmini, Çok Boyutlu Model Önerisi ve
Uygulaması, Doktora Tezi, Hacettepe Üniversitesi.. s.l.:s.n.
80
Ketz, J. E., 1978. The effect of general price-level adjustments on the predictive ability of. Journal of
Accounting Research, Supplement(16), pp. 273-284.
Kiviluoto, K., 1998. Predicting bankruptcies with the self-organizing map. Neurocomputing, 21(1-3),
p. 191–201.
Kolodner, J., 1991. Improving human decision making through case-based decision aiding.. AI
Magazine, 12(2), pp. 52-68.
Korol, T., 2014. A fuzzy logic model for forecasting exchange rates.. Knowledge-Based Systems,
Volume 67, pp. 49-60.
Kyung-Shik Shin, Taik Soo Lee, Hyun-jung Kim, 2005. An application of support vector machines in
bankruptcy prediction model.. Expert Systems with Applications., 28(1), pp. 127-135.
Laitinen, E., 1991. Financial ratios and different failure processes.. Journal of Business Finance &
Accounting, 5(18), pp. 649-673.
Lennox, C., 1999. The accuracy and incremental information content of audit reports in predicting
bankruptcy.. Journal of Business Finance & Accounting., 26(5/6), pp. 757-778.
Liang, B. J. a. T., 1995. Fuzzy indexing and retrieval in case-based system.. Expert Systems with
Applications., 8(1), pp. 135-142.
Lili Sun, Prakash P. Shenoy, 2007. Using Bayesian networks for bankruptcy prediction: Some
methodological issues. European Journal of Operational Research, 180(2), pp. 738-753.
Lin, F.Y. and McClean, S, 2000. The prediction of Financial Distress Using Structured Financial Data
From the Interne.. IJCSS International Journal of Computers Science and Signal, 1(1), pp. 43-57.
Lin, T.-H., 2009. A

crossmodelstudyofcorporatefinancialdistresspredictioninTaiwan:Multiplediscriminantanalysis,logit,p
robitandneuralnetworksmodels.. Neurocomputing, Volume 72, pp. 3507-3516.
Loh, W.-Y. and Shih, Y.-S, 1997. Split Selection Method For Classification Trees.. Statistica Sinica,
Volume 7, pp. 815-840.
Loh, W.-Y., 2011. Classification and Regression Trees.. 1st ed. NY: Willey & Sons Inc. .
Lugovskaja, L., 2009. Predicting default of Russian SMEs on the basis of financial and non-financial
variables",. Journal of Financial Services Marketing,, 14(4), pp. 301-313.
M. Adnan Aziz Humayon A. Dar, 2006. "Predicting corporate bankruptcy: where we stand?. The
international Journal of business in society., 6(1), pp. 18-33.
M. odom and R. Sharda, 1990. A neural network model for bankruptcy prediction. in Proc. Int. Joint
Conf. Neural Networks. San Diego, CA, s.n.
Makridakis, S., 2001. Insider Trading Behavior Prior to Chapter 11 Bankruptcy Announcements..
Journal of Business Research, 54(1), pp. 63-70.
81
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, 2009.
The WEKA Data Mining Software: An Update;. SIGKDD Explorations, 11(1), pp. 1-50.
Martin, D., 1977. Early warning of bank failures: A logit regression approach.. Journal of Banking and
Finance., Volume 1, pp. 249-276.
McKee, T., 2000. Developing a bankruptcy prediction model via rough sets theory. International
Journal of Intelligent Systems in Accounting, Finance and Management, Volume 9, pp. 59-173.
Mensah, Y. M., 1983. The differential Bankruptcy predictive ability of specific price level
adjustments:Some Empirical Evidence. The Accounting Review, LVIII(2), pp. 228-246.
Merwin, G., 1942. Financial Small corporations in five manufacturing industries, 1926-1936. New
York: National Bureau of Economic Research.
Meyer, P. and H. Pifer., 1970. Prediction of bank failures.. JOUHli1l of Finance, 4(25), pp. 853-868.
Morris, E. H. a. R., 1983. The significance of Base year in developing Failure prediction models..
Journals of Business Finance and Accounting., pp. 209-223.
Myong-Jong Kim and Dae-Ki Kang, 2010. Ensemble with neural networksn for bankruptcy
prediction.. Expert systems with applications, Volume 37, pp. 3373-3379.
Myoung-Jong Kim, Dae-Ki Kang, 2009. Ensemble with neural networks for bankruptcy prediction.
Expert Systems with Applications, 37(4), pp. 3373-3379.
Ning Chena, Bernardete Ribeiro, Armando Vieira, An Chena, 2013. Clustering and visualization of
bankruptcy trajectory using self-organizing map.. Expert Systems with Applications., 40(1), pp. 385-
393.
Ohloson, J. A., 1980. Financial Ratios and the probabilistic pridiction of Bankruptcy. Journal of
Accounting Research , 18(1), pp. 109-131.
O'Leary, D., 1992. On bankruptcy information systems. European Journal of Operational Research,
56(1), pp. 67-69.
Opler, T. C. and Titman, S., 1994. Financial Distress and Corporate Performance. The journal of
Finance, 18(1), pp. 109-131.
P. Melville, R. J. Mooney, 2003. Constructing Diverse Classifier Ensembles Using Artificial Training
Examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510. New York,
s.n.
P. Ravi Kumar , V. Ravi, 2007. Bankruptcy Prediction in banks and firms via statistical and intelligent
techniques - A review. European Journal of Operational Research, Volume I, pp. 1-28.
Paliwal, M., and Kumar, U., 2009. Neural networks and statistical techniques: A review of
applications.. Expert Systems with Applications, 36(1), pp. 2-17.
82
Pawlak, Z., 1982. Rough Sets. International journal of Computer and Information Science, Volume 11,
pp. 341-356.
Perold, F., 1999. Long term Capital Management Case Study Harvard Business School. s.l.:s.n.
Pindodo,J. and Rodriques, L.F., 2004. Parsimonious Models of Financial Insolvency in Small
Companies. Small Business Economics, pp. 51-66.
Pirooz Shamsinejad, Mohammad Saraee and Farid Sheikholeslam, 2010. A New Path Planner for
Autonomous Mobile Robots Based on Genetic Algorithm”, the 3rd IEEE International Conference on
Computer Science and Information Technology (ICCSIT 2010). Chengdu, China, IEEE, pp. 115-120.
Pompe P., Feedlers A., 1997. Using Machine Learning, Neural Networks and statistics to predict
Corporate Bankruptcy. s.l., s.n., pp. 267-276.
Pulvino, T., 1999. Effects of bankruptcy court protection on asset sales.. Journal of financial
Economics., Volume 52, pp. 151-186.
R. Slowinski,S. Greco, B. Matarazzo, 2001. Rough sets theory for multicriteria decision analysis..
European Journal of Operational Research., 129(1), pp. 1-47.
R. Susmaga,C. Zopounidis,A.I. Dimitras, R. Slowinski, 1999. Business failure prediction using rough
sets. European Journal of Operational Research, Volume 114, pp. 263-280.
R.Slowinski and J. Stefanowski., 1994. RoughDas: Rough set based data analysis system, Version 2.0,
User's Guide Book. Pozan, Poland.. s.l.:s.n.
Rajeev Singhal, Yun (Ellen) Zhu, 2013. Bankruptcy risk, costs and corporate diversiﬁcation. Journal of
Banking & Finance, Volume 37, pp. 1475-1489.
Rubin, D. B., 2002. Statistical Analysis With Missing Data. 2nd ed. New York: Wiley.
S. Greco, B. Matarazzo, R. Slowinski, 1998. A new rough set approach to evaluation of bankruptcy
risk.. C. Zopounidis (Ed.), Operational Tools in the Management of Financial Risks, Kluwer Academic
Publishers, Dordrecht, pp. 121-136.
S. Greco, B. Matarazzo, R. Slowinski, 1998. A new rough set approach to multicriteria and
multiattribute classification.. Rough Sets and Current Trends in Computing, pp. 60-67.
S. Jones, D.A. Hensher, 2004. Predicting firm financial distress: A mixed logit model. Accounting
Review, 4(79), p. 1011–1038.
S.Balcaen, H. Ooghe , 2006. 35 years of studies on business failure: an overview of the classic
statistical methodologies and their related problems. The British Accountin Review, Issue 38, pp. 63-
93.
Sangjae Lee and Wu Sung Choi , 2013. A multi-industry bankruptcy prediction model using back-
propagation neural network and multivariate discriminant analysis.. Expert Systems with
Applications., Volume 40, p. 2941–2946.
83
Sangjae Lee and Wu Sung Choi, 2013. A multi-industry bankruptcy prediction model using back-
propagation neural network and multivariate discriminant analysis.. Expert Systems with
Applications, 40(8), pp. 2941-2946.
Sankaran Mahadevan, , Ramesh Rebba , 2005. Validation of reliability computational models using
Bayes networks. Reliability Engineering & System Safety., 87(2), pp. 223-232.
SAS Institute Inc., 2012. Applied Analytics Using SAS® Enterprise Miner. Cary: NC: SAS Institute Inc.
SAS Institute Inc., 2013. Getting Started with SAS® Enterprise Miner. Cary: NC: SAS Institute Inc..
SAS Institute Inc., 2013. SAS Enterprise Miner 13.2 Reference Help.. 1st ed. Carry: SAS Institute Inc..
Schwenker, Friedhelm; Kestler, Hans A.; Palm, Günthe, 2001. Three Learning Phases For Radial Basis
Function Network.. Neural Network, Volume 14, pp. 439-458.
Shapiro, A. F., 2002. The merging of neural networks, fuzzy logic, and genetic algorithms. Insurance:
Mathematics and Economics., 31(1), pp. 115-131.
Sinkey, J., 1975. A multivariate statistical analysis of the characteristics of problem. Journal of
Finance, 1(30), pp. 21-36.
Skogsvik, K., 1990. Current cost accounting ratios as predictors of business failure: The Swedish
case.. Journal of Business Finance and Accounting., 17(1), pp. 137-160.
Smith, R. and A. Winakor, 1935. Change in Financial Structure of Unsuccessful Industrial

Corporations.. Urbana: University of Illinois Press..
Sunday Olusanya Olatunji, Ali Selamat, Abdul Azeez, Abdul Raheem, 2011. Predicting correlations
properties of crude oil systems using type-2 fuzzy logic systems.. Expert Systems with Applications.,
38(9), pp. 10911-10922.
Sungbin Cho, Hyojung Hong and Byoung-Chun Ha, 2010. A hybrid approach based on the
combination of variable selection using decision trees and case-based reasoning using the
Mahalanobis distance: For bankruptcy prediction.. Expert Systems with Applications., 37(4), p. 3482–
3488.
Sung-Hwan Min, Jumin Lee and Ingoo Han, 2006. Hybrid genetic algorithms and support vector
machines for bankruptcy prediction. Expert Systems with Applications, 31(3), pp. 652-660.
T.-P. Liang, B. Jeng, Y.-M. Jeng, 1997. FILM: A fuzzy learning method for automated knowledge
acquisition. Decision Support Systems, Volume 21, p. 61–73.
Takahashi, K., Y. Kurokawa and K: Watase. , 1984. Corporate bankruptcy prediction in Japan.. Journal
of Banking and Finance , 2(8), pp. 229-247.
Tezcan, N., 2002. Firmalarda Mali Başarisizliğin Tahmini. Yüksek Lisans Tezi, Yıldız. s.l.:Teknik
Üniversitesi, Sosyal Bilimler Enstitüsü.
84
Theodossiou, P., 1991. Alternative models for assessing the financial condition of business in Greece.
Journal of Business Finance and Accounting., 5(18), pp. 697-720..
Theodossiou, P., 1991. Alternative models for assessing the financial condition of business in
Greece.. Journal of Business Finance & Accounting., 18(5), pp. 697-720.
Thomas E. McKee and Terje Lensberg, 2002. Genetic programming and rough sets: A hybrid
approach to bankruptcy classification. European Journal of Operational Research., 138(2), p. 436–
451.
Thorburn, K. S., 2000. Bankruptcy auctions: costs, debt recovery and firm survival.. Journal of
Financial Economics., Volume 58, pp. 337-368.
Toshiyuki Sueyoshia, Mika Goto, 2009. Methodological comparison between DEA (data envelopment
analysis) and DEA–DA (discriminant analysis) from the perspective of bankruptcy assessment..
European Journal of Operational Research, 199(2), p. 561–575.
Tseng-Chung Tang and Li-Chiu Chi, 2005. Predicting multilateral trade credit risks: comparisons of
Logit and Fuzzy Logic models using ROC curve analysis.. Expert Systems with Applications., 28(3), pp.
547-556.
Turko, R., 1999. Finansal Yönetim. Istanbul: Alfa Yayin.
V. Popova and J.C. Bioch, 2001. Bankruptcy prediction with rough sets, ERIM Report Series Research
in Management (ERS-2001-11-LIS). s.l.:s.n.
Vapnik, V., 1998. in: S. Haykin (Ed.) Statistical Learning Theory. Adaptive and Learning systems,
Volume 736.
Varun, B., 2009. Prediction of Business failure: a Comparison of Discriminat And logistic Regression
Analyses. Istanbul University Journal of the School of Business Administration, 38(1), pp. 21-36.
Vranas, A., 1992. The significance of financial characteristics in predicting business failure: An
analysis in the Greek context.. Foundations of Computing and Decision Sciences., 4(17), pp. 257-275.
W.J. Banks, L.A. Prakash, 1994. On the performance of linear programming heuristics applied on a
quadratic transformation in the classification problem.. European Journal of Operational Research.,
74(23), pp. 23-28.
West, R., 1985. A factor analytic approach to bank condition.. Journal of Banking and Finance,
Volume 9, pp. 253-266.
Wheelen, T. L. and Hunger, J. D, 2000. Strategic Management: Business Policy.. 7th ed. New Jersey:
Prentice Hall.
Whitaker, R. B., 1999. The Early Stages of Financial Distress.. Journal of Economics and Finance,
23(2), pp. 123-133.
Wruck, K. H., 1990. Financial distress, reorganization, and organizational efficiency.. Journal of
Financial Economics , Volume 27, pp. 419-444.
85
Yoav Freund, Robert E. Schapire, 1996. Experiments with a new boosting algorithm. In: Thirteenth
International Conference on Machine Learning,148-156. San Francisco, s.n.
Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko, 1995. Rough sets. Communications of the ACM
Association for Computing Machinery, 38(11), pp. 89-97.
Z.Pawlak, 1984. Rough classification. International Journal of Man–Machine Studies, Volume 20, p.
469–483.
Zhi Xiao, Xianglei Yang, Ying Pang, Xin Dang, 2012. The prediction for listed companies’ financial
distress by using multiple prediction methods with rough set and Dempster–Shafer evidence theory..
Knowledge-Based Systems., Volume 26, pp. 196-206.
Zhong Gao, Meng Cui and Lai-Man Po, 2008. Enterprise Bankruptcy Prediction Using Noisy-Tolerant
Support Vector Machine. Leicestershire, Inernational Seminar on Future Information Technology and
management Engineering.
Zijiang Yang, Wenjie You, Guoli Ji, 2011. Using partial least squares and support vector machines for
bankruptcy prediction.. Expert Systems with Applications., 38(7), pp. 8386-8342.
Zmijewski, M., 1984. Methodological issues related to the estimation of financial distress prediction
models. Journal of Accounting Research, Volume 22, pp. 59-82.
86
Appendix-A:
Table 4.2 Containing 5th and 95th percentile for the data one year before bankruptcy
Ratio 5th Percentile 95th Percentile

X1T1 -1.8087 0.2034
X2T1 0.03740 7.2770
X3T1 -0.03.100 .498100
X4T1 -11.92 0.533
X5 T1 -1.4155 .210230
X6 T1 .02918 2.9812
X7 T1 -.918965 .188265
X8 T1 .00 2.00
X9 T1 0.037400 0.9623
X10T1 -.8959 .2414
X11T1 0.03740 2.220
X12T1 .0025 .6847
X13T1 -.918965 .1940
X14T1 -1.1998 .2599
X15T1 -.9189 .1882
X16T1 .04000 1.43100
X17T1 -.920121 0.18837
X18T1 0.037400 1.4271
X19T1 0.03739 5.6254
X20T1 -70.6875 39.1222
X21T1 0.03740 55.2085
X22T1 -1.00 0.00
X23T1 -4.00 .00
X24T1 -5.00 00
X25T1 .000 .788467
X26T1 .03739 1.9779
X27T1 .037739 .9996
X28T1 -50.0074 4.201
X29T1 .005612 5.11744
X30T1 -3.6791 1.4446
X31T1 -.06169 .5469
X32T1 .03739 4.1172
X33T1 .03195 2.9812
X34T1 -3.8257 5.0731
X35T1 -39779.2582 130226.9675
X36T1 -25.6936 52.7188
X37T1 .037395 5.6457
X38T1 .037395 105.4681
X39T1 -.029436 .534420
X40T1 .037395 89.23444
X41T1 -.031650 .497679
87
Table 4.3 Containing 5th and 95th percentile for the data 2 year before bankruptcy.

X1T2 -1.115026 .159229
X2T2 .037400 8.751000
X3T2 -.027860 .501547
X4T2 -8.560863 .519579
X5 T2 -.989231 .193456
X6 T2 .020722 2.735512
X7 T2 -.955825 .201191
X8 T2 .037395 1.703400
X9 T2 .037395 .953118
X10T2 -.498550 .233775
X11T2 .037395 1.689679
X12T2 .001994 .729849
X13T2 -.955825 .202553
X14T2 -.827642 .206167
X15T2 -.955825 .201191
X16T2 .037395 1.047462
X17T2 -.955825 .201191
X18T2 .037395 1.047462
X19T2 .037395 6.960619
X20T2 -80.712785 30.762810
X21T2 .037400 79.994575
X22T2 -.989223 .192470
X23T2 -4.007975 .435580
X24T2 -5.241754 .331631
X25T2 0.000000 .783154
X26T2 .037395 2.424602
X27T2 .037395 1.184873
X28T2 -44.313079 5.601510
X29T2 .004474 7.137088
X30T2 -3.781695 1.531244
X31T2 -.052498 .628665
X32T2 .037395 5.029267
X33T2 .024548 2.735016
X34T2 -4.211561 3.792801
X35T2 -48565.080866 122014.484723
X36T2 -41.690082 47.516025
X37T2 .037395 6.977095
X38T2 .037395 103.189517
X39T2 -.028087 .539383
X40T2 .037395 104.394512
X41T2 -.027860 .539383
88

X1T3 -.972076 .141759
X2T3 .037400 8.890000
X3T3 -.016905 .468870
X4T3 -6.905555 .511092
X5 T3 -.920121 .188337
X6 T3 .026572 2.737783
X7 T3 -.774949 .223634
X8 T3 .037395 1.541811
X9 T3 .037395 .941941
X10T3 -.375261 .212126
X11T3 .037395 1.534541
X12T3 .001649 .710965
X13T3 -.774949 .225917
X14T3 -.818240 .213688
X15T3 -.774949 .223634
X16T3 .037395 .952067
X17T3 -.774949 .223634
X18T3 .037395 .952067
X19T3 .037395 8.166963
X20T3 -52.219235 29.680575
X21T3 .037395 82.063272
X22T3 -.892493 .201859
X23T3 -4.720051 .354988
X24T3 -7.170293 .232486
X25T3 0.000000 .713828
X26T3 .037395 2.390569
X27T3 .037395 1.211031
X28T3 -29.093187 6.118316
X29T3 .007012 6.618499
X30T3 -3.698125 1.328322
X31T3 -.029597 .587621
X32T3 .037395 4.839647
X33T3 .031965 2.721380
X34T3 -3.305242 3.313479
X35T3 -26.372779 54.999548
X36T3 .037395 8.169533
X37T3 .037395 89.432601
X38T3 -.016905 .474131
X39T3 .037395 105.529853
X40T3 -.016368 .484197
X41T3 .031965 4.839647
89

X1T4 -1.219065 .136217
X2T4 .037400 8.865000
X3T4 -.013334 .460847
X4T4 -6.094200 .507600
X5 T4 -.955825 .201191
X6 T4 .037395 2.700640
X7 T4 .028180 2.981235
X8 T4 .037395 1.367371
X9 T4 .037395 .936470
X10T4 -.416678 .223132
X11T4 .037395 1.321276
X12T4 .002393 .746375
X13T4 .028180 2.981235
X14T4 -.994880 .195949
X15T4 .028180 2.981235
X16T4 .036170 .887481
X17T4 .028180 2.981235
X18T4 .036170 .887481
X19T4 .037395 6.167819
X20T4 -68.358510 49.605810
X21T4 .037395 83.020426
X22T4 -.940580 .208534
X23T4 -5.099491 .384648
X24T4 -5.571105 .281673
X25T4 0.000000 .701105
X26T4 .037395 2.577224
X27T4 .037395 1.024419
X28T4 -20.857862 5.726517
X29T4 .008468 6.916481
X30T4 -3.820015 1.493944
X31T4 -.027031 .624435
X32T4 .037395 4.408267
X33T4 .037395 2.683662
X34T4 -3.345348 4.089810
X35T4 -22860.596748 92513.787496
X36T4 -25.852989 43.391217
X37T4 .037395 6.171712
X38T4 .037395 105.721041
X39T4 -.018068 .479826
X40T4 .037395 109.861564
X41T4 .037395 .496959
90

X1T5 -1.065177 .184879
X2T5 .037400 8.210000
X3T5 -.016852 .484403
X4T5 -4.427293 .536836
X5 T5 -.774949 .223634
X6 T5 .037198 2.824415
X7 T5 .020722 2.735512
X8 T5 .037395 1.316398
X9 T5 .037395 .939327
X10T5 -.322255 .233877
X11T5 .037395 1.275609
X12T5 .003160 .668179
X13T5 .020722 2.735512
X14T5 -.846037 .223866
X15T5 .020722 2.735512
X16T5 .037395 .907009
X17T5 .020722 2.735512
X18T5 .037395 .895978
X19T5 .037395 8.613158
X20T5 -85.765685 70.377725
X21T5 .037395 70.371335
X22T5 -.834801 .225834
X23T5 -6.172182 .467050
X24T5 -7.063776 .339079
X25T5 0.000000 .665245
X26T5 .037395 2.500346
X27T5 .037395 .999834
X28T5 -22.505820 7.307410
X29T5 .009204 6.400484
X30T5 -3.076038 1.431133
X31T5 -.016499 .621382
X32T5 .037395 3.948719
X33T5 .036859 2.740754
X34T5 -3.111253 2.892728
X35T5 -20652.587857 97398.145554
X36T5 -24.799585 45.484270
X37T5 .037395 8.202192
X38T5 .037395 107.601496
X39T5 -.017567 .495606
X40T5 .037395 92.653018
X41T5 -.018878 .493702
91
Table 4.7 Univariate Statistics for data sample one year before bankruptcy
N Mean Std. Deviation Missing No. of Extremesa
Count Percent Low High
X1T1 928 -.436121 4.8265051 0 .0 108 21

X2T1 923 2.267315 5.0382807 5 .5 0 71
X3T1 928 .142327 .3068697 0 .0 8 32
X4T1 928 -19.474041 450.1726461 0 .0 129 2
X5T1 928 -.217274 2.1586339 0 .0 101 13
X6T1 928 1.300405 6.9325752 0 .0 0 43
X7T1 927 -.102521 .9911883 1 .1 112 23
X8T1 928 1.64 14.798 0 .0 0 42
X9T1 928 .491088 .3110961 0 .0 0 2
X10T1 928 -.144232 .9620095 0 .0 143 62
X11T1 928 1.620557 14.7969397 0 .0 1 68
X12T1 928 .159110 .2187743 0 .0 0 101
X13T1 927 -.098013 1.0012761 1 .1 112 24
X14T1 928 -.285414 4.1479253 0 .0 126 26
X15T1 927 -.102521 .9911883 1 .1 112 23
X16T1 928 1.032683 9.3955239 0 .0 1 62
X17T1 927 -.102524 .9911746 1 .1 112 23
x18T1 928 1.028998 9.3953480 0 .0 2 65
x19T1 928 8.819350 116.9838048 0 .0 2 103
X20T1 928 20.108646 3566.1037146 0 .0 127 99
X21T1 928 30.772723 259.7486124 0 .0 0 118
X22T1 928 -.25 2.554 0 .0 . .
X23T1 928 -2.46 28.439 0 .0 . .
X24T1 928 -1.48 17.134 0 .0 . .
X25T1 928 .463984 6.9189794 0 .0 0 58
X26T1 928 .404242 .7750083 0 .0 0 141
X27T1 928 .291357 .5574048 0 .0 0 36
X28T1 928 -15.215299 162.8420164 0 .0 145 38
X29T1 928 1.369037 5.9758064 0 .0 1 117
X30T1 928 .890412 30.0707478 0 .0 113 53
X31T1 928 .499001 8.2881821 0 .0 19 51
X32T1 928 7.685521 132.2526113 0 .0 0 67
X33T1 928 1.276195 6.9909388 0 .0 1 44
X34T1 928 .924524 29.4145612 0 .0 96 115
547133.010729
X35T1 928 8351.470574 0 .0 84 140
2
X36T1 928 7.815929 159.0286135 0 .0 76 107
X37T1 927 8.847431 117.0451446 1 .1 0 104
X38T1 927 39.087343 325.2562168 1 .1 0 136
92
x39T1 928 .164003 .3871066 0 .0 8 35
X40T1 928 26.543483 124.7393648 0 .0 0 127
X41T1 928 .138622 .3487470 0 .0 11 31
a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR).

b. . indicates that the inter-quartile range (IQR) is zero.
Table 4.8 Univariate Statistics for data sample two year before bankruptcy:
X1T2 928 -.126149 10.6946103 0 .0 101 18

X2T2 925 2.755301 6.3077397 3 .3 0 93
X3T2 928 .145116 .2131140 0 .0 5 34
X4T2 928 -3.554374 26.2637603 0 .0 133 1
X5T2 927 -.210058 1.6877541 1 .1 106 16
X6T2 927 1.183268 5.8505294 1 .1 0 29
X7T2 927 -.116987 .7929120 1 .1 119 15
X8T2 928 1.113855 5.8408280 0 .0 0 55
X9T2 928 .497185 .2880474 0 .0 0 0
X10T2 928 -.327446 5.3577094 0 .0 158 81
X11T2 928 1.110447 5.8406352 0 .0 1 54
X12T2 928 .170353 .2279073 0 .0 0 99
X13T2 927 -.113741 .7996759 1 .1 119 16
X14T2 928 -.657221 10.6261883 0 .0 109 22
X15T2 927 -.116987 .7929120 1 .1 119 15
X16T2 928 .666211 4.7486737 0 .0 1 52
X17T2 927 -.116987 .7929120 1 .1 119 15
x18T2 928 .665877 4.7495542 0 .0 1 52
x19T2 928 5.003501 42.0326727 0 .0 2 110
X20T2 928 60.944146 3137.1724015 0 .0 142 89
X21T2 928 44.484094 309.1130122 0 .0 0 122
X22T2 928 -.183365 1.6769597 0 .0 108 20
X23T2 928 -2.683131 22.9349743 0 .0 154 59
X24T2 928 -2.667787 23.6768343 0 .0 154 37
X25T2 928 .289044 1.4165818 0 .0 0 51
X26T2 928 .609075 .9297194 0 .0 0 51
X27T2 928 .465085 1.3492928 0 .0 0 28
X28T2 928 -41.537831 685.2767919 0 .0 148 55
X29T2 928 1.556448 5.5777834 0 .0 1 125
X30T2 928 .756777 21.4480823 0 .0 113 52
X31T2 928 .367848 2.6778065 0 .0 14 56
X32T2 928 44.045620 1237.6237326 0 .0 0 63
93
X33T2 928 1.180883 5.8503652 0 .0 1 29
X34T2 928 20.466627 561.3885151 0 .0 93 95
X35T2 928 -6393.958732 546295.4197156 0 .0 70 138
X36T2 928 -2.303385 192.2746617 0 .0 87 101

X37T2 927 5.027756 42.0527685 1 .1 0 111
X38T2 927 27.958886 132.8239327 1 .1 0 127
x39T2 928 .178540 .4327857 0 .0 5 39
X40T2 928 32.422630 231.9920376 0 .0 0 125
X41T2 928 3.159643 83.4001079 0 .0 5 40
Table 4.9 Univariate Statistics for data sample three year before bankruptcy
X1T3 928 -.152962 1.0385898 0 .0 104 11

X2T3 927 2.749813 5.7115676 1 .1 0 102
X3T3 928 .146560 .1873058 0 .0 4 20
X4T3 928 -1.803658 13.7484274 0 .0 135 3
X5T3 927 -.102524 .9911746 1 .1 112 23
X6T3 927 1.010380 1.1190518 1 .1 0 36
X7T3 927 -.126381 .9023768 1 .1 112 17
X8T3 928 .694191 1.0725480 0 .0 0 50
X9T3 928 .491663 .2826344 0 .0 0 0
X10T3 928 -.006830 .3131531 0 .0 157 96
X11T3 928 .689235 1.0703604 0 .0 0 48
X12T3 928 .168888 .2205183 0 .0 0 82
X13T3 927 -.123669 .9066376 1 .1 112 18
X14T3 928 -.092008 1.0123813 0 .0 113 22
X15T3 927 -.126381 .9023768 1 .1 112 17
X16T3 928 .362830 .8012899 0 .0 1 47
X17T3 927 -.126381 .9023768 1 .1 112 17
x18T3 928 .376288 .6172879 0 .0 2 47
x19T3 928 5.787770 58.1963690 0 .0 2 108
X20T3 928 -4.660024 482.4983818 0 .0 127 99
X21T3 928 23.629440 103.6703401 0 .0 0 122
X22T3 928 -.095571 .9982193 0 .0 110 25
X23T3 928 -4.637831 45.1477117 0 .0 155 46
X24T3 928 -5.496356 50.9316150 0 .0 157 22
X25T3 928 .209726 .3162577 0 .0 0 38
X26T3 928 .830881 6.3655513 0 .0 0 56
94
X27T3 928 .889831 8.3839439 0 .0 0 36
X28T3 928 -8.455995 107.5723951 0 .0 133 70
X29T3 928 2.094125 18.7134149 0 .0 1 124
X30T3 928 1.004896 32.5207291 0 .0 120 56
X31T3 928 .281926 1.6874224 0 .0 12 52
X32T3 928 2.094638 20.2466047 0 .0 0 71
X33T3 928 .978278 1.3907070 0 .0 1 36
X34T3 928 -1.258732 30.1861351 0 .0 96 92
X36T3 928 10.128026 265.9225798 0 .0 59 103
X37T3 927 5.807971 58.2262456 1 .1 0 108
X38T3 927 31.327416 229.7271188 1 .1 0 127
x39T3 928 .151201 .1844662 0 .0 4 22
X40T3 928 35.842833 269.1824970 0 .0 0 124
X41T3 928 3.124092 83.5912666 0 .0 3 28

Table 4.10 Univariate Statistics for data sample four year before bankruptcy
Table 4.10
X1T4 928 -.173175 .9779860 0 .0 121 16

X2T4 925 3.019160 7.1987185 3 .3 0 98
X3T4 928 .145791 .1779401 0 .0 5 23
X4T4 927 -1.470527 10.3967189 1 .1 142 5
X5T4 927 -.116987 .7929120 1 .1 119 15
X6T4 927 1.042635 1.3907275 1 .1 0 31
X7T4 928 1.300405 6.9325752 0 .0 0 43
X8T4 928 .687550 .9687480 0 .0 0 42
X9T4 928 .503605 .2788197 0 .0 0 0
X10T4 928 -.007038 .3408096 0 .0 170 113
X11T4 928 .681218 .9674495 0 .0 0 39
X12T4 928 .183454 .2560843 0 .0 1 80
X13T4 928 1.302593 6.9324967 0 .0 0 43
X14T4 928 -.105828 .8994755 0 .0 128 17
X15T4 928 1.300405 6.9325752 0 .0 0 43
X16T4 928 .369948 .6631395 0 .0 1 39
X17T4 928 1.300405 6.9325752 0 .0 0 43
x18T4 928 .374560 .6326094 0 .0 2 39
x19T4 928 10.155274 151.9705440 0 .0 2 125
X20T4 928 -13.382993 815.4188959 0 .0 125 113
X21T4 928 30.324822 194.2054425 0 .0 0 138
X22T4 928 -.102105 .7818219 0 .0 127 21
X23T4 928 -5.795678 89.3203831 0 .0 164 46
X24T4 928 -6.311462 91.4136545 0 .0 158 30
95
X25T4 928 .218274 .4325690 0 .0 0 33
X26T4 928 .668108 1.2811048 0 .0 0 55
X27T4 928 .392867 .7428170 0 .0 0 24
X28T4 928 -6.885582 100.9061376 0 .0 144 73
X29T4 928 1.818821 6.8452573 0 .0 0 129
X30T4 928 1.139221 31.2651135 0 .0 114 60
X31T4 928 .452752 8.6250310 0 .0 12 56
X32T4 928 1.467498 7.0327693 0 .0 0 69
X33T4 928 1.036377 1.3866071 0 .0 1 31
X34T4 928 -.916501 16.8631678 0 .0 103 104
373743.197126
X35T4 928 26884.588919 0 .0 52 136
3
X36T4 928 4.360924 62.1136661 0 .0 65 96
X37T4 927 10.165556 152.0521921 1 .1 0 124
X38T4 927 35.577803 267.1218918 1 .1 0 121
x39T4 928 .142655 .3950704 0 .0 7 28
X40T4 928 39.247402 219.7461392 0 .0 0 124
X41T4 928 2.616322 64.7991745 0 .0 4 33

Table 4.11 Univariate Statistics for data sample five year before bankruptcy
X1T5 928 -.172782 1.0149037 0 .0 121 20

X2T5 927 2.829966 6.0655194 1 .1 0 87
X3T5 928 .151702 .1815187 0 .0 5 16
X4T5 927 -1.145720 6.3335731 1 .1 139 8
X5T5 927 -.126381 .9023768 1 .1 112 17
X6T5 927 1.040013 1.0276340 1 .1 0 37
X7T5 927 1.183268 5.8505294 1 .1 0 29
X8T5 928 .659396 .8909585 0 .0 0 40
X9T5 928 .502371 .2808699 0 .0 0 0
X10T5 928 .006600 .2795171 0 .0 156 162
X11T5 928 .646685 .8831673 0 .0 1 37
X12T5 928 .176178 .2386720 0 .0 0 66
X13T5 927 1.185012 5.8504283 1 .1 0 28
X14T5 928 -.113455 .9443782 0 .0 122 24
X15T5 927 1.183268 5.8505294 1 .1 0 29
X16T5 928 .378503 .6858182 0 .0 1 45
X17T5 927 1.183268 5.8505294 1 .1 0 29
x18T5 928 .376491 .6925033 0 .0 2 44
x19T5 928 6.138037 77.5741934 0 .0 2 119
X20T5 928 159.759521 5091.7697254 0 .0 129 117
X21T5 928 18.588667 92.6717126 0 .0 0 133
96
X22T5 928 -.116831 .8905469 0 .0 116 21
X23T5 928 -4.435434 55.1402660 0 .0 158 58
X24T5 928 -4.851490 58.2163124 0 .0 159 43
X25T5 928 .204382 .4084191 0 .0 0 32
X26T5 928 .647224 1.1521097 0 .0 0 52
X27T5 928 .572762 6.1248955 0 .0 0 22
X28T5 928 -8.703963 227.1230790 0 .0 140 90
X29T5 928 1.656466 6.3122065 0 .0 1 138
X30T5 928 .734257 19.9489760 0 .0 120 52
X31T5 928 -.041617 11.6041550 0 .0 13 56
X32T5 928 1.307071 5.8451373 0 .0 0 64
X33T5 928 1.025311 1.0571790 0 .0 1 35
X34T5 928 2.744535 70.1411300 0 .0 98 98
426169.224685
X35T5 928 32491.360418 0 .0 50 126
6
X36T5 928 -4.221002 418.1366420 0 .0 60 88
X37T5 927 6.127394 77.6160452 1 .1 1 117
X38T5 927 39.323022 219.8241031 1 .1 0 123
x39T5 928 .161733 .2420099 0 .0 5 17
X40T5 928 46.842724 408.8061708 0 .0 0 119
X41T5 928 .156068 .2768338 0 .0 7 16

Table 5.1 Prediction accuracy of the model starting from year one to five using Decision Trees Model
Classification Table for data one year before Event: Classification Table for data Two years before Event:
Observed Predicted Observed Predicted
Bankrupt Non- Accuracy % Bankrupt Non- Accuracy %
Bankrupt Bankrupt
Bankrupt Bankrupt 279 185 60.0%
340 124 73.27%
Non- Non- 220 244 52.0%
Bankrupt 180 284 61.2% Bankrupt
56.0%
Overall Accuracy % 67.2% Overall Accuracy %
Classification Table for data Three years before Event: Classification Table for data Four years before Event:
Bankrupt Bankrupt
Bankrupt 300 164 64.5% Bankrupt 150 314 32.0%
Non- 123 341 73.5% Non- 24 440 94.8%
Bankrupt Bankrupt
Overall Accuracy % 69.0% Overall Accuracy % 63.0%
Classification Table for data Five years before Event:

Observed Predicted
Bankrupt Non- Accuracy %
Bankrupt
Bankrupt 185 279 39.8%
Non- 20 444 95.6%
Bankrupt
Overall Accuracy % 67.5%
97
Table 5.2 Prediction accuracy of the model starting from year one to five using HP Trees Model
Bankrupt Bankrupt
364 100 78.44%
Non- Non- 220 244 52.9%
68.3%
Overall Accuracy % 61% Overall Accuracy %
Bankrupt Bankrupt
Non- 325 139 70.0 % Non- 225 239 73.0%
Bankrupt Bankrupt

Observed Predicted
Bankrupt
Bankrupt 150 314 32.0%
Non- 44 420 90.2%
Bankrupt
Overall Accuracy % 61.3 %
Table 5.3 Prediction accuracy of the model starting from year one to five using Neural Network Model
Bankrupt Bankrupt
445 19 95.9%
Non- Non- 9 455 97.6%
97.7%
Bankrupt Bankrupt
Non- 34 430 92.0 % Non- 30 434 92.4 %
Bankrupt Bankrupt

Observed Predicted
Bankrupt Non-Bankrupt Accuracy %
Bankrupt 440 24 95.0%
Non-Bankrupt 64 400 86.2%
98
Table 5.4 Prediction accuracy of the model starting from year one to five using Auto Neural Model
Bankrupt Bankrupt
Bankrupt 440 24 94.0 % Bankrupt 463 1 99.5%
Non- 26 439 93.0 % Non- 1 463 99.5%
Bankrupt Bankrupt
99.5%
Bankrupt Bankrupt
Bankrupt 0 464 0 Bankrupt 454 10 97.8%
Non- 0 464 100.00 Non- 9 455 97.6%
Bankrupt Bankrupt

Observed Predicted
Bankrupt
Bankrupt 0 464 0
Non- 0 464 100.0
Bankrupt
Table 5.5 Prediction accuracy of the model starting from year one to five using HP Neural Model
Bankrupt Bankrupt
Bankrupt 420 44 90.0 % Bankrupt 440 24 95.0%
Non- 404 60 12.0 % Non- 464 0 0.0%
Bankrupt Bankrupt
47.25.0%
Bankrupt Bankrupt
Bankrupt 440 24 95.0% Bankrupt 420 44 90.0 %
Non- 225 239 73.0% Non- 52 412 88.9%
Bankrupt Bankrupt

Observed Predicted
Bankrupt 454 10 97.8%
99
Classification Table for data one year before Classification Table for data Two years before
Event: Event:
Bankrupt Non- Accuracy Bankrupt Non- Accuracy
Bankrupt % Bankrupt %
Non- 169 295 63.57 % Non- 392 72 16.6%
Bankrupt Bankrupt
55.15%
Classification Table for data Three years before Classification Table for data Four years before
Event: Event:
Non- 405 59 13.0% Non- 240 224 48.0%
Bankrupt Bankrupt
61.1%
Classification Table for data Five years before

Event:
Observed Predicted
Bankrupt Non- Accuracy
Bankrupt %
Bankrupt 185 279 39.8%
Non- 44 420 90.51%
Bankrupt
Classification Table for data one year before Classification Table for data Two years before
Event: Event:
Non- 169 295 63.57 % Non- 392 72 16.6%
Bankrupt Bankrupt
55.15%
Classification Table for data Three years before Classification Table for data Four years before
Event: Event:
100
Non- 405 59 13.0% Non- 240 224 48.0%
Bankrupt Bankrupt
61.1%
Classification Table for data Five years before

Event:
Observed Predicted
Bankrupt Non- Accuracy
Bankrupt %
Bankrupt 185 279 39.8%
Non- 44 420 90.51%
Bankrupt
Table 5.8 Prediction accuracy of the model starting from year one to five using HP SVM Model
Bankrupt Bankrupt
Non- 230 224 49.13 % Non- 151 313 67.7%
Bankrupt Bankrupt
54.0%
Bankrupt Bankrupt
Non- 151 313 67.7% Non- 163 325 70.0%
Bankrupt Bankrupt
54.2%

Observed Predicted
Bankrupt
Bankrupt 155 309 33.40%
Non- 169 295 63.57%
Bankrupt
101
Bankrupt Bankrupt
454 10 98.0%
Non- Non- 464 0 0.0%
50.0%
Bankrupt Bankrupt
Non- 464 0 0.0% Non- 450 14 3.0 %
Bankrupt Bankrupt

Observed Predicted
Bankrupt
Bankrupt
454 10 98.0%
Non- 450 14 3.0 %
Bankrupt
Table 5.10 Prediction accuracy of the model starting from year one to five using MBR Model
Bankrupt Bankrupt
Non- 165 299 64.47 % Non- 113 351 75.6%
Bankrupt Bankrupt
61.9%
Bankrupt Bankrupt
Non- 139 325 70.0% Non- 114 350 75.4%
Bankrupt Bankrupt

Observed Predicted
Bankrupt 221 243 47.2%
102
Table 5.11 Bankruptcy prediction accuracy using Naïve Bayes Model
Bankrupt Bankrupt
Non- 94 370 79.7 Non- 17 447 96.3%
Bankrupt Bankrupt
51.1%
Bankrupt Bankrupt
Non- 37 427 92.0% Non- 415 49 10.5%
Bankrupt Bankrupt

Observed Predicted
Bankrupt
Bankrupt 431 33 92.8%
Non- 25 439 94.6%
Bankrupt
Table 5.12 Bankruptcy prediction accuracy using BayesNet Model
Bankrupt Bankrupt
Non- 12 452 79.7 Non- 46 420 96.3%
Bankrupt Bankrupt
51.1%
Bankrupt Bankrupt
Non- 7 457 98.0% Non- 254 210 45.2%
Bankrupt Bankrupt
88.0% 51.1%
Overall Accuracy % Overall Accuracy %

Observed Predicted
Bankrupt
Bankrupt 263 201 57.3%
Non- 266 198 43.0%
Bankrupt
103
Table 5.13 Bankruptcy prediction accuracy table using SMO OR SVM Model
Bankrupt Bankrupt
Non- 233 231 49.7% Non- 223 241 51.9%
Bankrupt Bankrupt
55.1%
Bankrupt Bankrupt
Non- 204 260 56.2% Non- 210 254 54.7%
Bankrupt Bankrupt
59.4% 53.2%

Observed Predicted
Bankrupt
Bankrupt 258 206 55.6%
Non- 251 213 45.9%
Bankrupt
Table 5.14 Bankruptcy prediction accuracy table using RBFNetwork Model
Bankrupt Bankrupt
Non- 247 217 46.7% Non- 95 369 79.5%
Bankrupt Bankrupt
77.5%
Bankrupt Bankrupt
Non- 303 161 34.6% Non- 174 290 62.5%
Bankrupt Bankrupt
63.5% 55.7%

Observed Predicted
Bankrupt
Bankrupt 441 23 95.0%
Non- 85 379 81.7%
Bankrupt
104
Table 5.15 Bankruptcy prediction accuracy table using KSTAR Model
Bankrupt Bankrupt
Bankrupt 464 0 100% Bankrupt 233 231 50.2%
Non- 0 464 100% Non- 244 220 47.4%
Bankrupt Bankrupt
49.8%
Overall Accuracy % 100% Overall Accuracy %
Bankrupt Bankrupt
Non- 211 253 54.5% Non- 227 237 51.0%
Bankrupt Bankrupt
50.3% 50.4%

Observed Predicted
Bankrupt
Bankrupt 245 219 52.8%
Non- 243 221 47.6%
Bankrupt
Table 5.16 Bankruptcy prediction accuracy table using LWL Model
Bankrupt Bankrupt
Non- 362 102 21.9% Non- 247 217 46.7%
Bankrupt Bankrupt
50.3%
Bankrupt Bankrupt
Non- 327 137 29.5% Non- 20 444 95.7%
Bankrupt Bankrupt
52.1% 93.6%

Observed Predicted
Bankrupt
Bankrupt 49 415 10.6%
Non- 58 406 87.5%
Bankrupt
105
Table 5.17 Bankruptcy prediction accuracy table using AdaBoostM1 Model
Bankrupt Bankrupt
Non- 167 297 64.0% Non- 167 297 64.0%
Bankrupt Bankrupt
57.0%
Bankrupt Bankrupt
Non- 291 173 37.2% Non- 348 116 25.0%
Bankrupt Bankrupt
56.2% 54.4%

Observed Predicted
Bankrupt
Bankrupt 213 251 45.90%
Non- 233 231 49.8%
Bankrupt
Table 5.18 Bankruptcy prediction accuracy table using ClassificationviaRegression Model
Bankrupt Bankrupt
Non- 144 320 68.9% Non- 156 308 66.4%
Bankrupt Bankrupt
47.8%
Bankrupt Bankrupt
Non- 125 339 73.06% Non- 100 364 78.5%
Bankrupt Bankrupt
67.75% 48.22%

Observed Predicted
Bankrupt
Bankrupt 115 349 24.7%
Non- 50 414 89.22%
Bankrupt
106
Table 5.19 Bankruptcy prediction accuracy table using Decorate Model
Bankrupt Bankrupt
Non- 198 266 57.3% Non- 406 58 12.5%
Bankrupt Bankrupt
52.6%
Bankrupt Bankrupt
Non- 93 371 79.9% Non- 225 239 51.5%
Bankrupt Bankrupt
51.8% 53.4%

Observed Predicted
Bankrupt
Bankrupt 409 55 88.2%
Non- 387 77 16.6%
Bankrupt
Table 5.20 Bankruptcy prediction accuracy table using Dagging Model
Bankrupt Bankrupt
Non- 120 344 74.4% Non- 84 380 81.9%
Bankrupt Bankrupt
59.9%
Bankrupt Bankrupt
Non- 86 378 81.5% Non- 134 330 71.2%
Bankrupt Bankrupt
63.06% 61.9%

Observed Predicted
Bankrupt
Bankrupt 200 264 43.10%
Non- 72 392 84.5%
Bankrupt
107
Table 5.21 Bankruptcy prediction accuracy table using ogisticBoost Model
Bankrupt Bankrupt
Non- 256 208 44.8% Non- 225 239 51.5%
Bankrupt Bankrupt
62.7%
Bankrupt Bankrupt
Non- 136 328 70.9% Non- 366 98 21.2%
Bankrupt Bankrupt
69.8% 45.5%

Observed Predicted
Bankrupt
Bankrupt 280 184 60.3%
Non- 129 335 72.2%
Bankrupt
Table 5.22 Bankruptcy prediction accuracy table using MultiBoostAB Model
Bankrupt Bankrupt
Non- 308 156 33.6% Non- 306 158 34.0%
Bankrupt Bankrupt
52.15%
Bankrupt Bankrupt
Non- 250 214 46.1% Non- 51 413 95.2%
Bankrupt Bankrupt
57.7% 87.7%

Observed Predicted
Bankrupt
Bankrupt 287 177 61.8%
Non- 230 234 50.31%
Bankrupt
108
Table 5.23 Bankruptcy prediction accuracy table using Random Committee Model
Bankrupt Bankrupt
Non- 225 239 51.5% Non- 243 221 47.6%
Bankrupt Bankrupt
49.5%
Bankrupt Bankrupt
Non- 212 252 54.3% Non- 243 221 47.6%
Bankrupt Bankrupt
51.9% 50.4%

Observed Predicted
Bankrupt
Bankrupt 247 217 53.2%
Non- 243 221 47.6%
Bankrupt
Table 5.24 Bankruptcy prediction accuracy table using HyperPipes Model
Bankrupt Bankrupt
Non- 92 372 80.2% Non- 92 372 80.2%
Bankrupt Bankrupt
48.50%
Bankrupt Bankrupt
Non- 91 373 80.3% Non- 92 372 80.2%
Bankrupt Bankrupt
48.60% 49.3%

Observed Predicted
Bankrupt
Bankrupt 75 389 16.2%
Non- 104 360 77.6%
Bankrupt
109
Table 5.25 Bankruptcy prediction accuracy table using NNge Model
Bankrupt Bankrupt
Non- 238 226 48.7% Non- 258 206 44.3%
Bankrupt Bankrupt
50.0%
Bankrupt Bankrupt
Non- 247 217 46.7% Non- 271 193 41.59%
Bankrupt Bankrupt
52.3% 44.3%

Observed Predicted
Bankrupt
Bankrupt 244 220 53.5%
Non- 259 205 44.1%
Bankrupt
Table 5.26 Bankruptcy prediction accuracy table using OneR Model
Bankrupt Bankrupt
Non- 15 449 96.7% Non- 17 447 97.5%
Bankrupt Bankrupt
51.02%
Bankrupt Bankrupt
Non- 15 449 96.7% Non- 15 449 96.7%
Bankrupt Bankrupt
51.3% 51.02%

Observed Predicted
Bankrupt
Bankrupt 25 439 5.3%
Non- 24 440 94.8%
Bankrupt
110
Table 5.27 Bankruptcy prediction accuracy table using ZeroR Model
Bankrupt Bankrupt
Non- 188 276 59.5% Non- 188 276 59.5%
Bankrupt Bankrupt
49.5%
Bankrupt Bankrupt
Non- 188 276 59.5% Non- 188 276 59.5%
Bankrupt Bankrupt
49.5% 49.5%

Observed Predicted
Bankrupt
Bankrupt 184 280 39.6%
Non- 188 276 59.5%
Bankrupt
Table 5.28 Bankruptcy prediction accuracy table using Random Forest Model
Bankrupt Bankrupt
Non- 245 219 47.2% Non- 157 307 66.16%
Bankrupt Bankrupt
49.4%
Bankrupt Bankrupt
Non- 202 262 56.4% Non- 220 244 47.0%
Bankrupt Bankrupt
49.2% 47.05%

Observed Predicted
Bankrupt
Bankrupt 170 294 36.6%
Non- 166 298 64.2%
Bankrupt
111
Table 5.29 Bankruptcy prediction accuracy table using J48 Model
Bankrupt Bankrupt
Non- 274 190 40.9% Non- 202 262 56.4%
Bankrupt Bankrupt
48.6%
Bankrupt Bankrupt
Non- 136 328 70.6% Non- 242 222 48.0%
Bankrupt Bankrupt
49.5% 51.0%

Observed Predicted
Bankrupt
Bankrupt 231 233 49.7%
Non- 222 242 52.1%
Bankrupt
Table 5.30 Bankruptcy prediction accuracy table using SimpleCart Model
Bankrupt Bankrupt
Non- 417 47 10.1% Non- 326 138 29.74%
Bankrupt Bankrupt
49.87%
Bankrupt Bankrupt
Non- 415 49 10.5% Non- 323 141 30.4%
Bankrupt Bankrupt
50.15% 58.4%

Observed Predicted
Bankrupt
Bankrupt 398 66 85.7%
Non- 365 99 21.3%
Bankrupt
112
Table 5.31 Bankruptcy prediction accuracy table using END Model
Bankrupt Bankrupt
Non- 274 190 40.9% Non- 256 208 44.8%
Bankrupt Bankrupt
52.4%
Bankrupt Bankrupt
Non- 274 190 40.9% Non- 222 242 52.2%
Bankrupt Bankrupt
54.1% 51.0%

Observed Predicted
Bankrupt
Bankrupt 297 167 64.0%
Non- 274 190 40.9%
Bankrupt
Table 5.32 Bankruptcy prediction accuracy table using MLP neural network Model
Bankrupt Bankrupt
463 0 100.0%
Non- Non- 58 265 82.0%
86.2%
Bankrupt Bankrupt
Non- 29 293 91.0% Non- 185 137 42.5%
Bankrupt Bankrupt
58.7%

Observed Predicted
Bankrupt 108 228 32.1%
113
Table 5.33 Bankruptcy prediction accuracy table using CHAID Model
Bankrupt Bankrupt
Bankrupt Bankrupt
366 98 78.9% 194 270 41.8%
Non- Non-
56.1%
Bankrupt Bankrupt
Bankrupt Bankrupt
396 68 85.3% 60 404 12.9%
Non- Non-
56.0%
Overall Accuracy % Overall Accuracy % 53.0%

Observed Predicted
Bankrupt
Bankrupt
275 189 59.3%
Non-
Bankrupt 171 293 63.1%
Table 5.34 Bankruptcy prediction accuracy table CHAID Exhaustive Model
Bankrupt Bankrupt
Bankrupt Bankrupt
302 162 65.1% 300 164 65.0%
Non- Non-
82.2%
Bankrupt Non-Bankrupt Accuracy % Bankrupt Non-Bankrupt Accuracy %
Bankrupt Bankrupt
349 115 75.2% 60 404 12.9%
Non-Bankrupt Non-Bankrupt
301 163 35.1% 32 432 93.1%
55.2%

Observed Predicted
Bankrupt
424 40 91.4%
Non-Bankrupt
273 191 41.2%
Overall Accuracy %
66.3%
114
Table 5.35 Bankruptcy prediction accuracy table CART Model
Bankrupt Bankrupt
Bankrupt Bankrupt
394 70 84.9% 335 129 72.2%
Non- Non-
Bankrupt Bankrupt
Bankrupt Bankrupt
400 64 86.2% 389 75 83.8%
Non- Non-
56.2%

Observed Predicted
Bankrupt
Bankrupt
441 23 95.0%
Non-
Bankrupt 416 48 10.3%
Overall Accuracy %
52.7%
Table 5.36 Bankruptcy prediction accuracy table QUEST Model
Bankrupt Bankrupt
Bankrupt Bankrupt
404 60 88.1% 298 166 65.0%
Non- Non-
Bankrupt Non-Bankrupt Accuracy % Bankrupt Non-Bankrupt Accuracy %
Bankrupt Bankrupt
200 264 56.2% 0 464 0.0%
Non-Bankrupt Non-Bankrupt
0 464 100.0% 0 464 100.0%
78.2%

Observed Predicted
Bankrupt
464 0 100.0%
Non-Bankrupt
464 0 0.0%
Overall Accuracy %
50.0%
115
Table 5.37 Bankruptcy prediction accuracy table K-NN Model
Classification table one year before Event Classification table two years before Event
Classification table three year before Event Classification table four years before Event
Classification table five years before Event
116
Appendix B
Figure 5.4 Model Decision Trees
117
Figure 5.5 Model HP Tree
118
Figure 5.6 Neural Network Model
119
Figure 5.7 Auto Neural Model
120
Figure 5.8 HP Neural Model
121
Figure 5.9 DMNeural Model
122
Figure 5.10 Regression Model
123
Figure 5.11 HP SVM Model
124
Figure 5.12 HP Regression Model
125
Figure 5.13 Memory Based Reasoning Model
126
View publication stats

Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Bankruptcy Prediction Using Data Mining Classiﬁcation Techniques

Thesis · March 2018

Bankruptcy Prediction Using Data Mining Classification Techniques View project

The user has requested enhancement of the downloaded file.

Bankruptcy Prediction Using

I am immensely thankful to my supervisor, Prof. Farid Meziane, who has guided me

Given the economic and financial consequences of bankruptcy to companies, it is not a

1.3 Objectives of the thesis

The major objectives of this thesis are:

Provide a comprehensive literature review of various statistical and machine learning

2.3 Uni-variate or Linear statistical methods

1. Optimal cut-off point for each ratio.

Name of the Special features of the study

Altman (1968) specified the discriminant function of a firm as follows.

Where 𝑉1, 𝑉2, 𝑉3 , …………………………. 𝑉𝑛 are Discriminant Coefficients.

And 𝑋1 , 𝑋2 , 𝑋3 , …………………………….𝑋𝑛 are Independent Variables.

No. of No. of Firms Used Accuracy Special feature(s).

2.5 Probability, Regression, Logistic and factor analysis models

2.5.1 Linear probability model

2.5.2 Conditional probability models

1. The size of the company.

Industry No of Classification Accuracy Results

Pantalone and Bank 5 Failed banks-86.7% and non-failed banks 83.4%

Gilbert, Menon General 6 Bankrupt firms-29.2% to 62.5% and non-bankrup

Agarwal (1993) General 5 Bankrupt firms-40% to 80% and non-bankrup firms

Dimitras, Greek firms 12 Bankrupt firms-63.2 and non-bankrup firms 84.2%.

2.6.1 Neural Networks

Input Layer Hidden Layer Output Layer

2.6.2 Decision trees

2.6.3 Support Vector Machines

2.6.4 Fuzzy logic

2.6.5 Rough Sets

2.6.6 Case based reasoning

2.7 Other Methods

Furthermore, for a comprehensive literature about statistical methods for bankruptcy

3.2 Financial Distress

Financial distress is also known by Bankruptcy and liquidation in different studies. If a

It is important for a financially distressed company to start renegotiating to reach at a better

3.1.1 Stages of Financial Distress

1. Social Environment (Sevil et al.(1997) and (Tezcan, 2002))

3.1.3 Causes of Financial Distress

1. Bad performance due to expanded industry.

1. solvency in the beginning,

A variety of definitions have appeared to explain failure or bankruptcy. From a financial

1. Real costs endured personally by the bankrupt firm.

3.1.3 Determining cost of bankruptcy

𝑃𝐷𝑉 = 𝐿𝐶𝐷 + 𝑇𝐷𝐶 + 𝑁𝑉𝑅

Where 𝑃𝐷𝑉 = 𝑃𝑟𝑒 − 𝑑𝑖𝑠𝑡𝑟𝑒𝑠𝑠 𝑣𝑎𝑙𝑢𝑒

𝑇𝐷𝐶 = 𝐶𝑜𝑠𝑡 𝑒𝑛𝑑𝑢𝑟𝑒𝑑 𝑏𝑦 𝑐𝑙𝑎𝑖𝑚𝑠

𝑁𝑉𝑅 = 𝑁𝑒𝑡 𝑉𝑎𝑙𝑢𝑒 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑

3.1.4 Direct costs of bankruptcy endured by the firm

3.1.5 Indirect costs of bankruptcy endured by the firm

4.2 Importance of Data sample

The inactive companies are further subdivided into two classifications:

FAME database contains financial information of approximately 3,147,877 active and

4.4 Selection of Ratios

1. Missing completely at random:

1. Do not disturb it and treat it like other data values.

4.5.2.1 Solution of outliers

4.6 Descriptive Statistics of data samples

5.3 SAS Enterprise miner and its predictive modelling

Open SAS Enterprise