You are on page 1of 17

See

discussions, stats, and author profiles for this publication at:


https://www.researchgate.net/publication/46546654

An evaluation of alternative scoring


models in private banking

Article in The Journal of Risk Finance · January 2009


DOI: 10.1108/15265940910924481 · Source: RePEc

CITATIONS READS

20 144

1 author:

Hussein A Abdou
University of Huddersfield
24 PUBLICATIONS 216 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Hussein A Abdou
letting you access and read them immediately. Retrieved on: 11 May 2016
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1526-5943.htm

JRF
10,1 An evaluation of alternative
scoring models in private banking
Hussein A. Abdou
38 Salford Business School, Salford University, Salford, UK

Abstract
Purpose – This paper aims to investigate the efficiency and effectiveness of alternative
credit-scoring models for consumer loans in the banking sector. In particular, the focus is upon the
financial risks associated with both the efficiency of alternative models in terms of correct
classification rates, and their effectiveness in terms of misclassification costs (MCs).
Design/methodology/approach – A data set of 630 loan applicants was provided by an Egyptian
private bank. A two-thirds training sample was selected for building the proposed models, leaving a
one-third testing sample to evaluate the predictive ability of the models. In this paper, an investigation
is conducted into both neural nets (NNs), such as probabilistic and multi-layer feed-forward neural
nets, and conventional techniques, such as the weight of evidence measure, discriminant analysis and
logistic regression.
Findings – The results revealed that a best net search, which selected a multi-layer feed-forward net
with five nodes, generated both the most efficient classification rate and the most effective MC. In
general, NNs gave better average correct classification rates and lower MCs than traditional techniques.
Practical implications – By reducing the financial risks associated with loan defaults, banks can
achieve a more effective management of such a crucial component of their operations, namely, the
provision of consumer loans.
Originality/value – The use of NNs and conventional techniques in evaluating consumer loans
within the Egyptian private banking sector utilizes rigorous techniques in an environment which
merits investigation.
Keywords Banking, Financial risk, Scoring procedures (tests)
Paper type Research paper

Introduction
Recently, the role of effective management of different financial and operational risks is
especially important for bankers who have come to realize that banking operations
affect and are affected by economic and social environmental risks that they face. It is
believed that the Egyptian banking sector has been “tough” since 1999, and is “expected
to remain so” for the performance of the banking sector in Egypt has shown an “ongoing
profitability weakness due to revenue pressure” and a high incidence of problem loans
(Oldham and Young, 2004; CBE, 2006/2007). Therefore, the role that scoring techniques
can play is critical in helping to reduce the current and/or the expected risk they face,
because of an inadequate risk-reduction through efficient diversification.
Especially, with the fast growth in the credit industry and the huge loan portfolio
management, credit scoring is regarded as a one the most important techniques in banks
and has become a very critical tool during recent decades. A number of credit-scoring
The Journal of Risk Finance models have been developed to evaluate the credit risk process of both new and existing
Vol. 10 No. 1, 2009
pp. 38-53 loan clients. Scoring techniques assess, and therefore help to decide, “who will get credit,
q Emerald Group Publishing Limited
1526-5943
how much credit they should get, and what operational strategies will maintain the
DOI 10.1108/15265940910924481 profitability of the borrowers to the lenders” (Thomas et al., 2002; Hand and Jacka, 1998).
Statistical techniques, such as discriminant analysis (DA), regression analysis, and Alternative
logistic regression (LR), used in building the scoring models have been examined scoring models
(Crook et al., 2007; Greene, 1998; Altman et al., 1994; Orgler, 1971). A few studies have
explored the scoring models using the weight of evidence (WOE) measure, or in terms
of poor as good or good and bad credit, also results were comparable with those from
other techniques (Banasik et al., 2003; Bailey, 2001; Siddiqi, 2006). The evaluation of
new client’s loans has attracted some attention in the last ten decades and it is 39
considered as one of the most critical applications of credit-scoring models (Sarlija et al.,
2004; Chen and Huang, 2003; Malhotra and Malhotra, 2003).
The neural network models have the highest average correct classification (ACC)
rate when compared with other traditional techniques, such as DA and LR, taking into
accounts that results were very close. A few credit-scoring models have investigated
the use probabilistic neural nets (PNNs) (Ganchev et al., 2007; Zekic-Susac et al., 2004;
Masters, 1995), whilst many scoring models applying multi-layer feed-forward nets
(MLFNs) have been used (Erbas and Stefanou, 2008; Dimla and Lister, 2000; West,
2000; Reed and Marks, 1999; Desai et al., 1996; Bishop, 1995).
Other statistical models, as well as neural networks and fuzzy algorithms have been
used in building scoring models (Tsai and Wu, 2008; Anderson, 2007; Crook et al., 2007;
Nur Ozkan-Gunay and Ozkan, 2007; Yu et al., 2007; Seow and Thomas, 2006; Hand
et al., 2005; Zhang and Bhattacharyya, 2004). Furthermore, comparisons between
conventional and advanced statistical scoring techniques have been investigated as
well (Lee and Chen, 2005; Ong et al., 2005; Lee et al., 2002; Desai et al., 1997).
Comparisons have also been extended to include other neural networks, such as
back-propagation and feed-forward nets (Min and Lee, 2008; Malhotra and Malhotra,
2003; Arminger et al., 1997). Neural network models are better representations of data
than LR and CART, as shown by the statistical association measures (Zekic-Susac et al.,
2004; Zhang et al., 1999), while other traditional techniques, such as DA, in general, as
noted by Liang (2003) has a better classification ability but worse prediction ability,
whereas LR has a moderately better prediction ability.
The chosen environment is the Egyptian banking sector. Indeed, from the review of
literature to date, the author was not aware of other studies in Egypt in covering
credit-scoring techniques. Therefore, the intention was to cover this gap, which was found
in the Egyptian banking sector. This paper is organized as follows: part two details the
research methodology and data collection. Part three explains the results. Finally, part
four concludes the results of the study and suggests areas for future research.

Methodology and data collection


This part begins with a description of four statistical techniques used in building credit
scoring. The first model is the WOE measure, which is one of the earliest techniques used
in credit scoring. The second model is the DA model, which was first proposed by Fisher
(1936) as a discrimination and classification technique. Third, the LR model, unlike other
conventional statistical techniques, can suit different kinds of distribution functions and
is more suitable for credit-scoring problems. Then, neural nets (NNs), one of the best
statistical techniques used in building the scoring models, is regarded as a practical
technology, with successful applications in many fields in financial institutions
especially banks (Bishop, 1995; Masters, 1995). Here, two different nets, PNNs and
MLFNs with four nodes (selected automatically by the software design) were utilized in
JRF this paper and the best net search (BNS), from a MLFN with two to six nodes and from a
10,1 PNN as well, was an option selected in the current package. The advantage of selecting
the BNS, current package tests all checked net configurations, including PNNs and
MLFNs with node counts in the entered minimum-maximum range, from two to six
nodes, which means more alternative models in the training and testing process.
Later, the data collection method and the identification of variables will be
40 discussed. The applied validation technique used in this paper is based on a training
sample (389 cases; 67 per cent) and a testing sample (192 cases; 33 per cent). The testing
sample tests the predictive effectiveness of the fitted model.

Weight of evidence measure


One of the earliest measures used in credit-scoring models is WOE measure. It depends
on the odds ratio of good scores expressed as a proportion of bad scores. Information
odds (IO) are the ratio of the proportions; therefore, it is used to make inferences about
the difference between two distributions without the effect of changes to the overall
population. Therefore, we have the first equation:
IO ¼ Goods sub-classification as percent=Bads sub-classification as percentage:

WOE can be calculated from the IO using the logarithmic function, which can be
considered as raw scores, as follows:
WOE ¼ lnðIOÞ:
To determine the Point Score, or WOE Score, the following equation might be used:
X  P  
Point Score ¼ £ Rw £ ½WOE þ c ;
lnð2Þ
where, P – is the score at which the odds are doubled; Rw – is the correlation coefficient
(from a multiple regression) between the respective variable and the WOE for the
variable; c – is a constant applied to each variable (Bailey, 2001; Siddiqi, 2006).

Discriminant analysis
However, as to the statistical assumptions implicit in implementation, DA requires the
data to be independent and normally distributed. Consequently, the general formula
of DA is as follows:
Z ¼ a þ b 1 X 1 þ b2 X 2 þ · · · þ bn X n ;
where Z represents the discriminant Z-score, a is the intercept term, and bi represents
the respective coefficient in the linear combination of explanatory variables, Xi, for
i ¼ 1, . . . , n (Lee et al., 2002).

Logistic regression
LR is a widely used statistical modeling technique in which the probability of a binary
outcome (zero or one) is related to a set of potential predictor variables in the form:
 
p
ln ¼ a þ b1 X 1 þ b 2 X 2 þ · · · þ b n X n ;
12p
where p is the probability of the outcome of interest, a is the intercept term, and bi Alternative
represents the respective coefficient in the linear combination of explanatory variables, scoring models
Xi, for i ¼ 1, . . . , n. The dependent variable is the logarithm of the odds
{ln½p=ð1 2 pÞ}, which is the logarithm of the ratio of two probabilities of the
outcome of interest (Lee et al., 2002).

Probabilistic neural nets 41


An implementation of statistical techniques, called kernel DA, in which the processes
are structured into a MLFN with four layers, is a PNN. Therefore, a PNN is
predominantly a classifier mapping inputs to a number of classifications, and then
might be imposed into more general function (as shown in Figure 1).
By introducing a case to the PNN, each node in the first layer “pattern layer”
calculate the distance between the input case and the training case reintroduced by the
node. And then, the value pass to the second layer “summation layer” node, which is a
function in the distance in the same time smoothing factors, taking into account that
each input has its own smoothing factor. One node per dependant category/variable is
in the second layer; each node sums up the output values for the nodes corresponding
to the training cases in that category. The second layer output values can be
interpreted as probability function predictions for each class. Finally, the category with
the highest probability function value selected by the output node is the estimated
category (Bishop, 1995; Masters, 1995).

Multi-layer feed-forward nets


In situations of complex relationships between variables, it may be advisable to model a
system using MLFNs (multi-layer perceptron networks). The behavior of the net which
is determined by: the structure of the net in terms of numbers of nodes and hidden layers;
parameters associated with connections and neurons; and conversion functions for each
neuron, which map inputs to outputs, has been explained by Palisade Corporation
(2005). The output at a given level (layer) may be expressed as a connection-weighted
summation of outputs from a previous level (layer) plus a neuron-bias. A sigmoid
function, which is also employed in a LR, is sometimes used in NNs. However, in the

Probabilistic Neural Net Structure Multi-Layer Feed-forward Net Structure

Figure 1.
PNN and MLFN
Inputs Pattern Summation Output Inputs 1st
hidden 2ndhidden Output structures
Layer Layer Layer Layer
JRF Neural Tools software the sigmoid function is not utilized (Figure 1). The reason is to
10,1 avoid a restriction on outputs values, to create a superb model for training purposes.

Data collection and proposed variables


The structure of the banking system in Egypt includes: first, public sector banks
(six banks), second, private and joint venture banks (27 banks), and third, branches of
42 foreign banks (six banks) (CBE, 2007/2008). In order to build the proposed scoring
models, a consumer loans data-set were provided by one of the commercial banks in
Egypt, comprising 630 cases reduced by 49 cases to account for rejected applications.
The final data set consists of 433 good loans and 148 bad loans. It should be
emphasized that this data set is pertinent because of the large number of bad loans
(25.5 per cent) with good loans (74.5 per cent). Each bank customer in this data-set are
linked to 20 independent variables, in addition to the dependent variable, which is loan
quality explained by two values, good credit – 1 and bad credit – 0, as shown in
Table I. Some variables had identical values for all cases and hence were excluded, e.g.
loan duration, utility bill, and credit card. In addition, all clients must have a CBE
report from the Central Bank of Egypt, which provides a comprehensive history of the
clients’ dealings with all banks in Egypt. Finally, selected variables for the proposed
models were reduced to 12 variables, as shown in Table I.

Empirical results
In order to run the proposed models, SPSS 15.00, STATGRAPHICS Plus 5.1, and
Neural Tools software were used in this research. The detailed credit scoring results

Predictor variables Code

X1 Loan amounta LOAN AMO


X2 Loan duration LOAN DUR
X3 Companya COMP
X4 Branch BRAN
X5 Gendera GENDER
X6 Marital statusa MAR STA
X7 Agea AGE
X8 Monthly salarya MON SAL
X9 Additional incomea ADD INC
X10 House owned or renteda HOR
X11 House rent . loan tenure HRLT
X12 Home telephonea TELE
X13 Utility bill UTI BILL
X14 Title/position TIT
X15 Education levela EDU
X16 Loans from other banksa LFOB
X17 Relation with other banks RWOB
X18 Credit card status CC STA
X19 Corporate guaranteea COR GUAR
Table I. X20 Other guarantors OTH GUAR
List of predictor variables Y Loan quality (dependent variable)a LOAN QUA
proposed in building the
credit-scoring models Note: aVariables finally selected in building the scoring models
using the above-mentioned modeling techniques can be summarized as follows. Alternative
Because of the high correlation between the loan amount and monthly salary, 0.963, an scoring models
orthogonalization test has been used to keep the effect of both in the proposed models
because of their potential importance. The revised correlation, after running the test,
was 0.269; all other variables had correlations within an acceptable range.

Weight of evidence measure 43


In total, 12 variables were used in building the WOE scoring models in this section,
based on the training sample, including one weak independent variable (COMP) and on
poor independent variable (HOR). These two variables were kept in the final analysis
because of their potential importance and for the comparison purposes with other
techniques. As shown in Table II, a 49.61 per cent ACC rate was found in the training
sample using WOE model, while it was 51.04 per cent ACC rates in the testing sample,
for which the data played no role in building the model, respectively.
For the purpose of improving the proposed model, three more trial-applications
were developed, as a form of sensitivity analysis, taking into account that fact that 139
out of total accepted (630 – 49) cases had a corporate guarantee, which means there
was effectively no such chance for any of them to suffer default. As a result, three trials
have been investigated (WOET1, WOET2, and WOET3) to improve the classification
results for the WOE scoring model.
It can be observed for results in Table II that the ACC rates for testing and training
samples using WOET1, all the 139 COR GUAR cases’ score did not include in the total
score, were 65.10 and 63.75 per cent, respectively. Using WOET2, for which all the 139
COR GUAR cases in both training sample (91 cases) and testing sample (48 cases) were
excluded from these samples, the ACC rates were 71.81 per cent using training
sample, and 74.31 per cent using testing sample, as revealed in Table II. Finally, using
WOET3 for which the excluded 139 COR GUAR cases have been added back to the

Training sample Testing sample


Sample model G B T Overall percentage G B T Overall percentage

WOE
G 95 196 291 32.65 48 94 142 33.80
B 0 98 98 100.00 0 50 50 100.00
T 389 49.61 192 51.04
WOET1
G 154 137 291 52.92 80 62 142 56.34
B 4 94 98 95.92 5 45 50 90.00
T 389 63.75 192 65.10
WOET2
G 120 80 200 60.00 62 32 94 65.96
B 4 94 98 95.92 5 45 50 90.00
T 298 71.81 144 74.31
WOET3
G 211 80 291 72.51 110 32 142 77.46 Table II.
B 4 94 98 95.92 5 45 50 90.00 Classification results
T 389 78.41 192 80.73 for the WOEs: predictions
(in columns) versus
Notes: Cut-off point 0.50. G ¼ good; B ¼ bad; T ¼ total observations (in rows)
JRF WOET2 samples, Table II summarizes classification results for this model. The ACC
10,1 rates for training and testing samples were 78.41 and 80.73 per cent, respectively.

Discriminant analysis
Taking into account that DA credit-scoring models were designed to develop
discriminating functions, which can help predict the dependent variable? All the 12
44 predicted variables were entered. The one discriminating function with a P-value of
0.0000 was statistically significant at the 95 per cent confidence level. Table III
summarizes classification results for DA model. It can be observed that the ACC rate
for the training sample was 87.40 per cent, depending on 0.5 prior probabilities for
groups; while an 85.42 per cent ACC rate was found using the holdout sample.

Logistic regression
Four variables, namely ADD INC, HOR, MAR STA, and GENDER were not significant
at the 90 per cent confidence level. They were kept in the final model because of their
potential importance and for the comparison purposes with other techniques. Because
the P-value for the LR model was less than 0.01, there was a statistically significant
relationship between the variables at the 99 per cent confidence level. In addition, the
P-value for the residuals was greater than or equal to 0.10, indicating that the model
was not significantly worse than the best possible model for this data at the 90 per cent
or higher confidence level. Table III summarizes the results of the LR credit-scoring
model, using the original 12 independent variables. It can be observed that the ACC
rates were 88.95 and 82.81 per cent for training and testing samples, respectively.

Probabilistic neural net


To build the PNN model the same training sample, for which the data used in building
the model, with the same 12 predictive variables were used. As shown in Table IV,
using training sample and testing sample, for which the data played no role in building
the model, the ACC rates were 96.40 and 83.85 per cent, respectively.

Multi-layer feed-forward net


Following the same procedures, which were used with PNN model, MLFN model
was developed, using four nodes (that was selected automatically as a part of
the software design), applying the same training and testing samples using the

Training sample Testing sample


Sample model G B T Overall percentage G B T Overall percentage

DA
G 254 37 291 87.29 125 17 142 88.03
B 12 86 98 87.76 11 39 50 78.00
T 389 87.40 192 85.42
Table III. LR
Classification results for G 272 19 291 93.47 135 7 142 95.07
the DA and LR; B 24 74 98 75.51 26 24 50 48.00
predictions (in columns) T 389 88.95 192 82.81
versus observations
(in rows) Note: Cut-off point 0.50
same predictor variables. A 97.69 and 89.06 per cent ACC rates were found Alternative
applying training and testing samples, respectively, as revealed in Table IV. scoring models
Powerful neural net models
The main difference between PNN and MLFN and powerful NN models was that for the
former, the data cases in both training and testing samples were selected judgmentally
by dividing the whole data-set into a 33 per cent testing sample and a 67 per cent training 45
sample (the same samples data-sets were used with other scoring techniques used in this
paper namely, WOE, DA, and LR). By contrast, for the later, the data cases in the testing
and the training samples were automatically selected by the Neural Tools software,
applying a different 33 per cent data-set as a testing sample and a different 67 per cent
data-set as a training sample for each trial.
The experiment was repeated several times, the reason for repeating the process was
to investigate whether different results, in terms of average correct classification rate,
were being achieved because of the random selection procedure as part of the software
design. Finally, there is evidence of significant differences between the powerful neural
net models, as described below. Following are the classification results for the best
powerful NN models under each of the 20 PNN, MLFN and BNS models, respectively.
From results revealed in Table V, PNN3 was (trial number three applying PNNs) the
best model (based on the highest overall testing ACC rate), between all the 20 powerful
PNN models, achieving a 94.09 per cent and an 89.58 per cent ACC rates applying
training and testing samples, respectively. While three different MLFNs were the best
(based on the highest predictive ability) between the 20 powerful NN models, MLFN5,
MLFN7 and MLFN10. ACC rates in the training samples were 95.12, 96.66 and 96.40 per
cent for MLFN5, MLFN7 and MLFN10, respectively; and an 86.98 per cent ACC rate for
the three nets in the testing samples, as shown in Table V.
As explained earlier PNN and MLFN from two to six nodes, this was an option
under BNS, as a part of the currently used software, were applied in this paper as well.
It can be observed from Table V that BNS4-MLFN-5N (BNS4-MLFN-5N means trial
number four under the BNS with MLFN selecting five nodes as a best net) was the best
model, between all the 20 powerful BNS models. A 94.60 and 91.15 per cent ACC rates
were found in the training and testing samples, respectively, (for more details
regarding all models including all trials, see the Appendix).
As it shown in Table VI, there is evidence of significant differences between NN
models; the ANOVA F-ratio was 13.04. It was significant at 99 per cent confidence level.

Training sample Testing sample


Sample model G B T Overall percentage G B T Overall percentage

PNN
G 288 3 291 98.97 135 7 142 95.07
B 11 87 98 88.78 24 26 50 52.00 Table IV.
T 389 96.40 192 83.85 Classification results
MLFN for the PNN and MLFN;
G 287 4 291 98.63 133 9 142 93.66 predictions (in columns)
B 5 93 98 94.90 12 38 50 76.00 versus observations
T 389 97.69 192 89.06 (in rows)
JRF
Training sample Testing sample
10,1 Sample model G B T Overall percentage G B T Overall percentage

PNN3
G 276 8 284 97.18 141 8 149 94.63
B 15 90 105 85.71 12 31 43 72.09
46 T 389 94.09 192 89.58
MLFN5
G 278 9 287 96.86 135 11 146 92.47
B 10 92 102 90.20 14 32 46 69.57
T 389 95.12 192 86.98
MLFN7
G 280 10 290 96.55 126 17 143 88.11
B 3 96 99 96.97 8 41 49 83.67
T 389 96.66 192 86.98
MLFN10
G 279 8 287 97.21 134 12 146 91.78
Table V. B 6 96 102 94.12 13 33 46 71.74
Classification results for T 389 96.40 192 86.98
the best powerful NNs BNS4-MLFN-5N
(60 models comprising G 279 18 297 93.94 123 13 136 90.44
20 PNN, 20 MLFN and B 3 89 92 96.74 4 52 56 92.86
20 BNS) T 389 94.60 192 91.15

Powerful NN models
PNN MLFN BNS Overall

Count 20 20 20 60
Average (mean) 84.5055 83.6980 87.2140 85.1392
SD 2.18984 2.74707 1.80751 2.70667
ANOVA F-ratio – – – 13.04 * * *
Fisher’s least significant difference test
PNN-MLFN – – – 0.8075
PNN-BNS – – – 2 2.7085 * *
MLFN-BNS – – – 2 3.5160 * *
Cochran’s C-test – – – 0.483468
Bartlett’s test – – – 1.060090
Levene’s test – – – 1.604170
Kruskal-Wallis median test statistic
Average rank 25.10 21.60 44.80 –
Table VI. Test statistic – – – 20.6185 * * *
A comparative statistical
evaluation of NN models Notes: Statistically significant difference at: * *5 and * * *1 per cent levels, respectively

Besides, all the neural net models namely, PNNs, MLFNs, BNSs are significantly
different at 95 per cent confidence level as revealed by Fisher’s least significant
difference test (there was no statistically significant differences between PNN and MLFN
powerful models). Since the smallest of the P-value was greater than or equal to 0.05,
there was not a statistically significant differences amongst the standard deviations at
the 95 per cent confidence level according to the Cochran’s C/Bartlett’s/Levene’s tests.
Moreover, the Kruskal-Wallis median test statistic shows statistically significant Alternative
differences at 99 per cent confidence level for NNs models with test statistics 20.62, scoring models
which means that the average correct classification rates are significantly different in
each neural net.
Differences between different NN models can also be observed in the graphical
analysis in Figure 2. In the BOX-and-Whisker plot reveals that the inter-quartile ranges
for (colored shaded boxes) Categories 0 and 1 do overlap, but not Category 2; and 47
Category 2 has a narrower inter-quartile range; and the medians are different for the
three categories. The analysis of means plot with a 95 per cent decision limit reveals
that Category 0 is between the CL and the LDL; whilst Category 2 is higher than the
UDL and Category 1 is lower than the LDL.

Comparison of results of different credit-scoring models


Two different criteria were used in this paper to compare different models’ results; the
ACC rate criterion, as a significant criterion in evaluating the classification efficiency of
the scoring results; and the estimated misclassification cost (EMC) criterion, as a
crucial effectiveness criterion to find the minimum expected MC for the credit-scoring
models. The following equation is used in computing the EMC:
       
B B G G
EMC ¼ C P p2 þ C P p1 ;
G G B B
where, C (predicted bad/actually good) and C (predicted good/actually bad) are the
corresponding MCs of both Type I and II errors. P (bad/good) and P (good/bad)
measure the probabilities of Types I and II errors. p2 and p1, are the prior probabilities
of good and bad, respectively, (West, 2000).
Because it was a challenging task to provide consistent estimates of the MCs in the
Egyptian banking sector a standardized cost-ratio was applied, as explained. Now,
it is generally believed in the scoring application that the cost-associated with Type II

Analysis of Means Plot for Testing ACC


Box-and-Whisker Plot With 95% Decision Limits
88
0 87 UDL = 86.14
CL = 85.14
Mean

86
CAT

1 LDL = 84.14
85
2 84
83
77 80 83 86 89 92 0 1 2
Testing ACC CAT
Notes: ACC = average correct classification; CAT = category: 0 = PNN, 1 = MLFN, and 2 = BNS.
Box-and-Whisker Plot: the vertical line within the colored shaded area represents the median; the right
end of the colored shaded area is equal to the upper quartile; the left end of the colored shaded area
is equal to lower quartile; the right end of the “whisker” represents the minimum of (i) the upper Figure 2.
quartile plus the inter-quartile range, and (ii) the maximum response; the left end of the “whisker” equals Box-and-whisker plot and
the maximum of (i) the lower quartile minus the inter-quartile range, and (ii) the minimum response; analysis of means plot
outliers are shown individually. with 95 per cent decision
Analysis of Means Plot with 95% Decision Limits: CL = Central Limit (overall mean), limits
UDL = Upper Decision Limit, LDL = Lower Decision Limit
JRF (bad credit is misclassified as good credit) error is much higher than the MC associated
10,1 with Type I (good credit is misclassified as bad credit) error, which is also true in other
case studies based on housing loans (Lee and Chen, 2005). Hofmann, who compiled his
German credit data, reported that the ratio of MCs associated with Types II and I is 5:1,
as noted by West (2000). In this paper, this relative cost ratio will be used to calculate
the EMC for the proposed models (MCs have been calculated for all models including
48 all trial-applications, see the Appendix). The prior probabilities of good and bad credit
are set as 74.5 and 25.5 per cent, respectively, using the ratio of good and bad credit in
the Egyptian data-set.
Table VII summarizes the ACC rates, errors, and EMCs results for conventional
techniques (classifications based on a 0.50 cut-off point) namely, WOE, DA, and LR,
and NN techniques namely, PNN, PNN3, MLFN, MLFN5, MLFN7, MLFN10 and
BNS4-MLFN-5N.
As concluded in Table VII, DA had the highest ACC rate at 85.42 per cent, amongst
all conventional techniques. Meanwhile, BNS4-MLFN-5N has the highest ACC rate at
91.15 per cent, amongst all techniques. All models predict good credit better than
bad credit, except two models namely, WOE and BNS4-MLFN-5N. In addition, the
highest correctly classified bad credit was 100.00 per cent for WOE, whilst the highest
correctly classified good credit was 95.07 per cent for both LR and PNN. It can be
concluded from Table VII, that the average performance of the NN models was better
than the average performance of the conventional models.
Furthermore, comparing conventional techniques the MC at 0.3697 was for DA.
That was the chosen model according to the ACC rate at 85.42 per cent. Comparing
NN models the lowest MC at 0.1623 was for BNS4-MLFN-5N amongst these
techniques. That was the chosen model, according to the ACC rate at a 91.15 per cent
ACC rate (Table VII). At last, comparing all conventional and NN techniques, the
highest ACC rate, which was 91.15 per cent, leads to selecting BNS4-MLFN-5N.
However, this does provide the lowest EMC, which was 0.1623 (for more details
regarding ACC rates and EMCs for all models including all trial-applications, see the
Appendix, Table AI).

Overall testing sub-sample(s) Error results


Scoring model G (per cent) B (per cent) Overall percentage Type I Type II EMC (5:1)

WOE 33.80 100.00 51.04 0.6620 0.0000 0.4932


DA 88.03 78.00 85.42 0.1197 0.2200 0.3697
LR 95.07 48.00 82.81 0.0493 0.5200 0.6997
PNN 95.07 52.00 83.85 0.0493 0.4800 0.6487
PNN3 94.63 72.09 89.58 0.0537 0.2791 0.3959
MLFN 93.66 76.00 89.06 0.0634 0.2400 0.3532
MLFN5 92.47 69.57 86.98 0.0753 0.3043 0.4441
Table VII. MLFN7 88.11 83.67 86.98 0.1189 0.1633 0.2968
Comparing classification MLFN10 91.78 71.74 86.98 0.0822 0.2826 0.4216
results, errors and EMCs BNS4-MLFN-5Na 90.44 92.86 91.15 0.0956 0.0714 0.1623
for the selected
techniques Note: aBest model amongst all models according to the highest ACC rate and the lowest MC
Conclusions and directions for future research Alternative
Credit scoring is regarded as one of the effective management tools to reduce both scoring models
financial and operational risks facing banks. Furthermore, credit scoring is regarded as
one of the basic applications of misclassification problems that have attracted
more-and-more attention during the past decades. This research presents an evaluation
of personal loans to help strengthen the financial and operational risks evaluation
process in the Egyptian banking sector using four credit scoring statistical techniques: 49
WOE, DA, LR and NNs. The intention in this paper has been to investigate both the
classification efficiency rate of consumer loans and the cost effectiveness associated
with misclassification errors.
The ranking of the models did not vary according to the decision criterion. Using
both the highest ACC rate and the lowest EMC, BNS4-MLFN-5N is preferred.
Correspondingly, it has been suggested that the ACC rate is more reliable, while the
EMC calculated in this paper is more subjective. Some of the predictor variables have
not normally been used in published studies of credit-scoring models, for example:
COR GUAR and LFOB. They are particularly appropriate within the Egyptian
banking sector.
Future studies should aim to use other advanced statistical scoring techniques, such
as genetic algorithms, besides the NNs and traditional scoring models which were used
in the current paper, and perhaps integrated with other techniques, such as fuzzy
DA. Finally, future research would use more evaluation criteria such as, area under
the ROC.

References
Altman, E.I., Marco, G. and Varetto, F. (1994), “Corporate distress diagnosis: comparisons using
linear discriminant analysis and neural networks (the Italian experience)”, Journal of
Banking & Finance, Vol. 18 No. 3, pp. 505-29.
Anderson, R. (2007), The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk
Management and Decision Automation, Oxford University Press, New York, NY.
Arminger, G., Enache, D. and Bonne, T. (1997), “Analyzing credit risk data: a comparison of
logistic discriminant, classification tree analysis, and feedforward networks”,
Computational Statistics, Vol. 12 No. 2, pp. 293-310.
Bailey, M. (2001), Credit Scoring: The Principles and Practicalities, White Box Publishing, Bristol.
Banasik, J., Crook, J. and Thomas, L. (2003), “Sample selection bias in credit scoring models”,
Journal of the Operational Research Society, Vol. 54 No. 8, pp. 822-32.
Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford University Press,
New York, NY.
CBE (2006/2007), “Banking developments”, Economic Review, Vol. 47 No. 3, pp. 32-56.
CBE (2007/2008), “Structure of the Egyptian banking system”, Economic Review, Vol. 48 No. 3,
p. 93.
Chen, M. and Huang, S. (2003), “Credit scoring and rejected instances reassigning through
evolutionary computation techniques”, Expert Systems with Applications, Vol. 24 No. 4,
pp. 433-41.
Crook, J., Edelman, D. and Thomas, L. (2007), “Recent developments in consumer credit risk
assessment”, European Journal of Operational Research, Vol. 183 No. 3, pp. 1447-65.
JRF Desai, V.S., Crook, J.N. and Overstreet, G.A. (1996), “A comparison of neural networks and linear
scoring models in the credit union environment”, European Journal of Operational
10,1 Research, Vol. 95 No. 1, pp. 24-37.
Desai, V.S., Conway, D.G., Crook, J.N. and Overstreet, G.A. (1997), “Credit scoring models in the
credit union environment using neural networks and genetic algorithms”, IMA Journal of
Mathematics Applied in Business and Industry, Vol. 8 No. 4, pp. 323-3463.
50 Dimla, D.E. and Lister, P.M. (2000), “On-line metal cutting tool condition monitoring: II: tool-state
classification using multi-layer perceptron neural networks”, International Journal of
Machine Tools & Manufacture, Vol. 40 No. 5, pp. 769-81.
Erbas, B. and Stefanou, S. (2008), “An application of neural networks in microeconomics:
input-output mapping in a power generation subsector of the US electricity industry”,
Expert Systems with Applications, Vol. 36 No. 2, pp. 2317-26.
Fisher, R.A. (1936), “The use of multiple measurements in taxonomic problems”, Annals of
Eugenics, Vol. 7 No. 2, pp. 179-88.
Ganchev, T., Tasoulis, D., Varhatis, M. and Fakotakis, N. (2007), “Generalized locally recurrent
probabilistic neural network with application to text-independent speaker verification”,
Neurocomputing, Vol. 70 Nos 7/9, pp. 1424-38.
Greene, W. (1998), “Sample selection in credit-scoring models”, Japan and the World Economy,
Vol. 10 No. 3, pp. 299-316.
Hand, D.J. and Jacka, S.D. (1998), Statistics in Finance, Arnold Applications of Statistics, London.
Hand, D.J., Sohn, S.Y. and Kim, Y. (2005), “Optimal bipartite scorecards”, Expert Systems with
Applications, Vol. 29 No. 3, pp. 684-90.
Lee, T. and Chen, I. (2005), “A two-stage hybrid credit scoring model using artificial neural
networks and multivariate adaptive regression splines”, Expert Systems with Applications,
Vol. 28 No. 4, pp. 743-52.
Lee, T., Chiu, C., Lu, C. and Chen, I. (2002), “Credit scoring using the hybrid neural discriminant
technique”, Expert Systems with Applications, Vol. 23 No. 3, pp. 245-54.
Liang, Q. (2003), “Corporate financial distress diagnosis in china: empirical analysis using credit
scoring models”, Hitotsubashi Journal of Commerce and Management, Vol. 38 No. 1,
pp. 13-28.
Malhotra, R. and Malhotra, D.K. (2003), “Evaluating consumer loans using neural networks”,
Omega the International Journal of Management Science, Vol. 31 No. 2, pp. 83-96.
Masters, T. (1995), Advanced Algorithms for Neural Networks: ACþ þ Sourcebook, Wiley,
New York, NY.
Min, J.H. and Lee, Y-C. (2008), “A practical approach to credit scoring”, Expert Systems with
Applications, Vol. 35 No. 4, pp. 1762-70.
Nur Ozkan-Gunay, E. and Ozkan, M. (2007), “Prediction of bank failures in emerging financial
markets: an ANN approach”, Journal of Risk Finance, Vol. 8 No. 5, pp. 465-80.
Oldham, M. and Young, M. (2004), “Egypt: difficult times remains for the banking sector”,
Special Report, FitchRatings, New York, NY.
Ong, C., Huang, J. and Tzeng, G. (2005), “Building credit scoring models using genetic
programming”, Expert Systems with Applications, Vol. 29 No. 1, pp. 41-7.
Orgler, Y.E. (1971), “Evaluation of bank consumer loans with credit scoring models”, Journal of
Bank Research, Vol. 2 No. 1, pp. 31-7.
Palisade Corporation (2005), Neural Tools: Neural Networks Add-in for Microsoft Excel,
Version 1.0, Palisade Corporation, New York, NY.
Reed, R.D. and Marks, R.J. (1999), Neural Smithing: Supervised Learning in Feedforward Alternative
Artificial Neural Networks, The MIT Press, London.
Sarlija, N., Bensic, M. and Bohacek, Z. (2004), “Multinomial model in consumer credit scoring”,
scoring models
paper presented at the 10th International Conference on Operational Research, Trogir.
Seow, H. and Thomas, L.C. (2006), “Using adaptive learning in credit scoring to estimate take-up
probability distribution”, European Journal of Operational Research, Vol. 173 No. 3,
pp. 880-92. 51
Siddiqi, N. (2006), Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring,
Wiley, Hoboken, NJ.
Thomas, L.C., Edelman, D.B. and Crook, L.N. (2002), Credit Scoring and Its Applications, Society
for Industrial and Applied Mathematics, Philadelphia, PA.
Tsai, C. and Wu, J. (2008), “Using neural network ensembles for bankruptcy prediction and credit
scoring”, Expert Systems with Applications, Vol. 34 No. 4, pp. 2639-49.
West, D. (2000), “Neural network credit scoring models”, Computers & Operations Research,
Vol. 27 Nos 11/12, pp. 1131-52.
Yu, L., Wang, S. and Lai, K. (2007), “An intelligent-agent-based fuzzy group decision making
model for financial multicriteria decision support: the case of credit scoring”, European
Journal of Operational Research, July.
Zekic-Susac, M., Sarlija, N. and Bensic, M. (2004), “Small business credit scoring: a comparison of
logistic regression, neural networks, and decision tree models”, paper presented at the
26th International Conference on Information Technology Interfaces, Dubrovnik.
Zhang, G., Hu, M.Y., Patuwo, B.E. and Indro, D.C. (1999), “Artificial neural networks in
bankruptcy prediction: general framework and cross-validation analysis”, European
Journal of Operational Research, Vol. 116 No. 1, pp. 16-32.
Zhang, Y. and Bhattacharyya, S. (2004), “Genetic programming in classifying large-scale data:
an ensemble method”, Information Sciences, Vol. 163 Nos 1/3, pp. 85-101.

(The Appendix follows overleaf.)


JRF Appendix. ACC rates and EMCs for all the proposed models based on the testing
results
10,1
Scoring TRIAL-MODELS Testing ACC rates (per cent) Estimated misclassification cost

WOE 51.04 0.4931900


WOET1 65.10 0.4527670
52 WOET2 74.31 0.3810980
WOEaT3 80.73 0.2954230
MDAa 85.42 0.3696765
LR 82.81 0.6997285
PNN 83.85 0.6487285
PNN1 84.38 0.5627485
PNN2 80.21 0.5478655
PNNa3 89.58 0.3958590
PNN4 84.38 0.4881895
PNN5 87.50 0.3486675
PNNa6 85.94 0.3433690
PNN7 86.46 0.4891580
PNN8 83.85 0.4599020
PNN9 85.94 0.5033080
PNN10 83.85 0.4792355
PNN11 83.33 0.5159050
PNN12 84.38 0.5279940
PNN13 85.42 0.5838990
PNN14 86.46 0.4729960
PNN15 83.33 0.5576850
PNN16 81.25 0.3872680
PNN17 83.85 0.4812070
PNN18 82.29 0.4735590
PNN19 82.29 0.6142950
PNN20 85.42 0.5096840
MLFN 89.06 0.3532330
MLFN1 82.81 0.4055805
MLFN2 84.38 0.3852480
MLFN3 85.42 0.4709525
MLFN4 83.33 0.4650425
MLFNa5 86.98 0.4440810
MLFN6 82.29 0.4824780
MLFNa7 86.98 0.2967880
MLFN8 77.08 0.6070260
MLFN9 80.73 0.5028725
MLFNa10 86.98 0.4215540
MLFN11 82.81 0.5376720
MLFN12 83.85 0.3412575
MLFN13 85.42 0.4506675
MLFN14 82.81 0.3800235
MLFN15 85.42 0.5422965
MLFN16 86.46 0.5139520
MLFN17 81.77 0.4934140
MLFN18 78.65 0.6591290
MLFN19 83.33 0.4435355
MLFN20 86.46 0.4572215
Table AI.
(continued)
Scoring TRIAL-MODELS Testing ACC rates (per cent) Estimated misclassification cost
Alternative
scoring models
BNS1-MLFN-2N 88.02 0.3855765
BNS2-MLFN-6N 86.98 0.3412720
BNS3-MLFN-5N 86.98 0.5194255
BNS4-MLFN-6Na 91.15 0.1622570
BNS5-MLFN-5N 85.94 0.3926660
BNS6-PNN 88.02 0.2982340 53
BNS7-MLFN-5N 88.54 0.3234920
BNS8-MLFN-2N 88.02 0.2596150
BNS9-MLFN-5N 88.02 0.4295675
BNS10-MLFN-5N 86.98 0.4312835
BNS11-PNN 85.42 0.4228990
BNS12-PNN 87.50 0.4163430
BNS13-MLFN-2N 88.02 0.3434525
BNS14-PNN 82.81 0.4929810
BNS15-PNN 89.06 0.3475385
BNS16-MLFN-4N 85.94 0.3964670
BNS17-PNN 87.50 0.3141325
BNS18-MLFN-6N 89.06 0.3652085
BNS19-MLFN-3N 85.42 0.4358655
BNS20-MLFN-6N 84.90 0.4812480
Notes: aModels associated with the lowest EMCs and/or the highest ACC rates under each technique
modeling. Firstly, comparing conventional techniques the lower MC at 0.2954 was for WOET3. This
was not the chosen model between these models according to ACC rate at 85.42 per cent ACC rate, for
MDA. Secondly, the lowest EMC at 0.3434 was for PNN6 amongst all the PNN models. That was not
the chosen model, according to the ACC rate, at 89.58 per cent ACC rate, for PNN3. While the lowest
EMC was 0.2968 for MLFN7, that was one of the chosen models according to the ACC rate, at
86.98 per cent ACC rate. While the lowest EMC for BNSs was 0.1623 for BNS4-MLFN-6N. That was the
highest ACC rate, between all BNS models. Finally, comparing all the techniques, the lowest EMC
criterion leads to selecting BNS4-MLFN-6N with a minimum cost of 0.1623. However, this does provide
the highest ACC rate, which was 91.15 per cent for this net model. Nevertheless, the ACC rates
calculated in this paper are more reliable, while the EMCs are more subjective Table AI.

About the author


Hussein A. Abdou is a Lecturer in Finance and Banking at the University of Salford Business
School, UK, and member of the RMA (Risk Management Association), serving the Financial
Services Industry (USA). He is also affiliated with the Economics and Finance Research Unit of
the Peninsula Centre for Sustainable Governance (PCSG) at The University of Plymouth
Business School, UK. As an active researcher in finance, he has published widely in international
journals, including Expert Systems with Applications, Journal of International Business Strategy,
and Banks and Bank Systems. Hussein A. Abdou can be contacted at: h.abdou@salford.ac.uk

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com


Or visit our web site for further details: www.emeraldinsight.com/reprints

You might also like