You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220636600

Statistical merging of rating models

Article  in  Journal of the Operational Research Society · June 2011


DOI: 10.1057/jors.2010.41 · Source: DBLP

CITATIONS READS

16 1,632

2 authors:

Silvia Figini Paolo Giudici


University of Pavia University of Pavia
52 PUBLICATIONS   466 CITATIONS    150 PUBLICATIONS   2,120 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Probabilistic Graphical models View project

Credit risk models View project

All content following this page was uploaded by Paolo Giudici on 31 May 2014.

The user has requested enhancement of the downloaded file.


Journal of the Operational Research Society (2011) 62, 1067 --1074 © 2011 Operational Research Society Ltd. All rights reserved. 0160-5682/11

www.palgrave-journals.com/jors/

Statistical merging of rating models


S Figini∗ and P Giudici
University of Pavia, Pavia, Italy
In this paper we introduce and discuss statistical models aimed at predicting default probabilities of Small
and Medium Enterprises (SME). Such models are based on two separate sources of information: quantitative
balance sheet ratios and qualitative information derived from the opinion mining process on unstructured data.
We propose a novel methodology for data fusion in longitudinal and survival duration models using quantitative
and qualitative variables separately in the likelihood function and then combining their scores linearly by a
weight, to obtain the corresponding probability of default for each SME. With a real financial database at
hand, we have compared the results achieved in terms of model performance and predictive capability using
single models and our own proposal. Finally, we select the best model in terms of out-of-sample forecasts
considering key performance indicators.
Journal of the Operational Research Society (2011) 62, 1067 – 1074. doi:10.1057/jors.2010.41
Published online 12 May 2010

Keywords: predictive models; Bayesian merging; probability of default; parametric models; survival analysis;
model selection

1. Introduction are available to estimate the probability of default for each


Over the last 40 years, an impressive amount of theoretical and statistical unit of interest: quantitative financial ratios and cate-
empirical research has been produced concerning the analysis gorical variables derived from business expertise. Our general
and prediction of the default risk. Three main approaches can idea is to develop longitudinal predictive models and survival
be distinguished. duration models, first using the two data sources separately
A first approach, usually relying on accounting-based in the likelihood function, and then combining their scores
indicators, deals with the identification of an appropriate linearly by a weight to obtain the corresponding probability
regression technique. The aim of these studies is to define a of default for each statistical unit.
limited set of explicative variables together with a classifica- By using a real data set of German Small and Medium
tion procedure that should be able to discriminate between Enterprises (SMEs), we describe how our proposal works
safe and risky debtors (see eg Altman and Sabato, 2006). A in terms of out-of-sample forecasting on the basis of the
second approach relies on the option-pricing theory, stemmed confusion matrix and related key performance indicators. We
from Merton’s work (Merton, 1974). A third approach derives compare the results achieved through our approach as opposed
a relation between the credit spreads of risky securities and to standard ones, based on a one-step analysis of combined
their issuers’default probability (see eg Duffie and Singleton, data sources. As a results we observe that our approach,
1997). described in details in Section 3, performs better than classical
Recently, a series of non-parametric methods from machine models for credit risk estimation. Furthermore, we point out
learning, data mining, artificial intelligence and operations that our contribution is theoretically justified in the Bayesian
research have been employed. For an exhaustive review of literature (see eg Bernardo and Smith, 1994).
such approaches see for example Crook et al, 2006 and The paper is structured as follows: Section 2 describes
Abrahams et al, 2008 and 2009. the statistical models employed, while Section 3 shows our
In this paper we present a novel approach to credit risk proposal to derive a two step merging model. In order to
estimation. Instead of choosing the best model from a set of compare the models, key performance indicators are described
models, we ask whether performance can be improved by in Section 4. Section 5 reports the empirical application.
combining the outputs of several models. This has been an Finally, conclusions and further ideas of research are reported
area of growing research in recent years and is related to devel- in Section 6.
opments in the data fusion literature (see eg Webb, 2003).
Considering the problem at hand, two sources of information 2. Predictive models for default estimation
∗ Correspondence: S Figini, Department of Statistics and Applied
Economics, Libero Lenti, University of Pavia, Strada Nuova 65, 27100
Most rating agencies usually analyze each company on site
Pavia, Italy. and evaluate the probability of default (PD) on the basis of
E-mail: silvia.figini@unipv.it different quantitative financial criteria considered in a single
1068 Journal of the Operational Research Society Vol. 62, No. 6

year, or over a multiple-year time horizon. However, this kind family:


of evaluation does not take into account unstructured and qual-  
Yit it − b(it )
itative information for PD estimation. In our opinion, unstruc- (Yit |X 1it , . . . , X kit )∝ exp +c(Yit , ) ,
tured information—such as business knowledge and expert ait ()
opinions—is relevant and should be considered for the rating (1)
 
measurement. Unstructured data is not directly integrable in Yit it∗ − b(it∗ )
a statistical model and for this reason information extraction ∗ (Yit |X 1it

, . . . , X ∗pit ) ∝ exp + c(Yit , ∗ ) ,
ait ()
processes such as opinion mining, are required to derive qual- (2)
itative characteristics on SMEs.
Opinion analysis is an active research topic in data mining where it is the canonical parameter related to the linear
and knowledge discovery. So far two strategies have been predictor it = X it , with a k ×1 vector of fixed effects regres-
mainly pursued in opinion analysis: one is based on the sion coefficient ,  is a scalar dispersion parameter and ait ,
linguistic knowledge about subjective language (see eg Wiebe b, c are known functions with ait () = it , where it is a
et al (2005) and Ku et al (2006)), the other is related to known positive weight. Equation (2) reports the same param-
the machine learning method, incorporating with linguistic eters for the qualitative data.
features, to some concrete opinion analysis task (see eg Choi We are interested in predicting the expectation for the
et al (2005), Riloff et al (2006), Lin et al (2006)). response as a function of the covariates. For the problem at
In order to extract qualitative information, in this paper hand the logit link is appealing because it produces a linear
we have used an opinion-mining prototypes for text anal- model for the log of the odds, ln{ 1−(Y(Yit =1|X 1it , ..., X kit )
it =1|X 1it , ..., X kit )
} for the
∗ (Y =1|X ∗ , ..., X ∗ )
ysis proposed by the University of Sheffield a partner in quantitative data and ln{ 1−∗ (Yit it =1|X1it∗ , ..., Xkit∗ ) } for the qualita-
1it kit
the musing project (European Project on Multi-Industry tive data, implying a multiplicative model for the odds them-
Semantic-Based Business Intelligence Solutions, for more selves (for more details, see eg Dobson, 2002).
details, see http://www.musing.eu/). More precisely, musing As a results we derive the corresponding i (probability
proposes to employ Support Vector Machine (SVM) imple- of default estimated following Equation (1)) and i∗ (proba-
mentations inside the GATE system (Li et al, 2004). For bility of default estimated following Equation (2)) for each
a binary classification problem—like default in SME—the SME i = 1, . . . , n. As pointed out before, Equations (1) and
SVM tries to find out a hyper-plane in the feature space (2) consider quantitative and qualitative risk factors respec-
which separates the positive training examples from the nega- tively.
tive ones. As a result, the system proposed is able to derive
categorical qualitative variables with ordinal levels starting 2.2. Semi-parametric duration models (SDM)
from unstructured documents. The problem at hand could
We will now show how to model the probability that a
be formalised as follows: X 1 , . . . , X k is a set of k quantita-
company may default in a given year, following a semi-
tive financial ratios, X 1∗ , . . . , X ∗p are p categorical variables
parametric duration model. Even if the time to default
derived from opinions and business knowledge and Y is the
could be viewed, in principle, as a continuous variable,
target binary variable. All of these variables are measured
data concerning a company survival or default are available
for a common set of n statistical units in T subsequent times.
on discrete time basis, usually yearly. Indeed what we can
On the basis of the information collected, we have to
typically observe is whether a specific company survives or
estimate, for each SME, a measure of relative risk. Our aim
defaults in a given time interval.
is to propose a novel methodology, described in Section 3,
Let Ti be the duration time expressed by the duration of the
to merge quantitative and categorical data in credit risk esti-
relationship between the i th SME and the bank. The instan-
mation comparing logistic parametric longitudinal models
taneous default risk or hazard rate h i (t) for the i th SME at
and semi-parametric duration models based on survival
time t is defined as:
analysis.
P(ti + t |T  t)
h i (t) = lim .
t→0 t
2.1. Parametric longitudinal models (LPM)
The hazard rate is assumed to take on the Cox proportional
In this section we introduce how to model default on the
hazards form for the quantitative
basis of a longitudinal predictive approach. For observation
i, (i = 1, . . . , n) and time t, (t = 1, . . . , T ), let Yit denote the h i (t|X 1i , . . . , X ki ) = h 0 (t) exp{1 X 1i + · · · + k X ki }, (3)
binary response solvency variable, let X 1it , . . . , X kit denote a

set of k quantitative candidate predictors, and X 1it , . . . , X ∗pit and qualitative variables:
the p qualitative candidate predictors. h i∗ (t|X 1i

, . . . , X ∗pi ) = h 0 (t) exp{∗1 X 1i

+ · · · + ∗p X ∗pi }, (4)
Considering the quantitative and the qualitative data, the
elements of Yit = (yit , . . . , ynt ) are modelled as condition- where in Equations (3) and (4), h 0 (t) is the baseline hazard at
ally independent random variables from a simple exponential time t and  is a vector of regression coefficients. This model
S Figini and P Giudici—Statistical merging of rating models 1069

implies that the hazards ratio is constant over time, for two expert judgment with empirical data in a two-step approach.
SMEs provided that the covariates do not change. In more Considering the problem at hand, expert judgment are repre-
detail, there are n SMEs and, associated with the i th SME (i = sented by qualitative data, while empirical data are derived
1, . . . , n) there is a survival time t and a fixed censoring time from quantitative information.
ci . The t times are assumed to be independent and identically More precisely, following the predictive models described
distributed with density f (t) and survival function S(t). The in Section 2, for each SME we observe i = 1, . . . , n, a score
exact survival time t of a SME will be observed only if t  c j . i , derived from quantitative information and a score i∗ i =
Data in this framework can be represented by the n pairs of 1, . . . , n derived from qualitative information.
the random variables (yi , vi ), where: Our proposal is based on the following steps:

yi = min(t, ci ), 1. Starting from the original training data D of size n


composed of k quantitative variables and p qualitative
and vi is 1 if t  ci and 0 otherwise. variables, we generate r new training sets by sampling
The likelihood function for a set of right censored quanti- examples from D uniformly and with replacement. As a
tative data on n SMEs is given by: result, we obtain r bootstrapped training sets.
2. We built r predictive models on the r bootstrapped training

n
L(, h 0 (t)|D) ∝ [h 0 (yi ) exp(i )]i (S0 (yi )exp(i ) ) data sets derived from step 1 separately for the quantitative
i=1 and qualitative variables.
 n  3. Considering the quantitative data on the r bootstrapped

n 
i sets, we compute the variance of thei as: ¯ 2 (i ) =
= [h 0 (yi ) exp(i )] exp − exp(i )Ho (yi ) , data
j=1 (i j − 
¯ )2 , where 2 (i ) = r −1 j=1 (i j − 
r r
i=1 i=1
1
r −1
1
¯ )2 .
where D = (n, y, X, v), y = (y1 , . . . , yn ) , v = (v1 , . . . , vn ) , 4. Considering the qualitative data on the r bootstrapped data
i = xi  is the linear predictor for SME i, xi is a vector of sets,we compute the variance of  the ∗ as: 2 (¯ i∗ ) =
∗ ∗ 2 ∗
quantitative covariates for SME i, X is a matrix of quantitative
1
r −1
r
¯ i ) , where ¯ i = r rj=1 i∗j , j = 1, . . . r .
j=1 (i j − 
1
1
covariates composed of n SMEs (rows) and k quantitative 2 (i ) 2 (∗ )
5. We derive
i = 1
+ 1
= 2 (∗ )+ i 2 (i ) .
variables (columns). 2 (i ) 2 (i∗ )i

A similar likelihood function holds for a set of right 6. The final probability of default for each SME is a linear
censored qualitative data on n SMEs: combination of  and ∗ weighted by
i computed as:
P D i =
i i + (1 −
i )i∗ , i = 1, . . . , n.

n
[h 0 (yi ) exp(i∗ )]i (S0 (yi )exp(i ) )

L(∗ , h 0 (t)|D ∗ ) ∝ We remark that the data indicator
i must satisfy suitable
i=1 regularity conditions: 2 (i∗ ) + 2 (i ) = 0, 2 (i ) < ∞ and
 n 

n  2 (i∗ ) < ∞.
∗ i ∗
= [h 0 (yi ) exp(i )] exp − exp(i )Ho (yi ) , The resulting P D i , in step 6, can be interpreted as a Bayesian
i=1 i=1 before posterior probability for the models proposed (see eg
A very important remark is that Cox model generates Bernardo and Smith, 1994). For example, let yi indicate the
survival functions that are adjusted for covariate values. default variable, where yi = 0 means good SME and yi = 1
More precisely, it is possible to derive the probability that a means bad SME. We can assume that the yi are i.i.d distributed
company may default in a given year (T = ti ), conditional according to a Bernoulli probability distribution function:
upon it was solvent up to that point in time on the basis p(yi |) =  yi (1 − )1−yi ,  ∈ (0, 1). (5)
of quantitative and qualitative variables. More precisely,

Si (X 1i , . . . , X ki |T = t) and Si (X 1i , . . . , X ∗pi |T = t) can be We assume that the prior distribution of  is:

viewed as i and i , respectively.
1
h() =  0 −1 (1 − )0 −1 ,
B( 0 , 0 )
3. Merging model: a proposal
In this section we describe how to combine the quantitative ( 0 ) (0 )
B( 0 , 0 ) = , (6)
and qualitative risk scores i and i∗ for the n SMEs at hand. ( 0 + 0 )
Our proposal is well justified in the Bayesian paradigm. In
general, the Bayesian framework provides a unified and intu- it follows that the posterior probability distribution of  is:
 
itively appealing approach to the problem of drawing infer- yi + 0 −1 yi +0 −1
h(|y) ∝  (1 − )n− , (7)
ences from observations. Bayesian statistics views statistical
inference as a problem in belief dynamics and it uses evidence where y is the n × 1 vector of the observed default (yi = 0
about a phenomenon to revise and update the knowledge about means good SME, yi = 1 means bad SME).
it. Following the Bayesian theory (see eg Bernardo and Smith, Such posterior corresponds
n to a Beta( , 
) probability
1994), it is a scientifically justifiable to integrate informed distribution, with = i=1 Yi + 0 ,  = n − i=1
n
Yi + 0 ).
1070 Journal of the Operational Research Society Vol. 62, No. 6

n
Setting i=1 yi = k, where k is the observed frequency of confidence intervals can be calculated with the percentile
default, the posterior expectation of  is: method (Hosmer and Lemeshow, 2000). Furthermore, statis-
tical tests based on the 2 statistics can be useful to compare
k + 0 n k different models on the basis of the AUC (see eg DeLong
E(|y) = = ·
n + 0 + 0 n + 0 +  0 n et al, 1988).

0 + 0 0
+ . (8)
n + 0 + 0 0 + 0 5. Application
If we set
i = n+ n+ , i = nk and i∗ = ( +0 ), we obtain Our empirical analysis is based on annual 1999 to 2004 data
0 0 0 0
Equation (7) in the special case in which the scores i and from Creditreform, one of the major rating agencies for SMEs
i∗ are constant among SMEs. in Germany, covering 1003 firms belonging to 352 different
A further extension considers the multinomial probability business sectors.
distribution with a Dirichlet prior (see eg Bernardo and Smith, When handling bankruptcy data it is natural to label one
1994). of the categories as success (healthy) or failure (default) and
to assign them the values 0 and 1, respectively. Our data
4. Model selection and evaluation set consists of a binary response variable (solvency) Yit and
a set of explanatory variables: X 1it , . . . , X kit , quantitative

In order to compare different models, the empirical literature financial ratios and X 1it , . . . , X ∗pit qualitative features. The
typically uses criteria based on statistical tests (see Burnham sample size available is composed of about 1000 SMEs.
and Anderson, 1998), criteria based on scoring functions (see Considering the quantitative variables, the balance sheet is
Akaike, 1973; Schwarz, 1978; Vapnik, 1998), computational divided into two parts which, based on the following equation,
criteria (see eg Hastie et al, 2001) and criteria based on loss must equal (or balance out) each other: Assets = liabilities +
functions (see eg Kohavi and John, 1997). equity. This means that assets, or the means used to operate the
Considering the problem at hand, a clear comparison company, are balanced by a company’s financial obligations
between the models described in Section 2 can be derived along with the equity investment brought into the company
using the confusion matrix (see eg Kohavi and John, 1997). and its retained earnings.
Table 1 reports a theoretical confusion matrix containing Given this understanding of our balance sheet data and
the number of elements that have been correctly or incorrectly how it is constructed, we can discuss some techniques used to
classified for each class. analyze the information therein contained. This is mainly done
In the context of our study, for a given cut off p (0 < p < 1), through financial ratio analysis. Financial ratio analysis uses
the entries in the confusion matrix have the following formulas to gain insight into a company and its operations.
meaning: a true positive for example, a bad SME is clas- Based on the balance sheet and using financial ratios (like the
sified as bad; c a false positive for example, a good SME debt-to-equity ratio), it can give a better idea of a company’s
is classified as bad; d a true negative for example, a good financial condition along with its operational efficiency. It is
SME is classified as good; b a false negative for example, important to note that some ratios will need information from
a bad SME is classified as good. In principle, each of these more than one financial statement, such as the balance sheet
outcomes would have some associated loss or reward. and the income statement.
The cut-off point can be selected taking into account the The main types of ratios that use information from the
false-negative classification and the a priori incidence of the balance sheet are financial strength ratios and activity ratios.
target variable (P opt) or the value at which sensitivity and Financial strength ratios, such as the debt-to-equity ratio,
specificity are equal (P fair). It is possible also to derive a provide information on how well the company can meet
cut off, maximising the statistic Kappa (see eg Cohen, 1960). its obligations and how they are leveraged. This can give
A related instrument to validate the performance of a investors an idea of how financially stable a company is and
predictive model for probabilities is the Receiver Oper- how the company finances itself. Activity ratios focus mainly
ating Characterisitcs (ROC) curve (Hastie et al, 2001; on current accounts to show how well a company manages
Figini and Giudici, 2009). A recommended index of accu- its operating cycle. These ratios can provide insight into the
racy associated with a ROC curve is the Area Under company’s operational efficiency.
the Curve (AUC), as a threshold-independent measure of There is a wide range of individual financial ratios that
predictive performance. For such measure, bootstrapped Creditreform uses to learn more about a company. Given our
available dataset, we computed a set of 11 financial ratios
Table 1 Theoretical confusion matrix suggested by Creditreform based on its experience:
Observed/Predicted ŷi = 1 ŷi = 0
• Supplier target days: it is a temporal measure of financial
yi = 1 a b
yi = 0 c d
sustainability expressed in days that considers all short and
medium term debts as well as other payables.
S Figini and P Giudici—Statistical merging of rating models 1071

• Outside capital structure: this ratio evaluates a firm’s capa- Creditreform business experts of:
bility to receive forms of financing other than banks’ loans.
• Cash ratio this ratio indicates the cash a company can • KdtUrt: this information is relevant in order to define if the
generate in relation to its size. business relationship between an SME and Creditreform is
• Capital tied up: this ratio evaluates the turnover of short acceptable or is not recommended. This feature is based on
term debts with respect to sales. past credit decisions and it shows a value 0 if the relation-
• Equity ratio: it measures a company’s financial leverage ship is acceptable and 1 otherwise.
calculated by dividing a particular measure of equity by • ZwsUrt: this variable summarises the payment history for
the firm’s total assets. each SME. The levels are 0 if the payment is within time
• Cash flow to effective debt: this ratio indicates the cash a and 1 if irregular payments are present. Furthermore textual
company can generate in relation to its size and debts. descriptions highlight reminders to encourage payment for
• Cost income ratio: the cost income ratio is an efficiency each SME.
measure similar to the operating margin one which is useful • Entw: this variable describes information on the company
to measure how costs are changing compared to income. is development level. A level equal to 2 or 1 or 0 means
• Trade payable ratio: this ratio reveals how often the firm respectively a positive company development, a stagnating
payables turn over during the year; a high ratio means a rela- company development and a declining company develop-
tively short time between purchase of goods and services ment.
and their payment; a low ratio may be a sign that the • Auft: it is a categorical variable that reports the order situa-
company has chronic cash shortages. tion for each SME. If the order situation is good, the vari-
• Liabilities ratio: it is a measure of a company’s financial able is equal to 2; if the order situation is declining or bad
leverage calcu lated by dividing a gross measure of long- it is equal to 0 or 1.
term debt by the firm’s assets; also it highlights what debt • AnzMta: this feature is a grouped variable derived from a
proportion the company is using to finance its assets. quantitative information. We have computed three groups
• Result ratio: this is an index of how profitable a company is composed of different number of employees in relation to
relative to its total assets; it gives an idea as to how efficient special company structures.
management is at using its assets to generate earnings. • We remark that all of the previous variables are true expert
• Liquidity ratio: This ratio measures the extent to which opinions, although most of them are based on available
a firm can quickly liquidate assets and cover short-term objective information recorded in a subjective way.
liabilities. It is therefore of interest to short-term creditors.
5.1. Exploratory analysis
Furthermore, we considered the following additional annual
account positions, which were standardised in order to avoid In this section we report univariate statistical measures based
computational problems with the previous ratios: on variability and tendency for the quantitative financial
ratios and exploratory heterogeneity indexes for the qualita-
• Total assets: it is the sum of current and long-term assets tive features.
owned by a firm. The results derived on the qualitative features are
• Total equity: It refers to total assets minus total liabilities, summarised in Table 2. Table 2 reports classical heterogeneity
and it is also referred to as equity or net worth or book indexes (such as Gini and the entropy) for the qualitative
value. variables considered for good SMEs (Solvency = 0) and bad
• Total liabilities: it includes all the current liabilities, SMEs (Solvency = 1), respectively.
long term debt, and any other miscellaneous liabilities a In order to derive information about the relationship
company may have. between each variable and the solvency, bivariate explo-
• Net income: this is equal to the income that a firm has after rative analyses on the data available are reported in Tables
subtracting costs and expenses from the total revenue. 3 and 4. Table 3 shows the pairwise correlations computed
between each quantitative variable and a quantitative variable
As pointed out in Section 2, we have derived the following provided by Creditreform to measure the solvency (index
categorical variables on the basis of opinions expressed by for creditworthiness). Based on the p-value, the significant

Table 2 Heterogeneity measures for qualitative features


Year AnzMta Rfo ZwsUrt KdtUrt Auft Entw
Solvency = 0, Gini Heterogeneity Index 0.952 0.830 0.617 0.009 0.110 0.612 0.571
Solvency = 0, Entropy Index 0.910 0.800 0.559 0.014 0.148 0.538 0.460
Solvency = 1, Gini Heterogeneity Index 0.773 0.923 0.737 0.633 0.135 0.834 0.836
Solvency = 1, Entropy Index 0.581 0.895 0.706 0.522 0.200 0.792 0.815
1072 Journal of the Operational Research Society Vol. 62, No. 6

Table 3 Quantitative risk factors: discriminant power Table 7 Quantitative data results: chosen risk factors (SDM)
Variable Coefficient of p-value Quantitative variables Estimate Std. Error p-value
Correlation
Supplier target days −0.017 0.6073 Result ratio 1.088 0.389 < 0.0001
Outside capital structure 0.169 < 0.0001 Supplier target days 0.156 0.051 < 0.0001
Cash ratio −0.121 0.0002 Cost income ratio −0.360 0.181 < 0.0001
Capital tied up 0.032 0.3299 Liabilities ratio −1.167 0.211 < 0.0001
Equity ratio −0.146 < 0.0001
Cash flow to effective debt −0.096 0.0036
Cost income ratio 0.021 0.5306
Trade payable ratio 0.096 0.0034 Table 8 Qualitative data results: chosen risk factors (SDM)
Liabilities ratio 0.287 < 0.0001 Qualitative variables Estimate Std. Error p-value
Result ratio −0.221 < 0.0001
Liquidity ratio −0.116 0.0004 ZwsUrt 0.515 0.1749 0.0032

Table 4 Qualitative risk factors: discriminant power


Table 9 Chosen risk factors for one step LPM
Qualitative variable Cramer p-value
Variables Estimate Std. Error p-value
2 0  2  1
(intercept) −6.454 1.67 < 0.0001
AnzMta 0.16 < 0.0001
Result ratio −10.732 2.44 < 0.0001
ZwsUrt 0.99 < 0.0001
Liabilities ratio 3.952 1.716 0.021
KdtUrt 0.947 < 0.0001
ZwsUrt 0.527 0.101 < 0.0001
Auft 0.687 < 0.0001
Entw 0.710 < 0.0001

Table 10 Chosen risk factors for one step SDM


Table 5 Quantitative data results: chosen risk factors (LPM)
Variables Estimate Std. Error p-value
Quantitative variables Estimate Std. Error p-value
Result ratio −11.662 1.88 < 0.0001
(Intercept) −17.312 1.541 < 0.0001 ZwsUrt 0.363 0.080 < 0.0001
Outside capital structure 1.371 0.681 0.043
Liabilities ratio 4.200 1.561 0.007
Result ratio −12.132 2.277 < 0.0001
The capability of a SME to receive other forms of financing
other than banks’ loans (outside capital structure), the propor-
Table 6 Qualitative data results: chosen risk factors (LPM) tion of debt a SME is using to finance its assets (liabilities
ratio) and how profitable a SME is relative to its total assets
Qualitative variables Estimate Std. Error p-value
(results ratio) are the relevant features linked with the solvency
(Intercept) −0.447 0.150 0.002 indicator. In contrast, only the payment experience (ZwsUrt)
ZwsUrt −2.17746 0.39617 0.0001 is a relevant qualitative information.
The quantitative SDM model, reported in Table 7, shows
the results ratio, the liabilities ratio, a temporal measure of
financial sustainability expressed in days such as the short
variables are: outside capital structure, cash ratio, equity ratio, and medium term debts (supplier target days) and the cost
cash flow to effective debt, trade payable ratio, liabilities ratio, income ratio as significant. Again, in this model, the signifi-
result ratio and liquidity ratio. cant qualitative variable is the payment experience (ZwsUrt),
Table 4 reports the pairwise qualitative relationship based as reported in Table 8.
on the Cramer Index. As we can observe the discriminant We remark that the results achieved on the basis of the LPM
variables are the company development (Entw), credit deci- and SDM models confirm business practice and the scientific
sion (KdtUrt), and Payment experience/history (ZwsUrt). findings (see eg Altman and Sabato, 2006).
Tables 9 and 10 report the results achieved on the basis of
5.2. Parametric and semi-parametric inferential analysis:
a single model approach.
LPM and SDM
On the basis of a single model approach based on LPM,
In this section we report the inferential results for the predic- the significant variables are: result ratio, liabilities ration and
tive models described in Section 2. In particular, Tables 5 and the information related to the experience in payment for each
6 show the quantitative and qualitative chosen risk factors SME. The SDM selects the result ratio and ZwsUrt as relevant
based on the LPM model. variables.
S Figini and P Giudici—Statistical merging of rating models 1073

Table 11 Model comparison based on AUC


Model Quantitative Qualitative Merged model Single model
Parametric longitudinal predictive models (LPM) 0.868 0.795 0.909 0.837
Semi-parametric duration models (SDM) 0.79 0.687 0.812 0.818

5.3. Model comparison Table 12 Merged LPM Model: threshold selection


To assess the predictive performance for each model, we P opt P fair p kappa p = 0.5
implemented a cross-validation procedure. We used 70% of P krit 0.32 0.085 0.282 0.5
the observations as training set and 30% as validation set. sensitivity 0.508 0.801 0.594 0
For each model, we computed the AUC and related tests. The specificity 0.974 0.819 0.960 1
results are summarised in Table 11. % correct 0.915 0.816 0.914 0.875
In Table 11, the quantitative and qualitative columns report kappa 0.556 0.425 0.588 0
the results for each model derived from quantitative and qual-
itative variables. The merged column shows the results based
on our integrated approach described in Section 3 and the Table 13 Merged LPM Model: corresponding confusion matrix
last column reports the results achieved using a single model (P opt)
approach. As we can observe from Table 11, the proposed 0 1
merged models perform better than separate qualitative and
0 85.23% 2.26% 87.49%
quantitative version of the LPM model. Considering the SDM, 1 6.14% 6.35% 12.49%
we confirm similar evidences, but we highlight that a single 91.37% 8.61% 100%
model shows similar measures of performance with respect
to our proposal.
For each model we have computed the AUC with the rela- More precisely, Table 12 shows optimal threshold value for
tive confidence interval on the basis of a bootstrap percentile the LPM merged model. In order to classify bad and good
method (see eg Hosmer and Lemeshow, 2000). The resulting SMEs, we have selected a cut-off equal to 0.32, with a corre-
test shows that the AUC computed are significantly different sponding specificity and sensitivity equal to 0.974 and 0.508,
from 0.7. In business practice, an AUC bounded between 0.7 respectively. Table 13 reports the corresponding confusion
and 0.8 means that the discrimination made by the corre- matrix.
sponding model is acceptable. From Table 13 the proportion of correct classification is
So far we have derived four different models following a equal to 91.58%; considering the errors, the false negatives
merged approach and a single approach. It is interesting also and the false positives are respectively equal to 2.26% and
to test if the models are different in terms of predictive ability. 6.14%. In credit risk models, the magnitude as well as the
To reach this objective we may compare the performances of number of correct predictions are a matter of regulatory
the two models by testing if the difference between the esti- concern. This concern can be readily incorporated into a so
mated AUC is significantly different at the 1% level. First we called loss function by introducing a magnitude term. The
test if is significant the difference between the merged and the best forecasting model becomes the model that produces the
single LPM models. The related test statistics 2 (1) is equal smallest expected loss. Following the framework proposed
to 7.89; we reject the hypothesis that the models are equal. in Granger and Pesaran, 2000 and Fuertes and Kalotychou,
Similarly, the difference between SDM models with a test 2006, in our application we confirm the merged LPM as best
statistics 2 (l) equal to 4.94 is significant considering = 1%. model.
Following our proposal on the data at hand the LPM derived
on the merging approach shows a clear superiority among the
6. Concluding remarks
others. In contrast, considering the results in Table 11, we
think that the SDM models performs worse than the LPM: Considering the fundamental role played by small and
it should be improved, for example taking into account a medium sized enterprises (SMEs) in the economy of many
generalised version of the Cox model. Indeed we remark that countries and the considerable attention placed on SMEs in
in our opinion, Cox proportional hazard model shows some the new Basel Capital Accord, we have compared a set of
weaknesses as to default modelling because it assumes that predictive models to predict the default probabilities.
the event times are independent and the risk is proportional We have developed a novel methodology for data merging
across the SMEs. in longitudinal and survival duration models using quantita-
The selected model is the merged LPM. For sake of tive and qualitative variables separately in the likelihood func-
completeness, Table 12 reports the efficiency measures tion and then combining their scores linearly by a weight to
described in Section 4. obtain the corresponding probability of default for each SME.
1074 Journal of the Operational Research Society Vol. 62, No. 6

We have compared the results achieved using single models DeLong E, DeLong D and Clarke-Pearson D (1988). Comparing the
and merged models in terms of model performance and areas under two or more correlated receiver operating characteristic
predictive capability. curves: A nonparametric approach. Biometrics 44: 837–845.
Dobson AJ (2002). Introduction to Generalised Linear Model.
Empirical evidences collected on a real data source show Chapman Hall: London.
that the merged longitudinal predictive model performs better Duffie D and Singleton KJ (1997). An econometric model of the term
than survival duration models. On the basis of the data at structure of interest-rate swap yields. J Financ 52: 1287–1322.
hand the LPM derived using our proposal for data merging Figini S and Giudici P (2009). Applied Data Mining for Business and
shows a clear superiority in terms of predictive performance. Industry. Wiley: London.
Fuertes AM and Kalotychou E (2006). Early warning systems for
In contrast, in our opinion Cox proportional hazard model sovereign debt crises: The role of heterogeneity. Comput Stat Data
employed in SDM models shows some weaknesses for default An 51: 1420–1441.
modelling because assumes that the event times are indepen- Granger C and Pesaran M (2000). Economic and statistical measures
dent and the risk is proportional across the SMEs. We believe of forecast accuracy. J Forecasting 19: 537–560.
that our proposed model can be a good starting point for more Hastie T, Tibshirani R and Friedman JH (2001). The Elements
of Statistical Learning: Data Mining, Inference, and Prediction.
accurate risk estimation. Springer Verlag: New York.
Hosmer DW and Lemeshow S (2000). Applied Logistic Regression.
Wiley: New York.
Acknowledgements — This work has been supported by MUSING 2006
contract number 027097, 2006 to 2010. The paper is the result of a Kohavi R and John G (1997). Wrappers for feature subset selection.
collaboration between the authors, however it has been written by Silvia Artif Int 97: 273–324.
Figini. Ku LW, Liang Y-T, and Chen H-H (2006). Opinion extraction,
summarization and tracking in news and blog Corpora. In:
Proceedings of AAAI-2006 Spring Symposium on Computational
References Approaches to Analyzing Weblogs.
Li Y, Bontcheva K and Cunningham H (2004). An SVM based learning
Abrahams CR and Zhang M (2009). Credit Risk Assessment: The algorithm for information extraction. Machine Learning Workshop.
New Lending System for Borrowers, Lenders and Investors. Wiley: Sheffield.
Chichester, NY. Lin W, Wilson T, Wiebe J and Hauptmann A (2006). Which side
Abrahams CR and Zhang M (2008). Fair Lending Compliance: are you on? Identifying perspectives at the document and sentence
Intelligence and Implications for Credit Risk Management. Wiley: Levels. In: Proceedings of the Tenth Conference on Computational
Chichester, NY. Natural Language Learning (CoNLL-X), June, New York. AAAI
Akaike H (1973). Information theory and an extension of the Press, Menlo Park, CA, pp 109–116.
maximum likelihood principle. Second International Symposium Merton RC (1974). On the pricing of corporate debt: The risk structure
on Information Theory, pp. 267–281. of interest rates. J Finance 29: 449–470.
Altman E and Sabato G (2006). Modelling credit risk for SMEs: Riloff E, Patwardhan S and Wiebe J (2006). Feature subsumption
Evidence from the US market. ABACUS 19(6): 716–723. for opinion analysis. In: Proceedings of the Conference on
Bernardo J and Smith A (1994). Bayesian Theory. Wiley: London. Empirical Methods in Natural Language Processing (EMNLP-06).
Burnham KP and Anderson DR (1998). Model Selection and Available at ACL Anthology, a digital archive of research pages
Inference: A Practical Information—Theoretic Approach. Springer- in computational linguistics.
Verlag: New York. Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6:
Choi Y, Cardie C, Riloff E and Patwardhan S (2005). Identifying 461–464.
sources of opinions with conditional random fields and extraction Vapnik V (1998). Statistical Learning Theory. Wiley: New York
patterns. In: Proceedings of the 2005 Human Language Technology (Chichester, UK).
Conference/Conference on Empirical Methods in Natural Language Webb A (2003). Statistical Pattern Recognition, 2nd edn. Wiley:
Processing. Available at ACL Anthology, a digital archive of Chichester, NY.
research pages in computational linguistics. Wiebe J, Wilson T and Cardie C (2005). Annotating expressions of
Cohen J (1960). A coefficient of agreement for nominal scales. Educ opinions and emotions in language. Lang Resour Eval 39(2–3):
Psychol Meas 20: 3746. 165210.
Crook J, Edelman D and Thomas L (2006). Recent developments
in consumer credit risk assessment. Eur J Opl Res 183: Received April 2009;
1569–1581. accepted February 2010 after two revisions

View publication stats

You might also like