2012-CaBa - Credit Scoring Analysis Using A Fuzzy Probabilistic Rough Set Model PDF

Computational Statistics and Data Analysis 56 (2012) 981994
Contents lists available at SciVerse ScienceDirect
Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda
Credit scoring analysis using a fuzzy probabilistic rough set model

Andrea Capotorti , Eva Barbanera
Universit degli Studi di Perugia, Italy
article info abstract

Article history: Credit scoring analysis is an important activity, especially nowadays after a huge number
Available online 12 July 2011 of defaults has been one of the main causes of the financial crisis. Among the many
different tools used to model credit risk, the recent development of rough set models
Keywords:
has proved effective. The original development of rough set theory has been widely
Rough fuzzy multicriteria classification
generalized and combined with other approaches to uncertain reasoning, especially
Coherent conditional probability
assessments probability and fuzzy set theories. Since coherent conditional probability assessments cope
Credit scoring well with the problem of unifying these different approaches, a merging of fuzzy rough set
theory with this subjectivist approach is proposed. Specifically, expert partial probabilistic
evaluations are encompassed inside a gradual decision rule structure, with coherence
of the conclusion as a guideline. In line with Bayesian rough set models, credibility
degrees of multiple premises are introduced through conditional probability assessments.
Nonetheless, discernibility with this method remains too fine. Therefore, the basic partition
is coarsened by equivalence classes based on the arity of positively, negatively and
neutrally related criteria. A membership function, which grades the likelihood of default,
is introduced by a peculiar choice of t-norms and t-conorms. To build and test the model,
real data related to a sample of firms are used.
2011 Elsevier B.V. All rights reserved.
1. Introduction
In this paper we propose a hybrid methodology for classification based on the methodologies of rough sets, partial
conditional probability assessments and fuzzy sets. Due to common concomitance of uncertainty and imprecision in real life
data, the generalization of original Pawlak rough set theory (Pawlak, 1982) and its combination with probability and fuzzy
set theory has already been proposed (see, among others Dubois and Prade, 1992; Greco et al., 1999b, 2004, 2006, 2007;
Yasdi, 1995; Yao, 2008; Yao and Wong, 1992; Yao et al., 1990; Ziarko, 2005, 2008). Even though a further proposal in this
direction might appear unnecessary, we believe the use of the more general setting of coherent partial conditional probability
assessments, instead of the common probabilistic approaches, could bring new insights. The framework of coherent partial
conditional probability assessments finds its roots in the work of de Finetti (19741975), and it has been recently shown
(Coletti and Scozzafava, 2002, 2004, 2006) to be a powerful tool for unifying different approaches to uncertain reasoning.
Similarly to Lyra et al. (2010), our approach has been inspired by credit scoring, which is the general term used to
indicate methods utilized for classifying credit applicants into classes of risk on the basis of probability of default values.
The probability of default, also named expected default frequency or probability of insolvency, is a basic parameter of the Basel
II Accord used in the calculation of economic capital. The probability of default expresses the likelihood that a loan will not
be repaid and will fall into default. Consequently a firm that is either unwilling or unable to pay its debt is classified in a
state of default, or as a defaulting firm.
Corresponding address: Dip. Matematica e Informatica, via Vanvitelli 1, 06126 Perugia, Italy. Tel.: +39 0755855011; fax: +39 0755855024.
E-mail address: capot@dmi.unipg.it (A. Capotorti).
0167-9473/$ see front matter 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.csda.2011.06.036
982 A. Capotorti, E. Barbanera / Computational Statistics and Data Analysis 56 (2012) 981994
Credit scoring analysis is an important activity, especially nowadays that a huge number of defaults has been one of
the main causes of the current continuing world financial crisis. In an attempt to reduce costs of protection for firms, both
academics and practitioners have studied models for evaluating business failures, according to their differing interests and
the differing conditions of the firms being examined. Nevertheless, until now there is no generally accepted model for the
prediction of business failures.
In the past, classical statistical methods were predominantly used to predict credit risk, (see Balcaen and Ooghe, 2006 for
an overview of applications using univariate statistical models, multiple discriminant analysis, linear probability models,
logit regression, and probit analysis). However, these models remain ineffective as they have to rely on various restrictive
assumptions, such as a large number of samples, normally distributed independent variables, and a linear relationship
between all variables. Nevertheless, the main drawback of these statistical methods is that, as stressed in Doumpos and
Zopounidis (2002, Sec. 3), statistical properties of the data are rarely known, and it is difficult to specify the underlying
population being considered. Hence non-parametric techniques are preferable. Such approaches can be flexible enough to
adjust themselves to the peculiarities of the data under consideration.
Recently, rough set has shown its effectiveness in constructing a prediction model or in being combined with other
approaches to identify key attributes relevant to risk. This procedure usually starts with a pre-process of analyzing the
attributes crucial for detecting similar cases and predicting the value of the target variables (e.g. see, among others, Sowinski
and Zopounidis, 1995; Sowinski et al., 1997; Tay and Shen, 2002 and Lin et al., 2009). In particular, Sowinski et al. (1997)
demonstrate how the rough set approach can outperform classical discriminant analysis; and Greco et al. (1998) show how
a generalization through the dominance principle of rough set theory properly deals with financial classification problems.
Here we begin by following the same line. The rough set approach is used to aggregate similar instances in equivalence
classes, and fuzzy set theory is used to grade the likelihood of getting in default for elements in each class. The difference
in our development arises when the original attributes are transformed by experts probabilistic evaluations into criteria;
the partition generated by such criteria will be coarsened into meta-classes by profiting from a reasonable assumption of
conditional exchangeability; and these meta-classes will then be graded by values built by t-norms and t-conorms. These
are induced naturally by the interpretation of membership functions as coherent conditional probability assessments.
The development of our reasoning is similar to the historical evolution of the original rough set theory (RS) into the so
called stochastic-dominance-based rough set approach (Stochastic-DRSA) (Dembczynski et al., 2007). For this reason, in
Section 2 we reformulate the main steps of the passage from RS to a Bayesian rough set model (BRS) (Slezak and Ziarko,
2002) following the probability reinterpretation illustrated in Yao (2008). However, a BRS could involve several drawbacks
that have inspired further generalizations, such as those briefly listed in Section 2.4. These have led to the development of
Stochastic-DRSA. In contrast, we propose avoiding the same drawbacks in a different way, which we illustrate in Section 4
through the exemplification of default risk analysis performed over a sample of firms of the Umbria region of central
Italy. The main novelties introduced are the coarsening of the original ordered categories into a meta-criterion, and the
methodology adopted to obtain the membership function. Finally, Section 5 concludes the paper with a short comment.
2. Evolution of rough set classification methods
In this section we briefly review the evolution of the rough set classification methods that led to the so called two-
parameter probabilistic rough set approximation. In doing this we mainly follow Yao (2008), taking advantage of the
unifying approach that can be expressed through conditional probabilities.
2.1. Rough set theory
Classic rough set theory (RS) was proposed by Pawlak (1982) in order to approach approximate classification problems.
The starting point is the availability of a finite set U of objects described through a set of attributes A.
Let E U U be an equivalence relation on U. That is, E is reflexive, symmetric, and transitive. The basic building blocks
of rough set theory are the equivalence classes of E. For an element x U, the equivalence class containing x is given by:
[x]E = {y U : xEy}. (1)
When no confusion arises, we also simply write [x]. The family of all equivalence classes is also known as the quotient set of
U, and is denoted by U /E = {[x] : x U }. It defines a partition of the universe, namely, a family of pairwise disjoint subsets
the union of which is the universe. Usually, the equivalence relations adopted are those induced by the subsets of attributes
A. Moreover A is commonly divided into disjoint sets of condition attributes C and decision attributes D . For simplicity, we
assume the set D to be a singleton D = {D}. Equivalence classes are the elementary definable, measurable, or observable
sets in the approximation space apr = (U , E ). By taking unions of elementary definable sets, one can derive larger definable
sets.
The key idea of rough sets is to approximate one body of knowledge by another less detailed one. In classical RS the
approximated knowledge is the partition of a fine set of elements U into classes induced by the decision attribute D; the
knowledge used for approximation is another partition of U into elementary sets of objects that are indiscernible by the set
of condition attributes C . The elementary sets are seen as granules of knowledge used for approximation.
A. Capotorti, E. Barbanera / Computational Statistics and Data Analysis 56 (2012) 981994 983
Let X U be one of these decision classes (in our example of Section 4 X will coincide with the defaulting firms). Its lower
approximation is the greatest definable set contained in X , and its upper approximation is the least definable set containing
X . That is, for X U,

apr(X ) = {[x] U /E : [x] X } (2)
and

apr(X ) = {[x] U /E : [x] X = }. (3)
Given an approximation space apr = (U , E ) and a subset X U, the universe U can be divided into three disjoint regions,
namely the positive, the negative and the boundary regions:
POS(X ) = apr(X ), (4)
NEG(X ) = POS(X c ) = (apr(X ))c (5)

and
BND(X ) = apr(X ) apr(X ), (6)
c
where X denotes the complementary set of X .
An element of the positive region POS(X ) definitely belongs to X , an element of the negative region NEG(X ) definitely
does not belong to X , and an element of the boundary region BND(X ) only possibly belongs to X . One may therefore use any
of the three pairs to represent a subset X U:
(POS(X ), POS(X ) BND(X )) = (apr(X ), apr(X )). (7)
Note that the lower and upper approximations are strongly related to the notions of core and support of a fuzzy set.
Hence, it was a straightforward step to generalize them by a fuzzy set membership.
2.2. Rough membership function
To overcome the rigidity of the lower approximation due to the inclusion relation, the notion of a rough membership
function was explicitly introduced by Pawlak and Skowron (1994), although it was already implicitly and equivalently used
by many authors (see e.g. Pawlak et al., 1988; Wong and Ziarko, 1987; Yao and Wong, 1992; Yao et al., 1990). The original
immediate way of grading the membership of elements of [x] in X was
|X [x]|
X (x) = , (8)
|[x]|
where | | denotes the cardinality of a set. Going beyond a purely frequentist approach, in line with Coletti and Scozzafava
(2004, 2006) and Yao and Wong (1992), we can in the sequel redefine (8) in more general probabilistic terms as
X (x) P (X |[x]). (9)
Nevertheless, Coletti and Scozzafava (2004, 2006) give to the X in (9) the semantic interpretation of an element x is judged
to belong to X and not the usual set interpretation x X . This difference will be relevant for a right understanding of our
proposal, as described in Section 4.
A first proposal to adopt a probabilistic approach for approximations was actually given by Pawlak et al. (1988). Their
approximation is essentially a majority rule. In fact, an object x is put into the lower approximation of X if the majority of its
equivalent class [x] is in X . That is, by (8),
apr
0.5
(X ) = {x U |X (x) > 0.5} (10)
and
apr0.5 (X ) = {x U |X (x) 0.5}. (11)
Majority rule was soon extended to fully probabilistic rough set approximation by Yao and Wong (1992) and Yao et al.
(1990). Their model defines the positive region as an area where, on the basis of available data, the conditional probability,
interpretable now as rough membership (9), of objects to the given set is certain to some degree:
for 0 < 1
apr (X ) = {x U |X (x) } (12)

and
apr (X ) = {x U |X (x) > }; (13)
for 0 = = 1
apr (X ) = {x U |X (x) } (14)

and
apr1 (X ) = {x U |X (x) > 1 }, (15)
where the first pair of approximations (12)(13) are referred to as the asymmetric bounds (, ), while the second pair
(14)(15) as the symmetric bounds (1 , ). The same model was equivalently obtained through a set inclusion measure
by Ziarko (1993) and was named the variable precision rough set model (VPRS). Nevertheless, while the effectiveness of
VPRS critically relies on the choice of the bounds, Yao and Wong (1992) legitimize the bounds through the introduction of
loss functions in a decisiontheoretic approach.
Trying to give a reasonable interpretation of the required parameters without introducing auxiliary instruments, Slezak
and Ziarko (2002) introduced the Bayesian rough set Model (BRS) that corresponds to the approximations
apr
P (X )
(X ) = {x U |X (x) P (X )} (16)
and
apr1P (X ) (X ) = {x U |X (x) > 1 P (X )}, (17)
obtainable by VPRS by = 1 = P (X ), the a priori probability of observing an object x in the considered subset X U.
2.3. A further contamination
Before proceeding to analyze the main drawbacks of the approximation models we have seen until now, let us consider
a last step forward.
Greco et al. (2005) observed that rough membership functions, as defined either by frequency counts (8) or by conditional
probabilities (9), consider the overlap between X and [x] and do not explicitly consider the overlap between X and [x]c .
Hence they introduced a relative rough membership function:
|X [x]| |X [x]c |
X (x) = = P (X |[x]) P (X |[x]c ), (18)
|[x]| |[x]c |
that is a particular instance of a class of measures known as the Bayesian confirmation measures, (for comparisons among
such measures see e.g. Greco et al., 2008; Tentori et al., 2007). They proposed the two-parameter approximation
apr
,a
(X ) = {x U |X (x) and X (x) a} (19)
and
apr,b (X ) = {x U |X (x) > and X (x) > b}. (20)
Hence, in order to include object x in the positive region of the subset X , it is not sufficient to have a minimum membership
grade. It is also necessary that the grade is sufficiently greater than the membership associated with objects distinguishable
from x, i.e. objects outside [x]. In other words, it is necessary that both the absolute and the relative memberships of x in
X are not smaller than the given thresholds and a, respectively. Despite having quite reasonable motivations, this model
still suffers (even more than VPRS) with the problem of choosing the parameters.
2.4. Main drawbacks and direct remedies
The evolution of the rough set theory sketched until now conveys different drawbacks that various authors have solved
in different ways. Let us list the most important ones.
First of all, note that as the number of attributes grows, the indiscernibility property loses significance. In fact, in such
cases there will be a lot of equivalence classes [x], all with low cardinality. To avoid this trouble, Sowinski and Vanderpooten
(2000) generalized the indiscernibility relation through similarity conditions.
However, classical rough set motivations run into several troubles whenever preference-orders of attribute domains
(criteria) are to be considered. In fact, it cannot manage inconsistencies which result in violation of the dominance principle.
Such a problem is typically encountered in multicriteria decision analysis (MCDA) problems such as sorting, choice or
ranking. To bypass this drawback it is necessary to replace the indiscernibility relation with a dominance relation, which
allows approximation of ordered sets in a multicriteria sorting. For this Greco et al. (1999a,b) introduced the so called
dominance-based rough set approach (DRSA). In the DRSA approach, where condition attributes C are criteria, and classes
are preference-ordered, the knowledge approximated is a collection of upward and downward unions of classes, and the
granules of knowledge are sets of objects defined using a dominance relation instead of an indiscernibility relation.
The dominance relation <C is defined as a binary relation on U in the following way:
<C = {(y, z ) U U : f (y; Cl ) f (z ; Cl ) Cl C }; (21)

where f (y; Cl ) f (z ; Cl ) means y is at least as good as z with respect to criterion Cl . The dominance relation <C is a partial
preorder (i.e., it is reflexive and transitive). The dominance principle can therefore be expressed as follows:
y <C z f (y; D) f (z ; D) (22)
where D is the decision criterion. Two objects y, z U are said to be consistent if they satisfy the dominance principle.
Anyhow, in real applications, inconsistent pairs are easily found. Hence, in line with VPRS, Greco et al. (2001) have
generalized DRSA through variable consistency dominance-based rough set approach (VC-DRSA) by introducing a tolerance
level for the inconsistencies in the approximations.
A last critical aspect applies to definitions (8) and (9) for the rough membership functions. In fact, they are based on the
hypothesis of uniform distribution or of common conditional probabilities inside each equivalence class. Frequently these
assumptions are not met, either for a not random sample mechanism of the selection of the objects x U, or for the scarce
representativeness of U with respect to the whole population. To avoid this trouble Dembczynski et al. (2007) and Kotlowski
et al. (2008) have responded by introducing the so called Stochastic-DRSA method, which is mainly based on the estimate
of conditional probabilities using the maximum likelihood method.
3. Theoretical prerequisites
To let the paper be self-contained, in the following we briefly introduce the basic concepts of the conditional coherence
paradigm and of t-norms and t-conorms that will be needed to develop the classification procedure explained in the next
section.
Coherence, or equivalently consistency, for partial assessments can be reduced to the property of compatibility with a
well established mathematical model. For conditional probabilities the reference models are the so called full conditional
probabilities, as introduced by Dubins (1975) and in line also with the thoughts of de Finetti (1949), Krauss (1968), and Rnyi
(1955). Full conditional probabilities are characterized by the following set of axioms:
Definition 1. Given a Boolean algebra B , a full conditional probability on B B 0 (B 0 = B \ {}) is a function P : B B 0

[0, 1] such that
(i) P (|H ) is a finitely additive probability on B for any given H in B 0 ;
(ii) P (H |H ) = 1 for all H B 0 ;
(iii) P (A|C ) = P (A|B)P (B|C ) for every A B , B, C B 0 , with A B C .
Note that, whenever (i)(ii) are satisfied, condition (iii) is equivalent to

(iii ) P (A B|C ) = P (B|C )P (A|B C ) for every A, B B , C , B C B 0 ,
where denotes the usual logical conjunction. The pairs (A|H ) B B 0 are called conditional events. Usually on discrete
settings, like our U, for the Boolean algebra B the power set P ( ) of some sample space is adopted.
As a consequence we have:
Definition 2. If E = {A1 |H1 , . . . , An |Hn } is an arbitrary set of conditional events, an assessment P (|) on E is said to be
coherent if there exists a full conditional probability P (|) defined on P ( ) P ( )0 (with P ( ) the power set of sample
space spanned by the events A1 , H1 , . . . , An , Hn ) which coincides with P (|) on E .
An operational check of coherence is possible thanks to the following characterization theorem (Coletti, 1994; Coletti
and Scozzafava, 1996, 1999, 2002):
Theorem 1. Let E be an arbitrary finite family of conditional events and denote the set of atoms r generated by the events
A1 , H1 , . . . , An , Hn . For a real function P on E the following two statements are equivalent:
(a) P is a coherent conditional probability assessment on E ;
(b) there exists (at least) one class of probabilities {P0 , P1 , . . . , Pk }, each probability P being defined on a suitable subset ,
such that for any Ai |Hi E there is a unique P inside the class with
P (r )

r Ai Hi
P (r ) > 0 P (Ai |Hi ) = ;
P (r )

r Hi
r Hi
moreover for > and P (r ) = 0 if r .

Any class {P0 , P1 , . . . , Pk } satisfying condition (b) is said to agree with the conditional probability assessment P.
For our purposes it is important to underline the so-called qualitative side of the coherence approach. In fact we
will profit from expert comparison evaluations about more or less favorable premises to default. Therefore we will deal
with comparative assessments among some pairs of conditional events {Ai |Hi , Bi |Ki }iI , I N. Such comparisons lead to a
preference order formalized through the following binary relation

Ai |Hi 4 Bi |Ki P (Ai |Hi ) P (Bi |Ki ) has been assessed. (23)
The problem is to check the coherence, intended as in Definition 2, of the conditional probability P (|) representing 4 in
(23). This warranty can be obtained either directly, by building a numerical evaluation agreeing with 4 whose coherence can
be checked through Theorem 1, or indirectly by verifying if 4 satisfies characteristic properties. Such properties can be of
simple form whenever some specific conditions hold (see e.g. Coletti et al., 1993), or they can be integrated in the following
general axiom (Coletti and Vantaggi, 2006):
Aj |Hj 4 Bj |Kj j j [0, 1], with j < j if Aj |Hj Bj |Kj , s.t.
n

sup [j (IBj Kj j IKj ) + j (j IHj IAj Hj )] 0 (24)
H 0 j=1
n N and j , j 0,
where H 0 is the union of the conditioning events whose corresponding j , j is positive, and I() is the indicator function.
We briefly report also the main notions about t-norms and t-conorms (for more details see, among others, Bloch, 1996;
Dubois and Prade, 1985; Klement et al., 2000) that will be used in the next section to build a peculiar membership function.
T -norms are a generalization of the usual two-valued logical conjunction for fuzzy logics. A t-norm is a function
T : [0, 1] [0, 1] [0, 1] which satisfies the following properties:
Commutativity: T(a, b) = T(b, a);
Monotonicity: T(a, b) T(c , d) if a c and b d;
Associativity: T(a, T(b, c )) = T(T(a, b), c );
The number 1 is an identity element: T(a, 1) = a.
T -norms can be classified on the base of their analytical properties. In particular, a t-norm is called continuous if it is
continuous as a function, in the usual interval topology on [0, 1]2 (similarly for left- and right-continuity); a t-norm is called
Archimedean if it has the Archimedean property, i.e., if for each x, y in the open interval (0, 1) there is a natural number n
such that T (x, T (x, T (. . . , x))) (n times composition) is less than or equal to y. A continuous Archimedean t-norm is called
strict if 0 is its only nilpotent element; otherwise it is called nilpotent. The usual partial ordering of t-norms is pointwise,
i.e., T1 4 T2 if T1 (a, b) T2 (a, b) for all a, b [0, 1]. As functions, pointwise larger t-norms are sometimes called stronger
than those pointwise smaller. In the semantics of fuzzy logic, however, the larger a t-norm, the weaker (in terms of logical
strength) conjunction it represents.
On the other side, t-conorms are used to represent logical disjunction in fuzzy logic and union in fuzzy set theory.
T -conorms (also called S-norms) can be introduced as dual of t-norms under the usual negation connective , i.e. the order-
reversing operation which assigns 1 x to x on [0, 1], as S(a, b) = 1 T(1 a, 1 b). However, t-conorms can be defined
independently of t-norms as functions S : [0, 1] [0, 1] [0, 1] which satisfy the following conditions:
Commutativity: S(a, b) = S(b, a);
Monotonicity: S(a, b) S(c , d) if a c and b d;
Associativity: S(a, S(b, c )) = S(S(a, b), c ).
The number 0 is an identity element: S(a, 0) = a.
The use of specific t-norms and t-conorms depends on the semantic properties one wants to represent, hence their
formulation could change from one application to another (Bloch, 1996; Dubois and Prade, 1985).
4. Classification through comparison of coherent memberships functions
As already anticipated in the Introduction, we want to follow the same kind of reasoning that led to transform the classical
rough set theory to the more general Stochastic Dominance-based rough set approach, but with the paradigm of coherent
partial conditional probability assessments as a unique guide.
Generally speaking, we propose the following methodology:
conversion of attributes into criteria;
coarsening of the information by grouping criteria into a single meta-criterion;
elicitation of proper membership functions through specific choices of t-norms to combine the joint information
contained in the meta-criterion;
estimate of the parameters present into the membership functions through their genuine Bayesian interpretation;
a final merging of information obtained by comparing concordance or discordance of classifications performed in
different periods.
More specifically, starting with a disaggregated decision table (U , C , D), we want to have a multicriteria sorting
procedure based on some graduation function. For this, we will first transform the attributes into criteria. This will be done
by the aforementioned expert probabilistic evaluations that, like in the BRS, compare the positive or negative association
of attributes values with respect to the a priori probability of an object x U to belong to the target decision class
(in the following prototypical example the target will be the class of defaulting firms). However the partition generated
by these criteria will be too fine (i.e. we obtain equivalence classes with small cardinalities). Hence the basic granules
will be coarsened into a meta-criterion based on the arity of positively, negatively and neutrally related criteria. This will
be possible by taking advantage of a reasonable assumption of conditional exchangeability among the criteria given the
decision class. The same assumption will allow us to grade the likelihood of an object to belong to the target decision class
through a peculiar membership function. This membership function is built through t-norms and t-conorms inspired by the
conditional probabilistic interpretation (9). In the following credit risk analysis, the choices made for such operators quite
naturally stem from the assumed equivalent relevance among criteria and from the propensity (safeness) of firms with at
least one positive (negative) criterion to be in default, respectively. If different reasonable assumptions are made, different
connectives must be adopted.
Once an overall membership is obtained, the decision classes assignment will be based on a comparative process to avoid
the introduction of arbitrary parametric thresholds. This procedure is similar to the approach used in Bayesian confirmation
measures.
4.1. A prototypical credit risk analysis example
We shall now display the details and the effectiveness of our procedure through an example of credit risk analysis. The
decision table (U , C , D) is composed of two years balance sheets concerning 80 firms, with learning data sample kindly
provided by the Cassa di Risparmio di Spoleto (CARISPO) bank. It is a pairwise sample, where in each pair, one firm was still
in business in 2007 and it is matched against a defaulting one belonging to the same economical sector and having, as much
as possible, the same economical size. Firms x U are all located in central Italy, especially in the Umbria region, to avoid as
much as possible the influence of geographical characteristics. Condition attributes C represent the values of the following
economical indexes (which are usually adopted in this kind of analyses because they do not depend on the size of firms):
V 1 = net earnings/shareholders equity.
V 2 = EBITDA/net interests.
V 3 = fixed liabilities/current assets.
V 4 = current liabilities/current assets.
V 5 = EBITDA/current assets.
V 6 = (shareholders equity + fixed liabilities)/fixed assets.
V 7 = (fixed liabilities + current liabilities)/shareholders equity.
V 8 = (fixed liabilities + current liabilities)/EBITDA.
V 9 = revenues/(current assets + fixed assets).
V 10 = EBITDA/total net asset.
To make the information manageable, the observations of each attribute in (V 1, V 2, . . . , V 10) are synthesized into 6 equi-
frequent classes, so that in the following we will refer to the observations of ten discrete variables (DF1, DF2, . . . , DF10).
The decision attribute D is a binary variable representing the solvency status of the firm. Hence a 1 will be associated
with the defaulting firms, while a 0 will be associated with the healthy ones.
The objective of the analysis is to use the information provided by the condition attributes (DF1, DF2, . . . , DF10) to
approximate, as well as possible, the solvency status D. Hence the subset X , target of the approximation, is formed by the
defaulting firms
X = {x U : f (x, D) = 1}. (25)
4.2. Conversion of attributes into criteria
Credit risk analysis is an activity that seems particularly appropriate for applications that involve the expressions of
subjective judgments. In fact credit institutions usually employ several experts to handle and evaluate credit applications.
Hence we endow the analysis with an expert (DM) opinion.
We have seen that classically, the rough set approach considers attributes rather than criteria. In general, the notion of
attribute differs from that of criterion, because the domain (scale) of a criterion has to be ordered according to a decreasing
or increasing preference, while the domain of the attributes does not have to be ordered. We will use the notion of criterion
because the ordering of the evaluations considered are crucial for representing the DMs behaviors.
Attributes DFl can be converted into criteria Cl , l = 1, . . . , 10, by grouping classes of each DFl in ranges El that modify
the expert behavior about the firm propensity to default:
El+ if P (X |El+ ) > P (X )

E l = El0 if P (X |El0 ) = P (X ) (26)
El if P (X |El ) < P (X ),
Fig. 1. Criteria about a firms propensity to default w.r.t. different attributes.
with P (X ) the DMs a priori probability that a firm could be in default, and the P (X |El ) the posterior probabilities that a
firm could be in default in the context of being with a value for the l-th attribute DFl inside a positive (El+ ), a neutral (El0 ) or
a negative (El ) range, respectively. The expert propensity opinions are illustrated in Fig. 1. Since the El , l = 1, . . . , 10, =
+, , 0, form a partition of U, coherence of the comparative evaluations (26) can be easily checked by a direct construction
of a full conditional probability distribution agreeing with the inequalities therein. Note moreover that there are three
attributes DF1 , DF5 and DF9 judged fully irrelevant (i.e. expressing only neutral values), so that we are actually reduced
to 7 criteria, continuing, with an abuse of notation, to denote them by Cl , but now with l = 1, . . . , 7.
However, this information transformation from Vl to Cl is not enough for our goal. In fact, in U we observe a great variety
of rows (few firms x U having the same combination of values El , l = 1, . . . , 7). Hence we need a further step of
synthesis. After the reduction of the number of criteria from 10 to 7, each remaining criterion Cl is evaluated from the DM
to be equivalently relevant in the qualitative comparisons in (26). It is therefore quite reasonable to assume that the values
El are exchangeable given the solvency status (for a detailed exposition of the conditional exchangeability property refer
e.g. to Lad, 1996, sec. 3.9): with such an assumption, two firms y, z U are judged to be equivalent if they exhibit the
same number of positive criteria values (dark grey rectangles), the same number of negative ones (light grey rectangles)
and consequently the same number of neutrals (white rectangles). This can be expressed as follows: if we consider two
different criteria expressions
= E11 E22 E77

with l , m {+, 0, }, l, m = 1, . . . , 7; (27)
= E11 E22 E77
but with the same total numbers of signs +, 0, , we have that
P ( |X ) = P ( |X ) (28)
and
P ( |X c ) = P ( |X c ). (29)
Such exchangeability constraints reflect the expert idea that, whether a firm x is a defaulting one, i.e. in accordance with
(25) x X , or a healthy one, i.e. x X c , what is probabilistically relevant is how many positive and negative criteria it
expresses, and not which ones. Note that, as a consequence of conditional exchangeability we implicitly obtain equivalence
of conditional probabilities
P (El |X ) = P (Em |X )
l = m {1, . . . , 7}; = +, 0, , (30)
P (El |X c ) = P (Em |X c )
that formalize the judgement of equal relevance among criteria. In fact (30) expresses that, given the solvency status,
positive, neutral and negative regions are equally likely irrespective of which attributes they derive from.
Fig. 2. Coarser partitions based on the meta-criterion Fijk applied to the penultimate (left) and ultimate (right) balance sheets and corresponding
cardinalities and standard Rough Membership Functions values.
Through the conditional exchangeability assumption it is now possible to coarsen the partition induced by the criteria.
In fact, denoting by i the number of dark grey rectangles, j the number of light grey rectangles and k the number of white
rectangles, it is possible to create new equivalences classes [x] Fijk :
x Fijk x exhibits E11 E77
with |{l = +}| = i, |{l = }| = j, |{l = 0}| = k. (31)
To stress the use of these new coarser equivalence classes (31) and of the probabilistic interpretation (9), in the following
we modify, with a slight abuse of notation, the membership functions script:
(9) (31)
(X |Fijk ) , X (x) = P (X |[x]) = P (X |Fijk ). (32)
Endowed with equivalence classes Fijk , U continues obviously to be a partially ordered set (POSET) with the meta-criterion
having: i with an increasing order (hence i could be named a gain), j with a decreasing one (cost) and k with no relevance
(null). Hence the dominance relation 4ijk results as
Fi j k 4ijk Fi j k i i and j j . (33)

Applied to our two years balance sheets sample, we obtain the partitions illustrated in Fig. 2, where, for each equivalence
class Fijk , its number of defaulting (D) and healthy (H ) firms and the standard rough membership functions values (computed
as in (8)) have also been plotted. Note the inappropriateness of such memberships to grade the propensity to default due
to the violation of the dominance principle: for example in both the penultimate and ultimate balance partitions we have,
among others, by (33) that
F331 4ijk F322 while (X |F331 ) > (X |F322 ). (34)
This phenomenon is mainly due to the elementary standard rough membership function (8) and it could be avoided by
introducing a more appropriate consistency measure, as done e.g. in Baszczyski et al. (2009). We will follow a different
way that would also involve membership functions, but taking advantage of the more general probabilistic interpretation
(9). Before we illustrate how this would be possible, notice the high overall inconsistency of the decision tables in Fig. 2, in
spite of all the information syntheses performed to obtain them.
4.3. New rough membership functions definition
To overcome the inappropriateness of the standard rough membership functions, we introduce for our purposes a
different membership function. Since firms x U belonging to a class Fijk express the conjunction of i positive, j negative
and k neutral criteria values, a resulting membership function in line with (9) which grades their propensity to became a
defaulting firm (25) can be computed through a proper t-norm like
I
(X |Fijk ) = T1 (1 , . . . , 7 ) = +i j 0(i+j=0) , (35)
with + , and 0 being proportional to the contribution to the default brought by the positive, negative an neutral criteria
expressions, respectively. Such parameters can be estimated through a maximum-likelihood type procedure explained in
the next Section 4.4.
The choice of the multiplicative form for the first part of (35) is due to the assumed conditional exchangeability (28)
among the different attributes and consequent de Finetti representation theorem (Diaconis, 1977; Heath and Sudderth,
1976), so that sufficient information is encompassed in the counting of the positive and negative criteria values. If in some
other application such an assumption were not valid, another choice should be made depending on which is the sufficient
information (Bloch, 1996; Dubois and Prade, 1985). On the contrary, the neutrals El0 are irrelevant for the specification of the
I(i+j=0)
membership, except in the case of observing all neutral values (E10 , . . . , E70 ). For this reason, the last term 0 appears
in (35) to guarantee consistency in such extreme case, so that
(X |F007 ) = P (X ) (36)
holds.
Similarly, to grade the propensity of x U to be a healthy firm, obviously given specific levels of the meta-criterion ijk,
we can use a t-norm of the following kind
I
(X c |Fijk ) = T2 (1 , . . . , 7 ) = +i j 0(i+j=0) (37)
where now + , and 0 are proportional to the contribution to healthiness brought by the positive, negative and neutral
criteria expressions, respectively. Also for these parameters a maximum-likelihood type estimate will be presented in the
next subsection. Note that, with an abuse of notation usually done for fuzzy sets partitions, X c does not represent the
logical contrary of X , but just the proposition the firm X is judged a healthy one. Hence we do not have the constraint
(X |Fijk ) + (X c |Fijk ) = 1 (for more details refer e.g. to Coletti and Scozzafava, 2006, Remark 2).
Parameters 0 and 0 cannot be completely neglected in the memberships expressions (35) and (37), and they could
bring some trouble when estimated. In fact they are proportional to P (X ) and P (X c ), respectively, and these probabilities
are hardly estimated from the data, e.g. for the specific sample design, or hardly elicited from the DM (recall that X is exactly
the target of the analysis). Anyhow, these parameters have been introduced to deal with the extreme situation i + j = 0.
Hence it is possible to approximate the set of defaulting firms X by the following straightforward comparisons
POS(X ) = {x U |(X |Fijk ) > (X c |Fijk )}, (38)
NEG(X ) = {x U |(X |Fijk ) < (X |Fijk )} c
(39)
and
BND(X ) = {x U |(X |Fijk ) = (X c |Fijk )}, (40)
where the proper memberships expressions (35) and (37) are used whenever i + j = 0. Otherwise, since if i + j = 0 then (35)
and (37) reduce to P (X ) and P (X c ), respectively, a direct comparison avoiding the use of (X |Fijk ) and (X c |Fijk ) is possible
(e.g. in our example the DM, hopefully, judges that P (X ) < P (X c )).
4.4. Memberships estimation
To perform comparisons (38)(40) we have, when i + j = 0, to estimate through the available data the other quantities
+ , , + and . Note that such values should fundamentally express the propensity of firms to be or not to be in default,
given at least one positive or negative criterion value, respectively. Consequently they can be properly computed by
t-conorms of the form
+ = S1 ((X |E1+ ), . . . , (X |E7+ )) P (X |E1+ E7+ )
P (E1+ E7+ |X ), (41)
= S1 ((X |E1 ), . . . , (X |E7 )) P (X |E1 E7 )
P (E1 E7 |X ), (42)
+ = S2 ((X |E1+ ), . . . , (X |E7+ )) P (X |E1+ E7+ )
c c c
P (E1+ E7+ |X c ) (43)

and
= S2 ((X c |E1 ), . . . , (X c |E7 )) P (X c |E1 E7 )
P (E1 E7 |X c ), (44)
where the last proportionalities on right hand sides of (41)(44) derive directly from Bayes-rule
P (X )P (E1 E7 |X )
P (X |E1 E7 ) = , (45)
P (E1 E7 )
with = +, and X = X , X c .
Table 1
Estimates values (46)(49) on the base of the penultimate (left) and ultimate
(right) balance data.
+
0.980 +
0.980

0.875
0.775
+
0.900 +
0.900

0.975
0.900
Fig. 3. Classification of the firms on the base of the penultimate (left) and ultimate (right) balance sheets. POS subsets are identified by positive differences
(X |Fijk ) (X c |Fijk ), NEG by negative ones. No boundary situations appear.
These proportionalities are crucial because they allow direct evaluations through usual estimates of the maximum-
likelihood type
|Fijk X : i 1|
+ =
, (46)
|X |
|Fijk X : j 1|
=
, (47)
|X |
|Fijk X c : i 1|
+ =
(48)
|X c |
and
|Fijk X c : j 1|
=
, (49)
|X c |
that, via Theorem 1 (as done e.g. in Vantaggi (2008)), lead to certainly coherent conditional probability values. Note that
these estimates made by frequencies are computed transversally in the universe U, i.e. throughout the two classes X and
X c . This avoids the problems connected with the scarcity of observations and the lack of independence we encountered in
the evaluations performed locally, i.e. class by class, like for example in the classical rough membership function definition
(8). Note moreover that t-conorms (41)(44) are not the dual of the t-norms (35)(37) through standard negation, but that
they just stemmed by the meaning of the parameters + , , + and .
In our example the value of the estimates computed separately for each balance year are reported in Table 1. With these
estimates at our disposal we can conclude that the approximation induced by (38)(39) can be performed, except from the
extreme case i + j = 0, on the base of the quantity
(X |Fijk ) (X c |Fijk ) = +i j +i j , (50)

which gives the values plotted in Fig. 3. Note that the jumps in the monotonicity paths correspond to passages between
incomparable meta-classes Fijk , so that the figures respect the dominance principle.
Hence the firms belonging to classes whose memberships difference (50) is negative will be assigned to NEG(X ),
henceforth classified as healthy, those with a positive difference assigned to POS(X ), henceforth classified as defaulting,
with the rest assigned to the boundary (but in our example there are no such classes). The assignment performance with
respect to the two different balance sheets are summarized in Table 2. Notice that despite the highly inconsistent initial
decision table, we obtained acceptable results, especially with respect to other approaches. For example, with ordinary
Table 2
Confusion matrices for the classification of the firms on the base of the
penultimate (left) and ultimate (right) balance sheets based on the rough
approximation (38)(40).
D Attribution D Attribution
0 1 0 1
0 28 12 0 26 14
1 18 22 1 10 30
Tot 46 34 Tot 36 44
Attribution error 37.5% Attribution error 30%
Fig. 4. Joint memberships comparisons (50) for equivalence classes Fij on the base of both balance sheets with boundary region highlighted.
linear discriminant analysis we obtained classification errors of 42.5% and 36.3% with respect to the penultimate and last
year balance sheets, respectively; with logistic regression the same errors reduce to 23.8% and 31.3%, respectively. Recall
the aforementioned inappropriateness of such statistical methods in our context. In comparison, the JAMM software (IDSS,
2009), which is designed to perform VC-DRSA at a consistency level of 0.8, returns a classification error of 15% but with
47 (59%) unclassified firms with the penultimate year balance sheets, and a classification error of 11% but with 53 (66%)
unclassified firms with the last year balance sheets.
4.5. A joint classification
A final effort can now be made to join the information arising separately from the two balance sheets. Once the numerical
estimates of the quantities (46)(49) are available, the classification differences (50) can also be performed for not-observed
meta-classes Fijk . Due to the constraint k = 7 i j and to the exclusion of the extreme case i + j = 0, we can skip from
the notation the k value, letting the pairs (i, j) vary from (0, 7) to (7, 0).
Finally, we can join the two classifications by assigning to the positive and negative regions those equivalence classes that
have concordant assignments for both balance sheets, and assign to the boundary set those with discordant classifications.
Applying this method to our data set produces the approximation reported in Fig. 4, where, contrary to the separate analysis,
a boundary region appears and it is reasonably located in the situation with high uncertainty, i.e. when positive and negative
attributes could compensate. However, this compensation is not based on the arity, in fact the meta-class is F32 and not F33 ,
but it is weighted by the memberships (35) and (37). With this modification, assignments of our firms performed in the
two periods (penultimate and last year) produce the results summarized in Table 3. Because of the modest cardinality of
the boundary class F32 , especially for the last balance information, the performances are almost the same as the previous
approximation. Different results appear if we go back to analyze the behavior of each single firm. In fact, for each x U we
can compare the concordance of the potential attribution based on the two separate balance sheets, definitely assigning x to
POS(X ) if it belongs to concordant classes with positive memberships comparisons (50), to NEG(X ) if the two period classes
are concordant but with negative memberships comparisons (50), and to BND(X ) in case of passage through discordant
classes. This corresponds to considering as equivalence classes those induced by the pairs (ijk) , (ijk) of meta-criteria

related to the two periods. Assignment performance is summarized in Table 4. With respect the previous approximation,
a larger number of non assigned (boundary) firms appears, but the increase of the uncertainty is balanced by a better
performance (true negative ratio) in case of assigned firms. In fact, with respect to the ultimate balance attribution (Table 3-
right), the total attribution error remains the same, while the proportion of healthy firms classified correctly passes from
25/39 = 64.1% to 24/31 = 77.4%.
Table 3
Confusion matrices for the classification of the firms on the base of
the penultimate (left) and ultimate (right) balance sheets based on the
concordance of attributions for the two periods (Fig. 4).
D Attribution D Attribution
0 1 None 0 1 None
0 28 9 3 0 25 14 1
1 18 18 4 1 10 30 0
Tot 46 27 7 Tot 35 44 1
Attribution error 37% Attribution error 30.4%
Boundary 8.8% Boundary 1.3%
Table 4
Confusion matrix for the classification of the firms on the base of the joint
balance sheets based on the concordance of attributions for the two periods
(Fig. 4).
D Attribution
0 1 None
0 24 7 9
1 10 15 15
Tot 34 22 24
Attribution error 30.4%
Boundary 30%
5. Conclusion
In this paper we have shown, through a prototypical credit risk analysis example, how to profitably integrate rough set (in
its more general formulation), fuzzy set and probability theories. This was possible thanks to the high level of generality of
the conditional coherent probability assessments framework (Coletti and Scozzafava, 2004, 2006) that allows a unification of
different approaches to uncertainty and in particular links the probabilistic and the semantic interpretation of membership
functions (9). What is also important is that with such an approach we can soundly incorporate expert (DM) opinions
expressed through (26) in a procedure mainly based on data analysis. This is especially important in the field of credit
risk where decisions about credit applications are aided by automatic procedures but still require judgment by human
beings, even today. In particular, such integration allowed us to transform simple attributes into criteria, to coarsen the
corresponding partition into another one induced by a meta-criterion and to properly elicit membership functions able to
grade the likelihood of being in default.
Our approach, by a gradual coarsening of the information, improves the classification capabilities of the standard rough
set theory in the credit scoring analysis and we are confident that the hybridization we have proposed could be adopted in
other areas of application as well.
References
Balcaen, S., Ooghe, H., 2006. 35 years of studies on business failure an overview. The British Accounting Review 38, 6393.
Baszczyski, J., Greco, S., Sowiski, R., Szelag, M., 2009. Monotonic variable consistency rough set approaches. International Journal of Approximate
Reasoning 50, 979999.
Bloch, I., 1996. Information combination operators for data fusion: A comparative review with classification. IEEE Transactions on Systems, Man and
Cybernetics Part A: Systems and Humans 26 (1), 5267.
Coletti, G., 1994. Coherent numerical and ordinal probabilistic assessments. IEEE Transactions on Systems, Man, and Cybernetics 24, 17471754.
Coletti, G., Gilio, A., Scozzafava, R., 1993. Comparative probability for conditional events: A new look through coherence. Theory and Decision 35 (3),
237258.
Coletti, G., Scozzafava, R., 1996. Characterization of coherent conditional probabilities as a tool for their assessment and extension. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems 4, 103127.
Coletti, G., Scozzafava, R., 1999. Conditioning and inference in intelligent systems. Soft Computing 3, 118130.
Coletti, G., Scozzafava, R., 2002. Probabilistic Logic in a Coherent Setting. In: Trends in Logic, vol. 15. Kluwer.
Coletti, G., Scozzafava, R., 2004. Conditional probability, fuzzy sets, and possibility: A unifying view. Fuzzy Sets and Systems 144, 227249.
Coletti, G., Scozzafava, R., 2006. Conditional probability and fuzzy information. Computational Statistics and Data Analysis 51 (1), 115132.
Coletti, G., Vantaggi, B., 2006. Representability of ordinal relations on a set of conditional events. Theory and Decision 60, 137174.
de Finetti, B., 1949. Sullimpostazione assiomatica del calcolo delle probabilit. Annali Triestini dellUniversit di Trieste 19, 355. Engl. transl. in: (Chapter
5) of Probability, Induction, Statistics, Wiley, London, 1972.
de Finetti, B., 19741975. Theory of Probability, vol. 2. Wiley, New York, A.F.M. Smith and A. Machi (trs.).
Dembczynski, K., Greco, S., Kotlowski, W., Sowinski, R., 2007. Statistical model for rough set approach to multicriteria classification. In: Kok, J.N.,
Koronacki, J., de Mantaras, R.L., Matwin, S., Mladenic, D., Skowron, A. (Eds.), Knowledge Discovery in Databases: PKDD 2007, Warsaw, Poland. In: Lecture
Notes in Computer Science, vol. 4702. pp. 164175.
Diaconis, P., 1977. Finite forms of de Finettis theorem on exchangeability. Synthese 36, 271281.
Doumpos, M., Zopounidis, C., 2002. Multicriteria Decision Aid Classification Methods. Kluwer Academic Publishers, Dordrecht.
Dubins, L.E., 1975. Finitely additive conditional probabilities, conglomerability and disintegration. The Annals of Probability 3, 8999.
Dubois, D., Prade, H., 1985. A review of fuzzy set aggregation connectives. Information Sciences 36, 85121.
Dubois, D., Prade, H., 1992. Putting rough sets and fuzzy sets together. In: Sowinski, R. (Ed.), Intelligent Decision Support: Handbook of Applications and
Advances of the Sets Theory. Kluwer, Dordrecht, pp. 203232.
Greco, S., Inuiguchi, M., Sowinski, R., 2004. A new proposal for rough fuzzy approximations and decision rule representation. In: Dubois, D., Grzymala-
Busse, J., Inuiguchi, M., Polkowski, L. (Eds.), Transaction on Rough Sets II: Rough Sets and Fuzzy Sets. In: LNCS, vol. 3135. Springer-Verlag, Berlin,
pp. 156164.
Greco, S., Inuiguchi, M., Sowinski, R., 2006. Fuzzy rough sets and multiple-premise gradual decision rules. International Journal of Approximate Reasoning
41 (2), 179211.
Greco, S., Matarazzo, B., Sowinski, R., 1998. A new rough set approach to evaluation of bankruptcy risk. In: Zopounidis, C. (Ed.), Operational Tools in the
Management of Financial Risks. Kluwer, Dordrecht, pp. 121136.
Greco, S., Matarazzo, B., Sowinski, R., 1999a. Rough approximation of a preference relation by dominance relations. European Journal of Operational
Research 117, 6383.
Greco, S., Matarazzo, B., Sowinski, R., 1999b. The use of rough sets and fuzzy sets in MCDM. In: Gal, T., Stewart, T., Hanne, T. (Eds.), Advances in Multiple
Criteria Decision Making. Kluwer Academic Publishers, Boston, pp. 14.114.59 (Chapter 14).
Greco, S., Matarazzo, B., Sowinski, R., 2005. Rough membership and Bayesian confirmation measures for parameterized rough sets. In: Rough Sets, Fuzzy
Sets, Data Mining, and Granular Computing, Proceedings of RSFDGrC-05. In: LNAI, vol. 3641. pp. 314324.
Greco, S., Matarazzo, B., Sowinski, R., 2007. Dominance-based rough set approach as a proper way of handling graduality in rough set theory.
In: Transactions on Rough Sets VII. In: Marek, Victor W., Orowska, Ewa, Sowinski, Roman, Ziarko, Wojciech (Eds.), Lecture Notes in Computer Science,
vol. 4400. Springer-Verlag, Berlin, Heidelberg, pp. 3652.
Greco, S., Matarazzo, B., Sowinski, R., 2008. Parameterized rough set model using rough membership and Bayesian confirmation measures. International
Journal of Approximate Reasoning 49 (2), 285300.
Greco, S., Matarazzo, B., Sowinski, R., Stefanowski, J., 2001. Variable consistency model of dominance-based rough sets approach. In: RSCTC 2000. In: LNAI,
vol. 2005. pp. 170181.
Heath, D., Sudderth, W., 1976. de Finettis theorem on exchangeable variables. The American Statistician 30 (4), 188189.
IDSS: Laboratory of Intelligent Decision Support Systems of the Poznan University of Technology. JAMM Software, 2009.
http://idss.cs.put.poznan.pl/site/jamm.html.
Klement, E.P., Mesiar, R., Pap, E., 2000. Triangular Norms. Kluwer, Dordrecht.
Kotlowski, W., Dembczynski, K., Greco, S., Sowinski, R., 2008. Stochastic dominance-based rough set model for ordinal classification. Information Science
178 (21), 40194037.
Krauss, P.H., 1968. Representation of conditional probability measures on Boolean algebras. Acta Mathematica AcademiScientiarum Hungaric 19,
229241.
Lad, F., 1996. Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction. John Wiley, New York.
Lin, Rong-Ho, Wang, Yao-Tien, Wu, Chih-Hung, Chuang, Chun-Ling, 2009. Developing a business failure prediction model via RST, GRA and CBR. Expert
Systems with Applications 36, 15931600.
Lyra, M., Paha, J., Paterlini, S., Winker, P., 2010. Optimization heuristics for determining internal rating grading scales. Computational Statistics and Data
Analysis 54 (11), 26932706.
Pawlak, Z., 1982. Rough sets. International Journal of Computing and Information Sciences 11 (5), 341356.
Pawlak, Z., Skowron, A., 1994. Rough membership functions. In: Yager, R.R., Fedrizzi, M., Kacprzyk, J. (Eds.), Advances in the DempsterShafer Theory of
Evidence. John Wiley and Sons, New York, pp. 251271.
Pawlak, Z., Wong, S.K.M., Ziarko, W., 1988. Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies 29, 8195.
Rnyi, A., 1955. On a new axiomatic theory of probability. Acta Mathematica AcademiScientiarum Hungaric 6, 285335.
Slezak, D., Ziarko, W., 2002. Bayesian rough set model. In: Proc. of the International Workshop on Foundation of Data Mining and Discovery, FDM-2002,
pp. 131135.
Sowinski, R., Vanderpooten, D., 2000. A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data
Engineering 12 (2), 331336.
Sowinski, R., Zopounidis, C., 1995. Application of the rough set approach to evaluation of bankruptcy risk. International Journal of Intelligent Systems in
Accounting, Finance & Management 4 (1), 2741.
Sowinski, R., Zopounidis, C., Dimitras, A.I., 1997. Prediction of company acquisition in Greece by means of the rough set approach. European Journal of
Operational Research 100, 115.
Tay, F.E.H., Shen, L., 2002. Economic and financial prediction using rough sets model. European Journal of Operational Research 141, 641659.
Tentori, K., Crupi, V., Bonini, N., Osherson, D., 2007. Comparison of confirmation measures. Cognition 103 (1), 107119.
Vantaggi, B., 2008. Statistical matching of multiple sources: A look through coherence. International Journal of Approximate Reasoning 49 (3), 701711.
Wong, S.K.M., Ziarko, W., 1987. Comparison of the probabilistic approximate classification and the fuzzy set model. Fuzzy Sets and Systems 21, 357362.
Yao, Y.Y., 2008. Probabilistic rough set approximations. International Journal of Approximate Reasoning 49 (2), 255271.
Yao, Y.Y., Wong, S.K.M., 1992. A decision theoretic framework for approximating concepts. International Journal of Man-Machine Studies 37, 793809.
Yao, Y.Y., Wong, S.K.M., Lingras, P., 1990. A decision-theoretic rough set model. In: Ras, Z.W., Zemankova, M., Emrich, M.L. (Eds.), Methodologies for
Intelligent Systems, vol. 5. North-Holland, New York, pp. 1724.
Yasdi, R., 1995. Combining rough sets learningand neural learning-method to deal with uncertain and imprecise information. Neurocomputing 7, 6184.
Ziarko, W., 1993. Variable precision rough sets model. Journal of Computer and Systems Sciences 46 (1), 3959.
Ziarko, W., 2005. Probabilistic rough sets. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Proceedings of RSFDGrC-05. In: LNAI, vol. 3641.
pp. 283293.
Ziarko, W., 2008. Probabilistic approach to rough sets. International Journal of Approximate Reasoning 49 (2), 272284.

2012-CaBa - Credit Scoring Analysis Using A Fuzzy Probabilistic Rough Set Model PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2012-CaBa - Credit Scoring Analysis Using A Fuzzy Probabilistic Rough Set Model PDF

Uploaded by

Copyright:

Available Formats

Computational Statistics and Data Analysis 56 (2012) 981994

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis

Credit scoring analysis using a fuzzy probabilistic rough set model

article info abstract

2. Evolution of rough set classification methods

2.1. Rough set theory

NEG(X ) = POS(X c ) = (apr(X ))c (5)

2.2. Rough membership function

apr1 (X ) = {x U |X (x) > 1 }, (15)

apr1P (X ) (X ) = {x U |X (x) > 1 P (X )}, (17)

2.3. A further contamination

apr,b (X ) = {x U |X (x) > and X (x) > b}. (20)

2.4. Main drawbacks and direct remedies

<C = {(y, z ) U U : f (y; Cl ) f (z ; Cl ) Cl C }; (21)

Definition 1. Given a Boolean algebra B , a full conditional probability on B B 0 (B 0 = B \ {}) is a function P : B B 0

Note that, whenever (i)(ii) are satisfied, condition (iii) is equivalent to

moreover for > and P (r ) = 0 if r .

preference order formalized through the following binary relation

4. Classification through comparison of coherent memberships functions

4.1. A prototypical credit risk analysis example

4.2. Conversion of attributes into criteria

El+ if P (X |El+ ) > P (X )

Fig. 1. Criteria about a firms propensity to default w.r.t. different attributes.

= E11 E22 E77

Fi j k 4ijk Fi j k i i and j j . (33)

4.3. New rough membership functions definition

4.4. Memberships estimation

P (E1+ E7+ |X c ) (43)

(X |Fijk ) (X c |Fijk ) = +i j +i j , (50)

4.5. A joint classification

You might also like