Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics

Department of Applied Econometrics Working Papers
Warsaw School of Economics Al. Niepodleglosci 164 02-554 Warszawa, Poland

Working Paper No. 10-06

Application scoring: logit model approach and the divergence method compared

Izabela Majer
Warsaw School of Economics

This paper is available at the Warsaw School of Economics Department of Applied Econometrics website at: http://www.sgh.waw.pl/instytuty/zes/wp/

Application scoring: logit model approach and the divergence method compared
Izabela Majer Warsaw School of Economics im25961@sgh.waw.pl

Abstract

This study presents the example of application scoring. Two methods are considered: logit model approach and the divergence method. The practical example uses contemporary data on loan applications from the Polish bank. The constructed scoring models are validated on the hold-out sample. Both types of models seem to be acceptable and have high discriminatory power. The prediction accuracy measures indicate that the scoring based on divergence method is better than the one founded on logit model approach.

Keywords: credit scoring, logit model, divergence method, credit risk, classification JEL codes: G21, C10

2

Out of the variety of methods for scoring models. Logit model is a widely used statistical parametric model for modelling binary dependent variable and is supported by statistical tests verifying estimated parameters. The aim of the study is to show how the scoring model can be constructed. In section 7 the resulted models are evaluated in terms of their predictive power. Application scoring models take into account all relevant information about applicants that is known at the application date and reported in an application form (i. The divergence method is a kind of optimisation method. An important aspect of scoring model building is to define the dependent variable. The aim of scoring models is to classify applicants into two groups: the ones who will not default and the ones who will default. however. income as well as employment. 3 . defaulted and non-defaulted ones. Sections 5 and 6 present the models constructed with the use of both logit approach and the divergence method. the definition of default may vary between models.e. marital and accommodation status). Section 3 provides detailed data description. not supported by econometric theory and statistical testing. education. Introduction Application scoring models are used by loan institutions to evaluate creditworthiness of potential clients applying for credit product. Theoretical background Preliminary action undertaken in the model building process is to collect the appropriate data set and to divide it into base sample (used for model building) and hold-out sample (used for model validation). Application scoring models are used in retail and small business segments as they enable the automation of creditworthiness evaluation process and help making quick and objective credit decisions. demographic characteristics such as age. In section 4 the dependencies between explanatory variables and their association with the dependent variable are examined.1. In section 2 we present two methods used for building scoring models: logit model approach and the divergence method. Let us denote by Y the dependent dummy variable which equals 0 for non-defaulted applicants and 1 for defaulted ones. and by Y j the value of the variable Y for the j-th applicant. In most cases the dependent variable is a binary one which distinguishes between two groups of applicants. 2. in this study we focus on two methods: logit model approach and the divergence method. Section 8 concludes the report.

These measures of dependencies between variables are described in details for example by Gruszczynski [1999]. The logit model is a widely used statistical parametric model for modelling binary dependent variable. [2003] and Matuszyk [2004]. Yj = ⎨ * ⎩0 dla Y j ≤ 0 The unobservable variable Y * depends on applicants’ characteristics as well as on an error term.The next step is to select a set of predictors which are significantly associated with dependent variable and are possibly free of near multicollinearity. Gourieroux [2000] and Gruszczynski [2002]. This implies that the probabilities of default and non-default are equal respectively to: P(Y j = 1) = FL ( β ' x j ) = (1 + e − β ' x j −1 ) . As to the decision about the modelling approach. Janc and Kraska [2001]. In this study we focus on logit model and divergence method as they represent two different approaches to scoring model building. The scoring methods were reviewed for example by Hand and Henley [1997]. in the following way: Y j* = β ' x j + ε j . The parameters of logit model are usually estimated with the use of maximum likelihood method. Depending on the type of analysed variables we can use: Pearson’s linear correlation coefficients (two quantitative variables). The logit model is described in details for example in Greene [1997]. one can choose among great variety of methods. where β is a parameters’ vector. 4 . or significance tests for difference in means of a given quantitative variable in the population of applicants characterised by different values of dummy variable (a quantitative variable and a dummy one). An overview of scoring methods can be also found in whitepapers of Fair Isaac [2003] and Fractal [2003]. We assume that the distribution of ε j is logistic. Yule’s coefficients of association (two dummy variables). We assume that an unobservable variable Y * determines the value of observable dependent dummy variable Y e.g. x j is a vector of the values of explanatory variables for the jth applicant and ε j is an error term. − β ' x j −1 P (Y j = 0) = 1 − FL ( β ' x j ) = 1 − (1 + e ) . Baesens et al. as follows: * ⎧1 dla Y j > 0 .

The idea is presented in figure 1. There are two models: A and B. µi 0 and µi1 are means of weights of evidence calculated for i-th characteristic 2 for respectively non-defaulted applicants and the defaulted ones. n0 and n1 are total numbers of respectively non-defaulted applicants and defaulted ones. Each attribute is assigned a score equal to a weight of evidence (WOE) calculated according to the following formula: ⎛ nij|0 ⎜ n WOEij = ln⎜ 0 nij|1 ⎜ ⎜ n1 ⎝ ⎞ ⎟ ⎟. each characteristic is defined by the group of attributes. ⎟ ⎟ ⎠ where nij |0 is a number of non-defaulted applicants characterised by j-th attribute of i-th characteristic. model A ensures rejecting 90% of defaulted applicants whereas model B rejects merely 50% of defaulted applicants. In divergence method. DIVi stands for divergence of i-th characteristic. the divergence method represents a kind of optimisation method. However. IVi = ∑ ⎜ − ⎜ n1 ⎟ j ⎝ n0 ⎠ DIVi = ( µ i 0 − µ i1 ) 2 .e.5(σ i20 + σ i2 ) 1 where IVi stands for information value of i-th characteristic. 0. i. The aim of this method is to built a model which will result in score distributions for defaulted and non-defaulted applicants as far apart as possible. The aim is to find a subset of characteristics which will result in the highest value of divergence. That is why model A definitely outperforms model B. For both models the cut-off on 40 results in 20% of non-defaulted applicants who are rejected. nij |1 is a number of defaulted applicants characterised by j-th attribute of i-th characteristic.e. σ 0 and σ 12 are variances of weights of evidence calculated for i-th characteristic for respectively non-defaulted applicants and defaulted ones. 5 . The total scoring for a given applicant is calculated as a sum of weights of evidence assigned to respective attributes of characteristics included in a model. The distance between score distributions is measured by divergence. In this study this is done by exhaustive searching (i. not supported by econometric theory and statistical testing.The second approach. The discriminatory power of particular characteristic is measured and compared with the use of information values (IV) and divergences (DIV) defined as: ⎛ nij|0 nij|1 ⎞ ⎟ ⋅ WOEij . The ratios used in divergence method are described for example by Hand and Adams [2000] and by Janc and Kraska [2001]. we calculated divergences for all possible subsets of characteristics).

Applicants with the score higher than the cut-off will be accepted whereas those with lower score will be rejected. One can distinguish between performance measures depending on a cut-off point and those which depend only on the distribution of scores for defaulted and non-defaulted applicants (e. There is a variety of performance measures that can be used to evaluate the quality of scoring models. 3.g. The applications are for new clients of the bank as well as those who have already used bank’s products. Divergence method – comparison of two models. The measures commonly used for evaluation of scoring models are presented for example by Kraft et al. Gruszczynski [2002] and Wilkie [2004]. Data set consists of 500 credit applications from August 2004 to May 2005. 6 . The information from application forms consists of 21 characteristics of applicants as well as the date of application. The rejected applications are not available. Data description The scoring models presented in this study were built on the basis of data supplied by one of the banks operating in Poland. As soon as a scoring model is built the cut-off value has to be chosen. [2002]. The validation for the hold-out sample is the last step of the analysis.Figure 1. K-S statistic and Gini coefficient). The decision on the cut-off level is critical as this sets the level of risk acceptable by decision maker.

default could have happened before the application date. Other clients are regarded as non-defaults. In case of qualitative variables the number of attributes is given. Remaining 440 applications were used as the base sample for models’ construction. As a result we come up with the sample of 550 applicants covering 50 pairs of applicants with the same profile (the same attributes of all characteristics as well as the same credit performance). Next step was to select the hold-out sample by randomly selecting 20% of applications (110 applications. For some clients credit performance data was also available for months preceding the application date (because they used other products of the bank). The confidentiality of data does not allow for disclosing the names of applicant’s characteristics (from the application form). For example. However. 7 . other are either quantitative or qualitative variables. some simplifying assumptions were unavoidable. The defaulted client is defined as a client who during the time period May 2004 – July 2005 had at least one payment delay of more than 30 days. Because of the very low default rate. e. Data was selected in such a way that in the first month of credit performance period all the clients had no payment delay. Table 1 presents the list of characteristics collected from application forms. During the analysed period some clients used more than one credit product. Due to the fact that the initial data set consisted of a small number of applications. 2005). Some of data collected in application forms are the dates. Number of defaulted clients amounted to 250. The relevant application forms were submitted in August 2004. credit card and mortgage loan. out of which 55 were defaulted clients and 55 non-defaulted ones) to be used for the validation.Data set on the credit performance covers time since May 2004 to July 2005 (snapshot data at the end of each month). Credit performance information is a kind of aggregate referring to all products used by a client. For each month the maximum number of days past due and the total amount owed to the bank is available. if a client had payment delay of 20 days on mortgage and payment delay of 10 days on credit card in our data set there was information about 20 days past due. Therefore the variables used in the models are coded.g. The relevant applications cover time period August 2004 – May 2005 (among them 56 applications were submitted after 1st January. it seemed reasonable to increase the number of applicants by 50 randomly selected clients.

096 -0.125 0. 13 Char. 18 Char.173 -0.112 -0. 7 Char.855 5. Characteristics used in application forms.182 -0.148 -0.137 Chi ^ 2 17. 21 Attributes Attribute 1 Attribute 2 Attribute 1 Attribute 2 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Attribute 1 Variable code X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 Yule -0.140 -0. Associations between variables The association between dummy variables and the variable Y has been verified with the use of Yule’s coefficients of association as well as with the independence test for dummy variables based on Chi-square statistic.827 13. 12 Char. 8 . 17 Char.082 30. 17 Char. 4 Char.404 4.265 -0. 11 Char. each representing one attribute.200 0. Qualitative variables statistically significantly associated with the dependent variable. 10 Char.545 8.Table 1. 14 Char. It should be noted that each qualitative variable has been transformed into the set of binary variables. 5 Char. 16 Source: Own calculations. 19 Char. 3 Char. 12 Char. 16 Char.826 8. 4. 2 Char.591 14. 11 Char.051 4.310 Char. 18 Char. 6 Char. Table 2 presents the variables which are statistically significantly associated with the variable Y .535 -0.105 0. 1 Char. Table 2. Characteristic code Application date Char.088 9. 15 Char.244 5. 15 Char. Characteristic code Char. 20 Char.627 11.199 -0. 21 Description Date Date Date Date Date Continuous variable Continuous variable Discrete variable Discrete variable Continuous variable Continuous variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Qualitative variable Number of attributes 4 2 3 7 8 13 10 10 9 2 Source: Own analysis. 20 Char.483 126. 9 Char.368 6. 8 Char.520 17.111 -0.158 0.

3. Char.85. 9. whereas in case of 4 pairs of variables they are even higher than 0. Next.805 3. Table 3 presents the values of statistic U for all analysed quantitative variables. 8. Char. 9 . 5 Char. the correlation coefficients are higher than 0. 6. 8. The correlation coefficients which statistically significantly differ from zero are marked with blue background.607 0. 9.483 0. 10 X22 Char. Char. 2 X18 Char. 1 X20 Char.326 4. 6. while the ones which indicate a kind of dangerous collinearity are marked with bold lettering. all the continuous and discrete variables have been also transformed into dummy variables. 6. 10 Variables insignificantly associated with Y Char. Codes of characteristics used to create the variable Variable code U statistic -7. 10 X23 Char.605 -5. The test is based on normally distributed statistic U.096 -2.004 Variables significantly associated with Y X16 Application date.207 0. Char. 3 Char. 8 X19 Application date. The collinearity of explanatory variables was analysed only for the variables which were significantly associated with Y. 9 X21 Char.950 -5.Table 3. 8 X25 Char.206 -3. Char. 4 X17 Application date. 7 X24 Char. Char.279 -4. Char.789 0.957 -6.652 -4. The associations between the newly created dummy variables and the dependent variable (Yule’s coefficients and Chi-square statistics) are presented in Table 4. 10 - Source: Own calculations. 6 Application date. Char. For each pair of quantitative variables we calculated the value of Pearson’s linear correlation coefficient (see Table 5). Char. In case of quantitative variables (continuous or discrete ones) the verification of association with variable Y was based on significance tests for difference in means of a given variable in the population of defaulted and non-defaulted applicants. Char. Char. 9. Char. For each such variable the set of binary ones has been constructed. Analysis of association between quantitative variables and the dependent variable. In case of 12 pairs of quantitative variables.

570 9.113 -0. Char.417 0.366 12.178 -0.074 0. Char.902 -0.173 13.055 0.542 Application date.119 6.104 0.170 0. Char.774 7.175 0.129 -0.271 1 X24 -0. 1 Application date.040 0.560 -0.151 1 X18 0.712 76. In Table 6 we present the matrix of the coefficients calculated for qualitative variables. 9.099 -0.095 -0.172 -0.148 0.117 -0.286 0.750 -0.774 29.088 0.273 -0.101 11.881 8.087 -0.351 0.170 12.123 6.683 205.954 0.152 1 X19 0. 10 Char.239 0.350 6. 8 Char.126 7.104 0.220 0.429 -0.944 9.344 8. Char. Pearson’s linear correlation coefficients for quantitative explanatory variables.027 0.181 -0.312 1 X23 0.619 54.272 4.118 0.997 -0. 8 Char. 9. 10 Char.421 5.469 6.Table 4.261 0.147 9.653 0.311 42.139 -0.123 0. codes Char. Char.774 10.605 -0.170 0. 8. 6.010 -0.268 0. Char.282 8.112 1 X20 0.315 -0.8658 Char.551 6.221 21.194 16.378 4.160 0.971 0.040 1 X22 -0.098 0.143 8. 3 Char. Char. 9. 8 Interval Interval 1 Interval 2 Interval 3 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 6 Interval 7 Interval 8 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 6 Interval 7 Interval 1 Interval 2 Interval 3 Interval 4 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 1 Interval 2 Interval 1 Interval 2 Variable code X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80 X81 X82 X83 X84 X85 X86 Yule 0.1337 Chi ^ 2 6.164 11.023 -0.233 1 Source: Own calculation.199 0.260 -0.152 -0.051 4. Table 5.761 14.248 26.392 67.975 7. Char. Char.161 -0.124 1 X21 0.175 0.106 13.689 23.748 0.346 4.005 10.604 -0.106 -0.153 -0.159 4.895 0.270 -0. 10 -0.104 0.206 4.026 -0.274 4.007 -0.134 -0.308 0.039 0.199 17.668 Char. The analysis of collinearity for pairs of binary variables (qualitative variables as well as transformed quantitative ones) is based on Yule’s coefficients of association.061 -0. 5 Application date.117 -0.146 0.464 -0.286 36.172 12. Char. Dummy variables statistically significantly associated with the dependent variable created by the transformation of quantitative variables.612 58.152 0.138 0.669 0.365 -0.161 11.925 -0. 10 Char.120 0.096 -0.089 0.287 0.160 -0. 4 Char. 6.123 Chi ^ 2 32.871 0.329 -0.510 -0.128 5. Char.826 0. codes Interval Interval 1 Interval 2 Interval 3 Interval 4 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 1 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 6 Interval 1 Interval 2 Interval 3 Interval 4 Interval 1 Interval 2 Interval 3 Interval 4 Interval 1 Interval 2 Interval 3 Interval 4 Interval 5 Interval 6 Variable code X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 Yule 0.156 10.910 6.021 0.138 -0.356 13.227 0.353 0.480 11. 8.099 -0. 7 Source: Own calculation.370 1 X25 -0.050 12.048 -0. Char.451 47. Char. X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X16 1 X17 0. 2 Application date.112 -0.213 0.156 10. 6 Char.134 -0. 10 .350 0.280 34.462 0. 6.141 -0.103 13.675 -0.327 36.229 0.403 11.

08 -0.07 -0.02 0.04 -0.06 0.09 1 X12 0.10 -0.06 0.322 0.08 1 X8 0.343 0. X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X1 1 X2 -0.02 0.03 -0.876 0. the ones indicating significant association are marked with bold lettering.e.19 0. Yule’s coefficients of association for chosen pairs of transformed quantitative variables and for chosen pairs consisting of an transformed quantitative variable and a qualitative one.381 -0.330 Source: Own calculation.01 -0.01 -0.01 1 X6 0. We have also analysed the relationships between quantitative variables and qualitative ones verifying statistical significance of difference in means of a quantitative variable for the groups of clients with different values of a dummy (qualitative) variable.10 0.04 1 X14 0.09 1 X10 -0.07 -0.06 0.09 1 X13 0.65 1 X3 0.761 0.09 -0.03 0.09 0.32 -0.05 0.10 0.05 1 Source: Own calculation.04 0.04 -0.02 0.02 -0.06 1 X15 0.379 -0.11 -0.07 0.14 0.328 0.3 (Table 7).12 -0. Table 7.02 0.01 0.466 -0.05 0.872 -0.01 0.07 -0.00 0.14 0.28 -0.04 0.03 -0.740 -0. Yule’s coefficients of association for qualitative explanatory variables.00 -0.06 0.07 -0.11 1 X5 0.05 -0.02 1 X9 0.08 -0.06 -0.03 0.06 -0. Variable X5 X50 X5 X59 X5 X67 X6 X56 X6 X57 X6 X58 X6 X83 X9 X58 X50 X51 X50 X52 X50 X53 Yule -0.09 -0.04 -0.06 -0.14 0.345 Variable X50 X55 X50 X67 X56 X57 X56 X74 X56 X83 X57 X58 X57 X83 X58 X83 X58 X84 X27 X28 X74 X83 Yule -0. 11 .04 0.308 -0.09 1 X11 0.00 -0.02 -0.01 0.05 -0.455 -0.09 0.18 0.02 -0.15 -0.08 -0.06 -0. The statistically significantly different from zero coefficients are marked with blue background.Table 6.14 -0.376 -0.00 0.302 0.08 -0.06 0. The Yule’s coefficients of association were also constructed for pairs of transformed quantitative variables and for pairs consisting of a transformed quantitative variable and a qualitative one.00 -0.07 -0.06 -0.22 0.796 -0. difference in means differs statistically from zero) are marked with a blue background.02 -0.03 -0.03 0.436 -0.19 0. only 2 pairs of variables are significantly associated.02 -0.313 -0.27 -0.04 -0.12 -0.10 1 X4 -0.04 -0.03 0.350 0.03 -0. As we can see.436 0.07 -0.304 -0.18 1 X7 0.05 -0.07 -0. Those which are statistically significant (i.10 0. The directions of associations are presented in Table 8.06 0.07 0.08 0.12 -0.03 0.04 0. Due to the high dimension of this matrix we present only the pairs of variables for which the values of Yule’s coefficient are higher than 0.

We also estimated a logit model using only dummy variables (i. For each subset the parameters of logit model have been estimated. transformed quantitative as well as qualitative ones). the results of the analysis of variable selection show that some variables have no influence on Y. Logit model To adequately specify the logit model we targeted the subsets of explanatory variables.Table 8. 12 . 5. Relationships between quantitative variables and qualitative ones. X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 + + + + + + + + + + X17 + + + + + + + + + + + + X18 + + + + + + + + X19 + + + + + + + + + X20 + + + + + + + + + + + + X21 + + + + + + + X22 + + + + + + X23 + + + + + + + X24 + + + + + X25 + + + + + + Source: Own calculation.1 while the likelihood ratio equals to 0. in the one with the highest value of likelihood ratio) are marked with blue background.e. variables finally included in a given model are marked with green background.e. Moreover. To sum up. some of the variables significantly associated with Y cannot be included into the model because of high degree of collinearity with other explanatory variables. Table 9 presents the subsets: explanatory variables originally included in a given model are marked as X. not significantly pair-wise associated / correlated. Initial subset of explanatory variables included only those not significantly associated with each other. variables included in the best model (i. Table 10 presents the estimation results. The variables with low t-ratio have been excluded.68. For the final model the value of loglikelihood is –96.

407 0.39 -186 0.396 -185 0.39 -187 0.388 0. 13 .39 x x x x 23 x x x x x x x x x x x x x x x x x 24 25 26 27 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 28 x 29 x x x x x x x x x x x x x x x x x 30 x x x x x x x x x x x x x x x x x 31 32 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x -181 -181 -181 0. Analysed subsets of explanatory variables and variables finally included in various logit models.395 x x x x x x x x x x x x x x x x x -186 0.406 -181 0.392 -187 0.406 -184 0. Variable X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X24 X25 Logarithm of likelihood function Likelihood ratio index 1 2 X x X X X X X x x x x x x x x x x 3 x x x x x x x x x x x x x x x x x x x x x x 4 x 5 x x x x x x x x x x x x x x x 6 x x x x x x x x x x x x x x x 7 x 8 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 9 10 11 12 13 14 15 16 17 18 19 20 21 22 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x -185 0.392 -186 0.407 Source: Own calculation.39 -186 0.Table 9.395 -185 0.

139 -7.574 0.76.446 -0.Table 10.683 1.130 -2.55). Table 11 presents marginal effects and elasticities of the probability of default calculated for the applicant characterised by average values of all explanatory variables (in case of binary variables we used the probability that a given variable is equal to 1). All estimates have expected signs.442 3.244 -0.490 -0.020 -5.000 0.481 -2.614 1.620 2.105 -2.099 -0.444 -2.662 0.000 0.060 -4.443 1.000 0. W = 90.165 -0.434 -0. 14 .831 -1. Explanatory variable Constant X71 X51 X72 X53 X52 X54 X62 X5 X55 X10 X63 X4 X79 X75 X66 Estimated parameter 7.648 0.830 -4. Estimation results for the logit model.000 0.190 t-ratio 5.069 Source: Own calculation.001 0.141 0.003 0.241 3.388 Elasticity 0.800 -3.006 0.135 -3.232 -0.000 0.553 0.280 2.179 -0.308 -0.145 0.379 0.688 0.039 -0.465 -0.742 0.698 1.697 -0.693 -0. Table 11.000 0.607 -0.488 -0.486 0. Marginal effects and elasticities of the probability of default for the logit model.826 1.750 -2.890 -3.001 0.333 -4.000 0.610 -4.391 0.251 -3.519 1.073 -0. Explanatory variable X4 X5 X10 X51 X52 X53 X54 X55 X62 X63 X66 X71 X72 X75 X79 Marginal effect 0.200 -3.033 0.487 -0.064 -0.090 prob 0. Element in the i-th row and j-th column is the ratio of the estimated parameter for variable in ith row head to the estimated parameter for variable in j-th column head.000 0.489 Standard error 1.703 -2.410 -3.390 -2.960 2.401 0.597 -0.085 -0.570 -5.003 0.037 Source: Own calculation.054 -0.085 -0.960 -2.251 0.045 -1.037 -4. Table 12 presents the comparison of marginal effects for pairs of explanatory variables.827 1. Both likelihood ratio test and Wald test rejected the hypothesis that all estimated parameters except for the constant are zero (LR = 417.475 -4.613 -0.009 0.

Char. 1 Char. 8.63 1.504 0.84 0.197 1.47 0.62 3.97 X52 -1. 6 Char. Char.44 1.35 -0.97 1.723 0. 16 Char.224 1.98 X62 -1.78 0. Char. weights of evidence and contributions for each attribute of each characteristic.93 X75 3. 8 Char. 17 Char.15 Char.64 2.263 0.96 X63 -1. Char. 6.14 1. 10 Application date.15 -3.204 0.499 0.581 0.47 2.36 1.79 2.49 3.357 3.103 0.76 0.35 -0. 10 Application date.05 4. 6. Char.45 1.098 0.39 1.602 0.60 X72 -1.63 X55 -1.94 0. 4 Char.5 (according to the standard prediction rule).62 1.10 1.91 1.288 0.364 0.08 1.Table 12.65 Source: Own calculation. 9.26 -0.57 1.71 2.721 0.75 1. 9.03 -2.667 1. 9 Char.02 -2.02 -1.59 0.48 0.93 -1. 21 Char.156 0.05 2.05 1.54 1.92 0.11 1.093 0.255 0.27 2.119 0.049 Divergence 1.371 0. 7 Char.82 0.26 2. 11 Char.16 1.482 0. Char. Char.428 0.98 X71 -0.607 1. 12 Char.07 0.79 0.55 1.55 3.13 -1.37 1. 2 Char.26 3.356 0.77 -2.39 X51 -1. Due to the fact that the model was estimated on the balanced sample there is no need to adjust the constant and cut-off can be set on 0. 8 Char. 3 Char.53 0.06 -0.51 1.085 0. 20 Char.424 0.47 -4. X4 X5 X10 X51 X52 X53 X54 X55 X62 X63 X66 X71 X72 X75 X5 -0.49 1.86 1.54 0.82 0. Characteristic Char. Table 13.00 2.718 0.10 -1.680 0.102 0. 9.462 2. 10 Application date. 6.176 1.00 X53 -0.48 X54 -0.90 0.44 1.42 2. 8.132 0.111 0. 8 Char.74 -1.14 -6.086 0. 19 Source: Own calculation. Char.09 1.07 -3.374 0.14 -0. Information values and divergences. Char.11 -2.25 -3. Divergence method First step in scorecard building process was to calculate information values.40 2.484 0.56 0.43 1.78 1. Char.10 1.32 1.93 1. Table 13 presents information values and divergences for all analysed characteristics.89 -1. 6.39 2.58 1.21 -2. 10 Application date.118 0.40 1.181 0.20 -1.84 1.56 -4. Char. Comparison of marginal effects for pairs of explanatory variables in the logit model.049 15 .422 2.94 X79 -2.90 0.179 0. Char. Information value 4.24 1.205 0.301 0.65 -0.96 X66 -1.156 0. 5 Char.98 1.81 X10 1.

03. finally we set the cut-off on 42.e. For each such subset the combination of characteristics with the highest divergence value has been chosen. Table 14 presents the selected subsets of characteristics and the characteristics composing the most predictive combinations (i. So. the number of scores for which the Mahalanobis distances between mean score for defaulted clients and mean score for non-defaulted ones amounted to 40. Table 15 presents the scores associated with attributes of characteristics included in the model – after the appropriate scaling to ensure a total scoring of any applicant is not lower than 0 and not higher than 100.The divergence method amounts in fact to finding the combination of characteristics giving the highest divergence for the model as a whole. with the highest value of divergence) are marked with the blue background. Setting the cut-off point was the final step. The characteristics primarily chosen to the model are marked with X. Those finally included are marked with green background. On the other hand. The variables included in the best model (i.66. The criterion applied to the cut-off choice was minimization of the percentage of incorrectly classified defaulted clients (misclassification matrices are presented in Table 16). 24 subsets of variables were found as not collinear. The weighted average of mean score for defaulted clients and mean score for non-defaulted ones was equal 42.03. the ones with the highest value of divergence among all the combinations of characteristics of a given subset).e. 16 .

94 5. Char. Subsets of characteristics and characteristics finally included in alternative models – divergence method. Char. 1 X X X X X X X X X X X X Application date.35 3. Char. 10 X X X X X X Char. 7 X X X X Char.43 5. 6. 11 Divergence 3. 12 X X X X X X X X X X X X X X X X X X X X X X X X Char. Char. Char. Characteristic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 X X X X X X Char.94 5. 4 X X X X X X X X X X X X X X X X X X X X X X X X Char.41 6. 16 X X X X X X X X X X X X X X X X X X X X X X X X Application date. 8 X X X X X X X X X X X X X X X X Char. Char.22 7.30 7.84 5. 8. Char. 18 X X X X X X X X X X X X Application date.32 4.09 5. Char.43 5.09 5. 10 X X X X X X Char.91 4. 5 X X X X Char.Table 14. 9. 6.98 4. Char. 9. 9 X X X X X X X X X X Char. 8 X X X X X X X X Char. 21 X X X X X X X X X X X X X X X X X X X X X X X X Char. 6. Char.43 7. 10 X X X X Char. Char.14 6.50 Source: Own calculation. 9. 15 X X X X X X X X X X X X X X X X X X X X X X X X Char.04 6. 19 X X X X X X X X X X X X X X X X X X X X X X X X Char. 17 .50 4.91 4.98 4.41 3. 2 X X X X X X X X X X X X X X X X X X X X X X X X Char. 20 X X X X X X X X X X X X X X X X X X X X X X X X Char.35 3. 10 X X X X X X X X X X X X X X X X X X X X X X X X Application date. Char. 3 X X X X X X X X X X X X X X X X X X X X X X X X Char. 17 X X X X X X X X X X X X X X X X X X X X X X X X Char. 8. 6 X X X X X X Char.01 4.

21 Char.72 5. Char 2 Char.28 4.25 3.78 4.50 -0.84 5.06 8.34 5.61 4.32 6.36 28.37 2. Scorecard from the divergence method – scores associated with attributes of characteristics.77 -0.98 6.14 3.18 8.10 3.67 3. 19 Source: Own calculation.51 8.45 8.97 3.Table 15. Characteristic Attribute Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 8 Attribute 9 Attribute 10 Attribute 11 Attribute 12 Attribute 13 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 1 Attribute 2 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Score 1.61 7. Misclassification matrices for different cut-off rules.35 0.29 10.80 4.05 -4.27 3.10 6.64 5.83 6.32 12.92 7.62 7.56 26.93 4.37 -1.21 -2.14 2.95 4.43 3.95 0. 11 Char.99 5. Table 16.92 5.31 5. Char 4 Char.73 4. Cut-off: Mahalanobis distance Cut-off: Weighted average Total 220 220 440 ˆ Y=1 ˆ Y=0 23 202 225 ˆ Y=1 ˆ Y=0 18 198 216 Total 220 220 440 Y=1 Y=0 Total 197 18 215 Y=1 Y=0 Total 202 22 224 Source: Own calculation. 18 .89 3.28 5.14 Characteristic Attribute Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 8 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 8 Attribute 9 Attribute 10 Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 8 Attribute 9 Attribute 10 Attribute 11 Attribute 12 Score 0.01 1.35 3.32 -1.46 Application date.70 2.90 4. 10 Application date.70 4. 5 Application date. Char 3 Char.54 11.45 3.08 5. 18 Char.82 9.84 3.75 2.27 3.39 3.75 6.29 2.38 1.85 5.35 4.76 3.09 5.88 4.58 0. 9 Char.79 2.69 12.08 3.

874 7.7% 89.7.9% 92. contrary to the divergence model. Table 18.6% 30.896 9. The measures of predictive accuracy presented in Table 18 clearly indicate the dominance of the divergence approach (please refer to appendix for details on mathematical formulae of presented predictive accuracy measures).1% 104.593 0.5% 85. When comparing both models it is easily visible than in case of the logit model we come up with both type I and type II errors higher than in case of the divergence model. Predictive power measure Total percentage of correctly classified Percentage of correctly classified non-defaulted applicants Percentage of correctly classified defaulted applicants Odds ratio Mean difference K-S statistic Gini coefficient Information value Logit model 84. Table 17.125 3.5% 83. Table 17 presents misclassification matrices for both models. Distributions for defaulted applicants are marked with red lines whereas distributions for nondefaulted applicants are represented by yellow ones.028 6.855 0. Logit model Divergence method Total 55 55 110 ˆ Y=1 ˆ Y=0 9 47 56 ˆ Y=1 ˆ Y=0 6 51 57 Total 55 55 110 Y=1 Y=0 Total 46 8 54 Y=1 Y=0 Total 49 4 53 Source: Own calculation.7. In case of the logit model score intervals for defaulted applicants and non-defaulted ones are almost overlapping.602 Divergence method 90. For the divergence approach the maximum score assigned to a defaulted applicant amounted to 66. Figure 2 presents score distributions for both scoring models constructed in this study.988 0. Measures of predictive accuracy – hold-out sample.551 Source: Own calculation. 19 . Misclassification matrices – hold-out sample.745 0.5 whereas the minimum score assigned to a non-defaulted applicant was equal 30. Predictive accuracy of constructed scoring models The predictive accuracy was verified with the use of the hold-out sample of 110 applicants (55 defaulted clients and 55 non-defaulted ones).

Figure 3 presents Lorenz curves for both models (blue lines) as well as for hypothetical models resulting in perfect (yellow lines) and random (green lines) classifications of applicants. However. The figures show that both models are definitely better than random classification of applicants. 20 . Logit mode l 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 20 40 60 80 100 Dive rge nce mode l 100% 90% 80% 70% 60% 50% 40% 30% 20% Non-defaulted Defaulted Non-defaulted 10% 0% 0 20 40 60 80 100 Defaulted Score s Score s Source: Own calculation. Logit mode l 100% 90% 80% 70% Dive rge nce mode l 100% 90% 80% 70% Defaulted Defaulted 60% 50% 40% 30% 20% 10% 0% 0% 20% Lorenz curve Random classification Perfect classification 60% 50% 40% 30% 20% 10% 0% Lorenz curve Random classification Perfect classification Non-de faulte d 40% 60% 80% 100% 0% 20% Non-de faulte d 40% 60% 80% 100% Source: Own calculation. Score distributions. in this study the divergence approach outperforms the logit model. Figure 3. Lorenz curves.Figure 2.

p... Baesens B. References A Fair Isaac Whitepaper [2003]. [2004]. The Wharton School of the University of Pennsylvania.edu/~bob/research/bankrupt. 523-541. Saunders A.wharton. Scoring logitowy w praktyce bankowej a zagadnienie koincydencji. Bank i Kredyt. 5... Journal of Banking and Finance 28. p.pdf Gourieroux Ch. 57-62. Credit-scoring. Van Gestel T. Henley W. Department of Statistics.P. [2004]. Econometrics of qualitative dependent variables.upenn. Hand D. [1997]. Variable selection in data mining: Building a predictive model for bankruptcy. DeLong G. [1997]. Econometric analysis. Journal of Applied Statistics 27. Kraska M. p. Greene W. Oficyna Wydawnicza SGH. Warszawa. A discussion of data analysis.. Gruszczynski M. p. Vanthienen J.. Journal of the Operational Research Society 54. The validation on the hold-out sample shows that both models are acceptable and have high discriminatory power.com Allen L. Series A .. Nowoczesna metoda oceny zdolnosci kredytowej. Obviously. Statistical classification methods in consumer credit scoring:a review. http://www-stat. Foster D. Defining attributes for scorecard construction in credit scoring.. On the basis of the same data set we constructed two scoring models. 527-540. Janc A. [2000]. one based on the logit approach and the second on the divergence approach. Stepanowa M. Upper Saddle River. 727-752. Stine R. 21 . Prentice Hall.fractalanalytics. Warszawa. Viaene S. However. Gruszczynski M. 3.J. http://www. in this study the model constructed with the use of divergence method outperforms the one resulted from the logit approach. [1999].. Cambridge University Press..E. A Fair Isaac Whitepaper.M. p.com A Fractal Whitepaper [2003]. [2001].. [2003].8. http://www. Modele i prognozy zmiennych jakosciowych w finansach i bankowosci.J.160. Comparative analysis of classification techniques. Suykens J. Hand D. 627-635. 5. Conclusions The study presents an example of application scoring. Biblioteka Menedzera i Bankowca. NJ. [2000]. [2002]. Issues in the credit risk modeling of retail markets. Benchmarking state-of-theart classification algorithms for credit scoring. Journal of the Royal Statistical Society. such results are restricted to the particular data set. prediction and decision techniques.H.A. Adams N.fairisaac.

Zastosowanie w analizach i ocenach klientow bankowych. 22 . Statystyka od podstaw. Biblioteka Menedzera i Bankowca. Oxford University Press 2004. [2004].. [2000].N. Credit scoring.. CeDeWu.Jozwiak J. http://edoc.. [2004]. Warszawa.B. Lasek M. Kroisandt G. Crook J. Efficient frontier cutoff policies in credit portfolios. [2002].. [2004].. [2002]. PWE. Credit risk modeling: theory and applications. Podgorski J. Readings in credit scoring foundations.. Wilkie A. Principles and practice of consumer credit risk management. Muller M. Wells E.de/series/sfb-373-papers/2002-67/PDF/67. The Chartered Institute of Bankers. p. Oxford Princeton University Press. Matuszyk A. Journal of the Operational Research Society 52.. Warszawa. Measures for comparing scoring systems.D. Wynn A. Metoda zarządzania ryzykiem kredytowym.hu-berlin. Olivier R. Data Mining. McNab H.M. [2001]. Edelman D.. in: Thomas L. Assessing the discriminatory power of credit scores. 1025-1033. developments and aims. Canterbury.C. [2000]. Warszawa.pdf Landao D. Kraft H.

where ⎤ ⎡ n (i ) D0 (U ) = ⎢∑ i 2 0 − U 02 ⎥ . respectively. 12 12 Information value: ⎛ n (i ) n (i ) ⎞ ⎛ n (i ) IV = ∑ ⎜ 0 − 1 ⎟ ⋅ ln⎜ 0 ⎜ n1 ⎟ ⎜ n0 i ⎝ n0 ⎠ ⎝ n1 (i ) ⎞ ⎟. i 23 . with the scoring = i. and D(U ) is the scoring standard deviation calculated as: (n D (U ) + n D (U )) D(U ) = 2 0 0 2 1 1 1 2 n0 + n1 . whereas n01 and n10 are the appropriate numbers of incorrectly classified applicants. D(U ) where U o is the mean scoring for non-defaulted applicants. while n0 and n1 are total numbers of non-defaulted and defaulted applicants. n0 ⎦ ⎣ i ⎡ ⎤ n (i ) D1 (U ) = ⎢∑ i 2 1 − U 12 ⎥ . respectively. respectively. Mean difference: MDIF = (U o − U 1 ) . n1 ⎟ ⎠ K-S statistic: KS = max( N1 (i ) − N 0 (i ) ) . n1 ⎣ i ⎦ n0 (i ) and n1 (i ) are the numbers of non-defaulted applicants and defaulted ones. n01 ⋅ n10 where n11 and n00 are the numbers of correctly classified defaulted applicants and non-defaulted ones.Appendix: Measures of predictive accuracy Odds ratio: OR = n11 ⋅ n00 . U 1 is the mean scoring for defaulted applicants.

Lorenz curve 1 0.4 0.4 0.8 0. and the vertical axis represents percentage of defaulted applicants with scoring not higher than i.e.e. i.6 B Actual Lorenz curve A 0. j = 0 n1 Gini coefficient: GINI = A = 2A . and B stands for area between Lorenz curve for perfect discrimination of applicants and Lorenz curve for the actual model. A+ B where A is the area between Lorenz curve for actual model and Lorenz curve for random classification of applicants. The horizontal axis represents percentage of non-defaulted applicants with scoring not higher than i. j = 0 n0 i n1 ( j ) .2 0 0 0.8 1 Random classification Perfect discrimination 24 .6 0. i. N1 (i ) . N 0 (i ) .2 0.where N 0 (i ) = ∑ N1 (i ) = ∑ i n0 ( j ) . Each point of the Lorenz curve is related to a given scoring (i).

Sign up to vote on this title
UsefulNot useful