You are on page 1of 43

Sample Selection Bias in Acquisition Credit Scoring Models:

An Evaluation of the Supplemental-Data Approach ∇

Irina Barakova Dennis Glennon Ajay Palvia

Office of the Comptroller of the Currency

April 2013

Abstract:

Models evaluating credit applicants rely on payment performance data, which is only
available for accepted applicants. This sampling limitation could lead to biased parameter
estimates. We use a nationally representative sample of credit bureau records to examine
sample selection bias in account acquisition scoring models and to evaluate the
effectiveness of the industry practice of using proxy payment performance for rejected
applicants. Our results show that ignoring the rejected applicants significantly affects
forecast accuracy of credit scores while it has little effect on their discriminatory power.
Finally, we document that validating scores only on accepted applicants can be
misleading.

Keywords: credit scoring, sample selection bias, reject inference, validation


The authors thank David Hand, Nicholas Kiefer, Christopher Henderson, OCC and Federal Reserve Bank
of Chicago seminar participants, and an anonymous referee for helpful comments and suggestions. A
previous version of this paper has been also been circulated as “Adjusting for Sample Selection Bias in
Acquisition Credit Scoring Models.” The views expressed in this paper are those of the authors and do not
necessarily reflect the views of the Office of the Comptroller of the Currency (OCC), or the US Treasury
Department. The authors can be reached via email at irina.barakova@occ.treas.gov,
dennis.glennon@occ.treas.gov, and ajay.palvia@occ.treas.gov.

Electronic copy available at: http://ssrn.com/abstract=1722382


1. Introduction

Consumer credit scoring models are a key input in banks’ credit acquisition

strategies and banks increasingly rely on such models to evaluate credit risk when

deciding whether or not to extend credit. Such credit scores can be developed based on

the payment performance history of previous applicants, assuming they are representative

of future applicants. However, banks can only observe the performance of their

customers and not of those applicants that have been rejected. To the extent past rejection

of applicants is not random, such a model development sample would not be

representative of the full pool of applicants and could bias the estimated scoring model.

In turn, such a biased score could lead to a misguided acquisition strategy and future

losses in the bank’s portfolio.

This sample selection problem is well known in the industry and the academic

literature. Different parametric and non-parametric approaches, known as reject

inference, have been proposed in order to account for the missing data. Many of the

proposed reject inference techniques have been examined in theoretical or empirical

studies but are not widely used in practice. Lenders’ most common way to address the

sample selection bias is to obtain proxy credit performance information on the applicants

that they have rejected in the past from their credit bureau records. However, the general

impact of this approach on the performance of credit scoring models has not been

documented, which is a primary objective of this paper.

We exploit a unique proprietary database of credit bureau data with a large

number of credit card applicants over a 10-year period to examine potential sample

selection bias in credit card acquisition scoring models. Extensive out-of-sample

Electronic copy available at: http://ssrn.com/abstract=1722382


delinquency risk rank ordering is conducted and forecast accuracy performance of the

score is evaluated to assess the scope and gains from supplemental bureau data. Our

paper documents three key findings.

First, we find that generic credit bureau scores and other risk factors are

significantly worse for loan applicants who were rejected by a major credit card issuer at

least once before receiving credit and worse still for applicants who did not obtain credit

from a major credit card issuer but were able to obtain credit elsewhere. The results

suggest that accounting for the effect of applicants who were rejected at least once could

improve scoring models. The results also indicate, however, that a large sub-portion of

applicants attempt to obtain credit but do not succeed in doing so, suggesting

considerable limitations in inferring the behavior of rejects through obtaining additional

bureau data.

Next, in terms of score performance impact, we show that the discriminatory

power of the score is not substantially different when acquisition scoring models are built

excluding rejected applicants. The score forecast accuracy, however, does improve when

supplemental data is used to infer the performance of rejected applicants. We also find

that older models perform substantially worse in terms of accuracy regardless of whether

reject inference is used. Thus, our findings indicate reject inference is important for

improving model accuracy but not a substitute for building newer models or building

dynamic models when legacy models have outlived their usefulness.

Finally, our out-of-sample score validation results show that ignoring the rejected

applicants when validating the score could lead to misleading results. Delinquency rates

expected for each score range appear overestimated for the first couple of years after

3
score implementation when the score is validated only on the accepted applicants. This

finding is important since it is not standard industry practice to use bureau supplemental

data for score validation unlike for score development.

The rest of this paper is organized as follows. The next section provides some

background on the issue of sample selection bias in credit scoring models. Section 3

presents the data and methodology. Section 4 discusses our analysis and the final section

presents our conclusions.

2. Literature

A small but growing literature examines the statistical techniques and issues in

credit scoring. Hand and Henley (1997) offer an excellent review of the statistical

techniques used in building credit scoring models and Glennon (1999) outlines the

conceptual framework, current practices, and modeling issues in retail credit scoring.

More recently, Glennon et al. (2008) utilize a proprietary data-set to estimate and validate

credit scoring models for bank card borrowers. The results indicate that current industry

best practices can be effective at ranking borrower risk but may fall short when it comes

to accurately estimating default rates.

Kiefer and Larsen (2006) further discuss the key conceptual and statistical issues

involved in developing sound credit-scoring models. In particular, they consider a central

issue in developing acquisition scoring models – whether the applicant data used to build

such models, given that it excludes rejected applicants, is appropriate. The inability to

build a new model on the entire sample of past applicants need not necessarily lead to

selection bias. As noted by Little and Rubin (1987), the missing data can be categorized

4
in one of three ways: missing completely at random (MCAR), missing at random (MAR),

and missing not at random (MNAR) and in the first two cases account performance does

not depend on the selection process. In contrast, if the data is MNAR, then credit

performance is a function of the selection process and it is in this case that sample

selection bias will occur. Such bias has become well known in the literature and bank

industry and numerous reject inference techniques have been proposed in order to

mitigate its effect (see Joanes (1993), Hand and Henley (1994), Hand (2001a, 2001b),

Ash and Meester (2002), and Greene (2007)).

Among the most commonly used reject inference methods is to obtain external

performance data (from credit bureaus) from applicants that were rejected. Using such

performance data, a lender can seek to infer how rejected applicants would have

performed if accepted; this as often referred to as the supplemental data method. For the

subset of rejected applicants for which performance data is available, the method assumes

that default on some other credit product is equivalent to default on the product of

interest; the new model is then created while factoring in the behavior of these rejects.

The drawbacks of this approach include the cost of obtaining the bureau data and of

assuming default on another product is equivalent to default in the current model. Also,

because the rejects with no performance data are likely to be non-random, this method

will not completely eliminate bias. To date, there is little evidence on the effectiveness of

this method in practice.

A second method also seeks to improve scoring models by using information on

rejected applicants that would not normally be available. But instead of obtaining

performance data on a sub-set of rejected applicants, this method involves randomly

5
accepting a subsample of applicants that would otherwise be rejected. Obtaining

additional performance information through accepting normally rejected accounts, often

referred to as enlargement, is an ideal way to mitigate selection bias; but it is also very

costly since rejected applicants are the most likely to default. Parnitske (2005), using

simulated data, examines this method and finds that this method does reduce selection

bias.

A third class of inference methods are based on extrapolation techniques. One of

the more frequently used types of extrapolation, the re-weighting method, assumes that

the relation between the borrower characteristics and default is identical for accepted and

rejected applicants. 1 The method essentially works by giving greater weight to the lower-

scoring accepted applicants relative to higher scoring ones. Crook and Banasik (2004),

using proprietary data, find that the re-weighting technique does not improve the

performance of the good-bad model. Other extrapolation methods that assign default

status to rejected applicants and in essence “create” data include “re-classification” and

“parceling”. These methods, while easy to implement, tend to be quite arbitrary and often

result in false precision or lead to a distortion of the actual default data (Kiefer and

Larsen (2006)).

The last category of reject inference techniques includes those based on variations

of Heckman’s 2-step sample bias correction procedure where the first step is the selection

equation and the second step is the default model. Greene (1998) examines the impact of

Heckman’s procedure on predicting loan default and finds that coefficients for key

1
Extrapolation techniques such as re-weighting do not reduce selection bias in the sense that if the model
determining good/bad is assumed to be the same for accepts and rejects, there is no bias to correct for. In
these cases, reject inference is really intended to reduce variability and thus make the model more efficient.
For a more complete discussion, see Banasik and Crook (2007).

6
default determinants differ substantially from results obtained without correcting for

selection bias. Banasik and Crook (2007) use a proprietary data set where applicants that

would have normally been rejected were accepted and find that using a bivariate probit

model design to address sample selection bias improves model performance. Banasik et

al (2003) examine a Heckman-type selection procedure (bivariate probit model). Again, a

proprietary data set is used and the authors find that the procedure can improve model

accuracy in some cases but the improvements are small. Finally, Wu and Hand (2007),

using simulated data, find that Heckman’s procedure improves the good-bad model, but

only if the normality assumption holds, when “enough” customers are rejected and

accepted, and when the original accept/reject decision was not primarily determined by

the variables used in the selection equation.

In summary, the extant literature has highlighted sample selection issues in credit

scoring and, in particular, has evaluated several methods to adjust credit scoring models

for sample selection bias arising from building/validating models on previously booked

applicants. The evidence regarding the effectiveness of these models is mixed, however.

Further, inferring the behavior of rejected applicants using supplemental bureau data is

widely used but, to the best of our knowledge, the effectiveness of this approach has not

previously been directly examined in the literature. Our paper, in part, is motivated by

highlighting the effectiveness of this commonly used approach.

3. Sample design

3.1 Data

7
Banks usually purchase credit history information for the accepted and rejected

applicants from one or more of the credit bureaus for the development of acquisition

scores. Following industry practices, we use data from one of the three major credit

bureaus. 2 We have access to a unique nationally representative consumer credit database

(CCDB) with information for a growing sample of 1 to 1.5 million individuals during the

period 1999 to 2009. For each individual, the CCDB contains a credit bureau score and

information on all debt exposures, inquiries, public records, and any other reports to the

credit bureau also known as tradelines. In addition, for each tradeline the CCDB includes

the type, amount, and payment history, which allow for the calculation of the hundreds of

credit risk attributes that credit bureaus provide to banks. Credit standing of the

individual is reported as of the end of June 30th of each year so the data consists of a

series of snapshots of individuals and their risk attributes as of June 30th. One exception

is the payment status for each tradeline, which is available at the actual monthly

frequency at which it is realized and recorded. This allows us to track performance

closely and identify the exact instance of missed payments.

The CCDB sample includes both scoreable and unscoreable individuals. The

unscoreable individuals are those that have been inactive in the past 12 months or with a

very limited credit history such that they cannot be assigned a valid bureau credit score.

They constitute around a quarter of the full credit bureau data. In the CCDB, the

scoreable individuals are over represented relative to the full population of individuals

reported to the credit bureau such that the sample is well suited for score development.

2
The three major credit bureaus are Equifax, Experian, and TransUnion and they maintain credit files for
around 200 million individuals. The credit files contain information from grantors of consumer credit and
collectors of public records. The bureaus use the information to build consumer credit history, consumer
credit score and consumer credit attributes used for evaluating consumer credit quality.

8
For the 1999 sample there are 950,000 scoreable and 50,000 unscoreable individuals with

each scorable account having on average roughly 5 bankcard tradelines.

Going forward the individuals are kept in the panel if they remain or become

scoreable or dropped if they become unscoreable. In addition, another 50,000 uscoreable

individuals are added to the panel each year as well as another random sample of

scoreable individuals averaging six percent of the existing scoreable individuals. The

design of the panel is illustrated by the vertical bars in figure 1. The large portion of

individuals that are kept in the panel from year to year allows us to observe future

performance for credit risk evaluation purposes. The added unscoreable individuals

ensure that individuals with relatively short credit history are represented in the sample.

This is important for our purposes since the individuals with relatively short credit history

are likely to apply for credit. The additional scoreable individuals are selected based on

the distribution over the bureau score for a more population representative sampling. 3

This unbalanced panel sample design allows us to track performance over different

horizons for score development and validation purposes as well as to test the scores on

individuals that have not been part of the development sample.

3.2 Methodology

In this section we describe the selection of the sample that we will use for

evaluating the impact of reject inference on acquisition credit score performance. Bank

credit card acquisition strategies target the pool of potential customers. Each bank might

have different acquisition channels and depending on the channel, the customer pool

3
For more detailed illustration of the sample design please see Glennon et al. (2008).

9
might consist of actual applicants or the population identified for mailing applications. 4

The acceptance rate and thus the need for reject inference can vary across channels.

Since banks need to construct a measure of credit risk based on individual

characteristics known up front, for modeling purposes they take a snap shot (ie, cross

section sample) of applicants as of a particular month or quarter and construct a

performance measure for them. For consistency with this industry practice, we turn the

panel data into a series of snap shot samples used for score development and validation,

which allows us to test the robustness of our results through time. One concern is that the

aging of the population in the sample could have a downward bias on the number of

rejected applicants in our analysis since the individuals with less credit history, such as

the young borrowers, are more likely to be rejected rather than granted a credit card. As a

result, we suspect that any impact of reject inference on score performance that we find in

our data could in fact be larger.

Since we can construct the individual risk attributes only as of June 30th , we take

all credit card applicants from the following quarter as our sample window for identifying

the through-the-door TTD population. Figure 1 illustrates the construction of the sample.

We cannot completely replicate a development sample for any given bank since the bank

reported card inquiries and new opened card accounts in the CCDB are anonymous and

cannot be associated with a particular institution. Instead we take all newly opened

bankcards and inquiries observed during the third quarter of each year as our model

4
The use of the score for identifying a mailing base is also known as pre-approval or front-end evaluation.
Back-end evaluation refers to the use of the score for acceptance/rejection of actual applicants once the
applicants have responded to the pre-approved offers.

10
development sample, which is a random sample of the credit card applicants for the

industry rather than for a particular bank. 5

While for a particular bank, the accepted and rejected applicants are naturally

defined, we need to identify these subsets for the industry given our sample. We define as

booked (BK) the set of all individuals that have applied for a card to a bank during the

third quarter of the year and have been granted credit in each of those instances. The BK

individuals have not been rejected by any institution during the chosen quarter even if

they have been rejected at some other point in the past. The rejected individuals are those

that have made at least one inquiry and have been rejected. They could have received a

card from one bank during the quarter but at least one other bank has rejected them,

which implies that there is less certainty about their desirability as customers and for at

least one institution they would fall in the rejected pool and so we consider them rejected.

Following industry practices, we classify the rejected applicants further depending on our

ability to infer performance. Individuals that manage to open a bank card, after being

rejected by at least one card lender during the same or the following quarter, make our

main reject inference group (RI). This increase in the selection window is done in order

to allow for more of the rejected individuals to have opened cards to proxy performance.

To further expand the possible inference set, it is a common industry practice under the

supplemental approach to consider for individuals that do not manage to open any

bankcard during the extended window, nonbankcard tradelines, such as a retail credit

card, loan, etc. opened during the third or fourth quarters as a proxy for performance

tracking. As part of our analysis, we also mimic this industry practice to evaluate its

5
To the extent very large banks have nationally representative applicant pools, one could argue that for the
largest few bankcard providers our development sample is indeed representative of the industry applicants.

11
performance. In particular, we augment the reject-inference sample with individuals that

were rejected and opened tradelines on other retail credit credits during the observation

window. We label the extended reject inference sample RI*. 6 The set of individuals for

which we cannot make any inference because they do not open any new tradeline during

the selected window are labeled RNI. Note that it is possible to use existing bank cards or

other tradelines to proxy performance but that would not be a directly relevant

comparison because of the impact of seasoning and account management. Similarly, as a

variation of proxy performance information for rejected applicants, banks can use the

credit bureau score at the end of the performance period. However, as a summary

statistic, the score reflects any credit performance deterioration and cannot distinguish

between the newly opened and existing cards performance. 7

The BK applicants combined with the RI and RI* group of applicants make up the

through-the-door (TTD) sample that can be used for score development with reject

inference, which is depicted for each of the year/samples in figure 2. Around half of the

individuals are immediately booked (BK). Another 15 percent are rejected somewhere

but manage to receive a credit card (RI). With the expanded proxy performance (RI*),

another 5 percent of the applicants can be used for score development. However, around

6
To the extent that performance of different credit products are driven by different underlying factors,
assuming such products are similar could lead to biases. Though we accept that such biases might occur,
our main goal is to evaluate whether such models, which are widely used in industry, are nevertheless
helpful in inferring the behavior of rejected applicants. In terms of delinquency rates the RI and RI* groups
are much more similar to each other than to the BK group. Even if the RI* performance measure is not a
true representation of their possible performance on a bank card, a comparison between the RI and RI*
groups across the performance of non bank card accounts shows that the RI* group is riskier.
7
The 2004 data shows that for the BK group there is negative 50% correlation between 90 days plus
delinquency within 24 months and the fresh bureau score at the end of the 24 month period. For the group
for which we have performance on a new account (the RI group) the correlation is negative 40%. Although
the correlation is relatively high for both groups, it shows that the score is influenced by more than the
performance on newly opened accounts.

12
20 percent of our random sample consists of individuals for which no inference can be

made (RNI) from credit bureau data. This implies that the problem of censoring still can

be significantly large even after the augmentation with bureau data.

A commonly used performance measure in the acquisition scoring area is some

delinquency or any major derogatory measure under a fixed horizon, eg 12, 18, 24

months. Often the industry practice in setting the fixed horizon is a matter of selecting

another point in time as of which performance is evaluated. We follow this practice and

assess performance as of the end of the fourth quarter of the following year which results

in a horizon ranging from 12 to 18 months. Thus, there is variation of the performance

horizon with the booked sample having a longer horizon than the rejected sample. Given

that default risk can only increase with the increase of horizon, we may be

underestimating the true default risk of the rejected population. The rejected applicants

do appear with significantly higher delinquency rates than the booked so any possible

underestimation is not preventing our and banks' relative analysis.

Our main performance flag is defined as 90 days or worse delinquency (90+

DPD) although alternative definitions such as 60 days delinquency are used for

robustness. The performance for the BK and RI groups is based on the worst

performance they have for any of the opened credit cards during the window. For the RI*

individuals, performance is based on the worst delinquency of any of their tradelines

opened during the selected window. The RNI individuals do not open any new tradeline

from which we can infer performance.

4. Analysis

13
4.1 Univariate analysis

Our analysis begins with comparing the BK, RI, RI* and RNI groups of credit

card applicants defined in the previous section. In figure 3, which shows the annual bad

rate across these three groups, we see that bad rates are substantially different across the

three groups during all time periods. The booked accounts exhibit less than half the bad

rate of the RI group. Table 1 presents the mean and standard deviations for all the credit

bureau attributes that we use in building scoring models across the nine

development/validation samples. We use tradeline-specific data to construct attributes

that are consistent with the definitions developed by the credit bureau. We select a subset

of these attributes from the five broadly defined categories used in scorecard

development outlined in Fair-Isaac (2006). Each of the categories: payment history,

amount owed, length of credit history, types of credit in use and new credit is presented

in a separate panel. Relative to the BK group and across the rejected groups RI, RI*, and

RNI the mean credit bureau score decreases, the percent of unscoreable individuals

increases, the number of inquiries increases, the instances of non-zero balance are higher

even if the average balance itself is not, also the balance to credit ratio is higher. The RNI

group has on average a shorter credit history and worse payment history with the highest

balance for major derogatory accounts.

The difference in the population characteristics can be seen also in figure 4, which

depicts the full distribution of the generic bureau score across the four groups in the 2003

sample. Clearly the BK group is of higher credit quality than the three rejected groups

14
and the RNI group has the lowest credit quality; the RI and RI* groups appear to have

very similar distributions. 8

Similarly, we compare the distributions of attributes across the four groups. We

evaluate the magnitude of the differences in the distributions across the BK and rejected

data sets (RI, RI*, and RNI) using a non-parametric Kolmogorov-Smirnov (K-S) test,

which measures the level of separation between two distributions. 9 We report in table 2,

for a subset of attributes, the KS statistics for the difference in the distributions from the

various rejected groups (RI, RI*, and RNI) and the distribution of values from the booked

(BK) data set using the 2003 development data. We report the results for a subset of

attributes for which the K-S statistic is more than 20 percent for at least one of

comparisons; that is, the difference between the attribute distribution for the BK data is

large relative to the distribution of values for at least one of the data sets made of rejected

accounts. The shading indicates the level of the K-S statistic for each pair of distributions

with darker corresponding to higher K-S. The first column shows the K-S between the

BK and RI groups which exhibit the least differences. The third column comparing the

BK and RNI distribution is the darkest indicating the greatest amount of difference and

the most characteristics where the groups differ in distribution. Such differences in the

distribution suggest that building a score only on the booked accounts but applying it on

all booked and rejected could be misleading. Although the RNI group is large, the

8
A common practice is for banks to incorporate in their acquisition strategy a cut off based on the generic
bureau score and in this way they eliminate at least half of the RNI group from their customer base. We do
not take this route for our analysis since such a subjective cut off depends on the bank’s risk strategy while
our goal is to document the scope for reject inference.
9
If the distribution of values fro a specific attribute various significantly across the data groups, the KS
statistic will be large (e.g., greater than 20 percent). Conversely, a small KS value (e.g., less than 10
percent) implies the distributions of values for that attributes are similar across the data sets.

15
addition of the RI and RI* groups to the BK for score development has the potential to

address some of the censoring given the differences across these groups.

4.2 Score development

For score development purpose, we split the full sample further in terms of

“clean” individuals, ie those that have not had any major delinquency in the past, “dirty”

individuals, ie those that have been at least 60 days delinquent in the past or currently,

and individuals with “thin” files for which it is hard to determine the credit quality. Banks

do split further the thin into dirty and clean but the size of our sample would make such a

split impractical. Figure 5 shows the subgroups of BK, RI, RI* and RNI for each of the

clean, dirty and thin segments. Consistently across our samples the dirty and thin

segments have much fewer booked accounts (BK) and it is in those segments that we

expect the reject inference to be of most importance.

We build our score as a means to evaluate the risk of 90 days delinquency or

worse in the next 12-18 months. Following industry practice, we estimate a logistic

regression for each of the three segments clean, dirty, and thin using the 90 plus day

delinquency as the dependent variable. Banks may have a more granular segmentation

based on their portfolio but in some way the clean, dirty, thin split is usually part of any

segmentation scheme as there are significant differences in these populations. For

robustness, we also evaluate the score if built on the full population without any

segmentation. Although banks sometimes use some form of expert judgment in selecting

characteristics for scorecard development, the final score is usually based on a statistical

selection of variables often through a stepwise regression. Glennon et al. (2009) provides

16
evidence that indeed this method performs relatively well compared to semi-parametric

and non-parametric alternatives.

We start with more than 80 credit bureau attributes summarized in table 1 but the

final models have in some cases fewer than 10 attributes. Given the high multicollinerity

across the attributes the coefficients may not be informative and although we develop 9

separate scoring models – one for each annual cohort 1999 to 2007, we do not present all

estimated models. Instead we focus on the types of variables and how often they are

selected to be included in a scoring model. Tables 3, 4, and 5 show the number of times a

particular attribute is selected through the stepwise selection process in one of the 9

cohort-based scoring models for the clean, dirty and thin segments respectively. The

tables are sorted from the most to least selected variables, which allows for a comparison

between the scoring models built only on the booked (i.e., BK) accounts versus on the

full TTD population.

Those results show that the scoring models developed on the TTD data

incorporate a wider range, and an alternative mix, of variables relative to the BK-based

models. The TTD scoring models include more often information on the length of credit

history and both the credit line and balances for bankcards and other trades. The bureau

score is selected in all models and since it may be capturing most of the information

about the individual’s credit quality, we also estimate the models without the score which

results in a different mixture of selected variables and estimated coefficients but as

discussed later has little impact on performance. Because of the high correlations across

all the bureau attributes, it is not clear whether any of those differences would lead to

17
significant variation in performance of the scoring models build on the TTD population

versus those developed on the BK sample data only, which is discussed next.

4.3 Score performance

An important part of the evaluation of a scoring model is the out-of-sample, out-

of-time performance of the scores with respect to objectives and purpose of the model.

Acquisition strategies are often based on several models. 10 Banks use risk scores to set

acceptance/rejection cut off values based on risk tolerance, which implies, at least

implicitly, that the expected default rate (or odds) at cut off are consistent with a given

bank’s pricing and risk/return objectives. The score is also used for assigning credit line

levels and terms of the contract. For those later purposes, the discriminatory power of the

model is important because the bank needs to be able to differentiate the potential

customers by their level of risk. Higher lines and better terms or special offers like

balance transfers could be made to the better quality customers in order to maximize

market share and profits.

For many of the decisions in the acquisition area the scores need to also indicate

the account-specific likelihood of becoming seriously delinquent (or to default). The

score associated likelihood of delinquency (or, equivalently, the odds ratio) is used for

setting score cutoff levels for the approval decision but also for account and portfolio

10
A risk score or the likelihood of the account becoming severely delinquent, which is the focus of this
paper, is usually combined with a response likelihood forecast as well as a balance and/or revenue forecast.

18
profitability and loss analysis. 11 Thus, an equally important quality of the credit score is

the accuracy of the actual associated odds.

In evaluating the impact of sample selection on performance, we use the

performance of the reject inference score built on the TTD sample when applied on the

TTD pool as a benchmark. Using the same TTD validation sample we apply the score

built only on the BK subset and compare the results to the benchmark in order to evaluate

the selection bias. We also track the performance of the TTD score and the BK score

when applied only on the BK subset as a validation sample. These last two cases are

actually common model validation practices in the industry because, unlike for model

development purposes, for model validation banks do not typically gather performance

information for the individuals that have been rejected by the model. At the same time,

for acquisition purposes the scoring model is applied on the full pool of potential

customers so proper validation has to be done on at least the above defined TTD

population.

Discriminatory power

For ease of presenting the results, we show first the performance of the scoring

models built on the 2003 data sample. The two scoring models that we test are those built

on the BK and TTD samples labeled BK2003 and TTD2003 respectively. The first panel

of table 6 exhibits the K-S statistic for both in-sample and out-of-time samples. The

results are provided by segment (clean\dirty\thin) and also for the aggregate portfolio

(all). Each column shows a particular combination of score and validation sample:

11
Banks use additional layers of business logic to determine final cutoffs beyond the odds suggested by the
score. Risk management might have different targets and risk tolerance across geographic regions or
products as they monitor the performance of the score within such portfolio segments.

19
TTD2003_TTD, BK2003_TTD, TTD2003_BK, and BK2003_BK. 12 Note that the

discriminatory power is relatively high and decreases very little through time for the

2004-2007 validation samples. This finding is consistent with the results for behavioral

scoring models reported in Glennon et al. (2008) and is generally also assumed in the

industry. The robustness of the models to maintain their ability to discriminate between

good and bad accounts over time justifies the industry practice of relying on models

based on relatively old data.

As expected the thin and dirty segment scores exhibit lower discriminatory power.

The TTD2003_TTD are either the same or a couple of percentage points higher than the

BK2003_TDD but not sufficiently higher to imply that reject inference matters for the

discriminatory power of the scoring model: a result that holds for all years (not reported).

For the thin and dirty segments, the models developed on the TTD data and validated on

the TDD population in 2003 (ie, TTD2003_TTD) exhibit better performance by roughly

5 percentage points when validated on the BK accounts only (ie., TTD2003_BK). While

this is not a large enough difference to warrant concerns about the model’s ability to

differentiate between the default and non-default distributions, it shows that validating

only on booked accounts may tend to overestimate the discriminatory power of the score.

The results are confirmed in the second panel of table 6 where we show the Somer’s D

discriminatory power statistic as an alternative to the K-S. Unlike the K-S statistic,

12
TTD2003_TTD (_BK) refers to applying the scoring model developed on the 2003 TTD sample to the
TTD (BK) data. BK2003_TTD (_BK) refers to applying the scoring model developed on the 2003 booked
accounts only to the TTD (BK) data.

20
Somer’s D captures the impact across the full distribution and would reflect any change

in score discriminatory power. 13

Forecast accuracy

Unlike the impact on the scoring model’s discriminatory power, accounting for

reject inference appears to clearly affect forecasting accuracy. The top panel of table 7

shows the Hosmer-Lemeshaw (H-L) statistic for the same combinations of TTD2003 and

BK2003 scores applied to the TTD and BK samples in and out-of-time. The results show

that the forecast accuracy deteriorates relatively quickly on the out-of-time sample, which

is consistent with industry and academic findings. On the TTD sample the TTD2003

score performs much better than the BK2003. Given the asymptotic distribution of a chi

square with 10 degrees of freedom, the bottom panel provides the probability values for

each of the calculated H-L statistics, which are very low. 14 These probability values are

expected given the large number of observations and the fact that the score does not

account for all the risk drivers and thus cannot fully capture the delinquency risk. In the

industry, often the score odds are further calibrated to historical delinquency rates in

order to achieve better forecast accuracy. For our study, we need to compare the relative

accuracy of the scores with and without reject inference which is independent of any

refinement of the scoring model.

13
Note that Somers’ D is equivalent to the Accuracy Ratio (AR) of the Cumulative Accuracy Profile (CAP)
curve and is also related to the area under the curve (AUC) of the Receiver Operating Characteristic (ROC)
curve (Somers’ D=AR=2AUC-1). The industry often relies on monitoring the change in log odds which
similarly involves the full distribution, which is reflected in the Somer’s D measure.
14
Note that in sample the mean and variance are estimated from the data which implies that the appropriate
chi square distribution for the H-L statistic as a goodness-of-fit test is 2 degrees of freedom less ie χ2(8) as
shown in Hosmer and Lemeshaw (1980). However, out-of-time the test statistic calculated for the 10
deciles is asymptotically distributed as the sum of 10 squared random normal variables which is a χ2(10)
distribution (see Hosmer and Lemeshow 2000).

21
The H-L test statistic treats both under and over-estimation equally and also does

not indicate which scores are less accurate. In order to get a better sense of the forecast

accuracy issue, we summarize model performance in a more granular way in table 8. For

each score decile the corresponding probability values are given as well as the expected

minus actual delinquency rates. These results show that the likelihood of becoming

seriously delinquent is underestimated in the more risky deciles if sample selection bias is

ignored. 15 With time, the underestimation spreads to all deciles and to both scores

BK2003 and TTD2003.

We find also that when testing the scores on the BK subset instead of the realistic

TTD, the accuracy is better. In particular, BK2003_BK exhibits the lowest H-L and the

TTD2003_BK exhibits H-L levels lower than BK2003_TTD. Furthermore, table 8 shows

that using the booked accounts BK for validation of a score that accounts for reject

inference can lead to overestimation of the delinquency rates especially in the high risk

deciles and that may be the case for a few years post redevelopment (TTD2003_BK). In

this case, which is common practice in the industry, the deterioration of score

performance will not be evident right away and the validation results could be misleading

in terms of the actual forecasting accuracy of the score.

The results are similar across segments and across time. Table 9 shows the H-L

test for all scoring models (1999-2007) using both in and out-of-time samples.

Comparing the top two panels for which the validation sample is the TTD, the second

panel which shows the scoring models built only on booked accounts (BK_TTD) has

much higher H-L values than the first (TTD_TTD) – ie, lower accuracy. The bottom two

15
For each decile, the statistic that is evaluated is the part used for the H-L test calculation which has a
Normal asymptotic distribution.

22
panels confirm also our conclusions about the misleading performance when evaluated

only on the booked accounts.

Unlike in our sample where acceptance is exogenous and does not depend on our

score value, banks’ acceptance rules based on an internal score that underestimates

delinquency risk will lead to a definite deterioration in the credit quality of booked

accounts. Our evidence strongly suggests that accounting for reject inference as part of

the model-development process is important when the scores are used for defining

acquisition strategy cut offs based on targeted delinquency rates.

4.4 Robustness

In this section we discuss further the robustness of our main finding that reject

inference impacts the forecast accuracy of scores. The unbalanced panel design results in

a set of residual accounts that, by design, are not included in the development data; these

include accounts that were unscorable at the time of development but have since become

scorable and also the small percentage of new accounts that are added each year. This

sample structure allows us to test the forecast accuracy on individuals that have not been

part of the model-development sample. We report the results for the H-L test across

scoring models and validation years in table 10 using the format from table 9. Note that

the number of observations is significantly lower - roughly 4 thousand accounts as

compared to 100 thousand in the full sample - so the H-L test statistics are lower than in

table 9. The top first panel, as in table 9, has for the most part lower values than the

second panel. Another caveat for this validation sample is that it consists of a relatively

large number of unscoreable individuals, for which the model score is expected not to

23
perform as well. Although these results are not as clear, they confirm our earlier findings

that the TTD scores are relatively more accurate that the BK scores. Similarly, the bottom

two panels show that validating only on booked accounts attributes more forecast

accuracy to the model score than it actually has.

We also check the robustness of the results with respect to the development of the

score by varying the definition of bad performance to 60 rather than 90 days delinquency,

the performance horizon to 6-12 months instead of 12-18, without segmentation, without

the credit bureau variable, and a smaller set of explanatory variables. We also vary the

development sample selection period to be the first two quarters of the year rather than

the last two in case there is any seasonality. However, our credit bureau attributes data is

only as of June 30th which means that attributes that we use for score building already

reflect any inquiries and newly opened credit cards of the individuals in the TTD sample.

As expected, in that case the scoring model has better performance especially in terms of

discriminatory power. The impact of reject inference, although smaller, appears again in

terms of the forecast accuracy of the score.

5. Conclusions

The problem of acquisition credit score development based only on booked

individuals while ignoring the rejected ones has been well studied in the literature. Many

techniques for inferring the behavior of rejected applicants have been proposed but the

empirical evidence supporting such studies is weak or alternatively focuses on methods

not typically used in the industry. In this paper, we use a nationally representative

sample of credit bureau data to evaluate the problem of sample selection in credit card

24
acquisition score development. We evaluate the credit bureau supplemental data method

used by banks for reject inference by examining its impact on score performance.

We find that reject inference has little impact on the discriminatory power of the

score but basing the score only on booked account leads to underestimation of

delinquency risk especially in the high risk deciles. Furthermore, ignoring rejected

individuals when validating the score leads to underestimating the deterioration in score

performance with time. These results suggest that although the data augmentation method

is not a perfect solution to this sample selection issue, it can improve significantly the

forecast accuracy of scores which is important for credit acquisition strategies.

25
References

Ash, D. and S. Meester (2002). “Best Practices in Reject Inferencing”, Conference on


Credit Risk Modeling and Decisioning . Wharton Financial Institutions Center,
Philadelphia.

Banasik, J., J. Crook, and L. C. Thomas (2003). “Sample selection bias in credit
scoring models”, Journal of the Operational Research Society 54, pp 822–832.

Banasik, J. and J. Crook (2007). “Reject Inference, Augmentation, and Sample


Selection”, European Journal of Operational Research, 183, pp 1582-1594.

Crook, J. and J. Banasik (2004). “Does reject inference really improve the performance
of application scoring models?” Journal of Banking and Finance, 28, pp 857-874.

Crook, J., Eldman, D. and L. Thomas (2007). “Developments in Consumer Risk


Assessment”, European Journal of Operational Research, 183, pp 1447-1465.

Fair-Isaac (2006). "Understanding Your Credit Score",


http://www.my.co.com/CreditEducation/WhatsInYourScore.aspx.

Feelders A.J. (2000). “Credit scoring and reject inference with mixture models”,
International Journal of Intelligent systems in Accounting Finance and Management, 8
(4), pp 271-279.

Glennon, D. (1999). “Evaluating Credit Scoring Models: Theory and Practice”, OCC
Working Paper.

Glennon, D., C.E. Larson, N. Kiefer, and H. Choi, H. (2008). "Development and
Validation of Credit-Scoring Models", Journal of Credit Risk, 4, pp 1-61.

Greene, W. (1998). “Sample selection in credit-scoring models”, Japan and the World
Economy, 10, pp 299-316.

Greene, W. (2007). “A Statistical Model for Credit Scoring,” in Credit Risk: Quantitative
Methods and Analysis, Hensher, D. and S. Jones, eds., Cambridge University Press.

Hand D. J. (2001a). “Reject inference in credit operations: theory and methods” in The
Handbook of Credit Scoring, Glenlake Publishing Company, pp 225-240.

Hand D. J. (2001b). “Modeling consumer credit risk”, IMA Journal of Management


Mathematics, 12 (2), pp 139-155.

Hand D.J. and W.E. Henley (1994). “Can reject inference ever work?”, IMA Journal of
Mathematics Applied in Business and Industry, 5 (1), pp 45-55.

26
Hand, D.J. and W.E. Henley (1997). “Statistical classification methods in consumer
credit scoring: a review”, Journal of the Royal Statistical Society A, 160, pp 523-541.

Hosmer, D. W. and S. Lemeshow (1980). "A goodness-of-for test for the multiple logistic
regression", Communications in Statistics, A10, pp 1043-1069.

Hosmer, D. W. and S. Lemeshow (2000). Applied Logistic Regression -2nd ed. Wiley,
New York.

Joanes D.N. (1993). “Reject inference applied to logistic regression for credit scoring”,
IMA Journal of Mathematics Applied in Business and Industry, 5 (1), pp 35-43.

Kiefer, N. M. and C.E.Larson (2006). “Specification and Informational Issues in Credit


Scoring”, International Journal of Statistics and Management Systems, 1, pp 152-178.

Little, R.J.A. and D.B. Rubin (1987). Statistical Analysis with Missing Data. New York:
John Wiley.

Parnitzke, T (2005). “Credit Scoring and the sample selection bias”, Institute of Insurance
Economics, Working Paper.

Wu, I. and D. Hand (2007). “Handling selection bias when choosing actions in retail
credit applications”, European Journal of Operational Research, 183, pp 156-1568.

27
Table 1. Average mean and standard deviation by type of variable following Fair Isaac
categories across the 9 samples used for developing and validating the score (1999-2007).
Invalid extreme values are set to missing and they represent less than half of one percent
of the TTD sample.

Credit amount
BK RI RI* RNI

BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD MEAN STD
# OF OPEN AUTO LOAN TRADES BAL > 0 0.4 0.7 0.4 0.7 0.5 0.7 0.4 0.6
AGG BAL FOR OPEN AUTO LOAN TRADES 5874 10783 5746 10490 6847 11833 4821 10075
8605 14451 8264 14039 9774 15740 6921 13388
AGG CREDIT FOR OPEN AUTO LOAN TRADES
# OF BANKCARD TRADES BAL > 0 2.0 1.8 2.3 2.0 2.4 2.2 2.3 2.4
AGG BAL FOR OPEN BANKCARD TRADES 5486 10055 5395 10150 6621 12560 5198 11970
26948 32045 22246 30507 19757 29295 12352 24524
AGG CREDIT FOR OPEN BANKCARD TRADES
AGG BAL TO CREDIT RATIO FOR OPEN
28 465 36 623 41 39 58 2242
BANKCARD TRADES
# OF OPEN HOME EQUITY TRADES BAL > 0 0.2 0.4 0.1 0.4 0.1 0.4 0.1 0.3
7436 29329 6092 25724 6926 29450 4038 22171
AGG BAL FOR OPEN HOME EQUITY TRADES
AGG CREDIT FOR OPEN HOME EQUITY
12801 47881 9854 40972 10658 48150 5925 35129
TRADES
# OF INST TRADES BAL > 0 1.3 1.7 1.6 2.0 2.2 2.6 2.3 2.6
AGG CREDIT FOR OPEN INST TRADES 18285 37118 17603 41267 22295 50576 14406 31262
AGG BAL TO CREDIT RATIO FOR OPEN INST
41.1 238.3 44.0 329.5 50.4 73.9 42.4 193.7
TRADES
# OF OPEN AUTO LEASE TRADES BAL > 0 0.1 0.3 0.1 0.3 0.1 0.3 0.1 0.2
AGG BAL FOR OPEN AUTO LEASE TRADES 805 3973 757 3961 950 4511 579 3674
1533 6786 1398 6575 1868 7883 1043 5972
AGG CREDIT FOR OPEN AUTO LEASE TRADES
# OF OPEN MORTGAGE TRADES BAL > 0 0.6 0.7 0.5 0.7 0.5 0.7 0.3 0.6
AGG BAL FOR OPEN MORTGAGE TRADES 72367 126908 59220 116031 65951 138920 37885 100355
78231 132319 63569 121104 70270 144766 40584 105792
AGG CREDIT FOR OPEN MORTGAGE TRADES
AGG BAL TO CREDIT RATIO FOR OPEN
44.3 53.3 36.3 47.4 38.1 45.6 25.8 239.3
MORTGAGE TRADES
# OF RETAIL TRADES BAL > 0 0.7 1.1 0.8 1.2 0.9 1.3 0.8 1.3
AGG CREDIT FOR OPEN RETAIL TRADES 4327 7852 3319 4830 3067 4600 1857 3551
AGG BAL TO CREDIT RATIO FOR OPEN RETAIL
8.6 24.1 12.0 78.0 15.6 32.9 15.9 141.9
TRADES
# OF REVOLVING TRADES BAL > 0 3.0 2.6 3.2 2.8 3.6 3.1 3.2 3.2
31929 32478 26172 32407 23282 30263 14480 25332
AGG CREDIT FOR OPEN REVOLVING TRADES
# OF OPEN BANKCARD TRADES 3.7 3.0 3.5 3.3 3.3 3.1 2.3 2.8
AGG BAL TO CREDIT RATIO FOR OPEN
24.5 117.9 32.4 76.3 39.4 37.7 41.0 128.7
REVOLVING TRADES

Credit type
BK RI RI* RNI

BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD MEAN STD
# OF TRADES 14.1 7.9 14.6 8.3 15.7 8.5 13.1 7.8
# OF AUTO LOAN TRADES 0.8 1.1 0.8 1.1 1.0 1.3 0.8 1.1
# OF AUTO LOAN OPENED W/I 12 MOS 0.2 0.6 0.3 0.7 0.3 0.7 0.3 0.7
# OF BANKCARD TRADES 5.3 4.0 5.6 4.5 5.4 4.2 4.6 4.1
# OF INST TRADES 3.5 3.2 4.4 3.7 5.7 4.5 5.5 4.4
# OF AUTO LEASE TRADES 0.1 0.4 0.1 0.4 0.1 0.5 0.1 0.4
# OF MORTGAGE TRADES 1.4 1.8 1.3 1.9 1.5 2.1 1.0 1.7
# OF RETAIL TRADES 3.4 3.1 3.2 3.0 3.2 3.1 2.4 2.7
# OF OPEN RETAIL TRADES BAL DATE W/I 12
2.5 2.4 2.1 2.4 2.1 2.4 1.4 2.0
MOS
# OF REVOLVING TRADES 8.8 5.9 8.5 6.3 8.1 6.1 6.3 5.7

28
Table 1. (cont.)

Length of history

BK RI RI*

BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD MEAN


AGE OF OLDEST TRADE 173.2 109.0 150.4 102.2 140.0 98.3 122.5
AGE OF OLDEST AUTO LOAN TRADE 13.6 19.3 13.8 19.8 15.3 20.5 13.1
AGE OF OLDEST BANKCARD TRADE 124.3 91.9 103.6 86.7 95.0 83.7 79.2
AGE OF OLDEST HOME EQUITY TRADE 10.0 26.4 7.2 22.2 6.9 21.3 4.7
AGE OF OLDEST AUTO LEASE TRADE 2.6 9.7 2.4 9.4 2.9 10.1 1.8
AGE OF OLDEST MORTGAGE TRADE 29.8 48.6 24.7 45.9 24.1 44.3 18.3
AGE OF OLDEST RETAIL TRADE 122.6 114.4 104.4 104.2 96.7 99.9 78.5

New credit

BK RI RI*

BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD MEA


# OF OPEN BANKCARD TRADES BAL DATE W/I
3.6 3.0 3.4 3.2 3.2 3.0 2.3
24 MOS
# OF INQUIRIES W/I 12 MOS 1.6 1.9 2.1 2.2 2.7 2.6 2.4
# OF MORTGAGE OPENED W/I 24 MOS 0.9 1.6 0.9 1.6 1.1 1.9 0.7
# OF RETAIL OPENED W/I 24 MOS 0.9 1.3 0.9 1.3 0.9 1.4 0.6
# OF RETAIL OPENED W/I 12 MOS 0.5 1.0 0.5 1.0 0.6 1.1 0.4

Payment

BK RI RI*

BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD ME


# OF CLOSED TRADES W/I 6 MOS 1.2 1.4 1.3 1.6 1.4 1.6 1
# OF TRADES OPENED W/I 24 MOS FOR
0.0 0.3 0.1 0.3 0.1 0.4 0
CURRENT W/ MINOR DELINQ
# OF TRADES OPENED W/I 24 MOS W/ MAJOR
0.2 0.7 0.3 1.0 0.6 1.3 1
DELINQ/DEROG
# OF BANKCARD OPENED W/I 24 MOS W/
0.0 0.1 0.0 0.2 0.0 0.3 0
MAJOR DELINQ/DEROG
# OF TRADES CURRENTLY 30 DPD BAL > 0 0.0 0.2 0.0 0.2 0.1 0.4 0
# OF TRADES CURRENTLY 60 DPD BAL > 0 0.0 0.1 0.0 0.2 0.0 0.3 0
# OF TRADES CURRENTLY 90 DPD BAL > 0 0.0 0.1 0.0 0.1 0.0 0.2 0
0.0 0.3 0.1 0.5 0.1 0.6 0
# OF BANKCARD TRADES - 90 DPD W/I 12 MOS
# OF RETAIL TRADES - 90 DPD W/I 12 MOS 0.0 0.1 0.0 0.2 0.1 0.3 0
# OF TRADES MAJOR DEROG 0.4 1.5 1.2 2.4 1.4 2.6 2
AGG BAL FOR MAJOR DEROG 731 8303 2256 13210 2161 12607 46
# OF TRADES 30-180 DPD W/I 12 MOS 0.3 0.9 0.6 1.4 0.9 1.7 1
# OF TRADES 60-180 DPD W/I 12 MOS 0.1 0.6 0.4 1.0 0.5 1.2 1
# OF TRADES 90-180 DPD W/I 12 MOS 0.1 0.5 0.3 0.8 0.3 1.0 0
# OF BANKRUPTCIES 0.1 0.5 0.3 1.1 0.2 0.8 0
# OF ALL PUBLIC RECORD INCLUDING
0.1 0.6 0.3 1.1 0.2 0.9 0
TRADELINE BANKRUPTCIES

Credit score
BK RI RI*
BUREAU ATTRIBUTE MEAN STD MEAN STD MEAN STD ME
VALID BUREAU SCORE 765 93 709 110 678 112 6
UNSCOREABLE 0.4% 6.1% 0.6% 7.1% 0.9% 8.9% 1

29
Table 2. Bureau attributes exhibiting the largest distributional difference between booked
BK and the different rejected groups based on level of inference (RI, RI*, RNI), as
measured by the K-S statistic for the 2003 development sample. Shown are the variables
with significant difference for at least one of the pairs. The shading corresponds to the
level of K-S with darker indicating higher K-S (10-20, 20-30, 30-100).

VARIABLE BK_RI BK_RI* BK_RNI RI_RNI RI*_RNI


AGE OF OLDEST TRADE 0.10 0.13 0.24 0.14 0.12
# OF TRADES OPENED W/I 24 MOS W/ MAJOR 0.09 0.15 0.36 0.27 0.21
# OF OPEN BANKCARD TRADES 0.07 0.08 0.27 0.19 0.18
# OF OPEN BANKCARD TRADES BAL DATE W/I 24 MOS 0.08 0.08 0.26 0.19 0.18
AGG CREDIT FOR OPEN BANKCARD TRADES 0.18 0.19 0.39 0.21 0.21
AGG BAL TO CREDIT RATIO FOR OPEN BANKCARD 0.14 0.23 0.26 0.15 0.15
AGE OF OLDEST BANKCARD TRADE 0.13 0.15 0.26 0.14 0.12
# OF COLLECTION TRADES W/I 24 MOS 0.09 0.15 0.34 0.25 0.19
# OF BANKCARD TRADES - 30 DPD W/I 12 MOS 0.07 0.13 0.30 0.23 0.17
# OF BANKCARD TRADES - 60 DPD W/I 12 MOS 0.06 0.09 0.26 0.20 0.17
# OF BANKCARD TRADES - 90 DPD W/I 12 MOS 0.04 0.06 0.21 0.16 0.14
# OF TRADES MAJOR DEROG 0.19 0.23 0.46 0.27 0.23
AGG BAL FOR MAJOR DEROG 0.15 0.19 0.43 0.27 0.23
# OF TRADES 30-180 DPD W/I 12 MOS 0.13 0.21 0.42 0.29 0.21
# OF TRADES 60-180 DPD W/I 12 MOS 0.10 0.15 0.37 0.27 0.22
# OF TRADES 90-180 DPD W/I 12 MOS 0.08 0.11 0.31 0.23 0.20
# OF INST TRADES 0.10 0.22 0.21 0.11 0.01
# OF MORTGAGE TRADES 0.08 0.02 0.22 0.14 0.21
# OF OPEN MORTGAGE TRADES BAL > 0 0.09 0.04 0.23 0.14 0.19
AGG BAL FOR OPEN MORTGAGE TRADES 0.09 0.04 0.23 0.14 0.19
AGG CREDIT FOR OPEN MORTGAGE TRADES 0.09 0.04 0.23 0.14 0.19
AGE OF OLDEST MORTGAGE TRADE 0.10 0.05 0.25 0.15 0.20
AGG BAL TO CREDIT RATIO FOR OPEN MORTGAGE 0.09 0.04 0.23 0.14 0.19
# OF OPEN RETAIL TRADES BAL DATE W/I 12 MOS 0.08 0.08 0.23 0.15 0.15
AGG CREDIT FOR OPEN RETAIL TRADES 0.11 0.11 0.27 0.16 0.17
AGG CREDIT FOR OPEN REVOLVING TRADES 0.19 0.19 0.40 0.21 0.21
AGG BAL TO CREDIT RATIO FOR OPEN REVOLVING 0.15 0.25 0.29 0.17 0.13
VALID BUREAU SCORE 0.26 0.35 0.59 0.38 0.30

30
Table 3. Number of times a variable is selected by the stepwise regression used for
developing the acquisition scores for the CLEAN segment 1999-2007.

VARIABLE BK TTD
VALID BUREAU SCORE 9 9
AGG BAL TO CREDIT RATIO FOR OPEN REVOLVING TRADES 7 6
# OF REVOLVING TRADES BAL > 0 6 6
AVERAGE AGE OF TRADES 5 8
AGG CREDIT FOR OPEN MORTGAGE TRADES 5 7
AGE OF OLDEST BANKCARD TRADE 4 7
# OF TRADES 30-180 DPD W/I 12 MOS 4 5
# OF INQUIRIES W/I 6 MOS 4 5
AGE OF OLDEST HOME EQUITY TRADE 3 4
# OF INQUIRIES W/I 12 MOS 3 4
AGE OF OLDEST TRADE 3 2
# OF MORTGAGE OPENED W/I 24 MOS 3 2
AGG BAL FOR OPEN FINANCE TRADES 2 4
# OF TRADES OPENED W/I 24 MOS FOR CURRENT W/ MINOR 2 2
# OF BANKCARD TRADES - 30 DPD W/I 12 MOS 2 2
AGG BAL TO CREDIT RATIO FOR OPEN INST TRADES 2 2
# OF BANKCARD TRADES 2 0
# OF INST TRADES 2 0
AGG BAL FOR OPEN HOME EQUITY TRADES 1 3
# OF ALL PUBLIC RECORD INCLUDING TRADELINE BANKRUPTCIES 1 3
AGG BAL FOR OPEN AUTO LOAN TRADES 1 2
AGG BAL TO CREDIT RATIO FOR OPEN BANKCARD TRADES 1 2
AGG BAL FOR OPEN MORTGAGE TRADES 1 2
# OF OPEN HOME EQUITY TRADES BAL > 0 1 1
# OF OPEN MORTGAGE TRADES BAL > 0 1 1
# OF RETAIL OPENED W/I 12 MOS 1 1
# OF REVOLVING TRADES 1 1
# OF CLOSED TRADES W/I 6 MOS 1 0
# OF BANKCARD TRADES BAL > 0 1 0
AGG BAL FOR OPEN INST TRADES 1 0
AGE OF OLDEST AUTO LEASE TRADE 1 0
AGG BAL TO CREDIT RATIO FOR OPEN MORTGAGE TRADES 1 0
# OF OPEN RETAIL TRADES BAL DATE W/I 12 MOS 1 0
AGG BAL FOR OPEN BANKCARD TRADES 0 6
AGG CREDIT FOR OPEN BANKCARD TRADES 0 5
AGE OF OLDEST RETAIL TRADE 0 3
# OF TRADES 0 2
AGG CREDIT FOR OPEN AUTO LOAN TRADES 0 2
# OF OPEN BANKCARD TRADES BAL DATE W/I 24 MOS 0 2
AGG CREDIT FOR OPEN HOME EQUITY TRADES 0 2
# OF AUTO LEASE TRADES 0 2
AGG CREDIT FOR OPEN AUTO LEASE TRADES 0 2
# OF MORTGAGE TRADES 0 2
# OF OPEN AUTO LOAN TRADES BAL > 0 0 1
# OF OPEN BANKCARD TRADES 0 1
AGG BAL FOR OPEN AUTO LEASE TRADES 0 1
AGE OF OLDEST MORTGAGE TRADE 0 1
# OF RETAIL TRADES BAL > 0 0 1
# OF RETAIL OPENED W/I 24 MOS 0 1
AGG CREDIT FOR OPEN RETAIL TRADES 0 1
AGG CREDIT FOR OPEN REVOLVING TRADES 0 1

31
Table 4. Number of times a variable is selected by the stepwise regression used for
developing the acquisition scores for the DIRTY segment 1999-2007.

VARIABLE BK TTD
VALID BUREAU SCORE 9 9
AGE OF OLDEST BANKCARD TRADE 6 9
# OF TRADES MAJOR DEROG 6 9
# OF TRADES OPENED W/I 24 MOS FOR CURRENT W/ MINOR 6 5
AGG CREDIT FOR OPEN REVOLVING TRADES 6 2
AVERAGE AGE OF TRADES 5 8
# OF INQUIRIES W/I 12 MOS 5 7
# OF BANKRUPTCIES 5 7
# OF ALL PUBLIC RECORD INCLUDING TRADELINE 5 5
# OF BANKCARD OPENED W/I 24 MOS W/ MAJOR 4 7
# OF REVOLVING TRADES 4 7
AGG BAL TO CREDIT RATIO FOR OPEN REVOLVING TRADES 4 1
AGG BAL FOR OPEN BANKCARD TRADES 3 9
# OF TRADES 30-180 DPD W/I 12 MOS 3 9
AGG BAL FOR OPEN MORTGAGE TRADES 3 1
# OF REVOLVING TRADES BAL > 0 3 1
AGG CREDIT FOR OPEN MORTGAGE TRADES 2 8
AGE OF OLDEST HOME EQUITY TRADE 2 6
# OF TRADES CURRENTLY 30 DPD BAL > 0 2 3
# OF INST TRADES BAL > 0 2 3
AGG BAL TO CREDIT RATIO FOR OPEN MORTGAGE TRADES 2 3
AGE OF OLDEST TRADE 2 2
# OF TRADES OPENED W/I 24 MOS W/ MAJOR DELINQ/DEROG 2 2
AGG CREDIT FOR OPEN INST TRADES 2 2
AGG BAL TO CREDIT RATIO FOR OPEN INST TRADES 2 2
# OF TRADES 2 1
AGG CREDIT FOR OPEN BANKCARD TRADES 1 7
# OF INQUIRIES W/I 6 MOS 1 4
AGG BAL FOR OPEN FINANCE TRADES 1 4
# OF INST TRADES 1 3
# OF OPEN BANKCARD TRADES 1 2
# OF TRADES CURRENTLY 60 DPD BAL > 0 1 2
# OF BANKCARD TRADES - 30 DPD W/I 12 MOS 1 2
# OF BANKCARD TRADES - 60 DPD W/I 12 MOS 1 2
AGG CREDIT FOR OPEN AUTO LEASE TRADES 1 2
# OF OPEN MORTGAGE TRADES BAL > 0 1 2
# OF OPEN RETAIL TRADES BAL DATE W/I 12 MOS 1 2
# OF BANKCARD TRADES - 90 DPD W/I 12 MOS 1 1
AGG BAL FOR OPEN INST TRADES 1 1
# OF RETAIL TRADES 1 1
# OF RETAIL TRADES BAL > 0 1 1
AGG BAL FOR OPEN AUTO LOAN TRADES 1 0
# OF MORTGAGE - SEVERE DELINQUENCY INCLUDES 1 0
# OF OPEN HOME EQUITY TRADES BAL > 0 1 0
AGG BAL FOR MAJOR DEROG 1 0
# OF MORTGAGE OPENED W/I 24 MOS 1 0
AGE OF OLDEST MORTGAGE TRADE 1 0
AGE OF OLDEST RETAIL TRADE 1 0
# OF BANKCARD TRADES BAL > 0 0 4
# OF TRADES 60-180 DPD W/I 12 MOS 0 3
# OF CLOSED TRADES W/I 6 MOS 0 2
# OF TRADES 90-180 DPD W/I 12 MOS 0 2
# OF AUTO LEASE TRADES 0 2
AGE OF OLDEST AUTO LEASE TRADE 0 2
# OF MORTGAGE TRADES 0 2
AGE OF OLDEST AUTO LOAN TRADE 0 1
# OF OPEN BANKCARD TRADES BAL DATE W/I 24 MOS 0 1
AGG BAL TO CREDIT RATIO FOR OPEN BANKCARD TRADES 0 1
# OF COLLECTION TRADES W/I 24 MOS 0 1
AGG BAL FOR OPEN HOME EQUITY TRADES 0 1
# OF RETAIL OPENED W/I 24 MOS 0 1

32
Table 5. Number of times a variable is selected by the stepwise regression used for
developing the acquisition scores for the THIN segment 1999-2007.

VARIABLE BK TTD
VALID BUREAU SCORE 9 9
AGG CREDIT FOR OPEN REVOLVING TRADES 8 7
# OF TRADES 30-180 DPD W/I 12 MOS 4 8
# OF TRADES MAJOR DEROG 4 5
# OF INQUIRIES W/I 6 MOS 4 4
AGG BAL TO CREDIT RATIO FOR OPEN REVOLVING 4 4
AGG BAL TO CREDIT RATIO FOR OPEN INST TRADES 4 3
AGE OF OLDEST BANKCARD TRADE 3 6
# OF INST TRADES 3 3
# OF REVOLVING TRADES BAL > 0 3 2
AGG BAL TO CREDIT RATIO FOR OPEN BANKCARD 3 1
# OF INST TRADES BAL > 0 2 4
# OF INQUIRIES W/I 12 MOS 2 4
AGE OF OLDEST TRADE 2 3
# OF MORTGAGE TRADES 2 2
# OF REVOLVING TRADES 2 1
# OF TRADES CURRENTLY 60 DPD BAL > 0 2 0
# OF BANKCARD TRADES - 30 DPD W/I 12 MOS 2 0
AGG CREDIT FOR OPEN MORTGAGE TRADES 1 4
# OF TRADES 1 2
AVERAGE AGE OF TRADES 1 1
# OF BANKCARD TRADES 1 1
# OF BANKCARD OPENED W/I 24 MOS W/ MAJOR 1 1
# OF COLLECTION TRADES W/I 24 MOS 1 1
# OF BANKCARD TRADES - 60 DPD W/I 12 MOS 1 1
# OF BANKCARD TRADES - 90 DPD W/I 12 MOS 1 1
# OF MORTGAGE - SEVERE DELINQUENCY INCLUDES 1 1
# OF OPEN MORTGAGE TRADES BAL > 0 1 1
# OF BANKRUPTCIES 1 1
AGE OF OLDEST RETAIL TRADE 1 1
AGG BAL FOR OPEN AUTO LOAN TRADES 1 0
# OF OPEN BANKCARD TRADES BAL DATE W/I 24 MOS 1 0
# OF TRADES 60-180 DPD W/I 12 MOS 1 0
AGE OF OLDEST AUTO LEASE TRADE 1 0
AGG BAL FOR OPEN BANKCARD TRADES 0 5
AGG CREDIT FOR OPEN BANKCARD TRADES 0 3
# OF RETAIL TRADES BAL > 0 0 3
# OF TRADES OPENED W/I 24 MOS W/ MAJOR 0 2
# OF BANKCARD TRADES BAL > 0 0 2
# OF OPEN BANKCARD TRADES 0 2
# OF AUTO LEASE TRADES 0 2
# OF OPEN RETAIL TRADES BAL DATE W/I 12 MOS 0 2
# OF TRADES OPENED W/I 24 MOS FOR CURRENT W/ 0 1
# OF AUTO LOAN TRADES 0 1
AGG CREDIT FOR OPEN AUTO LOAN TRADES 0 1
# OF TRADES CURRENTLY 30 DPD BAL > 0 0 1
# OF OPEN HOME EQUITY TRADES BAL > 0 0 1
AGG BAL FOR OPEN INST TRADES 0 1
AGG CREDIT FOR OPEN INST TRADES 0 1
AGG BAL FOR OPEN AUTO LEASE TRADES 0 1
AGG BAL FOR OPEN MORTGAGE TRADES 0 1
AGG BAL TO CREDIT RATIO FOR OPEN MORTGAGE 0 1
# OF ALL PUBLIC RECORD INCLUDING TRADELINE 0 1
AGG BAL TO CREDIT RATIO FOR OPEN RETAIL TRADES 0 1

33
Table 6. Discriminatory power results for the scores built on just booked and all through-
the-door accounts based on the 2003 data TTD2003 and BK2003 tested in sample and
out-of-time

validation
segment TTD2003_TTD BK2003_TTD TTD2003_BK BK2003_BK
year
2003 0.59 0.58 0.59 0.59
2004 0.59 0.57 0.56 0.56
all 2005 0.58 0.56 0.56 0.55
2006 0.56 0.55 0.57 0.56
2007 0.52 0.51 0.52 0.51
2003 0.59 0.57 0.57 0.57
2004 0.56 0.55 0.51 0.51
clean 2005 0.55 0.54 0.51 0.51
K-S

2006 0.55 0.52 0.56 0.55


2007 0.51 0.46 0.51 0.46
2003 0.4 0.39 0.48 0.47
2004 0.41 0.38 0.44 0.43
dirty 2005 0.38 0.36 0.4 0.39
2006 0.37 0.35 0.39 0.36
2007 0.35 0.35 0.35 0.35
2003 0.47 0.4 0.58 0.59
2004 0.47 0.41 0.52 0.48
thin 2005 0.5 0.42 0.57 0.49
2006 0.51 0.46 0.53 0.54
2007 0.46 0.44 0.46 0.45

validation
segment TTD2003_TTD BK2003_TTD TTD2003_BK BK2003_BK
year
2003 0.74 0.73 0.74 0.75
2004 0.74 0.72 0.72 0.71
all 2005 0.72 0.71 0.71 0.7
2006 0.71 0.69 0.72 0.71
2007 0.67 0.66 0.67 0.65
2003 0.71 0.71 0.69 0.7
2004 0.7 0.68 0.64 0.64
Somer’s D

clean 2005 0.69 0.67 0.65 0.64


2006 0.67 0.63 0.69 0.65
2007 0.63 0.55 0.62 0.55
2003 0.54 0.51 0.6 0.61
2004 0.53 0.51 0.56 0.55
dirty 2005 0.49 0.47 0.54 0.52
2006 0.48 0.45 0.5 0.48
2007 0.46 0.46 0.47 0.47
2003 0.6 0.52 0.7 0.71
2004 0.58 0.53 0.66 0.63
thin 2005 0.58 0.55 0.65 0.61
2006 0.62 0.58 0.68 0.67
2007 0.58 0.56 0.59 0.56

34
Table 7. Score forecasting accuracy power based on the H-L test statistic for the scores
built on just booked and all through-the-door accounts based on the 2003 data TTD2003
and BK2003 tested in sample and out-of-time.

validation
segment TTD2003_TTD BK2003_TTD TTD2003_BK BK2003_BK
year
2003 23.3 816.3 193.5 17.8
2004 95.1 1799.1 178.1 23.6
all 2005 26.3 1020.8 208.3 48.7
2006 318.3 2319.9 111.8 278.6
2007 1712.3 5829.6 419.4 1411.5
2003 22 149.1 55.3 11.6
2004 32 241.2 25.2 16.5
clean 2005 24.4 162.2 19.3 42.3
H-L

2006 197.6 643.4 28 138.3


2007 1049.6 2903.8 295 1045.2
2003 30.3 569.5 119.7 26.4
2004 120.3 1393.2 137.8 9.2
dirty 2005 81.6 837.5 171.8 20.9
2006 214.3 1557.2 99.8 129.5
2007 765.1 2796.3 165 502.3
2003 17.1 368 56 7.7
2004 63.1 662.1 30.3 30.1
thin 2005 52.9 409.4 41.9 32.8
2006 208.1 1321.7 46.5 142.3
2007 81.2 1013.2 26.3 225.3

validation
segment TTD2003_TTD BK2003_TTD TTD2003_BK BK2003_BK
year
2003 0.010 0.000 0.000 0.058
2004 0.000 0.000 0.000 0.009
all 2005 0.003 0.000 0.000 0.000
2006 0.000 0.000 0.000 0.000
2007 0.000 0.000 0.000 0.000
p-value χ2 (10 d.f.)

2003 0.015 0.000 0.000 0.313


2004 0.000 0.000 0.005 0.086
clean 2005 0.007 0.000 0.037 0.000
2006 0.000 0.000 0.002 0.000
2007 0.000 0.000 0.000 0.000
2003 0.001 0.000 0.000 0.003
2004 0.000 0.000 0.000 0.513
dirty 2005 0.000 0.000 0.000 0.022
2006 0.000 0.000 0.000 0.000
2007 0.000 0.000 0.000 0.000
2003 0.072 0.000 0.000 0.658
2004 0.000 0.000 0.001 0.001
thin 2005 0.000 0.000 0.000 0.000
2006 0.000 0.000 0.000 0.000
2007 0.000 0.000 0.003 0.000

35
Table 8. Score forecasting accuracy power Normal test by deciles for the scores built on
just booked and all through-the-door accounts based on the 2003 data TTD2003 and
BK2003 tested in sample and out-of-time. Provided are results for the full sample
showing the p-value and expected less observed (exp-obs) number of BAD.

validation TTD2003_TTD BK2003_TTD TTD2003_BK BK2003_BK


year decile p-value exp - obs p-value exp - obs p-value exp - obs p-value exp - obs
1 0.46 0 0.30 1 0.22 1 0.04 3
2 0.01 9 0.34 2 0.04 5 0.24 2
3 0.17 -5 0.48 0 0.25 2 0.10 4
4 0.04 11 0.40 -1 0.38 1 0.20 3
5 0.01 18 0.01 -16 0.00 17 0.27 3
2003
6 0.42 -2 0.00 -39 0.01 17 0.32 3
7 0.15 -15 0.00 -125 0.01 20 0.12 8
8 0.11 26 0.00 -177 0.00 34 0.00 -25
9 0.06 42 0.00 -340 0.00 119 0.11 15
10 0.02 -84 0.00 -583 0.00 268 0.23 -16
1 0.06 -4 0.12 -3 0.22 1 0.37 -1
2 0.29 -2 0.00 -10 0.02 -6 0.39 -1
3 0.31 3 0.24 -3 0.01 8 0.40 1
4 0.45 1 0.02 -12 0.07 7 0.25 -3
5 0.06 -13 0.00 -37 0.14 6 0.27 3
2004
6 0.03 -22 0.00 -89 0.25 5 0.05 -9
7 0.04 -28 0.00 -166 0.09 12 0.13 -8
8 0.00 -59 0.00 -294 0.00 45 0.00 -39
9 0.00 -83 0.00 -493 0.00 111 0.23 -10
10 0.00 -344 0.00 -921 0.00 294 0.10 -30
1 0.15 -2 0.00 -10 0.06 -3 0.31 -1
2 0.18 3 0.12 4 0.19 2 0.38 1
3 0.03 9 0.39 -1 0.12 4 0.36 1
4 0.28 -4 0.01 -13 0.26 3 0.35 -1
5 0.01 -18 0.00 -26 0.25 4 0.03 -9
2005
6 0.04 -19 0.00 -100 0.43 -1 0.00 -18
7 0.06 -24 0.00 -140 0.13 10 0.00 -30
8 0.46 2 0.00 -267 0.00 37 0.00 -41
9 0.09 -39 0.00 -398 0.00 128 0.12 -17
10 0.00 123 0.00 -454 0.00 358 0.14 27
1 0.00 -16 0.00 -23 0.00 -6 0.00 -6
2 0.00 -15 0.00 -17 0.05 -4 0.01 -6
3 0.01 -10 0.00 -18 0.28 -2 0.05 -5
4 0.01 -13 0.00 -22 0.39 1 0.31 2
5 0.00 -30 0.00 -52 0.09 7 0.00 -13
2006
6 0.00 -86 0.00 -129 0.01 -15 0.00 -35
7 0.00 -118 0.00 -236 0.00 -34 0.00 -67
8 0.00 -134 0.00 -323 0.29 -7 0.00 -83
9 0.00 -172 0.00 -587 0.00 68 0.00 -114
10 0.01 -103 0.00 -683 0.00 256 0.00 -100
1 0.00 -37 0.00 -73 0.00 -16 0.00 -33
2 0.00 -23 0.00 -33 0.00 -11 0.00 -15
3 0.00 -28 0.00 -44 0.00 -15 0.00 -16
4 0.00 -60 0.00 -65 0.00 -16 0.00 -24
5 0.00 -97 0.00 -123 0.00 -30 0.00 -42
2007
6 0.00 -195 0.00 -216 0.00 -48 0.00 -66
7 0.00 -269 0.00 -375 0.00 -108 0.00 -107
8 0.00 -267 0.00 -439 0.00 -78 0.00 -170
9 0.00 -315 0.00 -685 0.02 -41 0.00 -176
10 0.00 -337 0.00 -885 0.00 127 0.00 -236

36
Table 9. The H-L test across all scores and validation years for the scores built on just
booked BK and all through-the-door TTD individuals with values below the χ2 (10)
shaded. The number of observations does is around 100K for all cohorts.

validation score score score score score score score score score
year 1999 2000 2001 2002 2003 2004 2005 2006 2007
1999 54 . . . . . . . .
TTD_TTD

2000 227 37 . . . . . . .
2001 139 40 27 . . . . . .
2002 172 411 393 14 . . . . .
2003 218 438 414 32 23 . . . .
2004 119 211 191 65 95 28 . . .
2005 325 561 547 38 26 103 19 . .
2006 589 378 455 215 318 184 284 12 .
2007 2190 978 1248 1370 1712 1299 1469 296 31

validation score score score score score score score score score
year 1999 2000 2001 2002 2003 2004 2005 2006 2007
1999 388 . . . . . . . .
BK_TTD

2000 1487 972 . . . . . . .


2001 1329 772 1434 . . . . . .
2002 275 100 211 333 . . . . .
2003 235 61 227 337 816 . . . .
2004 389 144 438 773 1799 1362 . . .
2005 310 105 328 372 1021 760 756 . .
2006 851 346 885 1188 2320 1891 1722 568 .
2007 2911 1304 2841 3176 5830 4721 4329 1740 501

validation score score score score score score score score score
year 1999 2000 2001 2002 2003 2004 2005 2006 2007
1999 143 . . . . . . . .
TTD_BK

2000 118 299 . . . . . . .


2001 141 331 312 . . . . . .
2002 223 433 392 123 . . . . .
2003 369 587 558 242 194 . . . .
2004 425 610 583 206 178 321 . . .
2005 488 724 695 242 208 365 201 . .
2006 403 518 562 105 112 197 95 202 .
2007 806 423 601 315 419 393 306 57 183

validation score score score score score score score score score
year 1999 2000 2001 2002 2003 2004 2005 2006 2007
1999 25 . . . . . . . .
2000 32 16 . . . . . . .
BK_BK

2001 15 38 16 . . . . . .
2002 51 88 38 7 . . . . .
2003 146 188 130 47 18 . . . .
2004 157 179 117 58 24 7 . . .
2005 228 265 214 97 49 19 15 . .
2006 225 147 196 182 279 172 109 13 .
2007 788 314 768 741 1412 1012 917 205 15

37
Table 10. The H-L test across booked BK and through-the-door TTD scores and
validation years applied only on the individuals that have not been part of the model-
development sample. This validation sample averages fewer than 4000 observations.
Note also that this subset of the TTD population has a large portion of unscoreable
individuals also affecting the accuracy of the results.

validation score score score score score score score score


year 1999 2000 2001 2002 2003 2004 2005 2006
2000 59 . . . . . . .
TTD_TTD

2001 51 48 . . . . . .
2002 75 63 85 . . . . .
2003 63 70 88 72 . . . .
2004 79 88 96 80 42 . . .
2005 71 74 84 73 51 58 . .
2006 78 76 83 61 55 69 35 .
2007 93 95 94 93 95 66 43 50

validation score score score score score score score score


year 1999 2000 2001 2002 2003 2004 2005 2006
2000 138 . . . . . . .
BK_TTD

2001 146 94 . . . . . .
2002 94 53 69 . . . . .
2003 56 35 53 63 . . . .
2004 99 73 83 108 76 . . .
2005 77 59 72 99 76 74 . .
2006 102 86 111 134 90 59 48 .
2007 124 78 137 140 166 97 79 65

validation score score score score score score score score


year 1999 2000 2001 2002 2003 2004 2005 2006
2000 55 . . . . . . .
TTD_BK

2001 42 56 . . . . . .
2002 39 41 48 . . . . .
2003 36 43 54 42 . . . .
2004 68 77 79 65 43 . . .
2005 77 69 87 71 60 73 . .
2006 83 65 65 55 42 52 40 .
2007 87 67 75 60 69 66 33 37

validation score score score score score score score score


year 1999 2000 2001 2002 2003 2004 2005 2006
2000 36 . . . . . . .
BK_BK

2001 30 23 . . . . . .
2002 36 20 45 . . . . .
2003 28 23 23 37 . . . .
2004 62 79 44 57 29 . . .
2005 83 40 59 79 35 40 . .
2006 70 49 64 53 38 30 16 .
2007 60 53 87 65 92 40 23 22

38
Figure 1. Illustration of the consumer credit database CCDB and the timing of the TTD
sample selection and performance horizon. The vertical bars represent the cross sectional
bureau data in the CCDB at each June 30th snapshot of both scoreable and unscoreable
individuals. The difference in shading indicates the mixture of individuals that make the
unbalanced panel form of the CCDB. Those that remain or become scoreable are part of
the sample in the following year and in each year, new unscoreable and scoreable
individuals are added. The TTD sample is taken from the full cross sectional snapshot
and can include both scoreable and unscoreable individuals as well as both individuals
that have been in the sample in the previous year or are new to the panel.

June 2002 sample June 2003 sample June 2004 sample


with attributes with attributes with attributes
unscoreable

unscoreable

unscoreable
2002 TTD proxy 2003 TTD proxy 2004 TTD proxy
sample accounts sample accounts sample accounts
scoreable

scoreable

scoreable

Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

2002 TTD performance horizon

2003 TTD performance horizon

39
Figure 2. Annual sample distribution for the type of applicant

Percent of Borrowers Booked (BK), Reject Inference (RI and RI*),


Rejects-No Inference (RNI) and
total through-the-door TTD Count ('000) by Cohort

100% 200
90%
80%
150 RNI
70%
60% RI*
50% 100 RI
40% BK
30% TTD
50
20%
10%
0% 0
1999 2000 2001 2002 2003 2004 2005 2006 2007

Figure 3. Account performance by type of applicant

Bad Rate for the Booked (BK) and


Rejected (RI + RI*) Subsets by Cohort

12
Percent Bad (90+ DPD)

10
8
BK
6
RI
4
RI*
2
0
1999 2000 2001 2002 2003 2004 2005 2006 2007

40
Figure 4. Distribution of the credit bureau score for the booked and the three groups of
rejected individuals based on level of inference. Extreme invalid credit score values are
set to missing.

41
Figure 5. Distribution of booked and rejected applicants by segment

Clean segment TTD

100%
80%
RNI
60% RI*
40% RI
BK
20%
0%
99

00

01

02

03

04

05

06

07
19

20

20

20

20

20

20

20

20
Dirty segment TTD

100%
80% RNI
60% RI*
40% RI
20% BK

0%
99

00

01

02

03

04

05

06

07
19

20

20

20

20

20

20

20

20

Thin segment TTD

100%
80%
RNI
60% RI*
40% RI
BK
20%
0%
99

00

01

02

03

04

05

06

07
19

20

20

20

20

20

20

20

20

42
Figure 6. Segment distribution across the development and validation samples 1999-
2008. The first panel shows the full sample in thousands and the second only new
individuals used for out-of-sample out-of-time validation.

Count of Individuals by Segment


in the TTD Sample

100
80
clean
60
dirty
40
thin
20
0
99

00

01

02

03

04

05

06

07
19

20

20

20

20

20

20

20

20

Count of New Individuals by


Segment in the TTD sample

4000
3500
3000
2500 clean
2000 dirty
1500 thin
1000
500
0
00

01

02

03

04

05

06

07
20

20

20

20

20

20

20

20

43

You might also like