You are on page 1of 7

Credit scoring

The development of credit quality


scoring quality measures for measures

consumer credit applications


79
Kevin J. Leonard
School of Business and Economics, Wilfrid Laurier University, Received November 1993
Waterloo, Ontario, Canada

Introduction
The process of modelling the variables important in the extension of credit is
referred to as “credit scoring”. This modelling process, which often uses
statistical methodology, has been carried out by banks and other financial
institutions. Based on statistical analysis of historical data (the good customers
and the bad), certain financial variables are determined to be important in the
evaluation process of a credit applicant’s financial stability and strength. This
analysis produces coefficients which are translated into “score weights”.
Subsequently, information on these important variables is obtained for new
bank customers. An overall score for these new applicants is produced by
adding the weighted scores which were generated from the responses to the
different variables. If this overall score is above a predetermined cut-off point,
the loan applicant receives a certain line of credit. If not, the applicant is denied
credit.
The primary objective in credit scoring is to develop an effective scoring
model which contains only a small number of predictor variables. These scoring
algorithms or “scorecards” are then used to evaluate all credit applicants in the
future. This allows for consistency in credit evaluation and efficiency in
processing. When implemented effectively, the scorecard should be able to rank
order the entire population of applicants by risk.
Applying statistical methods to the credit decision is not new. In particular,
the area of credit scoring for consumer credit has developed into a multi-million
dollar industry[1]. Further, credit scoring for consumer lending is well
documented in the literature[2]. However, measuring the effectiveness and the
success of scoring initiatives have not had the same widespread appeal. In fact,
the emphasis is often on the efficiency of scoring algorithms (lowering loan
losses) rather than on the effectiveness derived from scoring models (lowering
application approval time or increasing customer dollars spent per account). As
a result of this void in the industry, it is the objective of this article to develop a
“scoring effectiveness” or “scoring impact” measurement tool. In order to
ensure the applicability of this measurement, this “scoring quality index”
International Journal of Quality
comprises financial and credit industry standard portfolio statistics. These & Reliability Management,
performance measures are then applied to a real life credit portfolio for Vol. 12 No. 4, 1995, pp. 79-85,
© MCB University Press,
illustration purposes. 0265-671X
IJQRM Management information
12,4 Normally, in credit scoring, the following statistics are generated and evaluated
with respect to evaluating the performance of scoring models. These statistics
are created in an effort to measure portfolio efficiency or output. They are:
● acceptance rate based on approvals of application volume;
80 ● adherence to expected score distributions;
● frequency of reversing score decision (overrides);
● bad rate (number of accounts) as a percentage of total number of
accounts; and
● loan losses versus profitability.
While these statistics are valid measures, they only describe how well the
scoring algorithms are working within a very narrow focus. For instance,
acceptance levels or approval rates are a direct result of cut-off score. To control
approval rate, an adjustment of cut-off score can be performed. Since the
scorecard is a rank-orderer of risk, the higher the cut-off score, the fewer the
number of applications (and higher quality) which will be accepted.
Measurement of approval rates is well established within the industry.
As a second example, bad rate can be adjusted directly as well (through
scoring) by reducing approval rate. Fewer accounts approved will result
eventually in lower write-off levels. Once again, this is a valid measure for
evaluating scorecard efficiency. If the scoring instrument is not working
properly, then these changes would not produce the expected results. Although
these statistics indicate the performance of scorecards in the short term, more
information is often required. For example, these measures do not really
measure the true impact on the portfolio brought about by the introduction of
scoring. The quality measures introduced in this article will attempt to measure
the true effectiveness derived from scoring.

Quality measures
As stated, the author wishes to develop a scoring quality index which measures
scoring impact on the rest of the organization. As an example, consider the
average time it takes to approve a credit application. If the introduction of
scoring initiatives at a credit operation has been effective, then approval time
should decrease substantially. More applications will be able to be processed in
a shorter amount of time resulting in customers having access to their credit
faster. Further, accuracy levels should begin to increase due to the automated
nature of the adjudication process.
In all, the customer, as well as the employee, will, over time, begin to
experience a higher level of quality. These are factors which will not change
quickly but rather improve slowly yet steadily over time – truly reflective of the
impact of scoring.
Below, the ten criteria to be evaluated are listed: Credit scoring
(1) approval time (each application for credit); quality
(2) approval accuracy; measures
(3) authorization time (each transaction);
(4) override level;
81
(5) average dollars spent per account;
(6) interest revenue per account;
(7) current balance as a percentage of outstanding dollars;
(8) dollars 30 day delinquent to outstandings;
(9) write-off dollars per account;
(10) collection time per account.
These ten components will provide some insight into the magnitude of the
impact from scoring throughout the organization. Following the
recommendations of Noori and Gillen[3], these criteria are evaluated based on a
subjective scale peculiar to the particular organization (and its objectives) being
evaluated. The scale is produced by establishing primarily three benchmarks.
The first benchmark represents the worst case scenario and receives zero
points. Second, the best case represents the full achievement of realistic goals
within the short term and this value receives a maximum of ten points. The
third benchmark establishes the organization at a particular point in time
(usually somewhere between the first two). This value receives three points.
The remaining points corresponding to achievement levels are established
using linear extrapolation between the present case and best case. Then, at
some future point, these ten criteria are measured again. Points are awarded for
the measurement values across the ten factors and are totalled to provide an
overall scoring quality index.
As an illustration of this technique, these quality measures are applied to a
banking example. Although the bank requested that its identification remain
confidential, the data do represent actual goals and accomplishments which
have been achieved since the introduction of credit scoring (i.e. 18 months
previous). In addition, bank executives were interviewed to obtain feedback and
input into the scale of the values and scoring accomplishments which have been
witnessed elsewhere within the financial industry in Canada.
The ranges for the ten criteria are discussed below.

Approval time
The approval time is affected greatly by scoring. Scoring helps to streamline the
whole approval process and allows for rapid decisions when the answer is clear-
cut. When the applicant scores in the grey area around cut-off, then more time
can be allocated for review. As a whole, however, scoring will bring the average
approval time down. In this particular application, the approval time is
IJQRM estimated to range from a worst case of ten days on average down to one day
12,4 (see Table I for full details). As an illustration of the calculations, the historical
or experienced (before the introduction of scoring) approval time was an
average of nine days and is awarded three points. The best case of a one-day
turn-around time is given ten points. As a result, each day decrease (from nine
to one) is then awarded an increase of (10 – 3)/(9 – 1) = 0.875 points. After the
82 implementation and utilization of scoring for a period of approximately a year-
and-a-half, the time to approve has improved to three days on average. This
equates to an improvement of (9 – 3) × 0.875 = 5.25 points, which is then added
to the score of three awarded to the historical level of nine days. The quality
score for this criterion results in a value of 8.25.

Approval accuracy
The accuracy levels corresponding to the application evaluation and
subsequent approvals will be affected. Due to the automation of much of the
process, routine applications will have very few errors. The worst and best case
conditions here are estimated to be 80 and 95 per cent accurate, respectively. In
this banking application, scoring accuracy in the one-and-a-half years since the
inception of scoring has increased from 85 to 92 per cent for an effectiveness
rating or quality score of 7.90.

Authorization time
Authorization time is another area which can be enhanced greatly by scoring.
Whether accounts are scored based on performance information (on an ongoing
basis) or not, proper scoring can influence which characteristics are
investigated at the time of any particular transaction. The more streamlined the
decision process, the quicker the authorization is approved. Quality will be
perceived by the customer when the bank responds to his/her purchase request

Historical Present day


Criteria Worst Current Best New Weight

(1) Approval time (days) 10 9 1 3 8.25


(2) Approval accuracy (per cent) 80 85 95 92 7.90
(3) Authorization time (seconds) 50 40 5 7 9.65
(4) Override level (per cent) 20 15 5 13 4.40
(5) $ spent/account 50 100 200 160 7.20
(6) Interest/account 0 0.20 1.00 0.80 8.25
(7) Current $/account (per cent) 0.50 0.60 0.85 0.80 8.60
(8) Two plus cycle $/account
(per cent) 0.10 0.06 0.02 0.03 8.25
(9) Write-off $/account (per cent) 0.02 0.016 0.01 0.013 6.50
(10) Collection time 20 18 10 12 8.25
Table I. Scoring quality index 77.25
Evaluation of criteria
in a shorter interval of time. Best and worst case conditions here are 50 and five Credit scoring
seconds respectively. The quality score measure in this case equates to 9.65. quality
Override level
measures
Overrides are a particularly interesting component of scoring. Even when the
scorecards are working at peak efficiency, there is still a significant number of
overrides occurring. This is due to the fact that no scorecard can incorporate 83
every single possible factor which will influence payment behaviour. There will
always be times when there are factors which have to be considered in addition
to the scorecard characteristics. Hopefully, with an effective scoring algorithm
these exceptions are limited to about 5 per cent. Here the rating is only 4.40 as
the override rate is still high at about 13 per cent of the applications.

Average dollars spent per account


Approving the quality of the customer base will eventually result in more
outstanding balances being current and less dollars outstanding being
delinquent or forced to be written-off. This and the other “percentage of
outstanding dollars” measures that follow are critical individual measures of
effectiveness. Here the quality measure is 7.20.

Interest revenue per account


The more spending and current dollars, the more the interest revenue will
accrue. The improvement in this measure is reflected by a quality score of 8.25.

Current balance as a percentage of dollars outstanding


The importance of this measure pertains to the objective to match each customer
with the appropriate credit limit. Further, if collection initiatives are targeted
effectively, the incidence of delinquent accounts should decrease resulting in more
current balances. Improvement is shown here with a score of 8.60.

Dollars two or more cycles delinquent as a percentage of outstandings


Not all accounts can be monitored in the collection department. Correct
management of a collection department through scoring can reduce workloads
and reduce delinquency. The resulting score of 8.25 for this criterion reflects a
large reduction in the number of collection cases.

Dollars written off as a percentage of outstandings


If the criteria discussed above are all managed effectively, then this factor
should also show improvement. However, due to the fact that write-off values
often reflect accounts which were booked some two years previous, this is
usually the last quality measure to be affected by a scoring initiative. The
improvement here has been a modest 6.50.

Collection time spent per account


Once again, this can be a direct result of the portfolio improvement realized
from scoring. Fewer accounts are in collection, and, as a result, less time is spent
IJQRM on the phone chasing customers to honour their payment agreement. The
12,4 quality measure in this case is 8.25.

Total scoring quality index


This measure reflects the overall or combined effectiveness of the scoring
programme and provides a direct assessment of performance. The score of
84 77.25 represents the overall effectiveness or achievement of scoring in this bank.
To gain an insight into the reasons for deviation from objectives, the individual
scores for the ten criteria need to be examined. At this bank, the two major
deviations from objectives related to the override level and interest revenue per
account. As a result, current policies, procedures and objectives surrounding
overrides and write-offs must be addressed.

Conclusions
One point should be cautioned. There has been much debate over whether
individual statistics can or should be combined to produce one overall quality
measure. This measure is definitely representative of a static number, but the
author hesitates to compare future growth with this number. Some individual
measurement values could increase and some decrease and, as a result, the
overall number could increase yet overall performance measures such as
profitability could go down. Therefore, in order to be truly effective, this scoring
quality index should be tied to whatever criteria with which the entire portfolio
is eventually evaluated by senior management. For example, if profitability is
the key success driver, then each factor should be weighted by its affect on
profitability and then combined. Otherwise, a single quality number of this type
will confuse more than enlighten the scoring participants.

References
1. Fair, Isaac and Company Incorporated Annual Report, San Rafael, CA, 1990.
2. Reichert, A.K., Cho, C.C. and Wagner, G.M., “An examination of the conceptual issues
involved in developing credit-scoring models”, Journal of Business and Economic
Statistics, Vol. 1 No. 2, 1983, pp. 101-14.
3. Noori, H. and Gillen, D., “A performance measuring matrix for capturing the impact of
advanced manufacturing technology”,Working paper, REMAT, Wilfred Laurier
University, Waterloo, Ontario, 1992.

Further reading
Altman, E.I., “Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy”, Journal of Finance, Vol. 23, September 1968, pp. 589-609.
Bierman, H. and Hausman, W.H., “The credit granting decision”, Management Science, Vol. 16,
1970, pp. 519-32.
Doreen, D.D. and Farhoomand, F., “A decision model for small business loans”, Journal of Small
Business (Canada), Vol. 1 No. 2, 1983, pp. 57-71.
Efron, B. and Morris, C., “Limiting the risk of Bayes and empirical Bayes estimators – Part II:
the empirical Bayes case”, Journal of the American Statistical Association, Vol. 67, 1972,
pp. 130-9.
Eisenbeis, R.A., “Pitfalls in the application of discriminant analysis in business finance and Credit scoring
economics”, Journal of Finance, 1977, Vol. 32, pp. 875-99.
Gilbert, L.R., Menon, K. and Schwartz, K.B., “Predicting bankruptcy for firms in financial
quality
distress”, Journal of Business Finance and Accounting, Vol. 17 No. 1, 1990, pp. 161-71. measures
Koh, H.C. and Killough, L.N., “The use of multiple discriminant analysis in the assessment of the
going-concern status of an audit client”, Journal of Business Finance and Accounting, Vol. 17
No. 2, 1990, pp. 179-92.
Laird, N.M., “Empirical Bayes methods for two-way tables”, Biometrika, Vol. 65, 1978, pp. 581-90. 85
Leonard, K.J., “An empirical Bayes analysis of credit scoring models”, ASA Conference
Proceedings, Boston, MA, 9-13 August 1992, pp. 236-9.
Leonard, K.J., “Credit scoring models for the evaluation of small business loan applications”, IMA
Journal of Mathematics Applied in Business and Industry, Vol. 4 No. 1, 1992, pp. 89-95.
Platt, H.D. and Platt, M.B., “Development of a class of stable predictive variables: the case of
bankruptcy prediction”, Journal of Business, Finance and Accounting, Vol. 17 No. 1, 1990,
pp. 31-51.
Tomberlin, T.J., “Predicting accident frequencies for drivers classified by two factors”, Journal of
the American Statistical Association, Vol. 83, 1988, pp. 309-21.
Wiginton, J.C., “A note on the comparison of logit and discriminant models of consumer credit
behaviour”, Journal of Financial and Quantitative Analysis, Vol. 15, 1980, pp. 757-70.
Zmijewski, M.E., “Methodological issues related to the estimation of financial distress prediction
models”, Journal of Accounting Research, Supplement 1984, pp. 59-82.

You might also like