## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

David J. Hand

Imperial College London Quantitative Financial Risk Management Centre

August 2009 ______________________________________________________________________________ QFRMC - Imperial College London 1

A scorecard produces a measurement of a property Scorecards have many uses: - compare score with threshold, to make a classification

e.g. predict default / not default - predict what will happen; decide what action to take

**- monitor changing values for management
**

e.g. credit limit increase or decrease if score drops too much

**- monitor one’s own credit score
**

- decide which bank to go to for a loan

______________________________________________________________________________ QFRMC - Imperial College London 2

**How to evaluate a scorecard depends on what it is being used for
**

This talk is about using scorecards for predictive classification e.g. using scorecards to predict the likely future class of a customer, e.g. default / not default

If you are using the scorecard for some other purpose, then the arguments below may not apply

______________________________________________________________________________ QFRMC - Imperial College London 3

Scorecards for classification: problem structure

Past Design sample Measured characteristics Applied to

Present Outcome

Future

Measured characteristics

Outcome

______________________________________________________________________________ QFRMC - Imperial College London 4

**Scorecards for classification
**

Notation: measured characteristics x, outcome c (Here assume two classes, c = 0,1) Aim: use design sample ( xi , ci ) i = 1,..., n - construct score s = f ( x ) - compare score with threshold t class 1 if s > t - assign to class 0 if s ≤ t ⇒ two issues - how to construct the score - how to choose the threshold

______________________________________________________________________________ QFRMC - Imperial College London 5

Distribution of class 1 scores is

f1 ( s ) , with cdf F1 ( s ) Distribution of class 0 scores is f 0 ( s ) , with cdf F0 ( s )

______________________________________________________________________________ QFRMC - Imperial College London 6

**Methods for deriving score function
**

logistic regression linear discriminant analysis naive Bayes segmented scorecards e.g. logistic regression trees

Other methods: quadratic discriminant analysis, regularised discriminant analysis, perceptrons, neural networks, radial basis function methods, vector quantization methods, nearest neighbour and kernel nonparametric methods, tree classifiers such as CART and C4.5, support vector machines, rule-based methods, random forests, etc. etc. etc.

______________________________________________________________________________ QFRMC - Imperial College London 7

**How to choose between them
**

- compare performance - need a performance criterion - constructing rules means estimating parameters - do this by optimising a performance criterion How to measure performance

______________________________________________________________________________ QFRMC - Imperial College London 8

Basic issues

Business vs statistical criteria Design (training) set and test set: apparent performance Symmetric and asymmetric problems - in business most are asymmetric e.g. good/bad risk; profitable/not; fraud/legit; etc

Here let class 1 = ‘good’

______________________________________________________________________________ QFRMC - Imperial College London 9

**Basic misclassification table
**

True class Predicted class 0 1 0 a c 1 b d

a+b+c+d = n

a + c = n0 n n

b + d = n1

Class priors Proportion in class 0 = π 0 = n0 Proportion in class 1 = π 1 = n1

______________________________________________________________________________ QFRMC - Imperial College London 10

Predicted class

0 1

True class 0 1 a b c d

C = Proportion correctly classified = ( a + d ) n E = Proportion misclassified

**1 − F1 ( t ) = prop of class 1 correct = d ( b + d )
**

F0 ( t ) = prop of class 0 correct

Br = Bad rate amongst ‘accepts’ = c (c + d ) = a (a + c)

= (c + b) n

Gr = Good rate amongst ‘rejects’ = b ( a + b )

______________________________________________________________________________ QFRMC - Imperial College London 11

Can represent performance with all choices of t simultaneously with an ROC curve

The Gini coefficient is twice area between curve and diagonal G = 2 AUC − 1 The AUC is the area under the curve:

______________________________________________________________________________ QFRMC - Imperial College London 12

**How to choose threshold, t ? Another way of looking at things
**

Let cost of misclassifying a class i case be ci Then overall loss due to misclassifications is

L = c0π 0 (1 − F0 ( t ) ) + c1π1 F1 ( t )

______________________________________________________________________________ QFRMC - Imperial College London 13

**The threshold T which minimises the loss is given by
**

t

**T ( c0 , c1 ) arg min c0π 0 (1 − F0 ( t ) ) + c1π 1 F1 ( t )
**

Leading to a mapping between c1 c0 and T

{

}

c1 c0 = π 0 f 0 (T ) π 1 f1 (T )

**⇒ choosing threshold ≡ choosing c1 c0
**

But how can we choose the costs ?

______________________________________________________________________________ QFRMC - Imperial College London 14

Strategy 1:

Make default assumptions

Example 1: c0 = c1 = 1 Leading to error rate Example 2: c0 = π 1, c1 = π 0 Leading to Kolmogorov-Smirnov statistic

______________________________________________________________________________ QFRMC - Imperial College London 15

BUT Costs should be chosen on the basis of the problem, on business grounds Not picked for mathematical convenience ! Should you be choosing customers just to make the maths easy?

______________________________________________________________________________ QFRMC - Imperial College London 16

Strategy 2:

Average over all possible costs

L = ∫ ∫ c0π 0 1 − F0 (T ( c ) ) + c1π 1 F1 (T ( c ) ) u ( c0 , c1 ) dc0 dc1

0 0 ∞∞

{ (

)

}

L = ∫ cπ 0 1 − F0 (T ( c ) ) + (1 − c ) π 1 F1 (T ( c ) ) w ( c ) dc

with

0

1

{ (

)

}

c = (1 + c1 c0 )

−1

and c = π 1 f1 (T ) π 0 f 0 (T ) + π 1 f1 (T )

(

)

______________________________________________________________________________ QFRMC - Imperial College London 17

A particular choice of w ( c ) in

L = ∫ cπ 0 1 − F0 (T ( c ) ) + (1 − c ) π 1 F1 (T ( c ) ) w ( c ) dc

leads to

0

1

{ (

)

}

L = 1 − 2π 0π 1 ∫ F0 ( s ) f1 ( s ) ds

−∞

∞

= 1 − 2π 0π 1 AUC

This choice is

= (1 − π 0π 1 ) − π 0π 1G

dT ( c ) dc

w * ( c ) = π 0 f 0 (T ( c ) )

dT ( c ) dc

+ π 1 f1 (T ( c ) )

______________________________________________________________________________ QFRMC - Imperial College London 18

That is, AUC and Gini are equivalent to averaging the loss over a weight function w ( c ) which depends on the observed distributions

BUT Knowledge of c, or, equivalently, knowledge of T or the costs c0 and c1 , must come from information in the problem other than the score distributions !

______________________________________________________________________________ QFRMC - Imperial College London 19

Using the AUC or Gini is equivalent to saying your belief in the relative severity which will be assigned to the different types of misclassifications depends on choice of scorecard

This is absurd

Like measuring my height in millimetres, and yours in feet, and saying I am taller because my number is larger than yours

______________________________________________________________________________ QFRMC - Imperial College London 20

**But (you say) the AUC and Gini have several nice interpretations
**

e.g. AUC is the average value of F0 ( s ) if v = F1 ( s ) is chosen from a uniform x ~ [ 0,1] distribution

AUC = ∫ F0 ( F

0

1

−1 1

( v ) ) dv

______________________________________________________________________________ QFRMC - Imperial College London 21

F1 and F0 are operating characteristics of the scorecard I can choose F1 (or F0) according to whim In particular, I might choose a different 1-F1 for different rules t Rule 1 Rule 2 1-F1(t) 95 95 F0(t) 89 10 1-F1(t’) 80 80 t’ F0(t’) 90 90

______________________________________________________________________________ QFRMC - Imperial College London 22

● In contrast, the distribution

w ( c ) is a matter of belief,

not choice

● It measures the beliefs about the relative severity of

**the different kinds of misclassification
**

●

**w ( c ) cannot vary from scorecard to scorecard
**

misclassifying a good as a bad is ten times as serious as the reverse, but if I use linear discriminant analysis then it is a hundred times as serious

● I cannot say: if I use logistic regression then

**● c is a property of the problem, not the scorecard
**

______________________________________________________________________________ QFRMC - Imperial College London 23

Relationship between costs and sensitivity 1-F1 depends on empirical score distributions f 0 and f1 ⇒ fundamental complementarity

**ϕ ( v ) ~ choice of x distribution for the averaging 1 − F1
**

One cannot simultaneously choose ϕ and w independently of the empirical distributions

w ( c ) ~ choice of c distribution for the averaging costs

If one distribution is independent of the empirical distributions, the other distribution necessarily depends on the classification rule used

______________________________________________________________________________ QFRMC - Imperial College London 24

Which should we use? v is a matter of choice, c is a property of the problem AUC and Gini use ϕ ( v ) ~ U [ 0,1] ⇒ AUC and Gini average over cost distributions which vary from scorecard to scorecard [Same applies to Partial AUC]

______________________________________________________________________________ QFRMC - Imperial College London 25

**It is OK for different people to choose different w
**

- because they are interested in different aspects of performance

It is wrong for a single person to choose different w for different scorecards - doing so implies using different measures for different scorecards

______________________________________________________________________________ QFRMC - Imperial College London 26

Conclusion

• The Gini coefficient does not compare like with like • The Gini coefficient can lead to mistaken performance comparisons • The Gini coefficient can result in incorrect decisions

______________________________________________________________________________ QFRMC - Imperial College London 27

What to do instead?

Need to choose a weight function which is the same for all scorecards

Choice (1): choose w to reflect your personal beliefs in the likely values of c (for each problem) Choice (2): choose a universal standard w - I suggest beta ( 2, 2 ) Recommendation: use both

______________________________________________________________________________ QFRMC - Imperial College London 28

Alternative measure

Choose weight function invariant to choice of scorecard:

w ( c ) = wα ,β ( c ) =

Leading to the H measure Standardise - to lie between 0 and 1

c

α −1

(1 − c ) B (1;α , β )

β −1

- with large value being good

______________________________________________________________________________ QFRMC - Imperial College London 29

Standard H: with (α , β ) = ( 2, 2 )

______________________________________________________________________________ QFRMC - Imperial College London 30

**Example (but not needed !):
**

Whether or not someone will settle outstanding debts immediately: 9 predictors, 6378 cases

Classifier (i): linear discriminant analysis Classifier (ii): logistic regression Classifier (i) Classifier (ii) 0.580 0.500 0.016 0.027

Gini H

______________________________________________________________________________ QFRMC - Imperial College London 31

The Gini coefficient uses different distributions for the classifiers: Linear discriminant

w * (c)

Logistic regression

______________________________________________________________________________ QFRMC - Imperial College London 32

Some references:

Hand D.J. (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. To appear in Machine Learning. R code for the H measure is available on http://stats.ma.ic.ac.uk/d/djhand/public_html/ Krzanowski W.J. and Hand D.J. (2009) ROC curves for continuous data. CRC Press.

______________________________________________________________________________ QFRMC - Imperial College London 33

- 75 Stille&Palmstrom on Class as a Tool
- Discriminant Analysis
- Untitled
- Government Evaluation of 3mtm Leadchecktm Swabs 0
- Credit Risk Management in Retail Lending
- Causes of Exchange Rate Crisis Output Same - SPSS Ouput File
- Credit Scoring and Data Mining
- Consumer Credit Models-Pricing,Profit and Portfolios (2009)
- 110504-0915-10-00-PEKA SAINS 2011
- The Credit Scoring Toolkit - R. Anderson
- Robust Representation and Recognition of Facial
- Rangeland Mystery Box, Montana Education Grades 2 - 6
- Market Research- 1A
- Bioinformatics.F&M.20100722.Bujak
- Fact
- Malware detection using OWA measure
- 2nd edtpa lesson plan - murchison belinda
- 35-2_217-225
- Learning-Based Procedural Content Generation
- 1330330013
- Audio-Based Music Classification
- Evaluation of Different Classifier
- Multivariate Statistical Process Control.ppt
- Family Structure Web Paper.final
- serravallejgisc9216d1
- GP for Classification
- Week 5 Lecture 3
- 961257
- ucgis
- rmsqt-scdl

Skip carousel

- 50109_1935-1939
- Quality Evaluation Technique For Phyllanthus Emblica(Gooseberry) Using Computervision With User Friendly GUI
- tmpA5C9.tmp
- IJSRDV3I110455.pdf
- tmpC287.tmp
- Lemon Disease Detection Using Image Processing
- Garden City Classification and Compensation Study - Final
- A brief Analysis on Offline Character Recognition in Malayalam Scripts
- Colour Object Recognition using Biologically Inspired Model
- tmpB290.tmp
- Draft Online Regulation Policy
- tmpCFA5.tmp
- HB 5031-2011 Records Classification
- tmpC382.tmp
- tmpFFD7.tmp
- A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix Factorization
- New Risk Instrument for Offenders Improves Classification Decisions
- Pattern Classification Using Web Mining
- Named Entity Recognition for English Tweets using Random Kitchen Sink Algorithm
- A Survey on Medical Data Classification
- tmp5555.tmp
- Content based mechanism for social network using MAchine learning
- Improving Accuracy of Text Classification for SMS Data
- UT Dallas Syllabus for stat6348.501.11f taught by Robert Serfling (serfling)
- Principal Component Analysis based Opinion Classification for Sentiment Analysis
- Washington's Offender Accountability Act
- Texture Classification based on Gabor Wavelet
- Segmentation and characterization of skin tumors images used for aided diagnosis of melanoma
- Spam Email Classification & Blocking
- Introduction on Content Based Phishing Detection

- Statistical Classification Methods in Consumer Credit Scoring a Review
- Oetama, Raymond Sunardi - Dynamic Credit Scoring Using Payment Prediction (Dissertação)
- Vladimir Bugera, Hiroshi Konno, Stanislav Uryasev - Credit Cards Scoring With Quadratic Utility Function
- Vuay S. Desai, Daniel G. Conwayv , Jonathan N. Crook, - Credit-Scoring Models in the Credit-Onion Environment Using Neural Networks and Genetic Algorithms
- Von Zuben - Processamento Dinâmico em Redes Neurais Artificiais - Redes Neurais recorrentes
- Zan Huanga, Hsinchun Chena, Chia-Jung Hsua, Wun-Hwa Chenb, Soushan Wu - Credit Rating Analysis With Support Vector Machines and Neural Networks
- Márquez , Javier - An Introduction to credit scoring for small and medium size enterprises
- Machine Learning, Neural and Statistical Classification
- M. B. Yobas ,J. N. Crook, D P. Ross - Credit Scoring Using Neural and Evolutionary Techniques
- Von Zuben -Neurocomputação, Dinâmica Não-Linear e Redes Neurais de Hopfield
- Thomas, L.C. - Consumer Credit Modelling Context and Current Issues
- Thomas, Lyn C - A Survey of Credit and Behavioural Scoring
- Neural Network Metal Earning for Credit Scoring
- Redes Neurais Artificiais Na Melhoria Dew Desempenho de Metodos de Assimilacao de Dados - Filtro de Kalman
- Srinivasan V, Kim, Young H - Credit Granting - A Comparative Analysus of Classification Procedures
- Credit Scoring Systems Critical Analysis by Capon
- D. Michie, D.J. Spiegelhalter C.C. Taylor - Machine Learning, Neural and Statistical Classification (Livro)
- Leung, K - Building a Scorecard in Practice
- Iscanoglu Aysegul - Credit Scoring Methods and Accuracy Ratio
- Elman, Jeffrey, L - Finding Structure in Time
- Hand, J. D., Henley, W, E - Statistical Classification Methods in Consumer Credit Scoring a Review
- Demonstration of Artificial Neural Network in Matlab
- Economic Benefit of Powerful Credit Scoring
- Genetic Algorithm Toolbox - Matlab
- Kin Keung Lai - Neural Network Metal Earning for Credit Scoring
- Fensterstock Albert - Credit Scoring and the Next Step
- Elena Bartolozzi Matthew Cornford - Credit Scoring Modelling for Retail Banking Sector
- David J. Hand - How to Evaluate Credit Scorecards and Why Using the Gini Coefficient Has Cost You Money
- Credit Scoring Theory

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading