Credit risk

© All Rights Reserved

2 views

Credit risk

© All Rights Reserved

- Bmgt230 Exam 3
- Rr210504 Design and Analysis of Algorithms
- Curve Fitting Made Easy
- SSRN-id1590245
- 10_conflict and Edu-corrected
- Party
- Brashares_2003_CB_CorrelatesMammalExtiction
- User Guide Documentation
- Creative Accounting and Impact on Management Decision Making
- Seeded Watershed
- l81
- 06423262
- ISO 10723
- Models for predicting body dimensions needed for furniture design of junior secondary school one to two students
- Hilwa Hilmy
- statistics chapter 4 project
- skittles project
- Intro+Methodology
- Statistical Advisor
- INFLUENCE OF HUMAN RESOURCE STRATEGIES ON EMPLOYEE PERFORMANCE AT KENYA REVENUE AUTHORITY, NAKURU BRANCH, KENYA

You are on page 1of 5

Common Problems, Practical Solutions

Part 2: Modeling Strategies

by Jeffrey S. Morrison

C

ontinuing the series dealing with requirements for Basel II

and building PD and LGD models, this article offers some

tips in model-building strategy so the analyst can get the most

out of the data available. The focus is on selecting the right predictors,

dummy variables, transformations, and interactions, evaluating influen-

tial outliers, and developing segmentation schemes.

V

olumes of literature have been written on Selecting the Right Predictors

probability of default, loss given default, and Although good modeling data may be initially

the statistical techniques used to estimate hard to find, once your bank has put together the

them. Theres a deficit of literature, however, on practi- informational infrastructure for the New Capital

cal advice so the analyst can produce better models Accord, you might have more data than you know

more quickly. For each topic there are a variety of what to do with. In building a PD model using a

approachesmany of which can work equally well. regression technique, typically only 10 to 15 predictor

Regardless of whether the analyst is building a PD or variables are finally chosen. However, you may have

LGD model, some practical strategies can provide a 200 or more potentially predictive attributes coming

good jump-start in aligning your modeling process to from your loan accounting systems, not to mention

the New Basel Capital Accord. In this article, we will outside vendors such as economic service providers.

refer to a hypothetical dataset to model the probability So how do you narrow down all that information? As

of default using such predictive attributes as LTV, indicated in the May 2003 issue of The RMA Journal,

payment history, bureau scores, and income. And sound judgment, combined with knowledge of simple

although these techniques can be implemented in just correlations in your data, is a good starting point.

about any statistical language, examples will be given Luckily, there are some additional tools to make the

using the SAS software language as well as S-PLUS. model-building process a little less stressful.

2004 by RMA. Jeff Morrison is vice president, Credit MetricsPRISM Team, at SunTrust Banks, Inc., Atlanta,

Georgia.

98 The RMA Journal May 2004

Preparing for Basel II:

Pa r t 2

The first is the automatic variable selection rou- Clustering routines do not offer any insight into

tines found in most statistical packages. Generally, the relationship of the individual variables and our

they are called forward stepwise, backward stepwise, dependent variable (PD or LGD). What they do

and best-fit procedures. Forward selection builds the offer, however, is a much easier way of looking at

model from the ground up, entering a new variable your predictive attributes. It may be that Cluster 1

and eliminating any previous variables not found sta- represents delinquent payment behavior data.

tistically significant according to a specified criteria Cluster 2 could reflect geographic or economic data.

rule. The backward stepwise procedure works just Therefore, in a regression procedure you should be

the opposite. It starts with all available predictors sure to test variables from each cluster. So how do

and eliminates them one by one based upon their you know which to pick? One way is to choose those

level of significance. Best-fit procedures may take variables with the lowest ratio values as indicated in

much more computer time because they may try a Column Da measure easily computed by most

variety of different variables to maximize some popular statistical software packages such as SAS. In

measure of fit rather than the significance level of our example, we might select VAR7 from Cluster 1

each variable. Whats the advice? Experiment with and VAR5 from Cluster 2 to try in a stepwise regres-

them all, but dont be surprised if the backward step- sion.

wise procedure works a little better in the long run.

However, a word of cautiononly include variables Dummy Variables

in your model that make business sense, regardless of Much of the information you might have on an

what comes out of any automatic procedure. account is in the form of a categorical variable.

One major drawback of these procedures is that These are variables such as product type, collateral

they can lead to variables in your final model that are code, or the state in which the customer or collater-

too correlated with one another. For example, if local al resides. For example, lets say you have a product

area employment and income are highly correlated type code stored as an integer ranging from 1 to 4.

with one another, the stepwise procedures may not be If the variable was entered into a regression model

able to correctly isolate their individual influence. As a without any changes, then its coefficients meaning

result, they could both end up in your model. This is might not make much sense, let alone be predic-

a condition called multicollinearity and may cause prob- tive. However, if we recoded it as follows, then the

lems in the model-building process. One way of

avoiding this problem is to perform a correlation Figure 1

analysis before you run any regression procedure. The Variable Clustering

adviceonly include predictor variables in your

A B C D

regression that have a correlation with each other less R-Squared R-Squared 1-R-Squared

than .75 in absolute value. If you do run into variables Variable Own Cluster Next Closest Ratio

that are too highly correlated with one another, choose

Cluster 1

the one that is the most highly correlated with your

dependent variable and drop the other. VAR1 .334 .544 1.460

A second tool is called variable clustering, a set of

procedures that can help the analyst jump some of the VAR2 .544 .622 1.206

hurdles associated with using an automatic selection VAR7 .322 .242 0.894

routine. This procedure seeks to find groups or clus-

ters of variables that look alike. The optimal predic- Cluster 2

tive attributes would be those that are highly correlat- VAR5 .988 .433 0.021

ed with variables within the cluster, but correlated

very little with variables in other clusters. SAS, for VAR8 .544 .231 0.592

example, has an easy-to-use procedure called PROC

VAR3 .764 .434 0.416

VARCLUS that produces a table such as the one

shown in Figure 1. VAR4 .322 .511 1.386

99

Preparing for Basel II:

Pa r t 2

regression could pick up the differences each prod- dure called binning sometimes helps the analyst

uct category makes in the PD or LGD, all other increase the models predictive power even further.

things remaining equal: Although there are a variety of different binning

IF PRODUCT_CODE=1 THEN DUMMY_1=1; approaches available, all leverage off the idea that

ELSE PRODUCT CODE=0; breaking down a variable into intervals can provide

IF PRODUCT_CODE=2 THEN DUMMY_2=1; additional information. When dealing with a variable

ELSE PRODUCT CODE=0; that has continuous values, such as a bureau score, a

IF PRODUCT_CODE=3 THEN DUMMY_3=1; two-step process can be implemented. First we do

ELSE PRODUCT CODE=0; something called discretizing, where the original value

IF PRODUCT_CODE=4 THEN DUMMY_4=1; is changed into ranges in order to smooth the data

ELSE PRODUCT CODE=0; creating a small number of discrete values. An exam-

What you now have are four product code ple of this process is shown in Figure 2, where the

dummy variables that will allow the model to esti- bureau score was recoded into having only five values.

mate the default impact among different product Now that you have regrouped the data into five

types. By convention, one of the variables has to be range values, simply create dummy variables from

left out in the regression model or an error will them as shown in Figure 3.

result. Therefore, you would include dummy_1, Note that the last grouping variable was not done

dummy_2, and dummy_3 in your regression model because we would automatically leave this out of the

rather than a single variable presenting all values of model by convention. The reason this procedure

the product code. sometimes works to increase the accuracy of the

model is because it allows the model to be more flex-

Variable Binning ible in estimating the relationship between the

Building on the dummy variable concept, a proce- bureau score, for example, and the default when the

relationship may not be strictly linear. This

Figure 2 is a great technique to try on other variables

such as LTV, income, or months on books.

Step 1Discretizing your Data

Grouping

SAS Code Transformations

Range

Many times, simple transformations of

455-569 IF B_SCORE >455 AND B_SCORE<=569 THEN B_SCORE=569;

the data end up making the model more

570-651 IF B_SCORE >569 AND B_SCORE<=651.5 THEN B_SCORE=651;

predictive. If you are working with a PD

652-695 IF B_SCORE >651 AND B_SCORE<=695 THEN B_SCORE=695; model, one thing you could do as part of

696-764 IF B_SCORE >695 AND B_SCORE<=764 THEN B_SCORE=764; the preliminary analysis is to run a logistic

765-850 IF B_SCORE >764 AND B_SCORE<=850 THEN B_SCORE=849; regression using only a single predictor

Figure 3 Figure 4

Step 2Dummy Variables / Transformation

Binning Transformation AnalysisLogistic

B_SCORE_1 =0;

Obs _Status_ _Lnlike_ Name T_Type Recomm

IF B_SCORE=569 THEN B_SCORE_1 =1;

1 0 Converged -423.744 INCOME XSQ <-------*

B_SCORE_2 =0;

2 0 Converged -423.744 INCOME QUAD

IF B_SCORE=651 THEN B_SCORE_2 =1;

3 0 Converged -434.979 INCOME NONE

B_SCORE_3 =0;

IF B_SCORE=695 THEN B_SCORE_3 =1; 4 0 Converged -443.669 INCOME SQRT

Preparing for Basel II:

Pa r t 2

hood measure as shown in Figure

Influence Plot

4. This measure is automatically

produced by most statistical soft- Logistic Influence Results

ware packages. 6

If you do the same for various

5

Change in Deviance

transformations such as the square,

100

the square root, and the log of the 4

variable, pick the transformation

3

where the log likelihood measure

is closest to 0. Include this win- 2

ning transformation in your full 1

regression model along with other

predictors. However, there is no 0

guarantee that such a transforma- 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

tion will improve the overall fit of Estimated Probability

the model once the other predictor

variables have been added. Remember, modeling is observations that show up on a graph of this type,

as much of an art as it is a science! but the worst ones should be reviewed and a deci-

sion made as to whether they should be deleted.

Interactions With respect to a models implementation code, it is

Additional variables can be created reflecting the always a good idea to limit extreme predictor values

interaction of two variables together that may some- to certain maximums or minimums. For example,

times end up increasing your models accuracy. you might want to restrict an LTV value to be

Usually the creation of these variables comes from between 0 and 100 in the coding algorithm before

prior knowledge as to what should influence default coefficients or weights are used to predict the proba-

probabilities or LGD. To create such a variable, sim- bility of default.

ply multiply together the variables you wish to inter-

act and include the new variable in the regression. Segmentation

Sometimes a better overall PD or LGD model

Influential Outliers can be built if a segmentation analysis is performed.

Observations called influence points can impact Segmentation in this context refers to the develop-

the parameter estimates more than others. During ment of more than one PD or LGD model for better

the model development process, it is always good to overall accuracy. The effectiveness of devoting the

produce an influence plot so you can decide what time needed to build additional models depends on

corrective action (if any) is necessary. Figure 5 is a the predictive information available and whether or

bubble chart showing that observation #100 exerts not the segmented populations are really that differ-

significant influence on the size of your regression ent from one another. Sometimes the segmentation

estimates. This type of analysis usually uses a short- groups are obvious, such as modeling consumer loans

cut approach to see how the predicted probabilities apart from loans to small businesses. Sometimes, the

(and estimated parameters) would change if each segmentation is not so obvious. For example, you

observation were removed. The rule of thumb is to could build a PD model for loans with high, medium,

examine any observations that have a change in or low outstanding balances, or separate models for

deviance greater than 4.0, as represented by the dot- accounts with different levels of income or LTV.

ted horizontal line in Figure 5. Some software pack- So how do you know how to define the high,

ages call these measures by different names, but medium, or low segmentation breaks? One common

most produce some option for looking at influential approach is something called CART. CART stands

observations. In reality, there may always be some for Classification and Regression Treesa method

101

Preparing for Basel II:

Pa r t 2

quite different from regression analysis. It uses an tree. Figure 6 shows that the tree structure produces

approach referred to as recursive partitioning to deter- five nodes. A node represents the end of the splitting

mine breaks in your data. It does this by first finding logic and the segmentation branch of the tree.

the best binary split in the data, followed by the next Therefore, in order to create a CART segmentation

best binary split, and so forth. variable for our regression model, we would code

CART is an exploratory analysis tool that something like this:

attempts to describe the structure of your data in a IF LTV<.545 AND SCORE<651.5 THEN NODE=1;

tree-like fashion. S-Plus, one of the more popular IF LTV<.545 AND SCORE>=651.5 THEN NODE=2;

CART vendors, uses deviance to determine the tree Etc.

structure. As in regression, CART relates a single The same type of coding would be done for

dependent variable (in our case, default or LGD) to nodes 3-5. To introduce this information content into

a set of predictors. An example of this tree-like struc- logistic regression, we would create dummy variables

ture is shown in Figure 6. CART first splits on for each node and use them as predictors in our PD

accounts where LTV is greater than and less than model.

54.5%. Therefore, you could design a PD model for

accounts with LTVs below this cutoff and one for Summary

accounts greater than this cutoff. If you had even Getting the most out of the data is one of the

more time on your hands, you might want to consid- biggest challenges facing the model builder. Anyone

er using score and income to further define your seg- can build a model. However, the challenge is build-

mentation schemes. ing the most predictive model possible. This article

Sometimes it is possible to use CART analysis has offered some tips on building a better mouse-

directly in a logistic regression model. This is a much

trap and how to go about it in a way that is based on

quicker process than building and implementing sound statistical approaches that should satisfy most

separate regression models for each segment. So how regulators. On the other hand, it must be remem-

can we integrate the CART methodology into a sin- bered that building PD and LGD models is more an

gle PD model? The answer is to construct a variable art than a science. Todays model builders come from

that reflects the tree structure for the branches in the

a variety of disciplines, have different academic

training and experience, and are

accustomed to using certain statis-

Figure 6

tical tools. Regardless of these dif-

Typical CART. Segmentation Analysis in S-PLUS Software ferences, there is simply no substi-

tute for knowing your data.

LTV <0.545 Understanding how it was devel-

oped, what values are reasonable,

and how the business works is

essential in the model develop-

Score <651.5 Income <71,000 ment process.

Jeff.Morrison@suntrust.com.

References

Score <719.5

1 0 0 1 Jeffrey S. Morrison, Preparing for Modeling

Node 1 Node 2 Requirements in Basel II, Part 1: Model

Node 5 Development, The RMA Journal. May 2003.

Logistic Regression,1989 John Wiley & Sons, Inc.

1 0 3 Bryan D. Nelson,Variable Reduction for

Node 3 Node 4 Modeling Using PROC VARCLUS, Paper 261-26.

SAS User-group publication.

- Bmgt230 Exam 3Uploaded byNathan Yeh
- Rr210504 Design and Analysis of AlgorithmsUploaded bySrinivasa Rao G
- Curve Fitting Made EasyUploaded bynasser270382
- SSRN-id1590245Uploaded byAnto Rahar
- 10_conflict and Edu-correctedUploaded byiiste
- PartyUploaded bysadeqbillah
- Brashares_2003_CB_CorrelatesMammalExtictionUploaded byCarlos Mourão
- User Guide DocumentationUploaded byreidstar
- Creative Accounting and Impact on Management Decision MakingUploaded byAI Coordinator - CSC Journals
- Seeded WatershedUploaded byetournay
- l81Uploaded byEdwin Johny Asnate Salazar
- 06423262Uploaded byTiffany Fletcher
- ISO 10723Uploaded byJesusGarcia
- Models for predicting body dimensions needed for furniture design of junior secondary school one to two studentsUploaded bytheijes
- Hilwa HilmyUploaded byFatihah Diyana
- statistics chapter 4 projectUploaded byapi-306798358
- skittles projectUploaded byapi-252747849
- Intro+MethodologyUploaded byQuyên Hoàng
- Statistical AdvisorUploaded bySarah White
- INFLUENCE OF HUMAN RESOURCE STRATEGIES ON EMPLOYEE PERFORMANCE AT KENYA REVENUE AUTHORITY, NAKURU BRANCH, KENYAUploaded byImpact Journals
- volume10.pdfUploaded bytechnov
- R09 Correlation and RegressionUploaded byTorakiSato
- 08 Chapter 3Uploaded byAllyssa Jiselle Cabalonga
- 1-Format.man-An Empirical Analysis OfUploaded byImpact Journals
- s00468-012-0729-0Uploaded byTamaryska Setyayunita
- Regression MCQuestionsUploaded bytamizh
- swj1180Uploaded byRogelio FJ
- Chap014.ppt.pptUploaded byRahmat Kurniawan
- CSE-R16-Syllabus_splitted.pdfUploaded byjssai
- 13. Formate -Hum- The Influence of School Based Management Application Towards the Teacher Performance at Senior High School Number 2 PalopoUploaded byImpact Journals

- Advances in MLUploaded byjbsimha3629
- Original Housing PaperUploaded byjbsimha3629
- Churn Prediction Using Logistic RegressionUploaded byjbsimha3629
- CAR Data MartUploaded byjbsimha3629
- AI BrochureUploaded byjbsimha3629
- Linear ModelsUploaded byjbsimha3629
- HR Analytics SkillsUploaded byjbsimha3629
- Module 1_ Enterprise Analytics Lesson Plan1.1.docxUploaded byjbsimha3629
- 1809.10756.pdfUploaded byjbsimha3629
- Six sigmaUploaded byjbsimha3629
- Tour to FranceUploaded byjbsimha3629
- StockMarketPredictionUsingTwitterSentimentAnalysis.pdfUploaded byjbsimha3629
- IJCAI3.pdfUploaded byjbsimha3629
- RM Good AdviceUploaded byjbsimha3629
- Smart Grid CommUploaded byjbsimha3629
- Chakraborty Aaai12Uploaded byjbsimha3629
- Combining Load Forecasts From Independent ExpertsUploaded byjbsimha3629
- stopwords.txtUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- 4.Hypothesis TestingUploaded byjbsimha3629
- 3.DistributionsUploaded byjbsimha3629
- 2.Variable AnalysisUploaded byjbsimha3629
- Data Types and OperationsUploaded byjbsimha3629
- Big Data SyllabusUploaded byjbsimha3629
- TimeseriesUploaded byjbsimha3629
- IRJET-V3I602.pdfUploaded byjbsimha3629
- 01_paperUploaded byjbsimha3629

- NaalayiraDhivyaPrabandham-AllUploaded byapi-3714781
- The World Chess Crown Challenge - Kasparov vs Karpov Seville 87.pdfUploaded byioangely
- en_CCNAS_v11_Ch04Uploaded byjaalrome39
- 15 ER-EER to RelationalUploaded byNathan Wilson
- WebsiteError_ad2ccccb048a4c6f8633a279ada70262Uploaded byLuisReyes
- NounsUploaded bySumanth Mopidevi
- CRP for Oracle R12Uploaded bySumit K
- BecomeAnXcoder(TChinese)Uploaded bypoonluifai
- observation task 3Uploaded byapi-294972438
- first grade curriculum map for 2nd quarter 2016-2017 with spellingUploaded byapi-235643328
- SIP Test ToolsUploaded bywin_raman
- 7.10 Maha Tanhasankhaya S m38 Piya TanUploaded bypancakhanda
- Some Obstacles to Vocabulary DevelopmentUploaded byTrung Tran Van
- ebms-v3-part2Uploaded bymanishsg
- Python ArduinoUploaded byEliseo Rodríguez
- Mysql Workbench EnUploaded bypdadiga
- Bodhi DharmaUploaded bysuhazgummadi
- Guitar LessionUploaded bySagar Bhandari
- Examples SheikUploaded bySheik Mohammed
- ReadmeUploaded byasad153
- Rituals of Power ... by Manuel HermannUploaded bywibleywobley
- Contextual Studies-1_Spring_2015_Lec1-10.pdfUploaded byOwaisAhmedSiddique
- trpl-2016-06-02.a4Uploaded byparasharakumar
- South America-world EnglishesUploaded byBrad
- 1 ap-english-language-course-overviewUploaded byapi-326792212
- ch6 1-worksheetUploaded byapi-286833366
- Arabic Grammar in English- Arabic Grammar for the Holy Qur'AanUploaded byJawedsIslamicLibrary
- Israel ReferencesUploaded bydada pelumi
- Pali Reconsidering the ParadigmUploaded byMattia Salvini
- Integrating Oracle iLearning and WebEx_with_FlowChartUploaded byVinod Patel