You are on page 1of 27

Challenges arising when including

macroeconomic variables in survival


models of default
Dr Tony Bellotti
Department of Mathematics, Imperial College London
a.bellotti@imperial.ac.uk

Royal Statistical Society, 10 February 2016

Discrete Survival Models for Retail Credit Scoring

Outline of presentation
1. Background: credit scoring models
2. Including macroeconomic variables (MEVs) using
a discrete survival model.
3. Challenges

4. Results based on mortgage data including stress


test

Background - Credit Scoring

Typically, risk models for retail credit are models of default.

Hence this is a statistical classification problem.

Almost universally, logistic regression is the model of choice in


retail banks.
= 0| = = where = 0 +
where 0,1 is a default indicator, is the logit link function and
is the scorecard.

Default models are used for various functions including application


decisions, behavioural scoring and loss provisioning.
For example, for application scoring, the bank sets a threshold ,
depending on risk appetite, then accepts new applications new iff
new > .

Introducing macroeconomic variables (MEVs)

More accurate risk management, sensitive to changing economic


conditions.

Pressure from regulators (Basel Accord internationally and


Prudential Risk Authority specifically in UK) to develop models that
calibrate economic conditions against credit risk, enabling forecasts
of risk during recession periods (stress test).

Logistic regression does not naturally allow inclusion of


macroeconomic time series, but survival models do through timevarying covariates (TVCs).

Discrete survival model is good option since:


(1) The default (failure) event is observed at discrete intervals (ie
monthly accounting data).
(2) Computationally efficient.

List of Challenges
1.
2.
3.
4.
5.
6.
7.
8.

Selection of MEVs
Structural form of MEVs (ie lag and transformations)
Time trend in MEVs
Correlations amongst MEVs
Systematic effects not fully explained by MEVs
Change in MEV risk factors over time
Segmentation and interactions
Confounding economic effects with behavioural variables

We will illustrate these challenges in this presentation using an


example of US mortgage credit risk modelling.

Discrete survival model structure for credit risk


= 1| = 0 for < , , , +
= 0 + + + + + + + +
where

+
+

Outcome on account after some discrete duration > 0:


1 = default, 0 = non-default.
Typically, duration is age of the account.
Non-linear transformation of duration; Baseline hazard.
eg, = , 2 , log , log 2
Static variables; eg application variables and cohort effect.
Behavioural variables over time (with some lag ).
Date of origination of account .
Frailty term on account .
MEVs over calendar time + (with some lag ).
Unknown systematic (calendar time) effect.

Model estimation

This is a panel model structure over accounts and duration .

Need to specify a link function . This could be logit or probit.


Taking to be complementary log-log, ie
= 1 exp exp , yields a discrete version of the Cox
proportional hazard model.

Most of the variables are included as fixed effect terms.

Frailty can be included as a random effect term to deal with


heterogeneity.

Maximum marginal likelihood can be used to estimate coefficients on


fixed effects (0 , , , , ) and variance of the random effects.

US Mortgage data
We will use a large data set of account-level mortgage data for
illustration.

Freddie Mac loan-level mortgage data set.


Origination: 1999 to 2012.

181,000 loans (random sample, stratified by origination).


Default event: D180 (180 days delinquency), short sale or short
payoff prior to D180 or deed-in-lieu of foreclosure prior to D180.

Heat map:
Default rate by calendar month and account age

Account age (months)

150

100

50

Jan
1999

Jan
2003

Jan
2007

Jan
2009

Jan
2011

Jan
2013

Hazard probability

Age of mortgage effect = Hazard probability

50
100
Loan age (months)

150

Calendar time effect

Default
risk

This is the estimated risk over calendar time with (full line) and without
(dashed line) age, vintage and seasonality included in the model.

Jan
1999

Jan
2003

Jan
2007

Jan
2011

We see that not including other time components would lead to


inaccurate coefficient estimates for the MEV effects.

Challenge 1: Selection of MEVs


1.
2.
3.

Consider MEVs that we would expect to have a direct effect on default.


National versus local MEVs.
Consider MEVs that are required for stress testing, as specified by
regulators or the business.

For this exercise: US GDP, Unemployment rate (UR), House price


index (HPI) and interest rate (IR).

Challenge 2: Structural form of MEVs


How should we include the MEVs in our model?
Things to consider: Choice of lag structure (including possibly geometric lag);
Whether to use difference in MEV;
Whether to smooth the MEV time series prior to inclusion in the
model;
Whether to include seasonally adjusted or real values;
Cumulative effects (eg we may expect high unemployment to
have a greater effect, the longer it continues);
Whether the MEV need to be transformed prior to inclusion in
the model (eg log transform for price index variables).

Challenge 3: Time trends in MEVs


Time trends in the MEVs are problematic since this could lead to
spurious correlation with default risk.
Simulation studies in the survival model setting show time trends
in MEVs can lead to errors in coefficient estimates.
Therefore do not include MEVs with time trends. This effects
GDP and HPI, in particular.

Solution #1: First difference? But this will only fit default risk
against short changes in MEVs (is this just noise?).
Solution #2: Annual difference? This is better since this gives
information regarding change in economy over a period of time;
eg GDP growth. Also, do not need to worry whether or not to
seasonally adjust.

First attempt at modelling


Variable

Lag *
(months)

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.80

0.029

<0.0001

IR (log)

-7.30

0.158

<0.0001

HPI

21

-3.72

0.372

<0.0001

GDP

15

-6.35

0.900

<0.0001

UR

+0.245

0.0093

<0.0001

UR

12

-0.209

0.0149

<0.0001

Age effect
Vintage effect
Seasonality

* Choice of lag by finding best fit in a series of univariate studies.

try removing UR
Variable

Lag
(months)

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.82

0.029

<0.0001

IR (log)

-7.37

0.157

<0.0001

HPI

21

-3.68

0.369

<0.0001

GDP

15

+4.65

0.438

<0.0001

UR

+0.186

0.0083

<0.0001

Age effect
Vintage effect
Seasonality

but now we have a problem with estimate for GDP.


What is going on?

Challenge 4: Correlations between MEVs

HPI

GDP

UR

UR

HPI

0.564

-0.835

-0.431

GDP

0.564

-0.728

-0.842

UR

-0.835

-0.728

0.543

UR

-0.431

-0.842

0.543

This correlation matrix demonstrates some very high correlations


amongst the MEVs.
Solution #1: Variable selection but this may remove some
variables that are required in stress testing.
Solution #2: Factor analysis to determine macroeconomic factors
(MFs) to include in the model.

Principal Component Analysis on MEVs


Variable

MF1

MF2

MF3

HPI

+0.474

-0.596

-0.562

GDP

+0.528

0.359

0.431

UR

-0.524

0.375

-0.512

UR

-0.471

-0.613

0.486

Proportion
of variance

74.5%

18.5%

4.5%

The first component (MF1) represents much of the economic effect


among the MEVs.
MF1 also has an unambiguous interpretation as a measure of
economic health.
The remaining components do not account for much of the variance
and do not have a natural interpretation, hence only MF1 will be
included in the model.

Structure: Relationship between MF1 and default

Score

Bad

Good
Bad ----------------

MF1

--------------- Good

There is a distinct breakpoint in the risk profile of MF1.


This can be modelled with an interaction term.

Model with MF1


Variable

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.83

0.030

<0.0001

IR (log)

-7.41

0.158

<0.0001

MF1

-0.059

0.0056

<0.0001

(High MF1)

-2.17

0.0801

<0.0001

(High MF1) MF1

-0.331

0.0119

<0.0001

Age effect
Vintage effect
Seasonality

(High MF1) is an indicator variable with value 0 or 1.

-
-

Challenge 6: Residual systematic effect


Allow a calendar fixed effect to model residual systematic risk, not
modelled by MEVs (or by seasonality).
Compute standard deviation (s.d) of this effect to estimate size of the
unexplained systematic effect.
Model

Unexplained
effect (s.d)

Age, vintage but no MEVs

0.705

Age, vintage and linear MF1

0.223

Age, vintage and nonlinear MF1

0.131

Including MF1 explains much of the systematic effect, but a residual


remains unexplained (19%).
Including the MF1 with the interaction term improves the fit.
The residual estimate is important to quantify conservatism if using
the model for forecasting or stress testing.

Challenge 7: Economic model breakpoints


How stable are MEV risk factors?

One question we may have is whether the effects of


MEVs on default risk are stable over time.
In particular, after an economic regime changes.
Use a breakpoint model to test for this

Breakpoint model
Variable

Coefficient
estimate

SE

P-value

Expected
sign

MF1

-0.076

0.0059

<0.0001

(High MF1)

-1.82

0.0984

<0.0001

+0.679

0.122

<0.0001

(High MF1) MF1

-0.227

0.0152

<0.0001

D MF1

+0.0511

0.0260

0.0498

Other time
effects

D = indicator variable:

1 if calendar date before or during Feb 2006,


0 otherwise

D MF1 effect is not significant, indicating no evident difference in


economic effect in the two different time periods.

Further challenges
7. Segmentations / interactions
It is plausible that different segments of the population will react to
different MEVs in different ways.
Eg high LTV accounts may be more sensitive to changes in HPI.
Therefore, explore different model segments or variable interactions.
8. Confounding economic effects with behavioural variables (BVs)
We may want to include BVs as TVCs; however, the MEVs may be
confounders for these effects.
Eg economic conditions may affect repayments, in general.
Therefore, test for confounding and build the model in stages (eg
include MEVs in first stage, then introduce BVs).

Default rate (monthly)

Validate model with back-testing

Jan
1999

Jan
2003

Jan
2007

Jan
2011

Jan
2013

Use Default Rate (DR) during that period to measure performance.


Use conservatism (as 2 s.d of unknown systematic effect).

Result 2: Stress test results


Scenario

UR

GDP

HPI

IR

Baseline

-2% over 2 years

2.5% growth per


annum

7% increase
per annum

No change

Stress

Rise from 7.4% to


peak of 10.6%
over 2 years

Reduction from
+2% to -2%
growth per annum

Zero increase

No change

IR rise

-2% over 2 years

2.5% growth per


annum

7% increase
per annum

Average +2%
increase over 2
years

Projection of annual default rate:


Scenario Conservatism
Year 1

Year 2

Baseline

No

1.21%

0.83%

Stress

No

1.67%

2.34%

Stress

Yes

2.21%

3.20%

IR rise

No

1.73%

2.76%

Conclusion

1. Including MEVs in credit risk models is valuable to enable


more accurate risk management, sensitive to economic
changes, as well as stress testing.
2. We have illustrated several challenges when including MEVs
in credit risk models.

3. And suggested several approaches to handle these


challenges, demonstrating results on a large US mortgage
data set.