You are on page 1of 27

# Challenges arising when including

## macroeconomic variables in survival

models of default
Dr Tony Bellotti
Department of Mathematics, Imperial College London
a.bellotti@imperial.ac.uk

## Discrete Survival Models for Retail Credit Scoring

Outline of presentation
1. Background: credit scoring models
2. Including macroeconomic variables (MEVs) using
a discrete survival model.
3. Challenges

test

## Almost universally, logistic regression is the model of choice in

retail banks.
= 0| = = where = 0 +
where 0,1 is a default indicator, is the logit link function and
is the scorecard.

## Default models are used for various functions including application

decisions, behavioural scoring and loss provisioning.
For example, for application scoring, the bank sets a threshold ,
depending on risk appetite, then accepts new applications new iff
new > .

conditions.

## Pressure from regulators (Basel Accord internationally and

Prudential Risk Authority specifically in UK) to develop models that
calibrate economic conditions against credit risk, enabling forecasts
of risk during recession periods (stress test).

## Logistic regression does not naturally allow inclusion of

macroeconomic time series, but survival models do through timevarying covariates (TVCs).

## Discrete survival model is good option since:

(1) The default (failure) event is observed at discrete intervals (ie
monthly accounting data).
(2) Computationally efficient.

List of Challenges
1.
2.
3.
4.
5.
6.
7.
8.

Selection of MEVs
Structural form of MEVs (ie lag and transformations)
Time trend in MEVs
Correlations amongst MEVs
Systematic effects not fully explained by MEVs
Change in MEV risk factors over time
Segmentation and interactions
Confounding economic effects with behavioural variables

## We will illustrate these challenges in this presentation using an

example of US mortgage credit risk modelling.

## Discrete survival model structure for credit risk

= 1| = 0 for < , , , +
= 0 + + + + + + + +
where

+
+

## Outcome on account after some discrete duration > 0:

1 = default, 0 = non-default.
Typically, duration is age of the account.
Non-linear transformation of duration; Baseline hazard.
eg, = , 2 , log , log 2
Static variables; eg application variables and cohort effect.
Behavioural variables over time (with some lag ).
Date of origination of account .
Frailty term on account .
MEVs over calendar time + (with some lag ).
Unknown systematic (calendar time) effect.

Model estimation

## Need to specify a link function . This could be logit or probit.

Taking to be complementary log-log, ie
= 1 exp exp , yields a discrete version of the Cox
proportional hazard model.

heterogeneity.

## Maximum marginal likelihood can be used to estimate coefficients on

fixed effects (0 , , , , ) and variance of the random effects.

US Mortgage data
We will use a large data set of account-level mortgage data for
illustration.

## Freddie Mac loan-level mortgage data set.

Origination: 1999 to 2012.

## 181,000 loans (random sample, stratified by origination).

Default event: D180 (180 days delinquency), short sale or short
payoff prior to D180 or deed-in-lieu of foreclosure prior to D180.

Heat map:
Default rate by calendar month and account age

## Account age (months)

150

100

50

Jan
1999

Jan
2003

Jan
2007

Jan
2009

Jan
2011

Jan
2013

Hazard probability

## Age of mortgage effect = Hazard probability

50
100
Loan age (months)

150

## Calendar time effect

Default
risk

This is the estimated risk over calendar time with (full line) and without
(dashed line) age, vintage and seasonality included in the model.

Jan
1999

Jan
2003

Jan
2007

Jan
2011

## We see that not including other time components would lead to

inaccurate coefficient estimates for the MEV effects.

1.
2.
3.

## Consider MEVs that we would expect to have a direct effect on default.

National versus local MEVs.
Consider MEVs that are required for stress testing, as specified by
regulators or the business.

## For this exercise: US GDP, Unemployment rate (UR), House price

index (HPI) and interest rate (IR).

## Challenge 2: Structural form of MEVs

How should we include the MEVs in our model?
Things to consider: Choice of lag structure (including possibly geometric lag);
Whether to use difference in MEV;
Whether to smooth the MEV time series prior to inclusion in the
model;
Whether to include seasonally adjusted or real values;
Cumulative effects (eg we may expect high unemployment to
have a greater effect, the longer it continues);
Whether the MEV need to be transformed prior to inclusion in
the model (eg log transform for price index variables).

## Challenge 3: Time trends in MEVs

Time trends in the MEVs are problematic since this could lead to
spurious correlation with default risk.
Simulation studies in the survival model setting show time trends
in MEVs can lead to errors in coefficient estimates.
Therefore do not include MEVs with time trends. This effects
GDP and HPI, in particular.

Solution #1: First difference? But this will only fit default risk
against short changes in MEVs (is this just noise?).
Solution #2: Annual difference? This is better since this gives
information regarding change in economy over a period of time;
eg GDP growth. Also, do not need to worry whether or not to

Variable

Lag *
(months)

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.80

0.029

<0.0001

IR (log)

-7.30

0.158

<0.0001

HPI

21

-3.72

0.372

<0.0001

GDP

15

-6.35

0.900

<0.0001

UR

+0.245

0.0093

<0.0001

UR

12

-0.209

0.0149

<0.0001

Age effect
Vintage effect
Seasonality

try removing UR
Variable

Lag
(months)

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.82

0.029

<0.0001

IR (log)

-7.37

0.157

<0.0001

HPI

21

-3.68

0.369

<0.0001

GDP

15

+4.65

0.438

<0.0001

UR

+0.186

0.0083

<0.0001

Age effect
Vintage effect
Seasonality

## but now we have a problem with estimate for GDP.

What is going on?

HPI

GDP

UR

UR

HPI

0.564

-0.835

-0.431

GDP

0.564

-0.728

-0.842

UR

-0.835

-0.728

0.543

UR

-0.431

-0.842

0.543

## This correlation matrix demonstrates some very high correlations

amongst the MEVs.
Solution #1: Variable selection but this may remove some
variables that are required in stress testing.
Solution #2: Factor analysis to determine macroeconomic factors
(MFs) to include in the model.

Variable

MF1

MF2

MF3

HPI

+0.474

-0.596

-0.562

GDP

+0.528

0.359

0.431

UR

-0.524

0.375

-0.512

UR

-0.471

-0.613

0.486

Proportion
of variance

74.5%

18.5%

4.5%

## The first component (MF1) represents much of the economic effect

among the MEVs.
MF1 also has an unambiguous interpretation as a measure of
economic health.
The remaining components do not account for much of the variance
and do not have a natural interpretation, hence only MF1 will be
included in the model.

## Structure: Relationship between MF1 and default

Score

Good

MF1

--------------- Good

## There is a distinct breakpoint in the risk profile of MF1.

This can be modelled with an interaction term.

Variable

Coefficient
estimate

SE

P-value

Expected
sign

IR

+1.83

0.030

<0.0001

IR (log)

-7.41

0.158

<0.0001

MF1

-0.059

0.0056

<0.0001

(High MF1)

-2.17

0.0801

<0.0001

-0.331

0.0119

<0.0001

Age effect
Vintage effect
Seasonality

-
-

## Challenge 6: Residual systematic effect

Allow a calendar fixed effect to model residual systematic risk, not
modelled by MEVs (or by seasonality).
Compute standard deviation (s.d) of this effect to estimate size of the
unexplained systematic effect.
Model

Unexplained
effect (s.d)

0.705

0.223

0.131

## Including MF1 explains much of the systematic effect, but a residual

remains unexplained (19%).
Including the MF1 with the interaction term improves the fit.
The residual estimate is important to quantify conservatism if using
the model for forecasting or stress testing.

## Challenge 7: Economic model breakpoints

How stable are MEV risk factors?

## One question we may have is whether the effects of

MEVs on default risk are stable over time.
In particular, after an economic regime changes.
Use a breakpoint model to test for this

Breakpoint model
Variable

Coefficient
estimate

SE

P-value

Expected
sign

MF1

-0.076

0.0059

<0.0001

(High MF1)

-1.82

0.0984

<0.0001

+0.679

0.122

<0.0001

## (High MF1) MF1

-0.227

0.0152

<0.0001

D MF1

+0.0511

0.0260

0.0498

Other time
effects

D = indicator variable:

0 otherwise

## D MF1 effect is not significant, indicating no evident difference in

economic effect in the two different time periods.

Further challenges
7. Segmentations / interactions
It is plausible that different segments of the population will react to
different MEVs in different ways.
Eg high LTV accounts may be more sensitive to changes in HPI.
Therefore, explore different model segments or variable interactions.
8. Confounding economic effects with behavioural variables (BVs)
We may want to include BVs as TVCs; however, the MEVs may be
confounders for these effects.
Eg economic conditions may affect repayments, in general.
Therefore, test for confounding and build the model in stages (eg
include MEVs in first stage, then introduce BVs).

Jan
1999

Jan
2003

Jan
2007

Jan
2011

Jan
2013

## Use Default Rate (DR) during that period to measure performance.

Use conservatism (as 2 s.d of unknown systematic effect).

Scenario

UR

GDP

HPI

IR

Baseline

annum

7% increase
per annum

No change

Stress

peak of 10.6%
over 2 years

Reduction from
+2% to -2%
growth per annum

Zero increase

No change

IR rise

annum

7% increase
per annum

Average +2%
increase over 2
years

## Projection of annual default rate:

Scenario Conservatism
Year 1

Year 2

Baseline

No

1.21%

0.83%

Stress

No

1.67%

2.34%

Stress

Yes

2.21%

3.20%

IR rise

No

1.73%

2.76%

Conclusion

## 1. Including MEVs in credit risk models is valuable to enable

more accurate risk management, sensitive to economic
changes, as well as stress testing.
2. We have illustrated several challenges when including MEVs
in credit risk models.

## 3. And suggested several approaches to handle these

challenges, demonstrating results on a large US mortgage
data set.