Using Machine Learning For Probability of Default

Supervised learning mathematics of PD
calibration
Theodor Munteanu
21 January 2021
Contents
1 Introduction 1
2 Default time models and credit spreads 2

2.1 Exponential model . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Piecewise exponential model . . . . . . . . . . . . . . . . . . . 3
2.3 Weibull and piecewise Weibull . . . . . . . . . . . . . . . . . . 4
2.4 Other distributions explained . . . . . . . . . . . . . . . . . . 5
2.5 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 References 6
1 Introduction
Machine Learning has become an evermore larger area of quantitative mod-
eling for any phenomenon but however in the traditional culture it has re-
mained a tool used only for regressions and a handful of statistical tests.
In mathematical finance, machine learning is equivalent with reverse en-
gineering or reversed modeling, that is given a non-null set of models, call
it M and a non-null set of performance measures, call it P, the analyst is
asked to find the best model M according to P that transforms an output
set Y into the input set X.
1
A proper (but not necessarily a complete) definition for a supervised
learning problem would be that given a set of input data X, a set of models
M and the output data Y we are interested in the best model M ∈ M that
transforms X into Y according to a set of performance measures P.
Credit risk area makes no exception and in what follows I will show the
mathematics behind the Default Probability calibration, given a set of market
prices of bonds.
2 Default time models and credit spreads

Problem
Considering that a company from a certain sector has issued n bonds with
maturities T1 ≤ T2 ≤ ... ≤ Tn , and n coupon rates c1 , c2 , ..., cn , not necessarily
distinct and not necessarily 6= 0, create a meta-model object M returning
the default curve (P D : R+ × R+ → (0, 1), P D(., .) = P D(t1 , t2 ), 0 < t1 < t2 )
according to several default time distributions M ∈ M of your choice.
The inputs are Bi (ci , Ti , mktpricei )i=1,n .
The recovery rate R is assumed constant and the coupon payments are
assumed continuous (if these exist).
I remind the theoretical credit spread formula for risky bond or for a
CDS:
(1−R)· tT Bt (u)ft (u)du
R
st (T ) = RT
B (u)S (u)du
where τ is the default time, St (u) = P (τ >
t t t
d
u|τ > t) and ft (u) = − du St (u), Bt (u) = discount factor.
I assume in the following section that the interest rate curve is flat at r
level.
I will consider in what follows the universe of models M = {E(λ),
PE(λ1 , ..., λn+1 |t1 , ..., tn ), Weibull(γ, λ), Gompertz(λ, γ), Log−Log(λ, γ)} and
the set of performance measures P = ||.||2 (the R2 -statistic).
2.1 Exponential model

If τ ∼ E(λ) then, s(T ) = (1 − R)λ. Given a set of expiries t1 < t2 < ... < tn
n
and market spreads s(ti ) we are interested to find argmin (s(ti )−(1−R)λ)2
P
λ i=1
2
= argminf (λ).
λ
n
P s(ti )
The least squares estimator λ̂ = 1−R
i=1
2.2 Piecewise exponential model

Under this case, assume the knots are 0 = t0 < t1 < ... < tm and the hazard
rate function λ(t) = λi , t ∈ [ti−1 , ti ], i = 1, m and λ(t) = λm+1 if t > tm


 λ1 e−λ1 ·t , t ∈ [0, t1 ]

−λ1 ·t1 −λ2 (t−t1 )
λ2 e , t ∈ [t1 , t2 ]



−λ ·t −λ ·(t −t )−λ3 ·(t−t2 )
f0 (t) = λ3 e 1 1 2 2 1
, t ∈ [t2 , t3 ]

...



−λ1 ·t1 −λ2 ·(t2 −t1 )−λ3 ·(t2 −t1 )−...−λm+1 ·(t−tm )

λ
m+1 e , t ∈ [tm , +∞)

−λ ·t

 e 1
, t ∈ [0, t1 ]

e 1 1 −λ2 (t−t1 ) , t ∈ [t1 , t2 ]

 −λ ·t

and S0 (t) = e−λ1 ·t1 −λ2 ·(t2 −t1 )−λ3 ·(t−t2 ) , t ∈ [t2 , t3 ]

...




e−λ1 ·t1 −λ2 ·(t2 −t1 )−λ3 ·(t2 −t1 )−...−λm+1 ·(t−tm ) , t ∈ [t , +∞)

m
(
∗ −λm (t−t∗m−1 )
S(t) = S(tm−1 )e
From these formulas we observe that: ∗ ∀t ∈
f (t) = λm S(t∗m−1 )e−λm (t−tm−1 )
∗
tm−1 , t∗m

Rt
(1−R)( 0 m−1 e−rt f (t)+ tT e−rt f (t))
R
(1−R) 0T e−rt f (t)
R
So s(T ) = RT =
−rt S(t)dt
R tm−1 RT
m−1
∈ [tm−1 , tm ]
,T
0 e 0 e−rt S(t)+ t e−rt S(t)
m−1
RT RT ∗
If we denote I(tm−1 , T ) := tm−1 e−rt S(t)dt = tm−1 e−rt S(tm−1 )e−λm (t−tm−1 ) dt =
−rt∗ ∗
m−1 −e−(r+λm )T +λm tm−1
S(tm−1 )eλm tm−1 tT e−(λm +r)t dt = S(tm−1 ) · e
R
m−1 r+λm
−λm (T −t∗
m−1 )
For r = 0 it results that I(t∗m−1 , T ) = S(t∗m−1 ) · 1−e λm u S(t∗m−1 ) ·
(T − t∗m−1 ) ≈ T − t∗m−1 (for small default intensitites, the survival function
can be approximated by 1).
3


s(t1 ) = (1 − R)λ1
λ I(0,t )+λ I(t ,t )

s(t2 ) = (1 − R) · 1 I(0,t11 )+I(t2 1 ,t21) 2



Henceforth it results that: s(t3 ) = (1 − R) · λ1 I(0,t 1 )+λ2 I(t1 ,t2 )+λ3 I(t2 ,t3 )
I(0,t 1 )+I(t1 ,t2 )+I(t2 ,t3 )

...




s(tn ) = (1 − R) · λ1 I(0,t1 )+λ 2 I(t1 ,t2 )+λ3 I(t2 ,t3 )+...+λn I(tn−1 ,tn )


I(0,t1 )+I(t1 ,t2 )+...+I(tn−1 ,tn )
etc.so we obtain the following system of equations (S)
s(t1 )

 1−R
= λ1
s(t2 ) λ1 I(0,t1 )+λ2 I(t1 ,t2 )

 1−R = I(0,t1 )+I(t1 ,t2 )



s(t3 ) λ1 I(0,t1 )+λ2 I(t1 ,t2 )+λ3 I(t2 ,t3 )
=
 1−R I(0,t1 )+I(t1 ,t2 )+I(t2 ,t3 )
...




 s(tn ) λ1 I(0,t1 )+λ2 I(t1 ,t2 )+λ3 I(t2 ,t3 )+...+λn I(tn−1 ,tn )

1−R
= I(0,t1 )+I(t1 ,t2 )+I(t2 ,t3 )+...+I(tn−1 ,tn )
s(T )
Denoting ŝ(T ) = 1−R and using the approximation I(t1 , t2 ) u (t2 − t1 ) it
results:


 ŝ(t1 ) = λ1
λ1 I(0,t1 )+λ2 I(t1 ,t2 )

ŝ(t2 ) = I(0,t1 )+I(t1 ,t2 )



λ1 I(0,1/2)+λ2 I(t1 ,t2 )+λ3 I(t2 ,t3 )
ŝ(t3 ) = I(0,t1 )+I(t1 ,t2 )+I(t2 ,t3 )
⇔

...




λ1 I(0,t1 )+λ2 I(t1 ,t2 )+λ3 I(t2 ,t3 )+...+λn I(tn−1 ,tn )

ŝ(tn ) =

I(0,t1 )+I(t1 ,t2 )+I(t2 ,t3 )+...+I(t
n−1 ,tn
 )  

1 0 0 ... 0
 λ1 ŝ(t1 )
 I(0,t1 ) I(t1 ,t2 )
0 ... 0   λ2   ŝ(t2 ) 
 
 I(0,t2 ) I(0,t2 )  
 I(0,t1 ) I(t1 ,t2 ) I(t2 ,t3 )  ·  λ3  =  ŝ(t3 ) 
  
 I(0,t3 ) I(0,t3 ) I(0,t3 )
... 0 
... ...
    
I(0,t1 ) I(t1 ,t2 ) I(t2 ,t3 ) I(tn−1 ,tn )
I(0,tn ) I(0,tn ) I(0,tn )
... I(0,tn ) λn ŝ(tn )
Thus calibration of a piecewise exponential model with the knots equal
to the bond expiries (minus the greatest one) reduces to solving a lower-
triangular system. (bootstrapping).
The model performance is greatly enhanced especially for highly variable
credit curves.
2.3 Weibull and piecewise Weibull

Weibull model
4
γ γ
Because S(t) = e−λt , f (t) = e−λt λγtγ−1 so the spread function s(T ) =
T γ
e−ru−λγu λγuγ−1 du
R
(1 − R) 0
RT
−ru−λuγ du
= (1 − R) II12 (T )
(T )
0 e
Using second order Taylor expansion, after a few computations I obtain
2 r2 γλ 2 γT 2γ+1 3 3γ
λ
I1 (T ) = λT γ − r · γ+1 T γ+1 − rλ2 T 2γ + 2(γ+1) · T γ+2 + rλ 2γ+1 + λ T6 (1)
RT γ 2 γ+1 2 3
Similarly I2 (T ) = 0 e−ru−λu du = T − rT2 − λTγ+1 + r 6T + γ+2 λr
T γ+2 +
λ2 T 2γ+1
2(γ+1)
(2)
Piecewise Weibull model
Similar to piecewise exponential model, the survival function in piece-wise
γ
Weibull model is λ(t) = λi+1 · e−λi+1 (t−ti )
In this case there is no closed-form formula and in order to approximate
the lambda parameters we solve a non-linear system of equations with an
increasing number of parameters. The best approach is a modified Newton-
Kantorovich method for finding the roots.
2.4 Other distributions explained

Gompertz model
γt
The survival function is S(t) = eλ(1−e ) . One can remark that if we use
the Taylor’s first order approximation of the survival we get the E(γλ) dis-
tribution.
If we make the substitution eγt = u in both the denominator and numer-

R γT e
u−r/γ e−λu du
ator we obtain the spread s(T ) = (1 − R) · γλ R e1γT after some
1 e−λu u−r/γ−1 du
computations.
Log-logistic model
1
The survival function is S(t) = 1+λt1/γ
.
1
By using the substituion t γ = u, I obtain that the credit spread in this
R T 1/γ γ
(1−R)λ e−ru du
case is s(T ) = RT γ
1
0
γ γ γ−1
0 e−ru 1+λu
du
2.5 Numerical example

Suppose 6 types of bonds from a company in the Agricultural sector are
traded on the secondary market. These bonds have the following features:
The calibrated parameters and the corresponding residuals are:
5
Bond coupon value expiry coupon frequency market price
1 2% 0.5 4 100.1
2 3% 1 2 99.7
3 4% 1 1 100.3
4 3% 2 1 102.7
5 4% 2 1 103.1
6 3.5 % 3 1 105.1
Model Params R2
E(λ) 0.02065 0.57
W e(λ) (0.0211 , 0.8051) 0.00083
PE(λ1 , ..., λ4 |0.5, 1, 2) [0.0132, 0.0687, -0.01, 0.0034] 2.24 · 10−28
Gompertz(λ, γ) (0.001,1.0135) 0.00065
Log-Log (λ, γ) (0.0117,1.1924) 0.00023
2.6 Conclusion
Obviously there are as many distributions as one can imagine and devise,
and these can be obtained either through conditioning, or by taking linear
combinations of already existing distributions. These techniques are for the
moment beyond of the scope of this mini-article which is to showcase a Ma-
chine Learning problem that doesn’t involve regression techniques.
3 References
For some detailed numerical examples as well as Markov Chain probability
transition matrix you can also look after:
1. https://www.scribd.com/document/479327889/PD-rating-estimation-
for-credit-related-income-instruments-valuation or,
2. Thierry Roncalli, Handbook of Financial Risk Management (Chapman
and Hall)

Using Machine Learning For Probability of Default

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using Machine Learning For Probability of Default

Uploaded by

Copyright:

Available Formats

Supervised learning mathematics of PD

2 Default time models and credit spreads 2

2 Default time models and credit spreads

2.1 Exponential model

2.2 Piecewise exponential model

2.3 Weibull and piecewise Weibull

2.4 Other distributions explained

If we make the substitution eγt = u in both the denominator and numer-

2.5 Numerical example

You might also like