Extreme Value Theory, Financial Regulation and Financial Stability

Extreme Value Theory, Financial
Regulation and Financial Stability
Chen Zhou
De Nederlandsche Bank
March 17-18, 2009, Norges Bank

Outline
Day 1: One-dimensional EVT and VaR calculation
A simple VaR exercise under Basel II revision
Tail inference and EVT
History of EVT: law of sample maxima
Further discussion on VaR and financial regulation
Day 2: Multi-dimensional EVT and financial stability

Review of one-dimensional EVT
The analog in two-dimensional case: VaR approach
Tail dependence
Financial stability analysis and modeling
A simple VaR exercise
Setup of the exercise:
Only an index fund (S&P 500) on the trading book.

Total investment is normalized to 1.
We should calculate the 10-day 99% VaR.
Assume daily returns are i.i.d.
Data: Jan 2008 March 11 2009 (312 observations).
We start with 1-day 99% VaR.

Model
Denote the return of (S&P 500) as R. We consider the
loss X = R. The VaR at 99% level is defined as P (X >
V aR) = 0.01.
We consider:
Non-parametric (NP) approach: rank the 312 observations
of X, then take the [312 0.01] = 3rd order statistic from
the top.
Normal approach: model X as N (, 2).
Student-t approach: model X as a + bT , where T follows
a student-t distribution with degree of freedom . We
choose = 3 and = 4.
Data overview
Results
Model NP Normal St. (df=3) St. (df=4)
VaR 4 2.65 2.98 3.01
Here the VaR values are in percentage.

Back testing
Back testing in Basel II:
In a 250-sample, count the number of exceptions (exceed-
ing the estimated VaR).
Green area: 0-4.
Yellow area: 5-9.
Red area: 10 and above.
We use the first (last) 250 days in our sample to test.
Model NP Normal St. (df=3) St. (df=4)
VaR 4 2.65 2.98 3.01
BT 2 7 5 5
Normal is a failure!
Whats wrong with normal distribution?
Histogram of X
0.6
0.5
0.4
Density
0.3
0.2
0.1
0.0
4 2 0 2 4
Some investigation
Empirical distribution PP Plot (normal)
1.0
1.0
0.8
0.8
0.6
0.6
p
p
0.4
0.4
0.2
0.2
0.0
0.0
4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0
Ordered data transformed data

Zoom in the P-P plot
PP Plot (zoom in)
1.00
0.98
p[290:312]
0.96
0.94
0.92
0.92 0.94 0.96 0.98 1.00
transformed data
Alternative?
We take log transform on observations, and log(1 p)
for the y-axis.
Alternative Transform
5.5
5.0
4.5
log(1p)
4.0
3.5
3.0
2.5
0.4 0.6 0.8 1.0 1.2 1.4
log transformed data
log(1 P (X x)) = 1.08 + 2.81 log x.

VaR recalculation
log(1 P (X x)) = 1.08 + 2.81 log x.
P (X x) = 1 Ax, A = e1.08, = 2.81.
Taking the left side probability as 99%, we get a new VaR

0.01 1/
V aR = = 3.51.
A
Model NP Normal St. (df=3) St. (df=4) New method
VaR 4 2.65 2.98 3.01 3.51
BT 2 7 5 5 3
The new method only considers the tail region!

Basel revision
We now need to calculate 1-day 99.9% VaR.
NP approach is not feasible, as we have only 312 observa-

tions. We consider all the other methods.
Model Normal St. (df=3) St. (df=4) New method
VaR 3.49 6.57 5.66 7.96
Now the differences are very significant. Which result are

you willing to buy?
Summary of our journey
What we learnt:
VaR calculation is about tail modeling.
Normal distribution does not work well for our exercise.
NP is not always feasible.
Why normal fail?

It is not normals fault.
The mistake is to make tail inference based on moderate
information.
What we need:
A method that is possible for tail inference.
It should be able to calculate VaR with high probability.
It should work when observations are not sufficient.
It should only use tail information.
Let the tail speak for itself!

Extreme Value Theory: tail inference assumption
To make tail inference, it is necessary to assume some
properties of the tail part of distribution functions. The
assumption should only be on tail, and should be general
enough.
We assume:
It is possible to make inference on far tail by looking
at intermediate level.
Mathematically:
!
X t
lim P > x | X > t = g(x),
tx a(t)
for some positive functions a and g. x is the right endpoint
of the original distribution function. (can be infinity)
Remark: The tail exhibits self-similar property.

A special case
Special case: x = + and a(t) ct as t .
From
!
X t
lim P > x | X > t = g(x),
t a(t)
we get

X
lim P > x | X > t = g1(x).
t t
Denote F (x) = P (X x) and F (x) = 1 F (x).

F (tx)
g1(x) as t .
F (t)
The only possible g1 is g1(x) = x!

This is a derivation not an assumption!
Summary of the special case

X
lim P > x | X > t = x.
t t
Literally, for sufficiently large t

Given X > t the excess ratio X/t follows approximately
Pareto distribution.
For a sufficient large t and all x > t,

x
P (X > x) P (X > t) := Ax.
t
Tail observations follow a scaled Pareto distribution.

General case: Extreme Value Theory
!
X t
lim P > x | X > t = g(x).
t a(t)
Similar derivation shows that the only possible g functions
(with potential shift and scale transform) are
g(x) = (1 + x)1/ , f or 1 + x > 0.

where R is called the extreme value index.
When > 0, g(x) is essentially the same as x, where

= 1/ is called the tail index. This is the special case we
studied!
1 g(x) is the generalized Pareto distribution.
Given X > t the excess part X t follows approximately

a scaled generalized Pareto distribution.
Domain of attraction
!
X t
lim P > x | X > t = (1 + x)1/ .
t a(t)
The distribution function F (x) = P (X x) is called in the
domain of attraction. How large is this domain?
If > 0, F must have no finite right endpoint.

If < 0, F must have a finite right endpoint.
If = 0, two cases are both possible.
In the domain of attraction with > 0:

Pareto, Student-t, stable, gamma, Frechet, etc.
In the domain of attraction with = 0:
exponential, normal, log-normal, etc.
In the domain of attraction with < 0:
uniform distribution, Reversed Burr, etc.
Summary of EVT framework
What we assume:
Far tail property can be captured by intermediate level.
What we get:
Above a high threshold, the excess must follows a scaled
generalized Pareto distribution.
This is a derivation, not the initial assumption!
How restrictive our framework is

An assumption is necessary!
Our assumption is intuitive and simple.
The domain of attraction covers broadly.
When studying extremal behavior of a variable X, As-

suming its distribution function belongs to the domain
of attraction is a very general assumption!
Statistics based on EVT
EVT is a semi-parametric model:
Tail region: Parametric model: scaled generalized Pareto
Moderate region: No assumption, i.e. non-parametric.
Parameter estimation: EVI , scale function a.
From now on, we only consider > 0. This is the case for
most of financial returns. In this case, it is not necessary
to consider scale function.
F (tx)
lim = x.
t F (t)
where = 1/ > 0.
We first try to estimate from data.

Estimation of tail index
F (tx)
lim = x.
t F (t)
Intuitively, for sufficiently large t, given X > t the excess
ratio X/t follows approximately Pareto distribution.
Q1: How do we choose sufficiently large t?

Q2: How do we get observations on excess ratios?
Q3: How do we deal with approximately?
A1: We take high order statistic: Xn,nk .

A2: We take Xn,ni+1/Xn,nk , 1 i k.
A3: We treat them as exactly.
By fitting these k ratios to Pareto distribution by maximum

likelihood, we could estimate .
Note: we have n observations, we use only top k + 1.

More about index estimation
k
1 X
1/ = log(Xn,ni+1/Xn,nk ).
k i=1
This is called the Hill estimator.
Theoretical property: Consistency: Suppose k(n)

and k(n)/n 0 as n , then in probability.
Question from practitioner: how to choose k?

Difficulty: tradeoff between too small or too large
k too small:
Too few observations Large variance
k too large:
Involve non-extreme observations Impose bias
Solution: Making Hill plot, i.e. calculate for a series of k.

Hill plot for our case
Hillplot
3.5
3.0
2.5

2.0
1.5
10 20 30 40 50 60 70
The optimal choice of k is around 30.

Taking k = 30, we get = 2.26.
VaR estimation
F (tx)
lim = x
t F (t)
Our purpose is to estimate V aR(p) such that F (V aR(p)) =

1 p is very low (may even be lower than 1/n, then non-
parametric approach is not feasible.)
We again take Xn,nk as high threshold, F (Xn,nk ) can be

estimated as k/n.
!
1p F (V aR(p)) V aR(p)
.
k/n F (Xn,nk ) Xn,nk
We get an estimator as
!1/
1p
Vd
aR(p) = Xn,nk .
k/n
Inverse question: tail probability
Question: In our data, we have the max loss as 4.11%
(Oct 15, 2008). Suppose one want to know what is the
probability to have a loss more than 5%. How to calculate?
Solution:
F (tx)
lim = x
t F (t)
We again take Xn,nk as high threshold,
!
F (5) 5
.
F (Xn,nk ) Xn,nk
We get an estimator as
!
k 5
Pb (X > 5) = F(5) = .
n Xn,nk
Summary of EVT Statistics
We start from:
Assuming domain of attraction condition as in EVT.
What we can do:

Estimate VaR with small tail probability.
Evaluate probability of rare event (could be an event that
has never happened.)
How we achieve that:

Only use top k + 1 observations.
Fit Pareto to estimate tail index .
Use domain of attraction condition for VaR.
Technical difficulty and solution:

the choice of k is done by Hill plot.
History of EVT
The origin of EVT is not about VaR, not about statistics.
It is purely mathematical. Mathematician ask the following
question analog to the Central Limit Theory.
Suppose X1, X2, are i.i.d. with distribution function F .

Denote Mn = max1in Xi as the sample maxima. Sup-
pose there exist an and bn such that, as n ,
Mn bn d
G.
an
Q1: What is the potential limit distribution G?
Q2: What is the n.s. condition on F to have such a limit?

A1: G(x) = exp (1 + x)1/
It is called the generalized extreme value distribution.
A2: F is in the domain of attraction.
What else EVT can do?
We only discussed > 0 case so far. ( = 1/)
If we do not know > 0, is not defined. We should es-

timate and the scale function a. There exists estimators
for them.
Then, we could still estimate V aR with small tail probabil-

ity, or evaluate the probability of a rare event.
When < 0, we could estimate the finite endpoint.
EVT is a general framework. We call the case > 0

heavy tail case. But EVT can also deal with thin
tail, or even no tail cases.
Further discussion on VaR
F (tx)
lim = x
t F (t)
Replace t by V aR(1p) where p 0. Consider V aR(1px)

where x < 1. Then
!
px F (V aR(1 px)) V aR(1 px)
= .
p F (V aR(1 p)) V aR(1 p)
Hence, the domain of attraction condition can be rewritten

as a condition based on VaR:
V aR(1 px)
lim = x1/.
p0 V aR(1 p)
One could make inference on a high level VaR from

the information in a intermediate level VaR.
Aggregation of VaR
Feller Theorem Suppose X and Y follow heavy tailed
distributions with the same tail index and they are inde-
pendent. Then P (X + Y > t) P (X > t) + P (Y > t) as
t .
Feller Theorem provides us a solution for calculating

VaR from aggregated risk factors.
10-day VaR calculation:

Suppose X1, , X10 are i.i.d. daily returns following heavy
tailed distribution. X = X1 + + X10.
Then P (X > t) 10P (X1 > t) for sufficiently large t.
V aRX (99%) = V aRX1 (99.9%) = V aRX1 (99%)101/ .

Whether the 10rule holds depends on !
Summary of today
What we have done:
A VaR calculation exercise.
Build up EVT from tail inference viewpoint.
Calculating VaR under EVT framework.
Review EVT history: Law of Sample Maxima
VaR of Aggregation risks
What we learnt:
Modeling tail should use only tail information.
EVT provides such a model.
The setup of EVT is general, the limit derivation is spe-
cific.
The domain of attraction is still broad enough.
The model is estimated using only tail observations.
We can calculate VaR with tail probability very low.
We can calculate tail probability of an extremal event.
Let the tail speak for itself!

Extreme Value Theory, Financial Regulation and Financial Stability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Extreme Value Theory, Financial Regulation and Financial Stability

Uploaded by

Copyright:

Available Formats

Extreme Value Theory, Financial

Regulation and Financial Stability

March 17-18, 2009, Norges Bank

Day 2: Multi-dimensional EVT and financial stability

Only an index fund (S&P 500) on the trading book.

We start with 1-day 99% VaR.

Model NP Normal St. (df=3) St. (df=4)

VaR 4 2.65 2.98 3.01

Here the VaR values are in percentage.

We use the first (last) 250 days in our sample to test.

Model NP Normal St. (df=3) St. (df=4)

VaR 4 2.65 2.98 3.01

Ordered data transformed data

0.92 0.94 0.96 0.98 1.00

0.4 0.6 0.8 1.0 1.2 1.4

log transformed data

log(1 P (X x)) = 1.08 + 2.81 log x.

log(1 P (X x)) = 1.08 + 2.81 log x.

P (X x) = 1 Ax, A = e1.08, = 2.81.

Taking the left side probability as 99%, we get a new VaR

Model NP Normal St. (df=3) St. (df=4) New method

VaR 4 2.65 2.98 3.01 3.51

The new method only considers the tail region!

NP approach is not feasible, as we have only 312 observa-

Model Normal St. (df=3) St. (df=4) New method

VaR 3.49 6.57 5.66 7.96

Now the differences are very significant. Which result are

Why normal fail?

Let the tail speak for itself!

Remark: The tail exhibits self-similar property.

Denote F (x) = P (X x) and F (x) = 1 F (x).

The only possible g1 is g1(x) = x!

Literally, for sufficiently large t

For a sufficient large t and all x > t,

Tail observations follow a scaled Pareto distribution.

g(x) = (1 + x)1/ , f or 1 + x > 0.

When > 0, g(x) is essentially the same as x, where

1 g(x) is the generalized Pareto distribution.

Given X > t the excess part X t follows approximately

If > 0, F must have no finite right endpoint.

In the domain of attraction with > 0:

How restrictive our framework is

When studying extremal behavior of a variable X, As-

Parameter estimation: EVI , scale function a.

We first try to estimate from data.

Q1: How do we choose sufficiently large t?

A1: We take high order statistic: Xn,nk .

By fitting these k ratios to Pareto distribution by maximum

Note: we have n observations, we use only top k + 1.

Theoretical property: Consistency: Suppose k(n)

Question from practitioner: how to choose k?

Solution: Making Hill plot, i.e. calculate for a series of k.

The optimal choice of k is around 30.

Our purpose is to estimate V aR(p) such that F (V aR(p)) =

We again take Xn,nk as high threshold, F (Xn,nk ) can be

What we can do:

How we achieve that:

Technical difficulty and solution:

Suppose X1, X2, are i.i.d. with distribution function F .

If we do not know > 0, is not defined. We should es-

Then, we could still estimate V aR with small tail probabil-

When < 0, we could estimate the finite endpoint.

EVT is a general framework. We call the case > 0