You are on page 1of 32

Extreme Value Theory, Financial

Regulation and Financial Stability

Chen Zhou

De Nederlandsche Bank

March 17-18, 2009, Norges Bank


Outline
Day 1: One-dimensional EVT and VaR calculation
A simple VaR exercise under Basel II revision
Tail inference and EVT
History of EVT: law of sample maxima
Further discussion on VaR and financial regulation

Day 2: Multi-dimensional EVT and financial stability


Review of one-dimensional EVT
The analog in two-dimensional case: VaR approach
Tail dependence
Financial stability analysis and modeling
A simple VaR exercise
Setup of the exercise:

Only an index fund (S&P 500) on the trading book.


Total investment is normalized to 1.
We should calculate the 10-day 99% VaR.
Assume daily returns are i.i.d.
Data: Jan 2008 March 11 2009 (312 observations).

We start with 1-day 99% VaR.


Model
Denote the return of (S&P 500) as R. We consider the
loss X = R. The VaR at 99% level is defined as P (X >
V aR) = 0.01.

We consider:
Non-parametric (NP) approach: rank the 312 observations
of X, then take the [312 0.01] = 3rd order statistic from
the top.
Normal approach: model X as N (, 2).
Student-t approach: model X as a + bT , where T follows
a student-t distribution with degree of freedom . We
choose = 3 and = 4.
Data overview
Results

Model NP Normal St. (df=3) St. (df=4)

VaR 4 2.65 2.98 3.01

Here the VaR values are in percentage.


Back testing
Back testing in Basel II:
In a 250-sample, count the number of exceptions (exceed-
ing the estimated VaR).
Green area: 0-4.
Yellow area: 5-9.
Red area: 10 and above.

We use the first (last) 250 days in our sample to test.

Model NP Normal St. (df=3) St. (df=4)

VaR 4 2.65 2.98 3.01

BT 2 7 5 5

Normal is a failure!
Whats wrong with normal distribution?
Histogram of X

0.6
0.5
0.4
Density

0.3
0.2
0.1
0.0

4 2 0 2 4
Some investigation
Empirical distribution PP Plot (normal)
1.0

1.0
0.8

0.8
0.6

0.6
p

p
0.4

0.4
0.2

0.2
0.0

0.0
4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0

Ordered data transformed data


Zoom in the P-P plot
PP Plot (zoom in)

1.00
0.98
p[290:312]

0.96
0.94
0.92

0.92 0.94 0.96 0.98 1.00

transformed data
Alternative?
We take log transform on observations, and log(1 p)
for the y-axis.

Alternative Transform

5.5
5.0
4.5
log(1p)

4.0
3.5
3.0
2.5

0.4 0.6 0.8 1.0 1.2 1.4

log transformed data

log(1 P (X x)) = 1.08 + 2.81 log x.


VaR recalculation

log(1 P (X x)) = 1.08 + 2.81 log x.

P (X x) = 1 Ax, A = e1.08, = 2.81.

Taking the left side probability as 99%, we get a new VaR


 
0.01 1/
V aR = = 3.51.
A

Model NP Normal St. (df=3) St. (df=4) New method

VaR 4 2.65 2.98 3.01 3.51

BT 2 7 5 5 3

The new method only considers the tail region!


Basel revision
We now need to calculate 1-day 99.9% VaR.

NP approach is not feasible, as we have only 312 observa-


tions. We consider all the other methods.

Model Normal St. (df=3) St. (df=4) New method

VaR 3.49 6.57 5.66 7.96

Now the differences are very significant. Which result are


you willing to buy?
Summary of our journey
What we learnt:
VaR calculation is about tail modeling.
Normal distribution does not work well for our exercise.
NP is not always feasible.

Why normal fail?


It is not normals fault.
The mistake is to make tail inference based on moderate
information.

What we need:
A method that is possible for tail inference.
It should be able to calculate VaR with high probability.
It should work when observations are not sufficient.
It should only use tail information.

Let the tail speak for itself!


Extreme Value Theory: tail inference assumption
To make tail inference, it is necessary to assume some
properties of the tail part of distribution functions. The
assumption should only be on tail, and should be general
enough.

We assume:
It is possible to make inference on far tail by looking
at intermediate level.

Mathematically:
!
X t
lim P > x | X > t = g(x),
tx a(t)
for some positive functions a and g. x is the right endpoint
of the original distribution function. (can be infinity)

Remark: The tail exhibits self-similar property.


A special case
Special case: x = + and a(t) ct as t .

From
!
X t
lim P > x | X > t = g(x),
t a(t)
we get
 
X
lim P > x | X > t = g1(x).
t t

Denote F (x) = P (X x) and F (x) = 1 F (x).


F (tx)
g1(x) as t .
F (t)

The only possible g1 is g1(x) = x!


This is a derivation not an assumption!
Summary of the special case
 
X
lim P > x | X > t = x.
t t

Literally, for sufficiently large t


Given X > t the excess ratio X/t follows approximately
Pareto distribution.

For a sufficient large t and all x > t,


 
x
P (X > x) P (X > t) := Ax.
t

Tail observations follow a scaled Pareto distribution.


General case: Extreme Value Theory
!
X t
lim P > x | X > t = g(x).
t a(t)
Similar derivation shows that the only possible g functions
(with potential shift and scale transform) are

g(x) = (1 + x)1/ , f or 1 + x > 0.


where R is called the extreme value index.

When > 0, g(x) is essentially the same as x, where


= 1/ is called the tail index. This is the special case we
studied!

1 g(x) is the generalized Pareto distribution.

Given X > t the excess part X t follows approximately


a scaled generalized Pareto distribution.
Domain of attraction
!
X t
lim P > x | X > t = (1 + x)1/ .
t a(t)
The distribution function F (x) = P (X x) is called in the
domain of attraction. How large is this domain?

If > 0, F must have no finite right endpoint.


If < 0, F must have a finite right endpoint.
If = 0, two cases are both possible.

In the domain of attraction with > 0:


Pareto, Student-t, stable, gamma, Frechet, etc.
In the domain of attraction with = 0:
exponential, normal, log-normal, etc.
In the domain of attraction with < 0:
uniform distribution, Reversed Burr, etc.
Summary of EVT framework
What we assume:
Far tail property can be captured by intermediate level.

What we get:
Above a high threshold, the excess must follows a scaled
generalized Pareto distribution.
This is a derivation, not the initial assumption!

How restrictive our framework is


An assumption is necessary!
Our assumption is intuitive and simple.
The domain of attraction covers broadly.

When studying extremal behavior of a variable X, As-


suming its distribution function belongs to the domain
of attraction is a very general assumption!
Statistics based on EVT
EVT is a semi-parametric model:
Tail region: Parametric model: scaled generalized Pareto
Moderate region: No assumption, i.e. non-parametric.

Parameter estimation: EVI , scale function a.

From now on, we only consider > 0. This is the case for
most of financial returns. In this case, it is not necessary
to consider scale function.
F (tx)
lim = x.
t F (t)
where = 1/ > 0.

We first try to estimate from data.


Estimation of tail index
F (tx)
lim = x.
t F (t)
Intuitively, for sufficiently large t, given X > t the excess
ratio X/t follows approximately Pareto distribution.

Q1: How do we choose sufficiently large t?


Q2: How do we get observations on excess ratios?
Q3: How do we deal with approximately?

A1: We take high order statistic: Xn,nk .


A2: We take Xn,ni+1/Xn,nk , 1 i k.
A3: We treat them as exactly.

By fitting these k ratios to Pareto distribution by maximum


likelihood, we could estimate .

Note: we have n observations, we use only top k + 1.


More about index estimation
k
1 X
1/ = log(Xn,ni+1/Xn,nk ).
k i=1
This is called the Hill estimator.

Theoretical property: Consistency: Suppose k(n)


and k(n)/n 0 as n , then in probability.

Question from practitioner: how to choose k?


Difficulty: tradeoff between too small or too large
k too small:
Too few observations Large variance
k too large:
Involve non-extreme observations Impose bias

Solution: Making Hill plot, i.e. calculate for a series of k.


Hill plot for our case

Hillplot

3.5
3.0
2.5

2.0
1.5

10 20 30 40 50 60 70

The optimal choice of k is around 30.


Taking k = 30, we get = 2.26.
VaR estimation
F (tx)
lim = x
t F (t)

Our purpose is to estimate V aR(p) such that F (V aR(p)) =


1 p is very low (may even be lower than 1/n, then non-
parametric approach is not feasible.)

We again take Xn,nk as high threshold, F (Xn,nk ) can be


estimated as k/n.
!
1p F (V aR(p)) V aR(p)
.
k/n F (Xn,nk ) Xn,nk

We get an estimator as
!1/
1p
Vd
aR(p) = Xn,nk .
k/n
Inverse question: tail probability
Question: In our data, we have the max loss as 4.11%
(Oct 15, 2008). Suppose one want to know what is the
probability to have a loss more than 5%. How to calculate?

Solution:
F (tx)
lim = x
t F (t)
We again take Xn,nk as high threshold,
!
F (5) 5
.
F (Xn,nk ) Xn,nk

We get an estimator as
!
k 5
Pb (X > 5) = F(5) = .
n Xn,nk
Summary of EVT Statistics
We start from:
Assuming domain of attraction condition as in EVT.

What we can do:


Estimate VaR with small tail probability.
Evaluate probability of rare event (could be an event that
has never happened.)

How we achieve that:


Only use top k + 1 observations.
Fit Pareto to estimate tail index .
Use domain of attraction condition for VaR.

Technical difficulty and solution:


the choice of k is done by Hill plot.
History of EVT
The origin of EVT is not about VaR, not about statistics.
It is purely mathematical. Mathematician ask the following
question analog to the Central Limit Theory.

Suppose X1, X2, are i.i.d. with distribution function F .


Denote Mn = max1in Xi as the sample maxima. Sup-
pose there exist an and bn such that, as n ,
Mn bn d
G.
an
Q1: What is the potential limit distribution G?
Q2: What is the n.s. condition on F to have such a limit?
 
A1: G(x) = exp (1 + x)1/
It is called the generalized extreme value distribution.
A2: F is in the domain of attraction.
What else EVT can do?
We only discussed > 0 case so far. ( = 1/)

If we do not know > 0, is not defined. We should es-


timate and the scale function a. There exists estimators
for them.

Then, we could still estimate V aR with small tail probabil-


ity, or evaluate the probability of a rare event.

When < 0, we could estimate the finite endpoint.

EVT is a general framework. We call the case > 0


heavy tail case. But EVT can also deal with thin
tail, or even no tail cases.
Further discussion on VaR
F (tx)
lim = x
t F (t)

Replace t by V aR(1p) where p 0. Consider V aR(1px)


where x < 1. Then
!
px F (V aR(1 px)) V aR(1 px)
= .
p F (V aR(1 p)) V aR(1 p)

Hence, the domain of attraction condition can be rewritten


as a condition based on VaR:
V aR(1 px)
lim = x1/.
p0 V aR(1 p)

One could make inference on a high level VaR from


the information in a intermediate level VaR.
Aggregation of VaR
Feller Theorem Suppose X and Y follow heavy tailed
distributions with the same tail index and they are inde-
pendent. Then P (X + Y > t) P (X > t) + P (Y > t) as
t .

Feller Theorem provides us a solution for calculating


VaR from aggregated risk factors.

10-day VaR calculation:


Suppose X1, , X10 are i.i.d. daily returns following heavy
tailed distribution. X = X1 + + X10.

Then P (X > t) 10P (X1 > t) for sufficiently large t.

V aRX (99%) = V aRX1 (99.9%) = V aRX1 (99%)101/ .



Whether the 10rule holds depends on !
Summary of today
What we have done:
A VaR calculation exercise.
Build up EVT from tail inference viewpoint.
Calculating VaR under EVT framework.
Review EVT history: Law of Sample Maxima
VaR of Aggregation risks

What we learnt:
Modeling tail should use only tail information.
EVT provides such a model.
The setup of EVT is general, the limit derivation is spe-
cific.
The domain of attraction is still broad enough.
The model is estimated using only tail observations.
We can calculate VaR with tail probability very low.
We can calculate tail probability of an extremal event.

Let the tail speak for itself!

You might also like