Professional Documents
Culture Documents
Lecture 6 - Statistical Tools For Market Risk Management
Lecture 6 - Statistical Tools For Market Risk Management
Lecture 3
Statistical Tools for Market Risk Management
Riccardo Rebonato
1 Plan of the Lecture
Therefore in this section we are going to review the fundamental statistical and
probabilistic tools that we will use throughout the course — and that you will
use if you want to measure and manage risk.
2 Random Variables and Stochastic Processes
Suppose that we have a random event, such as the outcome of the next presi-
dential election. In its itself, this event is not a random variable. It becomes a
random variable when we associate a numerical outcome to the outcome.
So, the outcome of the next US Presidential election is not a random variable.
If we just sum all the random variables from 1900 to today, the sum immediately
tells us whether there have been more Republican or Democratic presidents.
Or take a market portfolio. The P&L at the end of each trading day is a random
variable.
By taking its average, we can try to tell whether the trader is any good (or is
he just lucky?).
By looking at the variance of the returns we have one measure of how much
risk the trader takes.
First of all, we know which values they can assume — perhaps it’s the whole
real line; perhaps it is just 1 and −1, as in case of the US presidential election.
The range of values is (somewhat confusingly) called the domain.
Next we can try to discover the probability of each possible event occurring.
If we have found this (as we shall see, this information is encoded in the prob-
ability distribution associated with a random variable), then we know all that
there is to know (or, rather, all that it is possible to discover) about the random
variable.
What about a stochastic process?
Let’s go back to the example of the US presidential election, and the random
variable associated with it. Suppose that we stick a time label to each outcome
(+1 or -1). Now we have a stochastic process.
We said that the probability distribution is the function that ‘contains’ all the
information we can have about a random variable.
For simplicity, I am going to assume in the following that the domain of the
random variable is the real axis (or a portion of it).
The number the cumulative probability distribution takes in is any value, say,
x, in the domain of the random variable.
It returns the probability that a random draw of X will return a value smaller
or equal to x. So
CDFX (x) = Prob [X ≤ x] (1)
As x tends towards lower and lower values (in the domain), it becomes more
and more unlikely that a draw of X will yield a value even lower.
Similarly, as x becomes larger and larger, the CDF tends to 1 (because we can
be virtually certain that the next random draw will be smaller than x):
CDFs contain all that there is to know about a random variable, but they are
not very easy to read.
When the CDF is continuous, they are defined are the derivative of the CDF
with respect to the input variable, x: if the denote the density function with
a lower-case Greek symbol and he CDF with an upper case Greek symbol we
have
d
ϕ (x) = Φ (x) (4)
dx
The density function tells us how the CDF changes. Fig (2) show what it looks
like for the same normal (0, 1) variable that gave rise to Fig (1):
Figure 1: The cumulative distribution function for a normal random variable
(mean = 0, variance = 1).
PDFs are handy because at a glance they tell us the mode and give us a
perception of the skewness and the spread of the distribution.
ϕ (x) dx gives the probability to draw a value of the random variable X between
x and x + dx. Remember: you have to multiply the density by dx in order to
make it become a probability — after all, that is why it is called a density.
Figure 2: The probability density function for a normal random variable (mean
= 0, variance = 1).
5 The Inverse of a CDF
The inverse function takes in each possible value in the range of f (x) (one of
the ‘y’ values), and returns the associated value in the domain (one of the ‘x’
values).
Draw from the distribution Φ (x) many random draws. Call them x1, x2, ...,
xn.
For each random draw, calculate the CDF: u1 = Φ (x1) , u2 = Φ (x2) , ...,
un = Φ (xn).
Then the random variables u1, u2, ..., un (they are random variables, right?)
are uniformly distributed over the interval [0, 1].
In symbols
Φ (xi) = U (0, 1) (5)
Now take a second distribution, say, Ψ (·).
Feed the uniform random variables, u1, u2, ..., un, into the inverse of Ψ (·),
Ψ (u)−1.
Then what I get out is a set of random draws (call them y1, y2, ..., yn) from
the distribution Ψ (·).
So, we have found a way to associate to the random draws from a given
distribution, Φ (x), the ‘associated’ random draws from a different distribution,
Ψ (y).
It means that if, for instance, x1 was an exceptionally large move for distribution
Φ (x), then the associated value y1 will also be equally exceptionally large for
the distribution Ψ (y).
So, in formulae
y1 = Ψ [Φ (x1)]−1 (6)
This is important.
We will prove the result in a later lecture, but, for the moment, please make
sure you understand what it says well, because we will make use of it repeatedly
when we look at Monte Carlo Simulations.
6 Percentiles
For instance, given a distribution, Φ (x), we may want to know what is the
value, x, such that the probability of a draw smaller than x is, say, 10%.
But, by definition, this is just the inverse of Φ (x), calculated for x = 0.1:
Φ−1 (0.1).
This number is called the 10th percentile.
So, for instance, the 10th, 5th an 1th percentiles of the standard normal dis-
tribution are given by
P erc10 = −1.28155...
P erc05 = −1.64485...
P erc01 = −2.32635...
As we shall see, the famous VaR is just a percentile of the loss distribution.
7 The Empirical Cumulative Distribution (ECDF)
Suppose that we have lots and lots of data - perhaps the returns from equity
indices (S&P, FTSE100, DAX) and swaps (10y, 5y, 2y US$ swaps).
Then you begin counting. When you meet your first data point (the lowest),
1.
you associate a cumulative probability for that value of N
Then you move to the second data point (the second lowest). You associate
2.
to it a cumulative probability of N
Next you go to the ... kth data point. You associate to it a cumulative
k.
probability of N
By the time you have reached the last data point, you associate to it the
cumulative probability of N
N = 1.
Fig (3) shows what you get with this procedure for four of the time series that
we saw in Lecture 2.
Note that, despite looking smooth, if you look very carefully, you see little
‘jumps’ (discontinuities).
Close to the centre of the distribution we have lots of small same-size jumps
next to each other.
In the tails, we can detect more easily ‘by inspection’ the ‘quantum jump’.
Cumulated Frequencie
Figure 3: The ECDF functions for S&P, DAX, 10-year and 2-year US$ swaps
returns.
So, given some data, we know how to build an empirical CDF.
When we have lots of data, despite the tiny jumps, the curve looks very smooth.
Why bothering with ‘named’ CDFs when empirical ones seem to do so well?
If the record from which we extract is very long, and contains seriously excited
periods, the implicit assumption may not be that bad, but the cautious risk
manger should be aware...
Exercise 1 Look carefully at these four ECDFs. Can you guess which ones
have the fattest tails?
8 ‘Famous’ Distributions
For some distributions we can obtain a closed-form expression for the density
(or for other quantities, such as the characteristic function, from which the
distribution of the density can be obtained).
These ‘named’ distributions are useful, because we can often obtain analytical
results or confidence intervals with them, but they are rare.
• Normal (Gaussian)
• Student-t
• Poisson
• Gamma
• Chi-Sqaured
• Stable class (Gaussian, Cauchy, Levy...)
• ....
One distribution is more famous than all the other distributions put together:
it is the Normal or Gaussian distribution.
It is the easiest distribution for obtaining closed-form results, but its popularity
is not just a case of looking for the keys under the street lamp.
The first has to do with the central limit theorem (CLT). The second with
Maximum Entropy (ME). Here they are.
9 The Central Limit Theorem
In its simplest form, and after skipping all the fine print, the CLT theorem states
the following:
Then, if I look at n-day returns, which are just the sum of 1-day term returns,
the Central Limit Theroem tells me that their distribution will approach the
normal as n becomes large enough.
Empirically, if we look at 1-day returns, we can reject normality for returns from
almost any asset class. (We will look at this in the following.)
But if we look at monthly returns, for many financial time series we no longer
can reject normality.
And for yearly returns, most time series generate distributions that are indis-
tinguishable from the normal one.
Exercise 2 What do you think stands in the way for the condition for the
theorem to apply?
10 Maximum Entropy*
Entropie = Il caractérise le niveau de désorganisation, ou d'imprédictibilité du contenu en information d'un système.
We won’t go into why the (differential) entropy is defined this way. The in-
teresting thing is that maximizing the entropy of a distribution given some
knowledge about the distribution, means finding the ‘least’ committal distrib-
ution compatible with what we know.
What does ‘least committal’ mean? Suppose that I only know that the my
random variable is distributed between a and b. I have no clues about its
mean, its variance, or anything else.
Then the ‘least committal’ distribution must be the uniform, U [a, b].
One can prove that this is indeed the case using variational calculus.
The very nice thing is that the Gaussian distribution is the least committal (the
maximum entropy) distribution if we only know the mean and variance of
a distribution!
If you want we can talk to me more about this because I love the topic, and
ME is extremely powerful, but this would take us a bit off course.
11 A Few Important Distributions
It is fully described by two parameters: one location parameter (the mean, µ),
and one shape parameter (the variance, σ 2).
Subtracting the mean and dividing by the standard deviation creates standard-
ized normal draws, which, clearly, are drawn from a standard Normal distribu-
tion, ie, from a Normal distribution with mean zero and variance 1.
11.2 Chi Squared
It is even more relevant for us, because of its links with the Student-t distribu-
tion.
11.3 Student-t Distribution
If
The lower the number of degrees of freedom (look again at the effect of the
degrees of freedom on the Chi-squared distribution), the more peaked the pdf,
and the fatter the tails.
Also, if we need really fat tails, then the Student-t distribution is not going to
take us very far, and Stable disributions can do a better job.
Figure 6: The convergence of the Student-t distribution to the normal, as the
dof increase.
Stable distributions have nice properties, because the sum of independent,
idenitically-distributed stable-distributed random variables has the same dis-
tribution as the component variables.
More precisely,
The two parameters α and β ‘interact’: when α is small, then β can induce a
lot of skewness; when α is high, the effect of β decreases.
It is given by
Cauchy (δ, γ) = Stable (1, 0, γ, δ) (13)
The Cauchy is one of the few stable distributions for which the pdf has an
analytic form:
1 γ
ϕCauchy (x; δ, γ) = 2 (14)
2
π γ + (x − δ)
A stable distribution which has both fat tails and skewness is the Levy distrib-
ution, given by
These are examples of the kind of shapes that can be obtained with the Normal,
Cauchy and Levy distributions.
We are going to use real data (S&P500, FTSE100, DAX, 10-y, 5y- and 2y
US$ swap rates) from 2005 to 2016, and fit the daily returns to several of the
distributions we have seen above.
Looking at the Stable distribution, we note that the leptokurtosis is not ex-
cessive (α = 1.53 , not too far from 2, which corresponds to the normal
distribution), and that there is a left skew.
These are the fits obtained with the Normal, the Student-t and the Stable
distributions:
13.2
Figure 11:
13.3 FTSE 100
Figure 12:
Figure 13:
Figure 14:
In the case of the FTSE index, the distribution appear less fat-tailed than for the
S&P500 (the number of dof for the Student-t is higher, and the α parameter
for the Stable distribution is closer to 2).
Little skewness.
Strategies and options may give rise to very skewed return distributions.
Figure 19:
Figure 20:
Figure 21:
15 Multivariate Distributions
So far we have looked at one random variable at a time. What if I had two?
Or three? Or seven?
So, for instance, for two random variables, X and Y , we can define the bivariate
cumulative distribution function, Φ2 (x, y), as the probability that the draw of
X will be less than x and the draw of Y will be less than y:
The density has the property that, when integrated (‘summed over’) all the
possible values of the X and Y variables, it gives 1:
The quantity ϕ2 (x, y) dxdy gives the probability of drawing for variable X a
value between x and x + dx and for variable Y a value between y and y + dy.
Figure 22: The CDF for the bivariate Normal distribution
16 Dependence Among Variables
Suppose that the variable X is a draw from the S&P returns, and Y is a draw
from the DOW-Jones returns.
If the draw for the S&P was, say, large and positive, the draw for the DOW-
Jones is very unlikely to have been large and negative.
If, instead, the variables X and Y had been the S&P returns and the tem-
perature in Paris, a large/small/middling realization of the S&P return has no
bearing on the draw of the temperature in Paris.
One simple way to describe the strength of this ‘linkage’ between variables is
via the coefficient of correlation, ρ.
For some distributions (the elliptical ones, of which the normal distribution is
a prime example) the coefficient of correlation is all that is needed to describe
the dependence between two (or, for that matter, umpteen) variables. This can
make life very easy.
For the case of bivariate Normal draws with means µx and µy , variances σ 2x
and σ2y and correlation, ρ, the density function is given by
− 12 z
1 1−ρ2
ϕ2−Norm (x, y) = e (21)
2πσxσy 1 − ρ2
with
x − µx 2 x − µy 2 (x − µx) y − µy
z= + − 2ρ (22)
σx σy σxσy
This (Fig (23)) is what the 3-d plot looks like for zero correlation:
3-d plots are stunning, but contour plots are more useful to visualize the degree
of dependence.
Figs (24) and (25) show the contour plots for a bivariate Normal with correlation
of 0.6 and -0.6, respectively.
Figure 23: A bivariate Gaussian density for ρ = 0.
17 Marginal Distributions
Take any bivariate density. It is a function of, say, x and y: ϕ2 (x, y).
What do we get if we integrate over all the possible values of, say, y?
For normal variates, we can also go the other way: given two univariate distri-
butions, we can combine them to form an infinity of bivariate normal densities,
one for each value of the correlation, ρ.
We can see this at work in Figs (24) and (25).
Both the bivariate distributions that produced the contour plots for the bivariate
Normal densities in Figs (24) and (25) were obtained from the same ‘marginals’,
ie, from two univariate Normal distributions, with µ1 = µ2 = 0 and σ1 = 0.1,
σ2 = 0.2, but with different correlation coefficients (0.6 and −0.6).
18 Conditional Distributions
This function gives the probability of x, given that we know that y has the
value of y.
So, we have
ϕ2 (x, y) = ϕcond (x|y) ϕy (y) (25)
The joint probability is given by the product of the conditional times the mar-
ginal.
19 Expectations
It is just given by
E [y] = yϕ2 (x, y) dxdy (26)
But let’s assume that we know that X had the value x. What is the expectation
of y now?
It has become
E [y|x] = yϕcond (y|x) dy (27)
Now, if the variables x are y are jointly normally distributed, a beautiful result
follows, namely:
E [y|x] = α + βx (28)
where α and β are the intercept and slope of a linear regression of y on
x.
19.1 Proving the Result*
Suppose that this is true, ie, that the conditional expectation of y given x is a
linear (affine, really) function of x:
E [y|x] = α + βx (29)
We get
Go back to E [y|x] = α + βx, and this time multiply both sides by xϕx (x)
and then integrate with respect to x.
We have
It turns out that, if x are y are jointly normally distributed, then one can show
that, indeed, the conditional expectation of y given x is exactly a linear (affine)
function of x.
20 Conditional Probability
When we ask for a conditional probability, we ask for the probability of some-
thing happening, given that we are told that something else has happened.
Here is one.
Figure 26:
P (A∩B)
So we can rest assured that the ratio P (B)
is always smaller than 1.
We can also write:
P (A ∩ B) = P (A|B) P (B) (41)
which is a chain rules for probabilities: the joint probability of A and B is equal
to the conditional probability of A given B, times the probability of B.
If works for as many variables as you want.
P (A ∩ B) ≡ P (A, B) (42)
and, for many variables,
Given a joint distribution, I can always associate a unique set of marginals and
a correlation matrix.
However, the converse is not true: there are joint bivariate distributions that
cannot be obtained by the two marginals and the correlation matrix.
By introducing copulas.
So what are copulas?
They are those multivariate distributions that have for marginals uniform U [0, 1]
distributions.
Why are they important?
Because of Sklar’s theorem, that says that any multivariate distribution can be
express as a function of
• a copula (ie, of a special multivariate distribution that has for its own
marginals uniform U [0, 1] distributions.)
More precisely, let F (x1, x2, ..., xn) be the cumulative distribution for variables
{x1, x2, ..., xn}.
This means that F (x1, x2, ..., xn) is the function such that
F (x1, x2, ..., xn) = Prob [X1 ≤ x1, X2 ≤ x2, ..., Xn ≤ xn] (47)
Suppose that we have a random vector of realizations for x ({x1, x2, ..., xn})
and a vector of realizations for y ({y1, y2, ..., yn}).
We have learnt that, if we put the realizations {x} into Fx and the realizations
{y} into Fy we get back new random variables, ux and uy , drawn from U [0, 1].
So, let’s do it:
{ux1 , ux2 , ..., uxn} = {Fx (x1) , Fx (x2) , ...Fx (xn)}
Why so? Because the marginals of this joint distribution are uniform distribu-
tions — and this is just the definition of a copula.
This decomposition is very useful, because the copula contains all the infor-
mation about the dependence among the variables, and no information about
their marginals.
Suppose that I have determined the copula, C, ie, the joint distribution function
of the uniform variates, ux and uy :
C (ux, uy ) = Prob [U x ≤ ux, U y ≤ uy ] (48)
Then, if we find a way to generate samples {ux, uy } from the copula joint
distribution, one can generate a sample from the arbitrary original distributions,
Fx and Fy , simply by doing
x = Fx−1 (ux) (49)
0.1 −0.05
cov = Σ = (51)
−0.05 0.25
1 −0.1
corr = ρ = (52)
−0.1 1
Figure 28:
Figure 29:
Suppose that we assume that the copula was a Gaussian copula.
C = Φnorm ϕ−1 −1
x,norm (ux) , ϕy,norm (uy ) (53)
It also means that the joint density created by the Gaussian copula will be a
Bivariate Normal density.
Look back at Fig (27): look at the contours, which bulge in an out, and
which cannot be turned into circles by rescaling the axes (an indication that
the distribution is not elliptical).
Clearly, for a dependence as complex as this, we cannot hope for the simple
correlation coefficient to do an adequate job.
The really important message is that, even with the marginals and the co-
variance matrix and the correlation matrix all perfectly recovered, the resulting
bivariate density can be very different from the true one.
By the why, this is how the data were created in MatLab. Not elegant, but
does the job.
The complex density is obtained in Fs. You can play with it yourselves.
x1 = -3:.15:3; x2 = -3:.15:3;
[X1,X2] = meshgrid(x1,x2);
mu = [0, 0];
Sigma = [.25, 0.3; 0.3, 1];
F1 = mvnpdf([X1(:) X2(:)],mu,Sigma);
F1 = reshape(F1,length(x2),length(x1));
surf(x1,x2,F1);
caxis([min(F1(:))-.5*range(F1(:)),max(F1(:))]);
axis([-3 3 -3 3 0 .4])
xlabel(’x1’); ylabel(’x2’); zlabel(’Probability Density’);
figure
mu = [0, 0.1];
Sigma = [.25, -0.4; -0.4, 1];
F2 = mvnpdf([X1(:) X2(:)],mu,Sigma);
F2 = reshape(F2,length(x2),length(x1));
surf(x1,x2,F2);
caxis([min(F2(:))-.5*range(F2(:)),max(F2(:))]);
axis([-3 3 -3 3 0 .4])
xlabel(’x1’); ylabel(’x2’); zlabel(’Probability Density’);
figure
mu = [0, 0.05];
Sigma = [.25, -0.05; -0.05, 0.98];
Fs = mvnpdf([X1(:) X2(:)],mu,Sigma);
Fs = reshape(Fs,length(x2),length(x1));
surf(x1,x2,Fs);
caxis([min(Fs(:))-.5*range(Fs(:)),max(Fs(:))]);
axis([-3 3 -3 3 0 .4])
xlabel(’x1’); ylabel(’x2’); zlabel(’Probability Density’);
Let F (x, y) be their CDF, and Fx (x) and Fx (x) their marginals.
We know that
First of all, the copula function gives a joint cumulative distribution. So, it
must return values between 0 and 1.
Suppose that I have chosen a very large value of x. Then the probability that
a draw of X will give a value smaller than x is very high. Let’s say
Irrespective of whether X and Y are concordant, the joint probability will also
be very low (because a finite number times a vanishingly small number must
be equal to almost zero).
Take the standard Gaussian distribution, and Gaussian marginals.
Then
P [X ≤ 5, Y ≤ 5] = F (5, 5) ≃ 1 (57)
and
Fx (5) ≃ 1 (58)
Fy (5) ≃ 1 (59)
But we know
F (x, y) = C Fx (5), Fy (5) ≃ 1 (60)
1 1 1
So, irrespective of whether X and Y are positively or negatively correlated, as
the cumulative marginals tend to 1 together, the joint distribution must tend
to 1, and so must the copula as its two inputs tend to 1.
24 Building a Copula by Hand (Emprical Copu-
lae)
Φ (u, v) = P (U ≤ u, V ≤ v) (61)
and approximating as:
# (U ≤ u, V ≤ v)
Φ (u, v) = P (U ≤ u, V ≤ v) ≃ (62)
N
This is what we get (the number count in the numerator first, the whole cal-
culation then (N = 45):
Figure 33:
Figure 34:
0.4
0.3
0.2
0.1
01
0.8
0.6
0.4
0.2
0.8 0.9 1
0 0.5 0.6 0.7
0.2 0.3 0.4
0 0.1
Figure 36: