Professional Documents
Culture Documents
Lecture 4
Estimating the Joint Distribution of Risk Factors
Riccardo Rebonato
Key Words and Concepts:
• marginal distributions,
• copula,
• inverse function,
Up to now we have assumed that someone had given us the joint distribution
of risk factors and that we simply had to sample from it. We did not explain
where this probability distribution was coming from. We fill in this gap in this
chapter.
3. lastly we put the two sets of information together using a copula approach.
Unless we use the historical-simulation approach, this approach is almost uni-
versal.
For instance:
• if, instead, we are interested in the risk profile of the whole portfolio of a
large bank, we will need, and want to, adopt a much more broad-brush
approach.
Not surprisingly, different tools will come to the fore.
So, in this lecture we will look at:
The efficient Monte Carlo procedure alluded to in point 4. will be one of our
preferred techniques to achieve our goal.
2 Transforming the Marginal Distribution of Ran-
dom Variables
In this part of the course we are going to learn how to transform the mar-
ginal distribution of a random variable to a different marginal in such a way
that something important is preserved (what this ‘something important’ is will
become apparent in the following).
Then, from the Uniform distribution we transform again to the target marginal
distribution of choice.
P r [x ≤ a] = E 1{x≤a} , (2)
where 1{condition} is the indicator function which is equal to 1 if condition =
T RUE, and equal to zero otherwise.
In what follows the strictly increasing function in (1) will be the inverse function,
φ−1.
We start from the standard normal distribution, N , and then we generalize the
setting.
Let z be drawn from N :
z˜N µ, σ2 .
u = N (z) . (3)
(Again, a sketch should convince you that this is true: the cumulative distrib-
ution is just a straight line that goes up at 45 degrees.)
P r z ≤ N −1 (u0) = u0 (4)
where we have made use of the fact that N (·) and N −1 (·) are both strictly
increasing functions (see below).
To make sure that we understand the last line, note that the inverse N −1 (u0)
is the Gaussian random variate whose cumulative probability is u0.
The probability that a random Normal variate, z, is smaller than the Gaussian
random variate, N −1 (u0), whose cumulative probability is u0, is just the
cumulative probability, u0.
Let’s do the last step slowly again.
P r [u ≤ u0] = u0 (9)
and therefore u is uniformly distributed between 0 and 1 (because, as we saw
before, uniform distribution is that distribution for which P r [u ≤ u0] = u0).
By the way, in the derivation we have made use of the fact that
But note that we did not make use of any property of the Normal distribution
to obtain this result.
If I draw from this arbitrary distribution lots and lots of random variates, and I
calculate the cumulative distribution corresponding to all these different random
values, the resulting quantities are uniformly distributed.
This means that, given any distribution, we know how to transform this distri-
bution to the uniform U [0, 1].
Now we are going to see how to transform random variables drawn from a
given distribution, φ, to ‘equivalent’ random variables drawn from a different
distribution, χ.
To fix ideas we start from a Student t distribution with, say, 2.9 degrees of
freedom.
We call these variables t1, and we display their histogram in Fig (2).
Figure 2: 10,000 random realizations drawn from a Student-t distribution with
2.9 degrees of freedom.
Now we apply the cumulative Student-t function with 2.9 degrees of freedom
to the variables t1 that we have generated.
So far, so unsurprising.
But uniform variates are uniform variates: they do not keep a label telling
statisticians from which distribution ‘they were originally drawn’.
This means that now we can transform these U [0, 1] variables to variables
distributed according to any distribution for which we know how to calculate
the inverse.
Figure 3: The 10,000 random variable drawn from the 2.9-degrees-of-freedom
Student-t distribution in Fig(1), transformed to the uniform U[0, 1] disribution
as described in the text.
So, for instance, if we know how to calculate the inverse of, say, a gamma dis-
tribution, we can obtain, from the original Student-t variables, the ‘equivalent’
Gamma-distributed variables.
Step by step:
It means that if, in the distribution we started with, one draw was so high
that only, say, 5% of draws are on average higher, then in the new equivalent
distribution we are going to create a draw so high that only 5% of draws (in
the new distribution) are on average higher.
Figure 5: The 10,000 U[0, 1] random variables which had been obatained from
the 10,000 Student-t distribution with 2.9 degrees of freedom, tranformed to a
Gaussian (N (0, 1)) distribution.
3 Applications to Monte Carlo Simulation
Suppose that we have a bunch of variables, and we have ascertained that they
are distributed according to different distributions. Using the inverse, we can
first transform all these variables to uniform random variables, and from these
to normal random variables.
Given these normal random variables we can estimate the matrix of correlation
among them.
Once we know that a bunch of jointly normally distributed variables has a known
vector of means, and a known covariance matrix (a vector of variances and a
correlation matrix), then we can very easily carry out a Monte Carlo simulation
for Gaussian variates, as we have learnt to do in Lecture 2.
But, you may say, this is not what we were interested in: we want to simulate
vectors drawn from the complex distribution we started with, not from a multi-
variate Gaussian.
But, if a Gaussian copula does a decent job at describing the empirical depen-
dence, then in the limit the procedure becomes quasi exact.
It gets better.
As we shall see in the next section, with a tiny little bit of care we can use the
empirical marginal distributions, and follow the same procedure.
We are going to use the same data that we used in the baby example in Lecture
2, namely six time series, covering more than 10 years (2736 observations) for
three equity indices (the S&P500, the FTSE100 and the DAX), and three swap
rates (10 10-year, 5-year and 2-year swap rates).
We have the following vector of sensitivities, hi:
Asset Sensitivity hi
S&P500 128, 000
FTSE100 −72, 000
DAX −40, 000 (11)
Swap10y 800, 000
Swap5y 600, 000
Swap2y 200, 000
The changes in the underlying time series and the sensitivities are related to
the changes in the P&L by
To be clear, if from day i − 1 to day i the S&P index changed from 2020 to
2012, then the associated P&L would be:
P LS&P 500 = 128, 000 × (2012 − 2020) = −$1, 024, 000 (13)
i
And, before we get started, let’s standardize each data series by subtracting its
mean and dividing by its standard deviation.
Let’s look first at the normalized histogram for the empirical S&P500 returns.
(By the way, from now on I will no longer say that the data have been normalized
— we just have to remember it.)
The empirical cumulative distribution function and its linear interpolation are
shown in Fig(??).
Figure 6:
From the (interpolated) empirical cumulative distribution, φSP , we know how
to obtain the associated uniform variates:
Fig (8) shows how uniform the associated random variables obtained by follow-
ing this procedure turned out to be:
Figure 7:
From the associated uniform variates we can go back to the original distribu-
tions.
We just have to do
−1
xSP
sim = φSP u
SP
Even in this case, however, I could simulate far more variables than in the
original sample (this is where the interpolation becomes useful), which certainly
looks good, and may even be useful.
Exercise 1 Under what conditions am I adding information by running more
simulations than the original number of data points?
We can do the same for the other five time series.
We do not show the results because they are very similar, and similarly good.
Let’s go back to the uniform variates obtained for the six time series, and let’s
collate them in a [2736 × 6] matrix.
As we do so, let’s keep careful track of the time ordering, so that the kth
uniform variate associated with, say, the FTSE, is lined up in the matrix with
the kth variate for all the other time series.
Figure 9:
Having done this, let’s transform each vector of uniform variates not back to
the original distribution, or to some exotic distribution such as the gamma that
we saw before, but to a Gaussian.
Because we know how to draw (and hence simulate) easily from a multivariate
Gaussian distribution.
The idea is to simulate a zillion of correlated Gaussians, and then to trans-
form them back to the original variables, while preserving (almost) the original
correlation.
Since we are now in Gaussland we are in the perfect place to calculate correla-
tions, which we do and show in Fig (??).
Figure 10:
Could we have calculated the correlation matrix also using the original data, or
perhaps even the associated uniform variates?
Of course we could, and Figs (11) and (12) show what we would have obtained.
As a matter of fact, the marginals and the associated scatterplots look extremely
different, but the correlation matrices themselves are remarkably similar.
Figure 11:
Discussion.
Figure 12:
Now that we have a correlation matrix, a vector of standard deviations for each
variable, and a vector of means, we can easily run a Monte Carlo simulation
with Gaussian variates, as we saw in Lecture 2.
At this point there are two avenues open to us, one that leads to perdition and
despair, the other to joy and success.
The bad way to do things is to stop at this stage, and just use the Gaussian
variates (after multiplying each by its standard deviation and adding back in
the mean) to calculate the P&Ls.
We note that there are very big differences between the empirical and the
simulated cumulative distributions.
And, by the way, this Fig (14) compares the P&L distribution obtained via His-
torical Simulation with the distribution obtained using the convoluted Gaussian
route.
Figure 13:
Figure 14:
So, if this an just an expensive way to obtain garbage, what is the path to joy?
Clearly, once we have obtained the correlated Gaussian draws we have to trans-
form each set of variates back to the original empirical distribution, by going
via the associated uniforms.
So, first we do, for each time series k,
uk = NSP z k (15)
(where the z k are the associated Gaussian draws for variable k).
Next we do
xk = φ−1
k uk = φ−1 N
k SP z k (16)
The same comparison between the empirical and the simulated cumulative
distribution now looks as shown in Fig (15):
The two curves are now so on top of each other that they cannot be distin-
guished.
Does this mean that we have found a perfect way to carry out a simulation
from an arbitrary (and arbitrarily complex) joint distribution?
Not quite.
To understand why not, let’s continue the exercise one more step, namely
let’s compare the empirical and simulated cumulative distributions of the P&Ls
instead of the variables xk . (We obtained the empirical distribution of P&Ls
using Historical Simulation.)
But why is it not as good as what we obtained when the looked at the univariate
distributions?
Clearly our Achille’s heel is in the description of the correlation (co-dependence,
really) among the original variables.
For all we know, they may have had a very different type of tail dependence
than what allowed by the Gaussian multivariate distribution;
or positive changes may have been correlated more or less strongly than negative
changes;
or whatever.
In general, a different copula other than the Gaussian could have been the best
description for each individual pair.
This does not mean that at all that the simulated variables will have Gaussian
tails: the marginals are all very well recovered.
You can check that, apart from numerical noise, it is essentially the same as the
VaR calculated with the analytic method in the previous lecture, and reported
again below in Fig (18) for ease of comparison.
Figure 18:
In case you want to play with this at home, I have written below a simple piece
of code that does the job. Caveat emptor.
5 What Have We Done?
Today we have learnt how to draw from the unconditional joint distribution of
several variables representing risk factors.
We have not fitted the marginals to any named distribution. We have used the
empirical marginals instead.
To carry out the draws of the high-dimensional variates, we have had to impose
a Gaussian dependence among the variables. This is the weakest point of the
procedure, and the most difficult to fix.
What remains to be done?
Remember that I would like to draw not from the unconditional joint distribution
of risk factors, but from the joint distribution of risk factors that applies to
today.
This is the first fix, and it is something that we will learn how to do in a future
lecture.
Why did I say before that using the empirical marginals was good and bad?
Discuss (Suppose that the length of each vector was 4 days, and then I drew
1,000,000,000 joint realizations.)
What could we do to fix the fact that, as we implemented the procedure, we
are never going to draw a value for any risk factor larger or smaller than what
in our data set?
What happens if I take a pencil and I extend monotonically (right and left) the
empirical cumulative distribution by half a mile? Do I still have a bona fide
cumulative distribution?
Are there principled ways of making this extension?
Yes, there are: they are given by the Extreme Value Theory — soon to appear
on these screens.
When we put these pieces together (ie, when we go from the unconditional to
the conditional joint distribution of risk factors, and when we allow for occur-
rences larger/smaller than what in our data set), we really have a industrial-
strength (ie, real-life, not baby toy) Monte Carlo simulation tool.
If I had a lot of time and money, this would be my favourite tool to calculate
a hypothetical distribution of risk factors (and hence a P&L distribution).
Discuss: How would you check whether the Gaussian copula assumption is
good enough?
Think of comparing HS with the same number of MC simulations.
Discuss: In what respects is Historical Simulation better / worse than the fancy
MC we have just learnt to run?
(Think of altering dependence, exploring tails, making conditional, non-Gaussianity
of the copula)
6 The Code
if indexstring == 1
titlestring = ’S&P500’;
end
if indexstring == 2
titlestring = ’FTSE100’;
end
if indexstring == 3
titlestring = ’DAX’;
end
if indexstring == 4
titlestring = ’Swap10y’;
end
if indexstring == 5
titlestring = ’Swap5y’;
end
if indexstring == 6
titlestring = ’Swap2y’;
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ totunif, totgauss, corrorig, corrunif, ...
... corrgauss, meanorig, stdorig, meangauss, ...
... stdgauss, absdeldata, standdata] = DataAnalysis1( )
%
%
% The function creates equivalent Gaussian
% variables that display the same
% correlation as the original variables
% read in by the function
% ReadInData();
x = standdata(:,kser);
titlestring = ChooseString( kser );
hist(x,-6.4:0.2:6.4); % plots the original data
title (titlestring);
xlabel(’Z (empirical)’);
ylabel(’Frequency’);
xlim([-4 4]);
figure
utest = rand(10000,1);
ztest = Finv(utest);
hist(ztest,-8:0.2:8);
xlim([-5 5]);
title (titlestring);
xlabel(’Z (simulated)’);
ylabel(’Frequency’);
figure
for k=1:npoints
unif(k)=Fcum(x(k));
end
totunif(1:npoints,kser)=unif(1:npoints);
end
corrorig = corr(standdata);
corrunif = corr(totunif);
corrgauss = corr(totgauss);
corrplot(standdata, ’varnames’,{’SP500’,’FTSE100’, ’DAX’, ’Sw10y’, ’Sw
title(’Correlation (original data)’);
meanorig = mean(absdeldata);
stdorig = std(absdeldata);
meangauss = mean(totgauss);
stdgauss = std(totgauss);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[numseries, ~] = size(corrmatrix) ;
[numsimul, ~] = size(absdeldata) ;
sensvector=[128000; -72000; -40000; 800000; 600000; 200000];
%% This is all done with Gaussian simulated data!!
lowertriang = chol(corrmatrix,’lower’);
for ksim= 1:numsimul;
draw = normrnd(0,1,numseries,1);
shockvectorGauss = diag(stdgauss) * lowertriang * draw ;
shockmatrixGauss(ksim,1:numseries) = shockvectorGauss(1:numseries)
detvector = meangauss’ ;
standreturnvector = detvector + shockvectorGauss;
totPandLMC(ksim,1) = 0;
totPandLHS(ksim,1) = 0;
for kser = 1:numseries
returnmatrix(ksim,kser) = (standreturnvector(kser,1) * stdorig
end
end
PandLGauss = PandLMC;
%%
% Gaussian simulation is over; now analyze the results of the Gaussian
% eperiment
%
%%
for k=1:numsimul
unifGauss(k)=Fcum(x(k));
end
totunifGauss(1:numsimul,kser)=unifGauss(1:numsimul);
end
%%
% Now I have the uniform variates that correspinds to the
% random draws. I have obtained them from the simulated
% Gaussian variates with the right correlation. I must
% transform them to the original empirical distributions.
yunif = totunifGauss(:,kser);
for k=1:numsimul
empir(k)=Finv(yunif(k));
end
empshocxkmatrix(1:numsimul,kser)= empir(1:numsimul);
end
%%
% Now I can start the simulation again with the appropriate
% variables with the empirical marginal distributions and with the
% correct correlation matrix
detvector = meangauss’ ;
empshocxkvector = empshocxkmatrix(ksim,:);
empreturnvector = detvector + empshocxkvector’;
totPandLMC(ksim,1) = 0;
totPandLHS(ksim,1) = 0;
for kser = 1:numseries
returnmatrix(ksim,kser) = ...
(empreturnvector(kser,1) * stdorig(1,kser)) + meanorig (1, kse
end
end
for kser = 1:numseries
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ absdeldata ] = GetAbsDiff( data, recordlength, numseries)
%
for kser=1:numseries
for krec=1:recordlength-1
absdeldata(krec,kser) = data(krec+1,kser)-data(krec,kser);
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ data, recordlength, numseries ] = ReadInData()
% The columns contain SOX, UKX, DAX, USSW10, USSW5, USSW2
data = csvread(’C:\Users\r_rebonato\Documents\MATLAB\ProjectVaR\FIEquityDa
[recordlength, numseries] = size(data);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [ standdata ] = StandardizeData( data )
%
[recordlength, numseries] = size(data);
for kser=1:numseries
average = mean(data(:,kser));
stdev = std(data(:,kser));
for krec=1:recordlength
standdata(krec,kser)= (data(krec,kser)- average)/stdev;
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%