Lecture 7 - Obtaining The Distribution of Risk Factor

Market Risk Measurement
Lecture 2
Obtaining the Distribution of Risk Factors
Riccardo Rebonato
Key Words and Concepts:
• Monte Carlo,
• Historical Simulation,
• Normal Approximation,
• VaR,
• High-Dimensional Monte Carlo,

• Principal Components,
• Market Factors.
1 Plan of the Lecture
Let’s work backwards from our ultimate goal — which is to obtain some reason-
able measures of the market risk of our portfolio.
As we discussed in the previous lecture, in order to get these measures, we need

the distribution of changes in value of our portfolio (its Profits and Losses —
P&Ls) over a given horizon.
This is a univariate distribution.

The quantity to which the univariate distribution refers is the sum of the changes
in values of all the individual positions in our portfolio.
So we need the changes in value of these individual positions.
The changes in value of these individual positions are given by a mapping from
the changes in the risk factors that affect our portfolio to the P&L of each
position.
The mapping from the changes in risk factors to the P&L in the individual
positions may be complex (and approximate), but it is deterministic.
What is not deterministic is the vector of changes in the risk factors. So we

need a statistical model for the changes in the risk factors.
Saying that we need a statistical model means (implicitly or explicitly) that we

need the joint distribution of the risk factors.
Obtaining this joint distribution is what we are going to learn how to obtain in
this and in the next few lectures.
We will discuss three ways to get there:
1. the parametric approach,
2. the historical method and
3. the Monte Carlo simulation model.
As we shall see, they all have their pros and cons.

Once again, the road is long and tortuous, so it is good to keep in mind where
we are heading — hence this introduction.
As we shall see, if we want to use our risk measures for regulatory purposes,
we will have to contend with a very large number of risk factors — where by ‘a
very large number’ I really mean ‘a very large number’: they could be of the
order of a quarter of a million.
This poses practical problems, that we will also discuss in the next lectures.
So, in this lecture we are going to deal with a very small number of risk factors
in order to understand the mechanics of the what we have to do.
In order to see the whole process soup to nuts, we will carry the treatment all
the way to obtaining the most popular risk measure (VaR) with baby imple-
mentations of the three methods mentioned above.
The versions of the historical, parametric and Monte Carlo methods we intro-
duce in this lecture mainly have a pedagogical purpose, and will not be good
enough to use in real life, but we will refine the three techniques as we proceed
with the course.
2 Defining VaR (and Expected Shortfall)
Since we are going to compute Value at Risk (VaR) in three different ways, it
seems a good idea to define first exactly what it is. There are many definitions
of VaR.
All the correct definitions are equivalent.
Not surprisingly, the non-correct ones are not.
Perhaps it is useful to start from the most common non-correct definition —

which, alas, is probably also the most common definition of all among non-
specialists (this includes journalists, judges, commentators, lawyers, etc...)
Wrong Definition A 10-day VaR is the maximum amount of money that a
portfolio can lose over a period of 10 days.
What is wrong with this definition?
(The give-away that this definition cannot be true is that there is no mention
of the percentile level.)
Here are good definitions of VaR.

Definition 1 When we say the N-day VaR at the X percentile level is $V, we
mean that we are X percent certain that we will not lose more than V dollars
in time T. (This definition is taken from Hull.)
This is better, but it is still not totally clear what ‘being X percent certain’
means.
One senior bank manager I knew thought that the higher the percentile, the
more sure we were.
Does it make sense that we can be more sure about 1-in-10,000-years events
than in 1-in-10-days events?
This certainly not what John Hull means, but someone could get confused. So,
here is another definition (which I prefer).
Definition 2 The N-day VaR at the X percentile level is the loss that a port-
folio will on average exceed only in (100-X) N-day periods out of 100.
This definition is intuitive and precise, but it is not ‘constructive’: it tells me

what VaR is, but not how to go about computing it.
Can we do better?
Let’s build a probability density of N-day profits and losses.
The area under the density curve is clearly equal to 1 (because probability
densities are normalized to 1.)
Consider the N-day VaR at the X percentile level.
Look up on the x axis the loss value such that (100-X)/100 of the area under
the density curve is to the left of it.
See Fig (var1).

So, if X = 99, then I want to find the value of x (the loss) such that 1% of
the area of the curve is to the left of it.
Instead of the probability density, consider the cumulative distribution of N-day

losses.
Now I record losses, L, as a positive number.

Then the X-percentile, N-day VaR is the inverse of the cumulative distribution
x :
corresponding to the value of the cumulative distribution, ΦL, of 100
X
X
V aRN-Day = Φ−1
L (1)
100
So, since Equation (1) is the definition of a percentile, VaR itself is just a
percentile.
So, here are two precise and constructive definitions of VaR.

Definition 3 The N-day VaR at the Xth percentile level is the Xth percentile
of the N-day loss distribution, ΦL, (where losses are positive numbers):
X
X
V aRN-Day = Φ−1
L .
100
Definition 4 The N-day VaR at the Xth percentile level is the smallest num-
ber, x, (the smallest loss) such that ΦL (x) ≥ X):
X
V aRN-Day = inf (x : ΦL (x) ≥ X) . (2)
Of course, what we have just given is simply the definition of a percentile.
We have written it in full, because one often finds it as a definition of VaR.

3 Problems with VaR (Big and Small)
In this section I am not asking whether VaR has been used properly to manage
financial risk.
I am asking the simpler question of whether VaR is a good risk measure, where
‘risk measure’ has the precise meaning we discussed in Lecture 1.
Let’s start from something about which a lot of ink has been spilled.
VaR does not enjoy sub-additivity. Here is an example (from Hull).

We have two projects with uncertain outcomes.
The success or the failure of each project is independent of the success or failure
of the other.
Each project has a probability of 0.02 of losing $10m and a probability of 0.98
of losing $1m.
(They can’t be great projects, since they only seem to lose, but let’s not worry
about this.)
What is the 97.5-percentile VaR for each project separately?
We can read it directly from Fig (1): it is $1m.

Now let’s put the two portfolios together.
Given independence, we have the following possible outcomes, with their prob-
abilities:
 
Prob Loss ($m)
 0.980 × 0.980 = 0.9604 1 
 
 
 2 × (0.02 × 0.980) = 0.0392 11 
0.02 × 0.02 = 0.004 20
Figure 1:
This is the associated cumulative probability function (Fig (2)):As you can
immediately read from the figure the VaR is now $11m.
So, the 97th percentile VaR from putting the two portfolios together is (way)
larger than the sum of the VaRs from the two single-project portfolios.
This doesn’t look good and, in theory, it isn’t.
However, in practical applications with realistic portfolios it is very rare to find

these violations of sub-additivity.
Figure 2:
Something we should worry about more is that VaR does not take into account
how big the loss is beyond the VaR level.
In Fig (1), for instance, the 97.5-percentile VaR would have remained $1m even
if the worst loss had been $1b, instead of $10m.
What can we do instead?

4 Conditional Expected Shortfall (CVaR)
An obvious solution to this problem is the following.
We could average all the losses in the tail past the VaR level, weighted by their
probability of occurrence.
This means that we could compute a new quantity (called Conditional Ex-
pected Shortfall, or, sometimes, CVaR, or Average VaR) as follows
+∞
V aRX xϕL (x) xdx
CES = +∞ (3)
V aRX ϕL (x) dx
where V aRX is the Xpercentile VaR and
x
ϕL (u) du = ΦL (x) (4)
−∞
So, the Conditional Expected Shortfall is the expected loss, conditional on
the loss being greater or equal to VaR, ie, to the X percentile of the loss
distribution.
From this it follows that another way to write this is the following
1 1
CES = V aRxdx (5)
1−X X
To see the equivalence, start from the denominator (which is easier), and add
V aR
and subtract the same quantity, −∞ X ϕL (x) dx.
Then we have:
+∞
ϕL (x) dx =
V aRX
V aRX +∞ V aRX
ϕL (x) dx + ϕL (x) dx − ϕL (x) dx =
−∞ V aRX −∞
1 ΦL(V aRX )
1 − ΦL (V aRX ) = 1 − ΦL Φ−1
L (X) = 1 − X (6)
where we have made use of the fact that V aRX is the Xth percentile of the
cumulative distribution of losses, ΦL (x), and that the father of the son of
Johnny is Johnny.
So we have established that
+∞
ϕL (x) dx = 1 − X. (7)
V aRX
1 in the expression
This takes care of the denominator, ie of the term 1−X
1 1
CES = V aRxdx (8)
1−X X
1 V aR dx.
For the numerator, consider the expression X x
Here we are adding up all the quantities V aRu, which are just the losses at
the u confidence level, for all confidence levels from X to 1.
So, we are adding up all the losses in the tail beyond V aRX .
Note that this is an equal-weight average.
Why are we giving equal weight to all these losses?
Because we know that

X
X
V aRN-Day = Φ−1
L 100
ie, we know that VaR is the inverse of the loss distribution.
And we shall see in the next lectures that, for any distribution function, the
values of the inverse of that distribution are uniformly distributed.
This means that if I draw from any distribution lots and lots of random variates,
and I calculate the cumulative distribution corresponding to all these different
random values, the resulting quantities are uniformly distributed. [Drawing on
whiteboard]
5 When VaR (and CVaR) Make Sense
As definitions go, what we have provided so far makes perfect sense.
However, when does it make sense in practice to carry out these calculations?
Consider the 1-day, 95th percentile VaR.
This is the loss that should be exceeded only on 5 business day out of 100.
Since 100 business days, give or take a few days, are approximately 4 calendar
months, this means that, if we have one years’ worth of data, we should expect
between 10 and 15 ‘exceptions’ (ie, losses greater than the VaR) per year.
Suppose we have 4 years’ worth of data.
This means approximately 1,000 business days, and therefore about 50 excep-
tions.
We will look at the precise statistical estimates in a future lecture, but it seems
plausible that estimating a loss that should be exceeded once a month with 4
years’ worth of data should not be an impossible task.
Consider now the 99th percentile, 10-day VaR.
Now we are talking about losses that should be exceeded once every one hun-
dred blocks of 10-day periods. This means 1,000 business days (as we saw,
approximately 4 years).
All of a sudden our data set looks terribly skimpy.

Why don’t we add more data?
Assuming that 20 years ago we had an unusual degree of foresight, and we

began squirreling away data not even knowing that we may use it after two
decades, we could do so.
From the statistical point of view things look a bit better: we can now expect
5 exceptions.
However, are those ancient data relevant to today’s conditions?

Suppose that I was required to calculate the 99th percentile, 10-day VaR in
September 2008.
(By the way, this percentile and this holding period have not been plucked
out of thin air: these are exactly the parameters required by the regulators to
calculate capital.)
In September 1988 (ie, twenty years before), the world looked very different:
men were sporting amazing hair-dos, women had jackets with 10-inch-padded
shoulders, and, more to the point, CDO of ABSs had not been invented yet.
The point here is not just one of data availability, but also one of data relevance:
for a risk measure such as VaR to be useful, it must be extracted from a
conditional distribution of risk factors (ie, the one that applies to today ), not
from a long-run unconditional one.
The further back I look in the future, the less relevant to today’s conditioons
those data points are.
Recent data are relevant but few; ancient data (if they have been kept) are
plentiful, but less and less relevant the further back we look.
And it is not just that some instruments had not been invented in the days of
Saturday Night Fever.
The macrofinancial conditions were radically different: in the 1970s the big
monster to slay was rampant inflation.
Chairman Volker raised rates above 15% to smother inflation.
And the first mortgage I took out in 1990 had a price tag of 17% interest per
annum.
The past is indeed a foreign country.

What should give pause for thought is that, even today, regulators require (in
the credit arena) estimates of the 99th percentile 1-year VaR. Which means
that to get 2 exceptions I should go back to the aftermath of the Napoleonic
wars, and to get 4 to when Charles II was beheaded.
In the crazy days before the crisis, the VaR inflation was out of control: grown-
ups could speak with a straight face of the 99.975th percentile at the 1-year
horizon. If I had more time, I would tell you how that crazy percentile was
arrived at, but you get my point.
This does not mean that VaR & Co are useless.
If used properly, they are very valuable measures, and it is worthwhile investing a
lot of resources (people, computers, databases) and a lot of time to understand
them, and calculate them.
However, please always ask yourself if what you are calculating makes sense,
and, if it doesn’t quite, what you can do instead.
6 Three Simple VaR Calculations
In this part of the lecture we assume that we have made ourselves happy that
what we are trying to estimate is a meaningful risk measure, and we set our
doubt to one side.
We are therefore going to compute VaR, not in one, but in three different ways:
analytically, using historical realizations, and via a simple-minded Monte Carlo
simulation.
We are going to use a simple portfolio, that we describe below.
With such a simple portfolio many of the problems one encounters in real life
(when one has to deal with tens of thousand of risk factors) are swept under
the carpet.
In the following lectures we will learn how we can make much better the crude
estimates we learn to perform today.
Why not doing it right the first time? Because we have no hope of understand-
ing how to handle these tough problems until we get our head around solving
the simple ones.
So, the plan of the work ahead is as follows:

• first, we are going to calculate the VaR using an analytical approach;
• next we will use a naive historical simulation method;
• last, we will look at a similarly naive Monte Carlo simulations.

As we said, we are going to look first at the analytical calculation for VaR.
As a prelude to that, we consider the variance of our portfolio — we shall see in

a moment why that is useful.
7 The Variance of a Portfolio
Let’s consider the simplest non-trivial portfolio, Π, we can think of, ie, a port-
folio made up of a holdings of security x and b holdings of security y:
Π = ax + by
A change in this portfolio will be given by

∆Π = a∆x + b∆y (9)
The expected change in the portfolio, E [∆Π] ≡ ∆Π, is given by

∆Π = a∆x + b∆y (10)
where ∆x = E [∆x] and ∆y = E [∆y] are the expected changes in x and y,
respectively.
What is the variance of the portfolio, var (∆Π)?
We know that it will be given by
var (∆Π) = E ∆Π2 − E [∆Π]2 . (11)
Let’s calculate the two terms, starting from the first:
E ∆Π2 = E (a∆x + b∆y)2 = E a2∆x2 + 2ab∆x∆y + b2∆y 2 =
a2E ∆x2 + 2abE [∆x∆y] + b2E ∆y2 (12)

As for the second term we get
2 2
E [∆Π] = a∆x + b∆y =
2
a2∆x + 2ab∆x ∆y + b2∆y 2. (13)
Set for ease of notation
∆x = E [∆x] ; ∆y = E [∆y] (14)
Subtracting the first term from the second and rearranging terms we get
var (∆Π) =
2 2
a2E ∆x2 + 2abE [∆x∆y] + b2E ∆y 2 − a2∆x + 2ab∆x ∆y + b2∆y =
E[∆Π2] E[∆Π]2
a2 E ∆x2 − E [∆x]2 +
var(∆x)
2ab {E [∆x∆y] − E [∆x] E [∆y]} +

covar(∆x∆y)
b2 E ∆y 2 − E [∆y]2 .
var(∆y)
We can write this in an even more compact manner by defining
var (∆x) ≡ σ2∆x
var (∆y) ≡ σ2∆y
covar (∆x∆y) = σ∆xσ∆xρ∆x,∆y
Then we have that the variance of the portfolio changes is given by
var (∆Π) = a2σ2∆x + σ ∆xσ∆xρ∆x,∆y + b2σ 2∆y ,

where ρ∆x,∆y is the correlation between the changes in security x and security
y.
We can generalize to a portfolio with n components (that we now call ω1, ω2,
..., ω n) and, at the same time, use a more elegant and compact notation, by
using matrix algebra.
Since we do not want to run out of letters, we are going to denote the n
securities by x1, x2, ..., xn.
To carry out this generalization, note first that the covariance matrix of the
securities can be written as
cov ∆xi∆xj = σiσ j ρij (15)
   
σ1 0 ... 0 1 ρ12 ... ρ1n σ1 0 ... 0
 0 σ2 ... 0  ρ21 1 ... ρ2n  0 σ2 ... 0 
   
   
 ... ... ... ..  ... ... ... ..  ... ... ... .. 
0 0 ... σn ρn1 ρn2 ... 1 0 0 ... σn
(16)
(If it is not obvious, please do check that the matrix expression does give the
covariance matrix, as it should.)
Then the variance of the portfolio (which, of course, is just a number), is given
by
var (∆Π) =
ωiωj σiσ j ρij =

i j
  
c11 c12 ... c1n ω1
 c21 c22 ... c2n  ω2 
  
ω1 ω2 ... ωn   
 ... ... ... ..  ... 
cn1 cn2 ... cnn ωn
with
 
c11 c12 ... c1n
 c21 c22 ... c2n 
 
cov ∆xi∆xj = =
 ... ... ... .. 
cn1 cn2 ... cnn
   
σ1 0 ... 0 1 ρ12 ... ρ1n σ1 0 ... 0
 0 σ2 ... 0  ρ21 1 ... ρ2n  0 σ2 ... 0 
   
   .
 ... ... ... ..  ... ... ... ..  ... ... ... .. 
0 0 ... σn ρn1 ρn2 ... 1 0 0 ... σn
(17)
The variance of a portfolio can therefore be written more compactly as
var (∆Π) = −
→
ω T cov ∆xi∆xj →
−
ω. (18)
Of course, its standard deviation, stdev (∆Π), is then just given by
stdev (∆Π) = −
→
ω T cov ∆xi∆xj →
−
ω. (19)
We shall see in a moment why calculating the standard deviation of a portfolio

can be very useful.
8 From the Portfolio Variance to the VaR — The
Normal Assumption
The results we have obtained so far are general, in the sense that we have not
invoked any distributional properties for the underlying variables.
But why are they interesting?
Suppose that all the changes ∆xi are normally distributed.
Then we have just found the distribution of the portfolio: it is also normal
(because the sum of normal variables is a normal variable), with a standard
deviation given by stdev (∆Π) = − →ω T cov ∆x ∆x −
i j
→
ω.
And why is this interesting?
Because if a variable (in this case the changes in value of our portfolio) is
normally distributed and we know its variance, then we can work out all its
percentiles.
And, VaR, of course is just a percentile.
See below.
Note that in arriving at this result we have clearly stated one assumption (that
all the securities should be jointly normally distributed), but we have swept
another under the carpet.
The unstated assumption is that the link between the changes in the quantities
{∆x} — which we have called ‘securities, but which will in general be ‘risk
factors’ — and the changes in the positions will be linear.
If the portfolio contains option-like derivatives this will be a poor approximation.
There are some (rather unsatisfactory) approximate ways around the problem,
but we will not go into that because the analytic route to calculating VaR is
useful as a benchmark, but is rarely used in practice.
8.1 Description of the Portfolio
With this result under the belt we can begin the calculations of VaR proper.
First, however, we must describe our portfolio. We stress that this is a baby
portfolio, and that realistic applications will be orders of magnitude larger (if
not more complex).
But we have to start somewhere.

So, our portfolio is made up of positions in six different assets:
1. the S&P500 equity index;
2. the FTSE100 equity index;
3. the DAX equity index;
4. a 10-year US$ swap;
5. a 5-year US$ swap;
6. a 2-year US$ swap.

The data cover 2,736 observations, more than 10 years, and the portfolio sen-
sitivities, hi, are given by:
Asset Sensitivity hi
S&P500 128, 000
FTSE100 −72, 000
DAX −40, 000 (20)
Swap10y 800, 000
Swap5y 600, 000
Swap2y 200, 000
The changes in the underlying time series and the sensitivities are related to
the changes in the P&L by
P Lki = hi xki − xki−1 (21)
where xki is the value of the financial time series k on day i. Note that we are
using absolute, not percentage, returns to obtain the changes in the P&L.
8.2 The Portfolio VaR under the Normal Approximation
If we are happy to enforce the normal distribution for the changes in the risk
factors, then there is no reason not to use the analytical VaR formula for the
standard deviation of the changes in value of the portfolio, stdev (∆Π) ≡ σΠ.
To do so, we are going to make use of two results:

1. the sum of normal variates is a normal variate;
2. the percentiles of a normal distribution with mean µ and variance σ 2 are

related to the percentiles of a standard normal distribution by
perc µ, σ 2 = µ + σ × perc (0, 1) . (22)

Since VaR is just a percentile, we then have
V aR (Π) = µ + σΠ × perc (0, 1) (23)
For the portfolio described above, this gives the following (rounded) values for
the VaR at percentiles levels from 95 to 99.5:
 
Percentile VaR
 95.0 −17, 509, 000 
 
 


95.5 −18, 047, 000 

 96.0 −18, 636, 000 
 
 96.5 −19, 288, 000 
 
 
 97.0 −20, 021, 000  (24)
 
 97.5 −20, 864, 000 
 
 98.0 −21, 862, 000 
 
 
 98.5 −23, 101, 000 
 
 99.0 −24, 764, 000 
99.5 −27, 420, 000
These values will be profitably used in what follows as a comparison benchmark.
9 The Historical Simulation Approach
With the naive historical simulation approach, we simply take the returns as
they happened on all of the 2,736 days of our data set, and, from the sensitivities
in Equation (20) we calculate the changes in P&L:
P &Li = hT∆xi (25)

where the subscript i denotes the past date to which the changes, ∆xi, in risk
factors refers.
Note that, by doing so, we have obtained a univariate distribution (the distri-
bution of P&Ls) from a multi-variate distribution of risk factors.
In this baby case, ‘multi’ meant six; in real applications it may mean 104 or
105.
Once we have obtained all the 2,736 P&Ls, we sort all these changes from the
largest loss to the largest profit.
Each of these realizations has the same probability of occurring because, in this
simple application, we are giving the same weight to every observation (vector
of changes in risk factors), irrespective of whether it happened yesterday or 10
years ago.
So we are saying that the returns are both independent and identically distrib-
uted.
(This, by the way, was the same assumption we were making when we calculated
the analytical Gaussian VaR. Where did both assumptions sneak in?)
Since every sorted P&L realization has the same probability, building the em-
pirical cumulative distribution is super simple:
• the probability of getting anything worse than the worse loss is zero;
• the probability of getting a loss smaller or equal to the worse loss is n1 (with
n = 2, 736 in this case);
• the probability of getting a loss smaller or equal to the second worse loss
is n2 ; the probability of getting a loss smaller or equal to the third worse
loss is n3 ; (you see where I am going...); ...;
• the probability of getting a profit smaller or equal to the largest profit is

n = 1; and
n
• the probability of a profit greater than the largest recorded one is zero.
So, we have just built the empirical P&L distribution
The associated VaR is just the appropriate percentile, ie, the value of the loss
k of choice for the cumulative distribution.
corresponding to the value n
These are the VaR results for the portfolio under consideration obtained from
the Historical Simulation (HS) approach:
 
Percentile VaR HS
 95.0 −17, 509, 000 −16, 698, 000 
 
 


95.5 −18, 047, 000 −17, 536, 000 

 96.0 −18, 636, 000 −18, 457, 000 
 
 96.5 −19, 288, 000 −19, 166, 000 
 
 
 97.0 −20, 021, 000 −19, 770, 000  (26)
 
 97.5 −20, 864, 000 −20, 827, 000 
 
 98.0 −21, 862, 000 −22, 308, 000 
 
 
 98.5 −23, 101, 000 −24, 100, 000 
 
 99.0 −24, 764, 000 −28, 441, 000 
99.5 −27, 420, 000 −32, 794, 000
Note that the HS simulation gives considerably larger losses in the far left tail,
and smaller losses for modest percentiles. The cross-over point, for the portfolio
at hand is around the 97.5th percentile.
Can you explain why this cross-over must happen, if the two distribution have
different fatness of the tails (but the same variance)?
10 The Monte Carlo Approach
Also in the case of the Monte Carlo Method, for the moment we are going to
look at the simplest and most naive implementation.
This means that we are going to assume that the distribution of the risk factors
is jointly Gaussian.
It should be clear that, if the portfolio sensitivities are exactly linear, then
running an expensive Gaussian Monte Carlo to estimate VaR has absolutely no
advantage over calculating VaR using the analytical approach.
What we are trying to learn here are just the mechanics of a basic Monte
Carlo simulation, so that we can build on this knowledge for more realistic
applications.
Exercise 5 When could it be advantageous to use the Gaussian Monte Carlo
method rather than the analytical method?
Exercise 6 What could the advantage be in using the Monte Carlo method
over the historical simulation method if we managed (as we shall learn how to
do in the next lecture) to sample from the empirical distribution of risk factors?
Running a Monte Carlo simulation means carrying out numerically repeated
draws of a set of random variables from a given distribution.
In our present case the distribution of interest is the six-variate Gaussian.
The numerically produced draws must recover, of course, not only the properties
of the (six) individual marginal distributions, but also the codependence (the
6 × 6 correlation matrix, in our case) amongst the variables.
If a single variable if normally distributed with mean µ and variance σ 2, then,
clearly, a simulation of n draws, xi, from this variable can be obtained by doing
n times
xi = µ + σεi (27)
where εi is a numerically produced standard normal variable, returned, for
instance, by the MatLab function randn(), or by the Excel invocation Norm-
sinv(Rand())
(Why, by the way? If you don’t know the answer, see the next lecture.)
However, if we draw independent normal variates for the six risk factors these
will not have the desired correlation structure (they will be independent, to
within numerical noise).
Remember: we want to draw (a zillion of) independent Gaussian vectors for

different times, but (six) dependent Gaussian variables for each time step.
Unfortunately, computers easily produce independent random draws.
What are we to do?

Suppose that we have two Gaussian processes, of the form
dx1t = µ1dt + σ1dzt1

dx2t = µ2dt + σ2dzt2
with
E dzt1dzt2 = ρ1,2dt (28)
Let me re-rewrite the second as follows
dx2t = µ2dt + σ2 ρ1,2dzt1 + 1 − ρ21,2dwt (29)
with
E dwtdzt1 = 0 (30)
The variance of the second process has not been changed by the transformation:
2
E dx2t =
σ 2 ρ1,2dzt1 + 1 − ρ21,2dwt σ2 ρ1,2dzt1 + 1 − ρ21,2dwt =
2
(σ2)2 ρ1,2 + 1 − ρ21,2 = (σ 2)2 (31)
What about the correlation between the two processes?
Let’s calculate
E dx1t dx2t =
µ1dt + σ1dzt1 µ2dt + σ2 ρ1,2dzt1 + 1 − ρ21,2dwt =
σ1σ2ρ1,2dt (32)
and therefore
E dx1t dx2t
= ρ1,2 (33)
σ1σ2dt
which means that the two processes, dx2t = µ2dt + σ2dzt2 and dx2t = µ2dt +
σ2 ρ1,2dzt1 + 1 − ρ21,2dwt are exactly equivalent in all respects.
This also means that, to simulate two correlated Gaussian variables, I can draw
two independent Gaussian variates, and then conjoin them using the lower-
triangular matrix, C,
1 0
C= (34)
ρ1,2 1 − ρ21,2
So, we have
dx1t µ1 σ1 0 1 0 dzt1
= dt + (35)
dx2t µ2 0 σ2 ρ1,2 1 − ρ21,2 dwt
(Please check that it all works out correctly).
How do we generalize this inspired guess to more than two variables?
For three variables you can convince yourselves that we have

1 0 0
ρ1,2 1 − ρ21,2 0 (36)
ρ1,3 c d
with
ρ2,3
c=
1 − ρ21,2
and
2
2 ρ2,3 2=1
ρ1,3 + + d (37)
1 − ρ21,2
Generalize.
You have just derived the Cholesky decomposition. If you had been born a
century ago, you would now be famous now.
11 Comparisons of VaRs
Now, here is some thinking for you.
Taking into account different portfolios, different resources at your disposal,

different amounts of time to deliver an answer, and different degrees of precision
required, please state your preferred method of choice to calculate the VaR
statistic.
12 Dealing with Large and Complex Portfolios
So far we have looked at a super-simple portfolio, which is great for paedagogical

purposes, but hardly reveals all the difficulties we have to face when estimating
the joint distribution of a very large and complex portfolio.
For a large financial institution, a market portfolio may be made up of thou-

sands, tens of thousand and even hundred of thousand of risk factors.
Employing a brute-force approach would not only be extremely difficult; it

would also be meaningless. Why this is the case we will explain in detail in the
next lecture. (Ultimately, it is because we would need to estimate hundred of
thousands of parameters from thousands of data points.) We have to be clever.
One way to be clever is to recognize that changes in the prices of many assets
are strongly correlated (think, for instance, of the changes in prices of bonds of
the same currency).
If this is the case, they should be describable in terms of a (much) smaller

number of factors. How much smaller, depends on the level of granularity we
deem necessary to achieve in the description of our portfolio.
One of the most popular ways to identify the ‘driving factors’ is Principal Com-
ponent Analysis. This is what it’s all about.
13 What Are Principal Components?
13.1 The Axis Rotation
Principal Component Analysis is really important in the modelling of correlated

variables. I will therefore spend some time discussing the two distinct functions
that this mathematical technique fulfills — both in general, and in term-structure
modelling applications.
Unfortunately, these distinct roles are frequently confused.

The first reason why principal components are used is that they allow a signifi-
cant reduction of redundancy. This reduction is obtained by forming combina-
tions of the original variables to create new variables that behave in a simpler
and more ‘informative’ manner.
(What ‘redundancy’ and ‘informative’ mean will become apparent in the fol-
lowing).
The second reason for using principal components is the removal of noise.
To understand both of these aspects of PCA, consider for instance a yield curve,
described by a large number, N, of yields, yi, i = 1, ...N.
Suppose that a modeller records exactly the daily changes in all the yields. She
diligently does so day after day.
By doing so, she will have a perfectly faithful record of the changes that the
(discrete, but very-finely-sampled) yield curve has experienced on those days.
After some time (say, a few months), she cannot help forming the impression
that she has been wasting a lot of her time. Take for instance her birthday, the
3rd April: on that day she recorded the following changes (in basis points) for
the yields of maturities from 24 to 29 years:
 
y24 3.1
 
 y25 3.2 
 
 y26 3.1 
 . (38)
 y27 3.3 
 
 
 y28 3.3 
y29 3.4
For all her efforts, she can’t help feeling that if she had recorded that all the
yields of maturity from 24 to 29 years moved by 3.22 basis points (the average
of the recorded values) she would have saved herself a lot of time, and lost
little information.
Then the researcher decides to look at the data a bit more carefully, and she
plots the changes against yield maturity. See Fig (3).
Perhaps, by squinting hard, one can see an upward trend in the data. The
change in the 26-year yield, however, does not fit well with the trend story.
Perhaps it was a noisy data point.
Being a conscientious modeller, the researcher runs a regression, and finds an
R2 of almost 80%.
On this very small sample, the slope coefficient (ie, the coefficient b in the
regression a + bT ) has a t statistic of 3.75, and therefore it is likely that there
is indeed an upward trend in the data.
Short of transcendental meditation, with one single record there is not much
more that the researcher can do.
However, if she has access to the equivalent changes for many days, she can
do something better: she can construct a matrix of (empirical) covariances in
yield changes and study this quantity.
A covariance matrix is in itself a synthetic description of a much larger quantity

of data, and therefore in general entails a considerable loss of information when
compared to all the original data; however, it is still a rather unwieldy beast:
2
for 30 yields, the matrix will contain 30 2−30 + 30 distinct elements.
Figure 3: The recorded changes in the yields of maturity from 24 to 25 years,
the associate regression line, with its intercept (1.719), slope (0.0571) and R2
(0.7792)
Exercise 7 Why is the number of distinct elements of an n × n covariance
2
matrix equal to n 2−n + n?
Can she do better than staring at (or analyzing) this rather forbidding square
matrix in its entirety?
This is where Principal Component Analysis offers some help. With this tech-
nique one creates some clever linear combinations, {x}, of the original data:
xi = cij yj . (39)
j=1.30
These linear combinations have an important property, best understood if we

restrict the attention to 2 yields only. See Fig (4), which records the daily
changes in the two of the yields in question.
Every point corresponds to a realization of the change in the first yield (whose
numerical value can be read as the x-coordinate of that point), and a realization
of the change in the second yield (whose numerical value can be read as the
y-coordinate of that same point).
On the two axes, x and y, we have projected the cloud of points, to obtain two
marginal distributions. We note that the variance of the changes in yield 2 are
a bit less than the variance of the the changes in yield 1, but, apart from that,
the two marginal distributions look quite similar.
Let’s now rotate the axes in a skillful way, as we have done in Fig (??), and let’s
project the scatter plot again in the direction of the two new axes, to obtain
the new marginal distributions.
We stress that we are still dealing exactly with the same data, and only the
axes have changed (they have been rotated). Note also that we have spoken
of a rotation of axes (no stretching).
Therefore the matrix (the set of numbers cij in Equation (39)) that carries out
the transformation to the new clever variables (the {x}) is a rotation matrix,
with a determinant equal to +1.
Figure 4: The daily changes in the two yields. Every point corresponds to a
realization of the change in the first yield (given by the x-coordinate of that
point), and a realization of the change in the second yield (given by the y-
After this length-preserving transformation the two marginal distributions look
very different:
• the variation along the new first axis, x1, clearly shows that along this new
direction is where ‘all the action is’;
• the variation along the second axis has now become a side show.
What do the transformed variables look like?
In this simple case they may be given by a transformation like the following:
1 1
dx1 = √ dy1 + √ dy2 (40)
2 2
1 1
dx2 = √ dy1 − √ dy2. (41)
2 2
Note that the two new variables, dx1 and dx2, are made up by the original
variables multiplied by some coefficients.
Figure 5: The same data as in Fig OrigData, but after rotating the axes. Also
in this case, the data have been projected along the two axes to obtain the
marginal distributions. Note the very different variances of the two marginal
Looking at the coefficients, we can see that the first variable turns out being
proportional to the average move in the yield curve.
If, as the modeller indeed observed in the example above, a parallel move in
the yields is the dominant mode of deformation of the yield curve, the change
in the first variable (the first principal component) will convey most of the
information about how all the yield curve moved on any day.
One number (the change in x1) will convey almost as much information as 2
numbers (the changes in y1 and y2).
This is what we meant when we said that the new variables are more informative:
if we look at both principal components we get an equivalent transformation
of the original data; but if decide to keep only one principal component, in the
case at hand the first new variable contains the ‘essence’ of the yield curve
deformation.
Note also that if the take the sum of the squares of the coefficients, and we
add either along rows or columns, we always obtain 1: again, this is because
the transformation c is just a rotation.
13.2 The Signal and the Noise
What about the noise? Now the modeller has to decide whether the much
narrower variation in the second variable conveys (less-important, but still-
significant) information, or is just an effect of measurement noise.
If she decides that it is noise, then in this simple example all the signal in the
data is captured by the change in the first principal component. If, beside the
parallel-move signal there is no additional information, by neglecting the second
principal component all she is throwing away is measurement error.
If she decides instead that the second transformed variable (the second principal
component) conveys less-important, but nonetheless-still-valuable, information
she can make a modelling choice as to whether keeping the second variable is
essential for her study or not.
In the case of two yields, all of this seems rather underwhelming.

However, if we have, say, 30 yields, the picture changes dramatically.
If the variation in the original data is indeed captured well by two or three new
variables, then we have obtained a dramatic reduction in dimensionality.
Now, for two variables only, there are no advantages, other than aesthetic
ones, in using the two original variables, or the two transformed variables (the
principal components.)
On the other hand, if we use only one transformed variable we throw away both
the noise and whatever information may have been contained in the original
data.
With two variables, it’s a tough and invidious choice.
But with many variables we can do something more nuanced: we can split
the decision of how many variables really matter for our application from the
decision of what is signal and what is unwanted noise.
For instance, we may ascertain that three principal component do contain in-
formation, but, perhaps to keep our model simple, we may decide that we will
work only with two.
When we do this drastic culling we do two things at the same time: first, we
clean the data from noise; second, we achieve a more synthetic, albeit less-
complete, description of the original signal.
Where do we place the noise boundary when we have 30 variables? There is

no hard-and-fast rule.
We can look at the variance of the marginal distribution of the nth transformed
variable, obtained by projecting the 30-dimensional scatter plot along one of
the 30 orthogonal axes (easy to say, less easy to draw — hence the choice of
two variables for my artistic efforts in Figs (4) and (??) above.
Michelangelo could have handled three variables, but only as a ceiling fresco.)
In practice, when the variance becomes ‘too small’ we say that either we do
not care, or it is noise.
This, however, is not the full story: more about this at the end of the lecture.
14 First Conclusions about Principal Components
Let’s summarize our reasoning.
First of all, PCA works well when the signal-to-noise ratio is high. Whether
this is true or not depends on the application at hand.
For instance, in his good tutorial on PCA, Shlenbs (2009) says: ‘Measurement
noise in any data set must be low or else, no matter the analysis technique, no
information about a signal can be extracted’.
However, wishful thinking should not cloud our appreciation of what is signal
and what is noise.
It is not true that, when the variance of a principal component (of one of the
transformed variables) is small we are safe in assuming that we are dealing with
noise.
Whether we are not depends on the nature of the phenomenon at hand and
on the quality of our data collection, not on the size of the variance. This
often-forgotten fact should be engraved on the keyboard of any statistician
and econometrician.
Second, PCA works well when there is a lot of redundancy in the original data.
For linear systems, this means that it works well when the original data are
strongly correlated (as yields are). What does ‘redundancy’ mean? With high
redundancy, if you tell me the realization of one variable (say, the change in
the 26-year yield), I have a sporting chance of guessing pretty accurately the
realization of another variable (say, the change in the 27-year yield).
The variable transformation that PCA carries out creates variables with no
redundancy, in the sense that knowledge of the realization of one variable (say,
the level of the yield curve) makes one none the wiser as to the realization of
the second variable (say, it slope). See again Fig (??).
Third, if the second new variable (principal component) is less important than
the first (and the third than the second, and so on), then one can ‘throw’ away
these higher principal components, even if they are not noise, and lose little in
the accuracy of the description.
What does it mean that a transformed variable is ‘less important than another’ ?
If all variables impact the something we are interested in with a similar ‘elastic-
ity’, then a variable that varies less (which has a smaller variance) will be less
important — simply by virtue of the fact that, despite having a similar impact
as the first ‘per unit shock’, it varies over a more limited range of values.
Look again at Equations (40) and (41): a unit change in either variable pro-
duces a change in the yield curve which is just as large. The ‘elasticity’ of the
quantity we are interested in (the magnitude of the deformation∗ of the yield
curve) is the same for the two principal components.
∗ We can use as a proxy of the deformation of the yield curve the sum of the absolute values
of the changes in the various yields.
However, empirically we find that the second principal component varies a lot
less than the first (see Fig (??)), and therefore it is not reasonable to assign a
unit change to both.
It is only in this sense that, for the application at hand, one variable matters
more than the other.
Note again however, that ‘importance’ is intimately linked to the application at

hand. The first principal component is what matters most if we are interested
in how the yield curve as a whole moves.
However, as we shall see, the second principal component matters more than
the first when it comes to determining risk premia and the market price of risk.
Despite the fact that the second principal component varies far less than the
second, in the case of the market price of risk if we throw the slope away we
throw away the signal.
The variance of a variable is a good indicator of its importance only if all the
variables act on what we are interested in on an equal footing.
This is important, but also often forgotten.

15 Some Mathematical Results*
In this section we present without proof some of fundamental results about

PCA. The student is referred to [...] for the derivations and the gory details.
Take a real symmetric matrix, A, of dimensions [n × n] and of rank r. Saying

that a matrix is of rank r means that r of the column vectors from which it is
built up are linearly independent. In our applications we will almost invariably
consider matrices of full rank, ie, such that r = n.
Any real symmetric matrix, A, can be orthogonalized. This means that it is

always possible to re-express A as
A = V ΛV T, (42)
where Λ is a diagonal matrix that contains the r distinct eigenvalues, λi, of A:
Λ = diag [λi] (43)

and V is an orthogonal matrix, ie, a matrix such that its transpose is equal to
its inverse.
V T = V −1. (44)
The matrix V is made up of r distinct and orthogonal and normalized (ortho-
normal) eigenvectors, −
→
v i:
 
↑ ↑ ↑
V = −
 →
v1 − →
v 2 ... − → 
v r . (45)
↓ ↓ ↓
The fact that each vector, −
→
v i, is an eigenvector means that
A−
→
v i = λi−
→
v i, i = 1, 2, ...r (46)
ie, when the original matrix, A, operates on the eigenvector, −
→
v i, it returns
the same eigenvector back, multiplied by a constant. This constant is the ith
eigenvalue.
Saying that the eigenvectors are all orthogonal to each other means that
−
→
vT →
− if i = j (47)
j vi=0
Saying that they are normalized means that
−
→
vT →
− (48)
i v i = 1.
(Note carefully that the expression − →
vTj indicates an [1 × r] row vector, which,
when multiplied using the using matrix multiplication rules by the [r × 1] col-
umn vector, −→v j , gives a scalar. The operation is also known as a contraction,
or an inner product. If this is not clear, please see Chapter 2 again, because
this notation will recur over and over again.)
Consider the case now when the matrix A is the covariance matrix between n
variables yi. The transformed variables, ie the principal components, xi, are
given by
T
xi = vij yj (49)
j=1,n
or, in matrix notation,
−
→
x = V T−
→
y. (50)
Of course, because of orthogonality,
−
→
y =V−
→
x. (51)
If the first eigenvector (the first column of V ) is made up by almost identical

elements, then the first principal component will be proportional to an average
of the original variables. See again Equation (40). It will describe the ‘level’ of
the original variables.
If the second eigenvector (the second column of V ) is made up by positive

(negative) elements in the top half and negative (positive) elements in the
second half, and all these elements become smaller as they approach the centre
of the vector, then the second principal component will be a good proxy for
the ‘slope’ of the original variables. See again Equation (41).
Finally, if a matrix is orthogonal, the absolute value of its determinant is 1.

If the determinant is +1, then this matrix, when regarded as an operator, is
a rotation: it produces no stretching of, and no reflection about, the original
axes.
16 An Aside: Why Do Regulators Insists on More
and More Granular ‘Mappings’
In the light of the discussion above, why do regulators believe that bigger is
always better, ie, that more and more mappings are an unalloyed good?
Surely they are aware of the objections about overparametrized models that we
raised in the previous section.
The problem is that banks often engage in relative-value trading.
This entails taking very large long and short positions in securities or indices
that are very similar, but that, for some reason or other, seem to have moved
out of their normal ‘trading range’.
Since the securities or indices in which the opposite and large p[ositions are
taken are very similar, and reasonable mapping to risk factors is likely to lump
them together — ie, to assume that they move exactly and not approximately
in lock-step.
But if they did, then the market risk from this positions would be exactly zero,
which it patently isn’t. (If it were, why would the large offsetting positions be
put in place to start with)?
So, regulators are keen to capture basis risk, and that is why they insist on the
granularity of the mappings.
Perhaps asking for more and more risk factors to be added to the pot is not
the best way to go about it (what would you do?).
However, if we accept this logic (ie, if we accept that the only way to deal with
basis risk is to capture all the securities that these pesky banks may use for
their relative-value trades), then we have to learn how to deal with a quadrillion
risk factors.
17 How PCs Can Help with Monte Carlo Simu-
lations*
In order to see how Principal Component Analysis can help us with the esti-
mation of the joint estimation of risk factors, suppose that we want to run a
Monte Carlo simulation similar in principle to the one we saw above when we
had a handful of securities.
The big problem is that now we have to deal with tens of thousands of risk
factors.
To sample directly from the associated hugely-dimensional distribution is not a

practical option.
Having learnt about the power of Principal Components, the obvious temptation
is to simulate the most important principal components associated with these
105 variables and draw many realizations for these (much fewer!) factors.
Simple in principle, but impossible in practice: first, because to extract the

principal components you still have to build the covariance matrix first, and
building a 105 × 105 covariance matrix brute force is a very bad idea. See
Denev (2016).
[...] [C]orrelations can be accounted for through the correlation/covariance
matrix between the constituents of a set of factors , after condition-
ing on the coarser factors . However, this approach is not without
problems. Suppose that we have 500 stocks in a portfolio. There
are 125, 250 parameters in the covariance matrix. High dimensional-
ity poses challenges to the estimation of matrix parameters, as small
element-wise estimation errors may result in huge errors matrix-wise.
In fact, the sample covariance matrix is ill-behaved in high-dimensions,
where the number of dimensions p is typically much larger than the
number of available samples n (n ≫ p). Here, the problem of covari-
ance estimation is ill-posed since the number of unknown parameters
is larger than the number of available samples, and the sample covari-
ance matrix becomes singular in this regime.
The second problem is that, even if we build such a monster covariance matrix,
we would still have to extract its principal components (diagonalize it), which,
given the size, would be an even worse idea.
We need to do something a bit cleverer.
There are many ways to try to get around this problem.
The details are different, but the general approach is similar. Our shot at
cleverness is via a hierarchical method that factorizes the dependencies present
in our portfolio. Here is how it is done.
18 Constructing a Hierarchy of Factors
So, the task ahead of us is the following: we want to shock thousands of risk
factors that give us the most granular description we can afford for our portfolio.
All the shocks must reflect what we know about the joint distribution of the
risk factors.
To explain how to do this, we start by describing a hierarchy of market factors.

18.1 The Top Factor Level — The Global Market Risk Fac-
tors
At the top level we want to capture Global Market Risk Factors — these could
be, for instance:
1. Global Equities Developed Markets (DM),
2. Global Fixed Income DM,
3. Global Credit DM,

4. Global Equities Emerging Markets (EM),
5. Mortgages DM
6. Global Fixed Income EM,
7. Global Credit EM,
8. Major Currencies DM
9. Major Currencies EM
10. ...
Take one Global Market Factor, say, the Global Fixed Income DM Risk Factor.
This should in some way tell us at a pretty coarse level about the behaviour of
Treasury bonds of all developed markets. This level of coarseness is not good
enough to shock our portfolio. We will have to drill down.
We are going to have a ‘reasonable’ number of Global Market Risk Factors —

where ‘reasonable’ means of the order of 10-20; not 3 and not 400.
18.2 The Middle Factor Level - The Components of Each
Global Market Factor
Let’s zoom in on one of the Global Market Factors: say, the Global Fixed
Income DM factor. This could be made up of
1. USD Treasury bonds
2. GBP Treasury bonds
3. EUR (Bunds) Treasury bonds

4. JPY Treasury bonds
5. Australian Treasury Bonds
6. Italian (BTP) Treasury bonds
7. ...
Again, we are going to have a ‘reasonable’ number of components (say, O (10))

for each Global Market Risk Factors.
We are going to carry out a decomposition into individual components of each

Global Market Risk Factor. Of course, they can all have a different number of
components. The idea is that each component will give a coarse description
of how a given important market behaves. For concreteness, let’s say that we
keep 20 components for the Global Fixed Income DM factor.
18.3 The Bottom Factor Level — The Components of the
Components.
Let’s zoom in further, and let’s look at one of the middle level factors, say, the
USD Treasury bonds.
At the bottom level we are going to look at all the important key-rate maturities
that characterize this market.
For instance, we may use 15 Nelson-Siegel yields, or perhaps an even more

granular description, say, 30 yields.
The point is that, if I told you the changes in these reference rates, you would
know ‘almost perfectly’ what all the yields (and hence all the bond prices) for
that currency have done. (On the run / off the run... basis risk).
Note that we call this the ‘bottom’ level , but it is bottom in the space of
factors. There is an even lower level below, but this is at security level — ie, at
the level of the individual positions in our portfolio.
Fig (6) shows a schematic representation of this hierarchy of factors, by dis-

playing some of the representative components of each block.
Now that we have defined the three factor ‘levels’, the question is how to relate
them. We start from the bottom and move upwards.
Figure 6: A schematic representation of the factor hierarchy for each level of
the cascade. At each level a few representative factors are shown, with their
connections to the more granular level ‘below’.
18.4 A Synthetic Description of the Bottom Factor Level
To fix ideas, suppose that the specific factor in the bottom factor level is USD
Treasury bonds. As we said, these could be 30 yields.
As a first step we record the changes in these reference rates, we construct a

variance-covariance matrix, and we extract the Principal Components as de-
scribed above.
The beauty of PCs is that we can retain a relatively small number of them and
we can‘reconstruct’ very accurately the changes in all the original variables.
Let’s say that we retain the first 4 principal components. So, we construct time
series of the first 4 principal components (of changes) for USD Treasuries.
18.5 Moving back up to the Middle Level
These 4 principal components are sent to the middle level to ‘represent’ the
‘USD Treasury bonds’ middle-level factor. There will also be, say, 3 principal
components received from the bottom level to ‘represent’ the ‘GBP Treasury
bonds’ factor.
And so on for all the national markets that make up the top level Global Fixed
Income DM factor.
If we have, say, 10 regional markets and on average 4 principal components
for each we do not want to represent at the top level the factor Global Fixed
Income DM by 40 variables.
So, we look at the covariance matrix among these 40 variables (a covariance

matrix which will be again block diagonal), and extract a smallish number of
principal components — say, 5.
18.6 Reaching the Top Level
Now at the top level we are going to receive 5 principal components from Global
Fixed Income DM, 5 from Global Equities DM, etc. We are going to have, at
this top level, some 102 principal components.
Now, we can handle 102 random variables — by which I mean that we can
build a meaningful covariance matrix among them and that, perhaps using the
techniques, that we shall discuss in Lecture 3, we may be able to can run a
reasonable Monte Carlo simulations to obtain many realizations.
We may therefore stop reducing the dimensionality of the problem — ie, we

may stop at this point extracting principal components—, but let’s suppose
that we repeat the same procedure and calculate, say, 20 top-level principal
components.
Remember that what we want to achieve is to draw from the possibly thousands
of bottom level factors (30 for US Treasuries, 30 for UK Gilts, ..., 25 for
S&P500, 20 for the FTSE250, ..., etc). This is how we can do this.
18.7 The Cascade in Reverse
We are going to proceed as follows:
1. we draw from the 20-dimensdional distribution of Global Market Risk Fac-

tors; by definition (and by our choice of retaining 20) this means that we
can recover (approximately, but accurately) 102 middle level, and ‘send
them down’ to the middle level;
2. these middle-level variables are associated with different asset classes.
So, for instance, the first 4 were associated with (were the first 4 PCs of)
the Global Fixed Income DM factor; the second 5 were associated with
(were the first 5 PCs of) the Global Equity DM factor; etc;
3. we therefore take the first 4 elements of the 102-element vector created
by the 20-dimensional draw at the top level, and use them to recover
(approximately, but accurately) the, say, 28 elements of the Global Fixed
Index factor; we can then take the next, say, 5 elements of the 102-element
vector created by the 20-dimensional draw at the top level, and use them
to recover (approximately, but accurately) the, say, 30 elements of the
Global Equity factor; and so on and so forth.
4. we now move to the bottom factor level: we know that the first four of
these 28 factors pertained to the USD Treasury market, and with these
four factors we can recover (approximately, but accurately) all the changes
in the 30 key rates that we decided described as granularly as we can afford
our portfolio.
From these we can obtain the changes in the security-level positions in our
portfolio.
As far as US Treasury bonds are concerned, we are done!

A word of caution.
We have managed to create changes in a staggering numbering of (corre-

lated) security prices. Let’s not be fooled, however: if we constructed a
250, 000 × 250, 000 covariance matrix from the changes in these quarter-of-
a-million variables, and extracted the associated PCs, we would still have only
20 independent PCs (20 non-zero eigenvalues, the rank of the matrix would be
20).
We can expand 20 to, say, 30, perhaps to 50, at a stretch perhaps to O 102 ,
but certainly not to 250, 000. Anybody who believes herself capable of doing
so with the (relevant!) financial data at her disposal is in a profound state of
sin.
Is this bad? Modelling means simplifying and pruning.
One-hundred-ish explanatory variables is still very generous. The regulators may

clamor for more, because they fear the basis risk in the relative-value trades.
What can be done?
19 Dealing with Basis Risk?
How can we know if, by our repeated culling, we have neglected important
sources of risk — perhaps those coming from the relative-value trades that
banks like so much?
P&L attribution.

Lecture 7 - Obtaining The Distribution of Risk Factor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7 - Obtaining The Distribution of Risk Factor

Uploaded by

Copyright:

Available Formats

Market Risk Measurement

• High-Dimensional Monte Carlo,

As we discussed in the previous lecture, in order to get these measures, we need

This is a univariate distribution.

So we need the changes in value of these individual positions.

What is not deterministic is the vector of changes in the risk factors. So we

Saying that we need a statistical model means (implicitly or explicitly) that we

We will discuss three ways to get there:

1. the parametric approach,

2. the historical method and

3. the Monte Carlo simulation model.

As we shall see, they all have their pros and cons.

All the correct deﬁnitions are equivalent.

Not surprisingly, the non-correct ones are not.

Perhaps it is useful to start from the most common non-correct deﬁnition —

What is wrong with this deﬁnition?

Here are good deﬁnitions of VaR.

This deﬁnition is intuitive and precise, but it is not ‘constructive’: it tells me

Consider the N-day VaR at the X percentile level.

See Fig (var1).

Instead of the probability density, consider the cumulative distribution of N-day

Now I record losses, L, as a positive number.

So, here are two precise and constructive deﬁnitions of VaR.

We have written it in full, because one often ﬁnds it as a deﬁnition of VaR.

VaR does not enjoy sub-additivity. Here is an example (from Hull).

We can read it directly from Fig (1): it is $1m.

This doesn’t look good and, in theory, it isn’t.

However, in practical applications with realistic portfolios it is very rare to ﬁnd

What can we do instead?

An obvious solution to this problem is the following.

Why are we giving equal weight to all these losses?

Because we know that

ie, we know that VaR is the inverse of the loss distribution.

5 When VaR (and CVaR) Make Sense

As deﬁnitions go, what we have provided so far makes perfect sense.

Consider the 1-day, 95th percentile VaR.

All of a sudden our data set looks terribly skimpy.

Assuming that 20 years ago we had an unusual degree of foresight, and we

However, are those ancient data relevant to today’s conditions?

Chairman Volker raised rates above 15% to smother inﬂation.

The past is indeed a foreign country.

So, the plan of the work ahead is as follows:

• next we will use a naive historical simulation method;

• last, we will look at a similarly naive Monte Carlo simulations.

As a prelude to that, we consider the variance of our portfolio — we shall see in

A change in this portfolio will be given by

The expected change in the portfolio, E [∆Π] ≡ ∆Π, is given by

We know that it will be given by

var (∆Π) = E ∆Π2 − E [∆Π]2 . (11)

Let’s calculate the two terms, starting from the ﬁrst:

E ∆Π2 = E (a∆x + b∆y)2 = E a2∆x2 + 2ab∆x∆y + b2∆y 2 =

a2E ∆x2 + 2abE [∆x∆y] + b2E ∆y2 (12)

Set for ease of notation

∆x = E [∆x] ; ∆y = E [∆y] (14)

2ab {E [∆x∆y] − E [∆x] E [∆y]} +

var (∆x) ≡ σ2∆x

var (∆y) ≡ σ2∆y

covar (∆x∆y) = σ∆xσ∆xρ∆x,∆y

Then we have that the variance of the portfolio changes is given by

var (∆Π) = a2σ2∆x + σ ∆xσ∆xρ∆x,∆y + b2σ 2∆y ,

ωiωj σiσ j ρij =

Of course, its standard deviation, stdev (∆Π), is then just given by

We shall see in a moment why calculating the standard deviation of a portfolio