Professional Documents
Culture Documents
Lecture 2
Obtaining the Distribution of Risk Factors
Riccardo Rebonato
Key Words and Concepts:
• Monte Carlo,
• Historical Simulation,
• Normal Approximation,
• VaR,
• Market Factors.
1 Plan of the Lecture
Let’s work backwards from our ultimate goal — which is to obtain some reason-
able measures of the market risk of our portfolio.
The changes in value of these individual positions are given by a mapping from
the changes in the risk factors that affect our portfolio to the P&L of each
position.
The mapping from the changes in risk factors to the P&L in the individual
positions may be complex (and approximate), but it is deterministic.
As we shall see, if we want to use our risk measures for regulatory purposes,
we will have to contend with a very large number of risk factors — where by ‘a
very large number’ I really mean ‘a very large number’: they could be of the
order of a quarter of a million.
This poses practical problems, that we will also discuss in the next lectures.
So, in this lecture we are going to deal with a very small number of risk factors
in order to understand the mechanics of the what we have to do.
In order to see the whole process soup to nuts, we will carry the treatment all
the way to obtaining the most popular risk measure (VaR) with baby imple-
mentations of the three methods mentioned above.
The versions of the historical, parametric and Monte Carlo methods we intro-
duce in this lecture mainly have a pedagogical purpose, and will not be good
enough to use in real life, but we will refine the three techniques as we proceed
with the course.
2 Defining VaR (and Expected Shortfall)
Since we are going to compute Value at Risk (VaR) in three different ways, it
seems a good idea to define first exactly what it is. There are many definitions
of VaR.
(The give-away that this definition cannot be true is that there is no mention
of the percentile level.)
This is better, but it is still not totally clear what ‘being X percent certain’
means.
One senior bank manager I knew thought that the higher the percentile, the
more sure we were.
Does it make sense that we can be more sure about 1-in-10,000-years events
than in 1-in-10-days events?
This certainly not what John Hull means, but someone could get confused. So,
here is another definition (which I prefer).
Definition 2 The N-day VaR at the X percentile level is the loss that a port-
folio will on average exceed only in (100-X) N-day periods out of 100.
Can we do better?
Let’s build a probability density of N-day profits and losses.
The area under the density curve is clearly equal to 1 (because probability
densities are normalized to 1.)
Look up on the x axis the loss value such that (100-X)/100 of the area under
the density curve is to the left of it.
So, since Equation (1) is the definition of a percentile, VaR itself is just a
percentile.
Definition 4 The N-day VaR at the Xth percentile level is the smallest num-
ber, x, (the smallest loss) such that ΦL (x) ≥ X):
X
V aRN-Day = inf (x : ΦL (x) ≥ X) . (2)
Of course, what we have just given is simply the definition of a percentile.
In this section I am not asking whether VaR has been used properly to manage
financial risk.
I am asking the simpler question of whether VaR is a good risk measure, where
‘risk measure’ has the precise meaning we discussed in Lecture 1.
Let’s start from something about which a lot of ink has been spilled.
The success or the failure of each project is independent of the success or failure
of the other.
Each project has a probability of 0.02 of losing $10m and a probability of 0.98
of losing $1m.
(They can’t be great projects, since they only seem to lose, but let’s not worry
about this.)
What is the 97.5-percentile VaR for each project separately?
Given independence, we have the following possible outcomes, with their prob-
abilities:
Prob Loss ($m)
0.980 × 0.980 = 0.9604 1
2 × (0.02 × 0.980) = 0.0392 11
0.02 × 0.02 = 0.004 20
Figure 1:
This is the associated cumulative probability function (Fig (2)):As you can
immediately read from the figure the VaR is now $11m.
So, the 97th percentile VaR from putting the two portfolios together is (way)
larger than the sum of the VaRs from the two single-project portfolios.
In Fig (1), for instance, the 97.5-percentile VaR would have remained $1m even
if the worst loss had been $1b, instead of $10m.
We could average all the losses in the tail past the VaR level, weighted by their
probability of occurrence.
This means that we could compute a new quantity (called Conditional Ex-
pected Shortfall, or, sometimes, CVaR, or Average VaR) as follows
+∞
V aRX xϕL (x) xdx
CES = +∞ (3)
V aRX ϕL (x) dx
where V aRX is the Xpercentile VaR and
x
ϕL (u) du = ΦL (x) (4)
−∞
So, the Conditional Expected Shortfall is the expected loss, conditional on
the loss being greater or equal to VaR, ie, to the X percentile of the loss
distribution.
From this it follows that another way to write this is the following
1 1
CES = V aRxdx (5)
1−X X
To see the equivalence, start from the denominator (which is easier), and add
V aR
and subtract the same quantity, −∞ X ϕL (x) dx.
Then we have:
+∞
ϕL (x) dx =
V aRX
V aRX +∞ V aRX
ϕL (x) dx + ϕL (x) dx − ϕL (x) dx =
−∞ V aRX −∞
1 ΦL(V aRX )
1 − ΦL (V aRX ) = 1 − ΦL Φ−1
L (X) = 1 − X (6)
where we have made use of the fact that V aRX is the Xth percentile of the
cumulative distribution of losses, ΦL (x), and that the father of the son of
Johnny is Johnny.
So we have established that
+∞
ϕL (x) dx = 1 − X. (7)
V aRX
1 in the expression
This takes care of the denominator, ie of the term 1−X
1 1
CES = V aRxdx (8)
1−X X
1 V aR dx.
For the numerator, consider the expression X x
Here we are adding up all the quantities V aRu, which are just the losses at
the u confidence level, for all confidence levels from X to 1.
So, we are adding up all the losses in the tail beyond V aRX .
Note that this is an equal-weight average.
And we shall see in the next lectures that, for any distribution function, the
values of the inverse of that distribution are uniformly distributed.
This means that if I draw from any distribution lots and lots of random variates,
and I calculate the cumulative distribution corresponding to all these different
random values, the resulting quantities are uniformly distributed. [Drawing on
whiteboard]
However, when does it make sense in practice to carry out these calculations?
This is the loss that should be exceeded only on 5 business day out of 100.
Since 100 business days, give or take a few days, are approximately 4 calendar
months, this means that, if we have one years’ worth of data, we should expect
between 10 and 15 ‘exceptions’ (ie, losses greater than the VaR) per year.
Suppose we have 4 years’ worth of data.
This means approximately 1,000 business days, and therefore about 50 excep-
tions.
We will look at the precise statistical estimates in a future lecture, but it seems
plausible that estimating a loss that should be exceeded once a month with 4
years’ worth of data should not be an impossible task.
Consider now the 99th percentile, 10-day VaR.
Now we are talking about losses that should be exceeded once every one hun-
dred blocks of 10-day periods. This means 1,000 business days (as we saw,
approximately 4 years).
From the statistical point of view things look a bit better: we can now expect
5 exceptions.
(By the way, this percentile and this holding period have not been plucked
out of thin air: these are exactly the parameters required by the regulators to
calculate capital.)
In September 1988 (ie, twenty years before), the world looked very different:
men were sporting amazing hair-dos, women had jackets with 10-inch-padded
shoulders, and, more to the point, CDO of ABSs had not been invented yet.
The point here is not just one of data availability, but also one of data relevance:
for a risk measure such as VaR to be useful, it must be extracted from a
conditional distribution of risk factors (ie, the one that applies to today ), not
from a long-run unconditional one.
The further back I look in the future, the less relevant to today’s conditioons
those data points are.
Recent data are relevant but few; ancient data (if they have been kept) are
plentiful, but less and less relevant the further back we look.
And it is not just that some instruments had not been invented in the days of
Saturday Night Fever.
The macrofinancial conditions were radically different: in the 1970s the big
monster to slay was rampant inflation.
And the first mortgage I took out in 1990 had a price tag of 17% interest per
annum.
In the crazy days before the crisis, the VaR inflation was out of control: grown-
ups could speak with a straight face of the 99.975th percentile at the 1-year
horizon. If I had more time, I would tell you how that crazy percentile was
arrived at, but you get my point.
This does not mean that VaR & Co are useless.
If used properly, they are very valuable measures, and it is worthwhile investing a
lot of resources (people, computers, databases) and a lot of time to understand
them, and calculate them.
However, please always ask yourself if what you are calculating makes sense,
and, if it doesn’t quite, what you can do instead.
6 Three Simple VaR Calculations
In this part of the lecture we assume that we have made ourselves happy that
what we are trying to estimate is a meaningful risk measure, and we set our
doubt to one side.
We are therefore going to compute VaR, not in one, but in three different ways:
analytically, using historical realizations, and via a simple-minded Monte Carlo
simulation.
We are going to use a simple portfolio, that we describe below.
With such a simple portfolio many of the problems one encounters in real life
(when one has to deal with tens of thousand of risk factors) are swept under
the carpet.
In the following lectures we will learn how we can make much better the crude
estimates we learn to perform today.
Why not doing it right the first time? Because we have no hope of understand-
ing how to handle these tough problems until we get our head around solving
the simple ones.
Let’s consider the simplest non-trivial portfolio, Π, we can think of, ie, a port-
folio made up of a holdings of security x and b holdings of security y:
Π = ax + by
2
a2∆x + 2ab∆x ∆y + b2∆y 2. (13)
Subtracting the first term from the second and rearranging terms we get
var (∆Π) =
2 2
a2E ∆x2 + 2abE [∆x∆y] + b2E ∆y 2 − a2∆x + 2ab∆x ∆y + b2∆y =
E[∆Π2] E[∆Π]2
a2 E ∆x2 − E [∆x]2 +
var(∆x)
b2 E ∆y 2 − E [∆y]2 .
var(∆y)
We can write this in an even more compact manner by defining
Since we do not want to run out of letters, we are going to denote the n
securities by x1, x2, ..., xn.
To carry out this generalization, note first that the covariance matrix of the
securities can be written as
cov ∆xi∆xj = σiσ j ρij (15)
σ1 0 ... 0 1 ρ12 ... ρ1n σ1 0 ... 0
0 σ2 ... 0 ρ21 1 ... ρ2n 0 σ2 ... 0
... ... ... .. ... ... ... .. ... ... ... ..
0 0 ... σn ρn1 ρn2 ... 1 0 0 ... σn
(16)
(If it is not obvious, please do check that the matrix expression does give the
covariance matrix, as it should.)
Then the variance of the portfolio (which, of course, is just a number), is given
by
var (∆Π) =
var (∆Π) = −
→
ω T cov ∆xi∆xj →
−
ω. (18)
stdev (∆Π) = −
→
ω T cov ∆xi∆xj →
−
ω. (19)
The results we have obtained so far are general, in the sense that we have not
invoked any distributional properties for the underlying variables.
Then we have just found the distribution of the portfolio: it is also normal
(because the sum of normal variables is a normal variable), with a standard
deviation given by stdev (∆Π) = − →ω T cov ∆x ∆x −
i j
→
ω.
And why is this interesting?
Because if a variable (in this case the changes in value of our portfolio) is
normally distributed and we know its variance, then we can work out all its
percentiles.
See below.
Note that in arriving at this result we have clearly stated one assumption (that
all the securities should be jointly normally distributed), but we have swept
another under the carpet.
The unstated assumption is that the link between the changes in the quantities
{∆x} — which we have called ‘securities, but which will in general be ‘risk
factors’ — and the changes in the positions will be linear.
If the portfolio contains option-like derivatives this will be a poor approximation.
There are some (rather unsatisfactory) approximate ways around the problem,
but we will not go into that because the analytic route to calculating VaR is
useful as a benchmark, but is rarely used in practice.
8.1 Description of the Portfolio
With this result under the belt we can begin the calculations of VaR proper.
First, however, we must describe our portfolio. We stress that this is a baby
portfolio, and that realistic applications will be orders of magnitude larger (if
not more complex).
Asset Sensitivity hi
S&P500 128, 000
FTSE100 −72, 000
DAX −40, 000 (20)
Swap10y 800, 000
Swap5y 600, 000
Swap2y 200, 000
The changes in the underlying time series and the sensitivities are related to
the changes in the P&L by
where xki is the value of the financial time series k on day i. Note that we are
using absolute, not percentage, returns to obtain the changes in the P&L.
8.2 The Portfolio VaR under the Normal Approximation
If we are happy to enforce the normal distribution for the changes in the risk
factors, then there is no reason not to use the analytical VaR formula for the
standard deviation of the changes in value of the portfolio, stdev (∆Π) ≡ σΠ.
For the portfolio described above, this gives the following (rounded) values for
the VaR at percentiles levels from 95 to 99.5:
Percentile VaR
95.0 −17, 509, 000
95.5 −18, 047, 000
96.0 −18, 636, 000
96.5 −19, 288, 000
97.0 −20, 021, 000 (24)
97.5 −20, 864, 000
98.0 −21, 862, 000
98.5 −23, 101, 000
99.0 −24, 764, 000
99.5 −27, 420, 000
These values will be profitably used in what follows as a comparison benchmark.
9 The Historical Simulation Approach
With the naive historical simulation approach, we simply take the returns as
they happened on all of the 2,736 days of our data set, and, from the sensitivities
in Equation (20) we calculate the changes in P&L:
Note that, by doing so, we have obtained a univariate distribution (the distri-
bution of P&Ls) from a multi-variate distribution of risk factors.
In this baby case, ‘multi’ meant six; in real applications it may mean 104 or
105.
Once we have obtained all the 2,736 P&Ls, we sort all these changes from the
largest loss to the largest profit.
Each of these realizations has the same probability of occurring because, in this
simple application, we are giving the same weight to every observation (vector
of changes in risk factors), irrespective of whether it happened yesterday or 10
years ago.
So we are saying that the returns are both independent and identically distrib-
uted.
(This, by the way, was the same assumption we were making when we calculated
the analytical Gaussian VaR. Where did both assumptions sneak in?)
Since every sorted P&L realization has the same probability, building the em-
pirical cumulative distribution is super simple:
• the probability of getting anything worse than the worse loss is zero;
• the probability of getting a loss smaller or equal to the worse loss is n1 (with
n = 2, 736 in this case);
• the probability of getting a loss smaller or equal to the second worse loss
is n2 ; the probability of getting a loss smaller or equal to the third worse
loss is n3 ; (you see where I am going...); ...;
The associated VaR is just the appropriate percentile, ie, the value of the loss
k of choice for the cumulative distribution.
corresponding to the value n
These are the VaR results for the portfolio under consideration obtained from
the Historical Simulation (HS) approach:
Percentile VaR HS
95.0 −17, 509, 000 −16, 698, 000
95.5 −18, 047, 000 −17, 536, 000
96.0 −18, 636, 000 −18, 457, 000
96.5 −19, 288, 000 −19, 166, 000
97.0 −20, 021, 000 −19, 770, 000 (26)
97.5 −20, 864, 000 −20, 827, 000
98.0 −21, 862, 000 −22, 308, 000
98.5 −23, 101, 000 −24, 100, 000
99.0 −24, 764, 000 −28, 441, 000
99.5 −27, 420, 000 −32, 794, 000
Note that the HS simulation gives considerably larger losses in the far left tail,
and smaller losses for modest percentiles. The cross-over point, for the portfolio
at hand is around the 97.5th percentile.
Can you explain why this cross-over must happen, if the two distribution have
different fatness of the tails (but the same variance)?
10 The Monte Carlo Approach
Also in the case of the Monte Carlo Method, for the moment we are going to
look at the simplest and most naive implementation.
This means that we are going to assume that the distribution of the risk factors
is jointly Gaussian.
It should be clear that, if the portfolio sensitivities are exactly linear, then
running an expensive Gaussian Monte Carlo to estimate VaR has absolutely no
advantage over calculating VaR using the analytical approach.
What we are trying to learn here are just the mechanics of a basic Monte
Carlo simulation, so that we can build on this knowledge for more realistic
applications.
Exercise 5 When could it be advantageous to use the Gaussian Monte Carlo
method rather than the analytical method?
Exercise 6 What could the advantage be in using the Monte Carlo method
over the historical simulation method if we managed (as we shall learn how to
do in the next lecture) to sample from the empirical distribution of risk factors?
Running a Monte Carlo simulation means carrying out numerically repeated
draws of a set of random variables from a given distribution.
The numerically produced draws must recover, of course, not only the properties
of the (six) individual marginal distributions, but also the codependence (the
6 × 6 correlation matrix, in our case) amongst the variables.
If a single variable if normally distributed with mean µ and variance σ 2, then,
clearly, a simulation of n draws, xi, from this variable can be obtained by doing
n times
xi = µ + σεi (27)
where εi is a numerically produced standard normal variable, returned, for
instance, by the MatLab function randn(), or by the Excel invocation Norm-
sinv(Rand())
(Why, by the way? If you don’t know the answer, see the next lecture.)
However, if we draw independent normal variates for the six risk factors these
will not have the desired correlation structure (they will be independent, to
within numerical noise).
with
E dwtdzt1 = 0 (30)
The variance of the second process has not been changed by the transformation:
2
E dx2t =
2
(σ2)2 ρ1,2 + 1 − ρ21,2 = (σ 2)2 (31)
What about the correlation between the two processes?
Let’s calculate
E dx1t dx2t =
σ1σ2ρ1,2dt (32)
and therefore
E dx1t dx2t
= ρ1,2 (33)
σ1σ2dt
which means that the two processes, dx2t = µ2dt + σ2dzt2 and dx2t = µ2dt +
σ2 ρ1,2dzt1 + 1 − ρ21,2dwt are exactly equivalent in all respects.
This also means that, to simulate two correlated Gaussian variables, I can draw
two independent Gaussian variates, and then conjoin them using the lower-
triangular matrix, C,
1 0
C= (34)
ρ1,2 1 − ρ21,2
So, we have
dx1t µ1 σ1 0 1 0 dzt1
= dt + (35)
dx2t µ2 0 σ2 ρ1,2 1 − ρ21,2 dwt
(Please check that it all works out correctly).
How do we generalize this inspired guess to more than two variables?
Generalize.
You have just derived the Cholesky decomposition. If you had been born a
century ago, you would now be famous now.
11 Comparisons of VaRs
One of the most popular ways to identify the ‘driving factors’ is Principal Com-
ponent Analysis. This is what it’s all about.
13 What Are Principal Components?
(What ‘redundancy’ and ‘informative’ mean will become apparent in the fol-
lowing).
The second reason for using principal components is the removal of noise.
To understand both of these aspects of PCA, consider for instance a yield curve,
described by a large number, N, of yields, yi, i = 1, ...N.
Suppose that a modeller records exactly the daily changes in all the yields. She
diligently does so day after day.
By doing so, she will have a perfectly faithful record of the changes that the
(discrete, but very-finely-sampled) yield curve has experienced on those days.
After some time (say, a few months), she cannot help forming the impression
that she has been wasting a lot of her time. Take for instance her birthday, the
3rd April: on that day she recorded the following changes (in basis points) for
the yields of maturities from 24 to 29 years:
y24 3.1
y25 3.2
y26 3.1
. (38)
y27 3.3
y28 3.3
y29 3.4
For all her efforts, she can’t help feeling that if she had recorded that all the
yields of maturity from 24 to 29 years moved by 3.22 basis points (the average
of the recorded values) she would have saved herself a lot of time, and lost
little information.
Then the researcher decides to look at the data a bit more carefully, and she
plots the changes against yield maturity. See Fig (3).
Perhaps, by squinting hard, one can see an upward trend in the data. The
change in the 26-year yield, however, does not fit well with the trend story.
Perhaps it was a noisy data point.
Being a conscientious modeller, the researcher runs a regression, and finds an
R2 of almost 80%.
On this very small sample, the slope coefficient (ie, the coefficient b in the
regression a + bT ) has a t statistic of 3.75, and therefore it is likely that there
is indeed an upward trend in the data.
Short of transcendental meditation, with one single record there is not much
more that the researcher can do.
However, if she has access to the equivalent changes for many days, she can
do something better: she can construct a matrix of (empirical) covariances in
yield changes and study this quantity.
This is where Principal Component Analysis offers some help. With this tech-
nique one creates some clever linear combinations, {x}, of the original data:
xi = cij yj . (39)
j=1.30
On the two axes, x and y, we have projected the cloud of points, to obtain two
marginal distributions. We note that the variance of the changes in yield 2 are
a bit less than the variance of the the changes in yield 1, but, apart from that,
the two marginal distributions look quite similar.
Let’s now rotate the axes in a skillful way, as we have done in Fig (??), and let’s
project the scatter plot again in the direction of the two new axes, to obtain
the new marginal distributions.
We stress that we are still dealing exactly with the same data, and only the
axes have changed (they have been rotated). Note also that we have spoken
of a rotation of axes (no stretching).
Therefore the matrix (the set of numbers cij in Equation (39)) that carries out
the transformation to the new clever variables (the {x}) is a rotation matrix,
with a determinant equal to +1.
Figure 4: The daily changes in the two yields. Every point corresponds to a
realization of the change in the first yield (given by the x-coordinate of that
point), and a realization of the change in the second yield (given by the y-
After this length-preserving transformation the two marginal distributions look
very different:
• the variation along the new first axis, x1, clearly shows that along this new
direction is where ‘all the action is’;
• the variation along the second axis has now become a side show.
What do the transformed variables look like?
In this simple case they may be given by a transformation like the following:
1 1
dx1 = √ dy1 + √ dy2 (40)
2 2
1 1
dx2 = √ dy1 − √ dy2. (41)
2 2
Note that the two new variables, dx1 and dx2, are made up by the original
variables multiplied by some coefficients.
Figure 5: The same data as in Fig OrigData, but after rotating the axes. Also
in this case, the data have been projected along the two axes to obtain the
marginal distributions. Note the very different variances of the two marginal
Looking at the coefficients, we can see that the first variable turns out being
proportional to the average move in the yield curve.
If, as the modeller indeed observed in the example above, a parallel move in
the yields is the dominant mode of deformation of the yield curve, the change
in the first variable (the first principal component) will convey most of the
information about how all the yield curve moved on any day.
One number (the change in x1) will convey almost as much information as 2
numbers (the changes in y1 and y2).
This is what we meant when we said that the new variables are more informative:
if we look at both principal components we get an equivalent transformation
of the original data; but if decide to keep only one principal component, in the
case at hand the first new variable contains the ‘essence’ of the yield curve
deformation.
Note also that if the take the sum of the squares of the coefficients, and we
add either along rows or columns, we always obtain 1: again, this is because
the transformation c is just a rotation.
13.2 The Signal and the Noise
What about the noise? Now the modeller has to decide whether the much
narrower variation in the second variable conveys (less-important, but still-
significant) information, or is just an effect of measurement noise.
If she decides that it is noise, then in this simple example all the signal in the
data is captured by the change in the first principal component. If, beside the
parallel-move signal there is no additional information, by neglecting the second
principal component all she is throwing away is measurement error.
If she decides instead that the second transformed variable (the second principal
component) conveys less-important, but nonetheless-still-valuable, information
she can make a modelling choice as to whether keeping the second variable is
essential for her study or not.
If the variation in the original data is indeed captured well by two or three new
variables, then we have obtained a dramatic reduction in dimensionality.
Now, for two variables only, there are no advantages, other than aesthetic
ones, in using the two original variables, or the two transformed variables (the
principal components.)
On the other hand, if we use only one transformed variable we throw away both
the noise and whatever information may have been contained in the original
data.
But with many variables we can do something more nuanced: we can split
the decision of how many variables really matter for our application from the
decision of what is signal and what is unwanted noise.
For instance, we may ascertain that three principal component do contain in-
formation, but, perhaps to keep our model simple, we may decide that we will
work only with two.
When we do this drastic culling we do two things at the same time: first, we
clean the data from noise; second, we achieve a more synthetic, albeit less-
complete, description of the original signal.
Michelangelo could have handled three variables, but only as a ceiling fresco.)
In practice, when the variance becomes ‘too small’ we say that either we do
not care, or it is noise.
This, however, is not the full story: more about this at the end of the lecture.
14 First Conclusions about Principal Components
First of all, PCA works well when the signal-to-noise ratio is high. Whether
this is true or not depends on the application at hand.
For instance, in his good tutorial on PCA, Shlenbs (2009) says: ‘Measurement
noise in any data set must be low or else, no matter the analysis technique, no
information about a signal can be extracted’.
However, wishful thinking should not cloud our appreciation of what is signal
and what is noise.
It is not true that, when the variance of a principal component (of one of the
transformed variables) is small we are safe in assuming that we are dealing with
noise.
Whether we are not depends on the nature of the phenomenon at hand and
on the quality of our data collection, not on the size of the variance. This
often-forgotten fact should be engraved on the keyboard of any statistician
and econometrician.
Second, PCA works well when there is a lot of redundancy in the original data.
For linear systems, this means that it works well when the original data are
strongly correlated (as yields are). What does ‘redundancy’ mean? With high
redundancy, if you tell me the realization of one variable (say, the change in
the 26-year yield), I have a sporting chance of guessing pretty accurately the
realization of another variable (say, the change in the 27-year yield).
The variable transformation that PCA carries out creates variables with no
redundancy, in the sense that knowledge of the realization of one variable (say,
the level of the yield curve) makes one none the wiser as to the realization of
the second variable (say, it slope). See again Fig (??).
Third, if the second new variable (principal component) is less important than
the first (and the third than the second, and so on), then one can ‘throw’ away
these higher principal components, even if they are not noise, and lose little in
the accuracy of the description.
What does it mean that a transformed variable is ‘less important than another’ ?
If all variables impact the something we are interested in with a similar ‘elastic-
ity’, then a variable that varies less (which has a smaller variance) will be less
important — simply by virtue of the fact that, despite having a similar impact
as the first ‘per unit shock’, it varies over a more limited range of values.
Look again at Equations (40) and (41): a unit change in either variable pro-
duces a change in the yield curve which is just as large. The ‘elasticity’ of the
quantity we are interested in (the magnitude of the deformation∗ of the yield
curve) is the same for the two principal components.
∗ We can use as a proxy of the deformation of the yield curve the sum of the absolute values
of the changes in the various yields.
However, empirically we find that the second principal component varies a lot
less than the first (see Fig (??)), and therefore it is not reasonable to assign a
unit change to both.
It is only in this sense that, for the application at hand, one variable matters
more than the other.
Despite the fact that the second principal component varies far less than the
second, in the case of the market price of risk if we throw the slope away we
throw away the signal.
The variance of a variable is a good indicator of its importance only if all the
variables act on what we are interested in on an equal footing.
A = V ΛV T, (42)
where Λ is a diagonal matrix that contains the r distinct eigenvalues, λi, of A:
A−
→
v i = λi−
→
v i, i = 1, 2, ...r (46)
ie, when the original matrix, A, operates on the eigenvector, −
→
v i, it returns
the same eigenvector back, multiplied by a constant. This constant is the ith
eigenvalue.
Saying that the eigenvectors are all orthogonal to each other means that
−
→
vT →
− if i = j (47)
j vi=0
Saying that they are normalized means that
−
→
vT →
− (48)
i v i = 1.
(Note carefully that the expression − →
vTj indicates an [1 × r] row vector, which,
when multiplied using the using matrix multiplication rules by the [r × 1] col-
umn vector, −→v j , gives a scalar. The operation is also known as a contraction,
or an inner product. If this is not clear, please see Chapter 2 again, because
this notation will recur over and over again.)
Consider the case now when the matrix A is the covariance matrix between n
variables yi. The transformed variables, ie the principal components, xi, are
given by
T
xi = vij yj (49)
j=1,n
or, in matrix notation,
−
→
x = V T−
→
y. (50)
Of course, because of orthogonality,
−
→
y =V−
→
x. (51)
In the light of the discussion above, why do regulators believe that bigger is
always better, ie, that more and more mappings are an unalloyed good?
Surely they are aware of the objections about overparametrized models that we
raised in the previous section.
The problem is that banks often engage in relative-value trading.
This entails taking very large long and short positions in securities or indices
that are very similar, but that, for some reason or other, seem to have moved
out of their normal ‘trading range’.
Since the securities or indices in which the opposite and large p[ositions are
taken are very similar, and reasonable mapping to risk factors is likely to lump
them together — ie, to assume that they move exactly and not approximately
in lock-step.
But if they did, then the market risk from this positions would be exactly zero,
which it patently isn’t. (If it were, why would the large offsetting positions be
put in place to start with)?
So, regulators are keen to capture basis risk, and that is why they insist on the
granularity of the mappings.
Perhaps asking for more and more risk factors to be added to the pot is not
the best way to go about it (what would you do?).
However, if we accept this logic (ie, if we accept that the only way to deal with
basis risk is to capture all the securities that these pesky banks may use for
their relative-value trades), then we have to learn how to deal with a quadrillion
risk factors.
17 How PCs Can Help with Monte Carlo Simu-
lations*
In order to see how Principal Component Analysis can help us with the esti-
mation of the joint estimation of risk factors, suppose that we want to run a
Monte Carlo simulation similar in principle to the one we saw above when we
had a handful of securities.
The big problem is that now we have to deal with tens of thousands of risk
factors.
The details are different, but the general approach is similar. Our shot at
cleverness is via a hierarchical method that factorizes the dependencies present
in our portfolio. Here is how it is done.
18 Constructing a Hierarchy of Factors
So, the task ahead of us is the following: we want to shock thousands of risk
factors that give us the most granular description we can afford for our portfolio.
All the shocks must reflect what we know about the joint distribution of the
risk factors.
At the top level we want to capture Global Market Risk Factors — these could
be, for instance:
5. Mortgages DM
8. Major Currencies DM
9. Major Currencies EM
10. ...
Take one Global Market Factor, say, the Global Fixed Income DM Risk Factor.
This should in some way tell us at a pretty coarse level about the behaviour of
Treasury bonds of all developed markets. This level of coarseness is not good
enough to shock our portfolio. We will have to drill down.
Let’s zoom in on one of the Global Market Factors: say, the Global Fixed
Income DM factor. This could be made up of
7. ...
Let’s zoom in further, and let’s look at one of the middle level factors, say, the
USD Treasury bonds.
At the bottom level we are going to look at all the important key-rate maturities
that characterize this market.
Note that we call this the ‘bottom’ level , but it is bottom in the space of
factors. There is an even lower level below, but this is at security level — ie, at
the level of the individual positions in our portfolio.
To fix ideas, suppose that the specific factor in the bottom factor level is USD
Treasury bonds. As we said, these could be 30 yields.
The beauty of PCs is that we can retain a relatively small number of them and
we can‘reconstruct’ very accurately the changes in all the original variables.
Let’s say that we retain the first 4 principal components. So, we construct time
series of the first 4 principal components (of changes) for USD Treasuries.
18.5 Moving back up to the Middle Level
These 4 principal components are sent to the middle level to ‘represent’ the
‘USD Treasury bonds’ middle-level factor. There will also be, say, 3 principal
components received from the bottom level to ‘represent’ the ‘GBP Treasury
bonds’ factor.
And so on for all the national markets that make up the top level Global Fixed
Income DM factor.
If we have, say, 10 regional markets and on average 4 principal components
for each we do not want to represent at the top level the factor Global Fixed
Income DM by 40 variables.
Now at the top level we are going to receive 5 principal components from Global
Fixed Income DM, 5 from Global Equities DM, etc. We are going to have, at
this top level, some 102 principal components.
Now, we can handle 102 random variables — by which I mean that we can
build a meaningful covariance matrix among them and that, perhaps using the
techniques, that we shall discuss in Lecture 3, we may be able to can run a
reasonable Monte Carlo simulations to obtain many realizations.
So, for instance, the first 4 were associated with (were the first 4 PCs of)
the Global Fixed Income DM factor; the second 5 were associated with
(were the first 5 PCs of) the Global Equity DM factor; etc;
3. we therefore take the first 4 elements of the 102-element vector created
by the 20-dimensional draw at the top level, and use them to recover
(approximately, but accurately) the, say, 28 elements of the Global Fixed
Index factor; we can then take the next, say, 5 elements of the 102-element
vector created by the 20-dimensional draw at the top level, and use them
to recover (approximately, but accurately) the, say, 30 elements of the
Global Equity factor; and so on and so forth.
4. we now move to the bottom factor level: we know that the first four of
these 28 factors pertained to the USD Treasury market, and with these
four factors we can recover (approximately, but accurately) all the changes
in the 30 key rates that we decided described as granularly as we can afford
our portfolio.
From these we can obtain the changes in the security-level positions in our
portfolio.
How can we know if, by our repeated culling, we have neglected important
sources of risk — perhaps those coming from the relative-value trades that
banks like so much?
P&L attribution.