This action might not be possible to undo. Are you sure you want to continue?
Mathematical Modeling and Statistical Methods
for Risk Management
Lecture Notes
c Henrik Hult and Filip Lindskog
Contents
1 Some background to ﬁnancial risk management 1
1.1 A preliminary example . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Why risk management? . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Regulators and supervisors . . . . . . . . . . . . . . . . . . . . . 3
1.4 Why the government cares about the buﬀer capital . . . . . . . . 4
1.5 Types of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Financial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Loss operators and ﬁnancial portfolios 6
2.1 Portfolios and the loss operator . . . . . . . . . . . . . . . . . . . 6
2.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Risk measurement 10
3.1 Elementary measures of risk . . . . . . . . . . . . . . . . . . . . . 10
3.2 Risk measures based on the loss distribution . . . . . . . . . . . . 12
4 Methods for computing VaR and ES 19
4.1 Empirical VaR and ES . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Conﬁdence intervals . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Exact conﬁdence intervals for ValueatRisk . . . . . . . . 20
4.2.2 Using the bootstrap to obtain conﬁdence intervals . . . . 22
4.3 Historical simulation . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Variance–Covariance method . . . . . . . . . . . . . . . . . . . . 24
4.5 MonteCarlo methods . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Extreme value theory for random variables with heavy tails 26
5.1 Quantilequantile plots . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Regular variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Hill estimation 33
6.1 Selecting the number of upper order statistics . . . . . . . . . . . 34
7 The Peaks Over Threshold (POT) method 36
7.1 How to choose a high threshold. . . . . . . . . . . . . . . . . . . . 37
7.2 Meanexcess plot . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 Estimation of ValueatRisk and Expected shortfall . . . . . . . . 40
8 Multivariate distributions and dependence 43
8.1 Basic properties of random vectors . . . . . . . . . . . . . . . . . 43
8.2 Joint log return distributions . . . . . . . . . . . . . . . . . . . . 44
8.3 Comonotonicity and countermonotonicity . . . . . . . . . . . . . 44
8.4 Covariance and linear correlation . . . . . . . . . . . . . . . . . . 44
8.5 Rank correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.6 Tail dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
i
9 Multivariate elliptical distributions 53
9.1 The multivariate normal distribution . . . . . . . . . . . . . . . . 53
9.2 Normal mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.3 Spherical distributions . . . . . . . . . . . . . . . . . . . . . . . . 54
9.4 Elliptical distributions . . . . . . . . . . . . . . . . . . . . . . . . 55
9.5 Properties of elliptical distributions . . . . . . . . . . . . . . . . . 57
9.6 Elliptical distributions and risk management . . . . . . . . . . . 58
10 Copulas 61
10.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.2 Dependence measures . . . . . . . . . . . . . . . . . . . . . . . . 66
10.3 Elliptical copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.4 Simulation from Gaussian and tcopulas . . . . . . . . . . . . . . 72
10.5 Archimedean copulas . . . . . . . . . . . . . . . . . . . . . . . . . 73
10.6 Simulation from Gumbel and Clayton copulas . . . . . . . . . . . 76
10.7 Fitting copulas to data . . . . . . . . . . . . . . . . . . . . . . . . 78
10.8 Gaussian and tcopulas . . . . . . . . . . . . . . . . . . . . . . . . 79
11 Portfolio credit risk modeling 81
11.1 A simple model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2 Latent variable models . . . . . . . . . . . . . . . . . . . . . . . . 82
11.3 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.4 Onefactor Bernoulli mixture models . . . . . . . . . . . . . . . . 86
11.5 Probit normal mixture models . . . . . . . . . . . . . . . . . . . . 87
11.6 Beta mixture models . . . . . . . . . . . . . . . . . . . . . . . . . 88
12 Popular portfolio credit risk models 90
12.1 The KMV model . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
12.2 CreditRisk
+
– a Poisson mixture model . . . . . . . . . . . . . . 94
A A few probability facts 100
A.1 Convergence concepts . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 Limit theorems and inequalities . . . . . . . . . . . . . . . . . . . 100
B Conditional expectations 101
B.1 Deﬁnition and properties . . . . . . . . . . . . . . . . . . . . . . . 101
B.2 An expression in terms the density of (X, Z) . . . . . . . . . . . 102
B.3 Orthogonality and projections in Hilbert spaces . . . . . . . . . . 103
ii
Preface
These lecture notes aims at giving an introduction to Quantitative Risk Man
agement. We will introduce statistical techniques used for deriving the proﬁt
andloss distribution for a portfolio of ﬁnancial instruments and to compute risk
measures associated with this distribution. The focus lies on the mathemati
cal/statistical modeling of market and credit risk. Operational risk and the use
of ﬁnancial time series for risk modeling is not treated in these lecture notes.
Financial institutions typically hold portfolios consisting on large number of
ﬁnancial instruments. A careful modeling of the dependence between these in
struments is crucial for good risk management in these situations. A large part
of these lecture notes is therefore devoted to the issue of dependence modeling.
The reader is assumed to have a mathematical/statistical knowledge correspond
ing to basic courses in linear algebra, analysis, statistics and an intermediate
course in probability. The lecture notes are written with the aim of presenting
the material in a fairly rigorous way without any use of measure theory.
The chapters 14 in these lecture notes are based on the book [12]
which we strongly recommend. More material on the topics pre
sented in remaining chapters can be found in [8] (chapters 57), [12]
(chapters 812) and articles found in the list of references at the end
of these lecture notes.
Henrik Hult and Filip Lindskog, 2007
iii
1 Some background to ﬁnancial risk manage
ment
We will now give a brief introduction to the topic of risk management and
explain why this may be of importance for a bank or ﬁnancial institution. We
will start with a preliminary example illustrating in a simple way some of the
issues encountered when dealing with risks and risk measurements.
1.1 A preliminary example
A player (investor/speculator) is entering a casino with an initial capital of
V
0
= 1 million Swedish Kroner. All initial capital is used to place bets according
to a predetermined gambling strategy. After the game the capital is V
1
. We
denote the proﬁt(loss) by a random variable X = V
1
− V
0
. The distribution
of X is called the proﬁtandloss distribution (P&L) and the distribution of
L = −X = V
0
− V
1
is simply called the loss distribution. As the loss may be
positive this is a risky position, i.e. there is a risk of losing some of the initial
capital.
Suppose a game is constructed so that it gives 1.6 million Swedish Kroner
with probability p and 0.6 million Swedish Kroner with probability 1−p. Hence,
X =
_
0.6 with probability p,
−0.4 with probability 1 −p.
(1.1)
The fair price for this game, corresponding to E(X) = 0, is p = 0.4. However,
even if p > 0.4 the player might choose not to participate in the game with the
view that not participating is more attractive than playing a game with a small
expected proﬁt together with a risk of loosing 0.4 million Swedish Kroner. This
attitude is called riskaverseness.
Clearly, the choice of whether to participate or not depends on the P&L
distribution. However, in most cases (think of investing in instruments on the
ﬁnancial market) the P&L distribution is not known. Then you need to evaluate
some aspects of the distribution to decide whether to play or not. For this
purpose it is natural to use a risk measure. A risk measure is a mapping
from the random variables to the real numbers; to every loss random variable
L there is a real number (L) representing the riskiness of L. To evaluate the
loss distribution in terms of a single number is of course a huge simpliﬁcation
of the world but the hope is that it can give us suﬃcient indication whether to
play the game or not.
Consider the game (1.1) described above and suppose that the mean E(L) =
−0.1 (i.e. a positive expected proﬁt) and standard deviation std(L) = 0.5 of the
loss L is known. In this case the game had only two known possible outcomes so
the information about the mean and standard deviation uniquely speciﬁes the
P&L distribution, yielding p = 0.5. However, the possible outcomes of a typical
realworld game are typically not known and mean and standard deviation do
1
not specify the P&L distribution. A simple example is the following:
X =
_
0.35 with probability 0.8,
−0.9 with probability 0.2.
(1.2)
Here we also have E(L) = −0.1 and std(L) = 0.5. However, most riskaverse
players would agree that the game (1.2) is riskier than the game (1.1) with
p = 0.5. Using an appropriate quantile of the loss L as a risk measure would
classify the game (1.2) as riskier than the game (1.1) with p = 0.5. However,
evaluating a single risk measure such as a quantile will in general not provide
a lot of information about the loss distribution, although it can provide some
relevant information. A key to a sound risk management is to look for risk
measures that give as much relevant information about the loss distribution as
possible.
A risk manager at a ﬁnancial institution with responsibility for a portfolio
consisting of a few up to hundreds or thousands of ﬁnancial assets and contracts
faces a similar problem as the player above entering the casino. Management or
investors have also imposed risk preferences that the risk manager is trying to
meet. To evaluate the position the risk manager tries to assess the loss distribu
tion to make sure that the current positions is in accordance with imposed risk
preferences. If it is not, then the risk manager must rebalance the portfolio until
a desirable loss distribution is obtained. We may view a ﬁnancial investor as a
player participating in the game at the ﬁnancial market and the loss distribution
must be evaluated in order to know which game the investor is participating in.
1.2 Why risk management?
The trading volumes on the ﬁnancial markets have increased tremendously over
the last decades. In 1970 the average daily trading volume at the New York
Stock Exchange was 3.5 million shares. In 2002 it was 1.4 billion shares. In
the last few years we have seen a signiﬁcant increase in the derivatives markets.
There are a huge number of actors on the ﬁnancial markets taking risky positions
Contracts 1995 1998 2002
FOREX 13 18 18
Interest rate 26 50 102
Total 47 80 142
Table 1: Global market in OTC derivatives (nominal value) in trillion US dollars
(1 trillion = 10
12
).
and to evaluate their positions properly they need quantitative tools from risk
management. Recent history also shows several examples where large losses on
the ﬁnancial market are mainly due to the absence of proper risk control.
Example 1.1 (Orange County) On December 6 1994, Orange County, a
prosperous district in California, declared bankruptcy after suﬀering losses of
2
around $1.6 billion from a wrongway bet on interest rates in one of its principal
investment pools. (Source: www.erisk.com)
Example 1.2 (Barings bank) Barings bank had a long history of success and
was much respected as the UK’s oldest merchant bank. But in February 1995,
this highly regarded bank, with $900 million in capital, was bankrupted by $1
billion of unauthorized trading losses. (Source: www.erisk.com)
Example 1.3 (LTCM) In 1994 a hedgefund called LongTerm Capital Man
agement (LTCM) was founded and assembled a star team of traders and aca
demics. Investors and investment banks invested $1.3 billion in the fund and
after two years returns was running close to 40%. Early 1998 the net asset
value stands at $4 billion but at the end of the year the fund had lost sub
stantial amounts of the investors equity capital and the fund was at the brink
of default. The US Federal Reserve managed a $3.5 billion rescue package to
avoid the threat of a systematic crisis in th world ﬁnancial system. (Source:
www.erisk.com)
1.3 Regulators and supervisors
To be able to cover most ﬁnancial losses most banks and ﬁnancial institutions
put aside a buﬀer capital, also called regulatory capital. The amount of buﬀer
capital needed is of course related to the amount of risk the bank is taking, i.e. to
the overall P&L distribution. The amount is regulated by law and the national
supervisory authority makes sure that the banks and ﬁnancial institutions follow
the rules.
There is also a strive to develop international standards and methods for
computing regulatory capital. This is the main task of the socalled Basel Com
mittee. The Basel Committee, established in 1974, does not possess any formal
supernational supervising authority and its conclusions does not have legal force.
It formulates supervisory standards, guidelines and recommends statements of
best practice. In this way the Basel Committee has large impact on the national
supervisory authorities.
• In 1988 the ﬁrst Basel Accord on Banking Supervision [2] initiated an im
portant step toward an international minimal capital standard. Emphasis
was on credit risk.
• In 1996 an amendment to Basel I prescribes a so–called standardized model
for market risk with an option for larger banks to use internal Valueat
Risk (VaR) models.
• In 2001 a new consultative process for the new Basel Accord (Basel II)
is initiated. The main theme concerns advanced internal approaches to
credit risk and also new capital requirements for operational risk. The new
Accord aims at an implementation date of 20062007. Details of Basel II
is still hotly debated.
3
1.4 Why the government cares about the buﬀer capital
The following motivation is given in [6].
“Banks collect deposits and play a key role in the payment system. National
governments have a very direct interest in ensuring that banks remain capable
of meeting their obligations; in eﬀect they act as a guarantor, sometimes also
as a lender of last resort. They therefore wish to limit the cost of the safety
net in case of a bank failure. By acting as a buﬀer against unanticipated losses,
regulatory capital helps to privatize a burden that would otherwise be borne by
national governments.”
1.5 Types of risk
Here is a general deﬁnition of risk for an organization: any event or action that
may adversely aﬀect an organization to achieve its obligations and execute its
strategies. In ﬁnancial risk management we try to be a bit more speciﬁc and
divide most risks into three categories.
• Market risk – risks due to changing markets, market prices, interest rate
ﬂuctuations, foreign exchange rate changes, commodity price changes etc.
• Credit risk – the risk carried by the lender that a debtor will not be able
to repay his/her debt or that a counterparty in a ﬁnancial agreement can
not fulﬁll his/her commitments.
• Operational risk – the risk of losses resulting from inadequate of failed
internal processes, people and systems of from external events. This in
cludes people risks such as incompetence and fraud, process risk such as
transaction and operational control risk and technology risk such as sys
tem failure, programming errors etc.
There are also other types of risks such as liquidity risk which is risk that
concerns the need for well functioning ﬁnancial markets where one can buy or
sell contracts at fair prices. Other types of risks are for instance legal risk and
reputational risk.
1.6 Financial derivatives
Financial derivatives are ﬁnancial products or contracts derived from some fun
damental underlying; a stock price, stock index, interest rate, commodity price
to name a few. The key example is the European Call option written on a
particular stock. It gives the holder the right but not the obligation at a given
date T to buy the stock S for the price K. For this the buyer pays a premium
at time zero. The value of the European Call at time T is then
C(T) = max(S
T
−K, 0).
Financial derivatives are traded not only for the purpose of speculation but is
actively used as a risk management tool as they are tailor made for exchanging
4
risks between actors on the ﬁnancial market. Although they are of great impor
tance in risk management we will not discuss ﬁnancial derivatives much in this
course but put emphasis on the statistical models and methods for modeling
ﬁnancial data.
5
2 Loss operators and ﬁnancial portfolios
Here we follow [12] to introduce the loss operator and give some examples of
ﬁnancial portfolios that ﬁt into this framework.
2.1 Portfolios and the loss operator
Consider a given portfolio such as for instance a collection of stocks, bonds or
risky loans, or the overall position of a ﬁnancial institution.
The value of the portfolio at time t is denoted V (t). Given a time horizon
∆t the proﬁt over the interval [t, t + ∆t] is given by V (t + ∆t) − V (t) and the
distribution of V (t +∆t) −V (t) is called the proﬁtandloss distribution (P&L).
The loss over the interval is then
L
[t,t+∆t]
= −(V (t + ∆t) −V (t))
and the distribution of L
[t,t+∆t]
is called the loss distribution. Typical values
of ∆t is one day (or 1/250 years as we have approximately 250 trading days in
one year), ten days, one month or one year.
We may introduce a discrete parameter n = 0, 1, 2, . . . and use t
n
= n∆t as
the actual time. We will sometimes use the notation
L
n+1
= L
[t
n
,t
n+1
]
= L
[n∆t,(n+1)∆t]
= −(V ((n + 1)∆t) −V (n∆t)).
Often we also write V
n
for V (n∆t).
Example 2.1 Consider a portfolio of d stocks with α
i
units of stock number i,
i = 1, . . . , d. The stock prices at time n are given by S
n,i
, i = 1, . . . , d, and the
value of the portfolio is
V
n
=
d
i=1
α
i
S
n,i
.
In ﬁnancial statistics one often tries to ﬁnd a statistical model for the evolution
of the stock prices, e.g. a model for S
n+1,i
− S
n,i
, to be able to compute the
loss distribution L
n+1
. However, it is often the case that the socalled log
returns X
n+1,i
= ln S
n+1,i
− lnS
n,i
are easier to model than the diﬀerences
S
n+1,i
−S
n,i
. With Z
n,i
= ln S
n,i
we have S
n,i
= exp¦Z
n,i
¦ so the portfolio loss
L
n+1
= −(V
n+1
−V
n
) may be written as
L
n+1
= −
d
i=1
α
i
(exp¦Z
n+1,i
¦ −exp¦Z
n,i
¦)
= −
d
i=1
α
i
exp¦Z
n,i
¦(exp¦X
n+1,i
¦ −1)
= −
d
i=1
α
i
S
n,i
(exp¦X
n+1,i
¦ −1).
6
The relation between the modeled variables X
n+1,i
and the loss L
n+1
is nonlin
ear and it is sometimes useful to linearize this relation. In this case this is done
by replacing e
x
by 1 +x; recall the Taylor expansion e
x
= 1 +x +O(x
2
). Then
the linearized loss is given by
L
∆
n+1
= −
d
i=1
α
i
S
n,i
X
n+1,i
.
2.2 The general case
A general portfolio with value V
n
is often modeled using a ddimensional random
vector Z
n
= (Z
n,1
, . . . , Z
n,d
) of riskfactors. The value of the portfolio is then
expressed as
V
n
= f(t
n
, Z
n
)
for some known function f and t
n
is the actual calendar time. As in the example
above it is often convenient to model the riskfactor changes X
n+1
= Z
n+1
−Z
n
.
Then the loss is given by
L
n+1
= −(V
n+1
−V
n
) = −
_
f(t
n+1
, Z
n
+X
n+1
) −f(t
n
, Z
n
)
_
.
The loss may be viewed as the result of applying an operator l
[n]
() to the
riskfactor changes X
n+1
so that
L
n+1
= l
[n]
(X
n+1
)
where
l
[n]
(x) = −
_
f(t
n+1
, Z
n
+x) −f(t
n
, Z
n
)
_
.
The operator l
[n]
() is called the lossoperator. If we want to linearize the relation
between L
n+1
and X
n+1
then we have to diﬀerentiate f to get the linearized
loss
L
∆
n+1
= −
_
f
t
(t
n
, Z
n
)∆t +
d
i=1
f
z
i
(t
n
, Z
n
)X
n+1,i
_
.
Here f
t
(t, z) = ∂f(t, z)/∂t and f
z
i
(t, z) = ∂f(t, z)/∂z
i
. The corresponding
operator given by
l
∆
[n]
(x) = −
_
f
t
(t
n
, Z
n
)∆t +
d
i=1
f
z
i
(t
n
, Z
n
)x
i
_
is called the linearized lossoperator.
7
Example 2.2 (Portfolio of stocks continued) In the previous example of a
portfolio of stocks the riskfactors are the logprices of the stocks Z
n,i
= ln S
n,i
and the riskfactorchanges are the log returns X
n+1,i
= lnS
n+1,i
−ln S
n,i
. The
loss is L
n+1
= l
[n]
(X
n+1
) and the linearized loss is L
∆
n+1
= l
∆
[n]
(X
n+1
), where
l
[n]
(x) = −
d
i=1
α
i
S
n,i
(exp¦x
i
¦ −1) and l
∆
[n]
(x) = −
d
i=1
α
i
S
n,i
x
i
are the loss operator and linearized loss operator, respectively.
The following examples may convince you that there are many relevant ex
amples from ﬁnance that ﬁts into this general framework of riskfactors and
lossoperators.
Example 2.3 (A bond portfolio) A zerocoupon bond with maturity T is a
contract which gives the holder of the contract $1 at time T. The price of the
contract at time t < T is denoted B(t, T) and by deﬁnition B(T, T) = 1. To a
zerocoupon bond we associate the continuously compounded yield
y(t, T) = −
1
T −t
lnB(t, T),
i.e.
B(t, T) = exp¦−(T −t)y(t, T)¦.
To understand the notion of the yield, consider a bank account where we get a
constant interest rate r. The bank account evolves according to the diﬀerential
equation
dS
t
dt
= rS
t
, S
0
= 1
which has the solution S
t
= exp¦rt¦. This means that in every inﬁnitesimal time
interval dt we get the interest rate r. Every dollar put into the bank account
at time t is then worth exp¦r(T − t)¦ dollars at time T. Hence, if we have
exp¦−r(T −t)¦ dollars on the account at time t then we have exactly $1 at time
T. The yield y(t, T) can be identiﬁed with r and is interpreted as the constant
interest rate contracted at t that we get over the period [t, T]. However, the
yield may be diﬀerent for diﬀerent maturity times T. The function T →y(t, T)
for ﬁxed t is called the yieldcurve at t. Consider now a portfolio consisting of
d diﬀerent (default free) zerocoupon bonds with maturity times T
i
and prices
B(t, T
i
). We suppose that we have α
i
units of the bond with maturity T
i
so
that the value of the portfolio is
V
n
=
d
i=1
α
i
B(t
n
, T
i
).
8
It is natural to use the yields Z
n,i
= y(t
n
, T
i
) as riskfactors. Then
V
n
=
d
i=1
α
i
exp¦−(T
i
−t
n
)Z
n,i
¦ = f(t
n
, Z
n
)
and the loss is given by (with X
n+1,i
= Z
n+1,i
−Z
n,i
and ∆t = t
n+1
−t
n
)
L
n+1
= −
d
i=1
α
i
_
exp¦−(T
i
−t
n+1
)(Z
n,i
+X
n+1,i
)¦ −exp¦−(T
i
−t
n
)Z
n,i
¦
_
= −
d
i=1
α
i
B(t
n
, T
i
)
_
exp¦Z
n,i
∆t −(T
i
−t
n+1
)X
n+1,i
¦ −1
_
.
The corresponding lossoperator is then
l
[n]
(x) = −
d
i=1
α
i
B(t
n
, T
i
)
_
exp¦Z
n,i
∆t −(T
i
−t
n+1
)x
i
¦ −1
_
and the linearized loss is given by
L
∆
n+1
= −
d
i=1
α
i
B(t
n
, T
i
)
_
Z
n,i
∆t −(T
i
−t
n+1
)X
n+1,i
_
.
Example 2.4 (A European call option) In this example we will consider a
portfolio consisting of one European call option on a nondividend paying stock
S with maturity date T and strike price K. A European call option is a contract
which pays max(S
T
−K, 0) to the holder of the contract at time T. The price
at time t < T for the contract is evaluated using a function C depending on
some parameters. In the BlackScholes model C depends on four parameters
C = C(t, S, r, σ) where t is the current time, S is the stock price today at time
t, r is the interest rate and σ is called the volatility, which is a measure of the
standard deviation of the return of the stock. In this case we may put
Z
n
= (ln S
n+1
, r
n
, σ
n
)
T
X
n+1
= (ln S
n+1
−lnS
n
, r
n+1
−r
n
, σ
n+1
−σ
n
)
T
The value of the portfolio is then V
n
= C(t
n
, S
n
, r
n
, σ
n
) and the linearized loss
is given by
L
∆
n+1
= −(C
t
∆t +C
S
X
n+1,1
+C
r
X
n+1,2
+C
σ
X
n+1,3
).
The partial derivatives are usually called the “Greeks”. C
t
is called theta; C
S
is called delta; C
r
is called rho; C
σ
is called vega (although this is not a Greek
letter).
9
3 Risk measurement
What is the purpose of risk measurement?
• Determination of risk capital – determine the amount of capital a ﬁnancial
institution needs to cover unexpected losses.
• Management tool – Risk measures are used by management to limit the
amount of risk a unit within the ﬁrm may take.
Next we present some common measures of risk.
3.1 Elementary measures of risk
Notional amount
Risk of a portfolio is measured as the sum of notional values of individual secu
rities, weighted by a factor for each asset class. The notional amount approach
is used for instance in the standard approach of the Basel Committee where risk
weights 0%, 10%, 20%, 50% and 100% are used (see [2]). Then the regulatory
capital should be such that
regulatory capital
riskweighted sum
≥ 8%.
Example 3.1 Suppose we have a portfolio with three claims each of notional
amount $1 million. The ﬁrst claim is on an OECD central bank, the second on
a multilateral developed bank and the third on the private sector. According to
the riskweights used by the Basel document the ﬁrst claim is weighted by 0%,
the second by 20% and the third by 100%. Thus the riskweighted sum is
0 10
6
+ 0.20 10
6
+ 1 10
6
= 1 200 000,
and the regulatory capital should be at least 8% of this amount, i.e. $96 000.
Advantage: easy to use.
Disadvantage: Does not diﬀerentiate between long and short positions. There
are no diversiﬁcation eﬀects; a portfolio with loans to m independent obligors
is considered as risky as the same amount lent to a single obligor.
Factor sensitivity measures
Factor sensitivity measures gives the change in portfolio value for a predeter
mined change in one of the underlying risk factors. If the value of the portfolio
is given by
V
n
= f(t
n
, Z
n
)
then factor sensitivity measures are given by the partial derivatives
f
z
i
(t
n
, Z
n
) =
∂f
∂z
i
(t
n
, Z
n
).
10
The “Greeks” of a derivative portfolio may be considered as factor sensitivity
measures.
Advantage: Factor sensitivity measures provide information about the robust
ness of the portfolio value with respect to certain events (riskfactor changes).
Disadvantage: It is diﬃcult to aggregate the sensitivity with respect to changes
in diﬀerent riskfactors or aggregate across markets to get an understanding of
the overall riskiness of a position.
Scenario based risk measures
In this approach we consider a number of possible scenarios, i.e. a number of
possible riskfactor changes. A scenario may be for instance a 10% rise in a
relevant exchange rate and a simultaneous 20% drop in a relevant stock index.
The risk is then measured as the maximum loss over all possible (predetermined)
scenarios. To assess the maximum loss extreme scenarios may be downweighted
in a suitable way.
Formally, this approach may be formulated as follows. Fix a number N
of possible riskfactor changes, X = ¦x
1
, x
2
, . . . , x
N
¦. Each scenario is given
a weight, w
i
and we write w = (w
1
, . . . , w
N
). We consider a portfolio with
lossoperator l
[n]
(). The risk of the portfolio is then measured as
ψ
[X,w]
= max¦w
1
l
[n]
(x
1
), . . . , w
N
l
[n]
(x
N
)¦
These risk measures are frequently used in practice (example: Chicago Mercan
tile Exchange).
Example 3.2 (The SPAN rules) As an example of a scenario based risk
measure we consider the SPAN rules used at the Chicago Mercantile Exchange
[1]. We describe how the initial margin is calculated for a simple portfolio con
sisting of units of a futures contract and of several puts and calls with a common
expiration date on this futures contract. The SPAN margin for such a portfolio
is compute as follows: First fourteen scenarios are considered. Each scenario
is speciﬁed by an up or down move of volatility combined with no move, or an
up move, or a down move of the futures prices by 1/3, 2/3 or 3/3 of a speciﬁc
“range”. Next, two additional scenarios relate to “extreme” up or down moves
of the futures prices. The measure of risk is the maximum loss incurred, using
the full loss of the ﬁrst fourteen scenarios and only 35% of the loss for the last
two “extreme” scenarios. A speciﬁed model, typically the Black model, is used
to generate the corresponding prices for the options under each scenario.
The account of the investor holding a portfolio is required to have suﬃcient
current net worth to support the maximum expected loss. If it does not, then
extra cash is required as margin call, an amount equal to the “measure of risk”
involved.
Loss distribution approach
This is the approach a statistician would use to compute the risk of a portfolio.
Here we try model the loss L
n+1
using a probability distribution F
L
. The
11
parameters of the loss distribution are estimated using historical data. One can
either try to model the loss distribution directly or to model the riskfactors
or riskfactor changes as a ddimensional random vector or a multivariate time
series. Risk measures are based on the distribution function F
L
. The next
section studies this approach to risk measurement.
3.2 Risk measures based on the loss distribution
Standard deviation
The standard deviation of the loss distribution as a measure of risk is frequently
used, in particular in portfolio theory. There are however some disadvantages
using the standard deviation as a measure of risk. For instance the standard
deviation is only deﬁned for distributions with E(L
2
) < ∞, so it is undeﬁned for
random variables with very heavy tails. More important, proﬁts and losses have
equal impact on the standard deviation as risk measure and it does not discrim
inate between distributions with apparently diﬀerent probability of potentially
large losses. In fact, the standard deviation does not provide any information
on how large potential losses may be.
Example 3.3 Consider for instance the two loss distributions L
1
∼ N(0, 2)
and L
2
∼ t
4
(standard Student’s tdistribution with 4 degrees of freedom).
Both L
1
and L
2
have standard deviation equal to
√
2. The probability density
is illustrated in Figure 1 for the two distributions. Clearly the probability of
large losses is much higher for the t
4
than for the normal distribution.
−5 0 5
0
.0
0
.1
0
.2
0
.3
3 4 5 6 7
0
.0
0
0
0
.0
0
5
0
.0
1
0
0
.0
1
5
0
.0
2
0
0
.0
2
5
0
.0
3
0
0
.0
3
5
0 2 4 6 8 10
0
5
1
0
1
5
2
0
x
lo
g
−
r
a
tio
Figure 1: Left/Middle: The density function for a N(0, 2) and a t
4
distribution.
The t
4
is highly peaked around zero and has much heavier tails. Right: The
“logratio” ln[P(L
2
> x)/ P(L
1
> x)] is plotted.
ValueatRisk
We now introduce the widely used risk measure known as ValueatRisk.
12
Deﬁnition 3.1 Given a loss L and a conﬁdence level α ∈ (0, 1), VaR
α
(L) is
given by the smallest number l such that the probability that the loss L exceeds
l is no larger than 1 −α, i.e.
VaR
α
(L) = inf¦l ∈ R : P(L > l) ≤ 1 −α¦
= inf¦l ∈ R : 1 −F
L
(l) ≤ 1 −α¦
= inf¦l ∈ R : F
L
(l) ≥ α¦.
−4 −2 0 2 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
95% VaR
Figure 2: Illustration of VaR
0.95
.
We might think of L as the (potential) loss resulting from holding a portfolio
over some ﬁxed time horizon. In market risk the time horizon is typically one day
or ten days. In credit risk the portfolio may consist of loans and the time horizon
is often one year. Common conﬁdence levels are 95%, 99% and 99.9% depending
on the application. The BIS (Bank of International Settlements) proposes, for
market risk, to compute tenday VaR with conﬁdence level α = 99%. The time
horizon of ten days reﬂects the fact that markets are not perfectly liquid.
Deﬁnition 3.2 Given a nondecreasing function F : R → R the generalized
inverse of F is given by
F
←
(y) = inf¦x ∈ R : F(x) ≥ y¦
with the convention inf ∅ = ∞.
If F is strictly increasing then F
←
= F
−1
, i.e. the usual inverse. Using the
generalized inverse we deﬁne the αquantile of F by
q
α
(F) = F
←
(α) = inf¦x ∈ R : F(x) ≥ α¦, α ∈ (0, 1).
13
We note also that VaR
α
(F) = q
α
(F), where F is the loss distribution. Notice
that for a > 0 and b ∈ R,
VaR
α
(aL +b) = inf¦l ∈ R : P(aL +b ≤ l) ≥ α¦
= inf¦l ∈ R : P(L ≤ (l −b)/a) ≥ α¦
¦let l
t
= (l −b)/a¦
= inf¦al
t
+b ∈ R : P(L ≤ l
t
) ≥ α¦
= a inf¦l
t
∈ R : P(L ≤ l
t
) ≥ α¦ +b
= a VaR
α
(L) +b.
Hence, the risk measured in VaR for a shares of a portfolio is a times the risk of
one share of this portfolio. Moreover, adding (b < 0) or withdrawing (b > 0) an
amount [b[ of money from the portfolio changes this risk by the same amount.
Example 3.4 Suppose the loss distribution is normal, L ∼ N(µ, σ
2
). This
means that L
d
= µ +σL
t
, where L
t
∼ N(0, 1). Since VaR
α
(L
t
) = Φ
−1
(α), where
Φ is the distribution function of a standard normal random variable, we may
compute the ValueatRisk as VaR
α
(L) = µ +σΦ
−1
(α).
Example 3.5 Suppose that the distribution function F is given by
F(x) =
_
_
_
0 x < 0,
1/2 x ∈ [0, 1),
1 x ≥ 1.
Then F
←
(u) = 0 on (0, 1/2] and F
←
(u) = 1 on (1/2, 1). (F is the distribution
function of a random variable X with P(X = 0) = P(X = 1) = 1/2.)
Example 3.6 You hold a portfolio consisting of a long position of α = 5 shares
of stock A. The stock price today is S
0
= 100. The daily log returns
X
1
= ln(S
1
/S
0
), X
2
= ln(S
2
/S
1
), . . .
of stock A are assumed to be normally distributed with zero mean and standard
deviation σ = 0.1. Let L
1
be the portfolio loss from today until tomorrow. For
a standard normal random variable Z ∼ N(0, 1) we have F
−1
Z
(0.99) ≈ 2.3.
(a) Compute VaR
0.99
(L
1
).
We have shown that L
1
= −αS
0
(e
X
1
−1) = −500(e
X
1
−1). We have VaR
u
(L
1
) =
F
−1
L
1
(u) and to compute F
−1
L
1
(u) we use that F
L
1
(F
−1
L
1
(u)) = u.
F
L
1
(l) = P(−500(e
X
1
−1) ≤ l)
= P(e
X
1
≥ 1 −l/500)
= P(X
1
≥ ln(1 −l/500))
= 1 −F
X
1
(ln(1 −l/500)).
14
−0.4 −0.2 0.0 0.2 0.4
0
.
6
0
.
8
1
.
0
1
.
2
1
.
4
1
.
6
Figure 3: Plot of the function e
x
for x ∈ [−0.5, 0.5].
−3 −2 −1 0 1 2 3
0
5
1
0
1
5
2
0
Figure 4: Plot of the function e
x
for x ∈ [−3, 3].
15
Hence,
1 −F
X
1
(ln(1 −F
−1
L
1
(u)/500)) = u
⇔ ln(1 −F
−1
L
1
(u)/500) = F
−1
X
1
(1 −u)
⇔ 1 −F
−1
L
1
(u)/500 = e
F
−1
X
1
(1−u)
⇔ F
−1
L
1
(u) = 500(1 −e
F
−1
X
1
(1−u)
).
Since X
1
is symmetric about 0 we have F
−1
X
1
(1 − u) = −F
−1
X
1
(u). Hence,
F
−1
L
1
(u) = 500(1 − e
−F
−1
X
1
(u)
). Using that F
−1
X
1
(0.99) = 0.1 F
−1
Z
(0.99) ≈ 0.23
and with the help of Figure 3,
F
−1
L
1
(0.99) = 500(1 −e
−F
−1
X
1
(0.99)
)
≈ 500(1 −e
−0.23
) ≈ 500(1 −0.8) = 100.
Hence, VaR
0.99
(L
1
) ≈ 100.
You decide to keep your portfolio for 100 (trading) days before deciding what
to do with the portfolio.
(b) Compute VaR
0.99
(L
100
) and VaR
0.99
(L
∆
100
), where L
100
denotes the loss
from today until 100 days from today and L
∆
100
denotes the corresponding lin
earized 100day loss.
We have
L
100
= −αS
0
(e
X
100
−1
),
where X
100
is the 100day log return. Notice that
X
100
= lnS
100
/S
0
= lnS
100
−ln S
0
= lnS
1
/S
0
+ + ln S
100
/S
99
,
i.e. X
100
is a sum of 100 independent normally distributed random variables
with mean zero and standard deviation 0.1. Hence, X
100
d
= Z, where Z is
normally distributed with zero mean and standard deviation one. We have
VaR
0.99
(L
100
) = 500(1 −e
−F
−1
Z
(0.99)
)
≈ 500(1 −e
−2.3
).
Using Figure 4 we ﬁnd that e
−2.3
= 1/e
2.3
≈ 0.1. Hence, VaR
0.99
(L
100
) ≈
500(1 −e
−2.3
) ≈ 450.
We have L
∆
100
= −500Z. Hence, VaR
0.99
(L
∆
100
) = 500F
−1
Z
(0.99) ≈ 500 2.3 =
1150. One sees that using the linearized loss here gives a very bad risk estimate.
16
Expected shortfall
Although ValueatRisk has become a very popular risk measure among practi
tioners it has several limitations. For instance, it does not give any information
about how bad losses may be when things go wrong. In other words, what is
the size of an “average loss” given that the loss exceeds the 99%ValueatRisk?
Deﬁnition 3.3 For a loss L with continuous loss distribution function F
L
the
expected shortfall at conﬁdence level α ∈ (0, 1) is given by
ES
α
(L) = E(L [ L ≥ VaR
α
(L)).
We can rewrite this as follows:
ES
α
(L) = E(L [ L ≥ VaR
α
(L))
=
E(LI
[q
α
(L),∞)
(L))
P(L ≥ q
α
(L))
=
1
1 −α
E(LI
[q
α
(L),∞)
(L))
=
1
1 −α
_
∞
q
α
(L)
ldF
L
(l),
where I
A
is the indicator function: I
A
(x) = 1 if x ∈ A and 0 otherwise.
Lemma 3.1 For a loss L with continuous distribution function F
L
the expected
shortfall is given by
ES
α
(L) =
1
1 −α
_
1
α
VaR
p
(L)dp.
For a discrete distribution there are diﬀerent possibilities to deﬁne expected
shortfall. A useful deﬁnition called generalized expected shortfall, which is a
socalled coherent risk measure, is given by
GES
α
(L) =
1
1 −α
_
E(LI
[q
α
(L),∞)
(L)) +q
α
(L)
_
1 −α −P(L ≥ q
α
(L))
_
_
.
If the distribution of L is continuous, then the second term vanishes and GES
α
=
ES
α
.
Exercise 3.1 (a) Let L ∼ Exp(λ) and calculate ES
α
(L).
(b) Let L have distribution function F(x) = 1 −(1 +γx)
−1/γ
, x ≥ 0, γ ∈ (0, 1),
and calculate ES
α
(L).
Answer: (a) λ
−1
(1 −ln(1 −α)). (b) γ
−1
[(1 −α)
−γ
(1 −γ)
−1
−1].
17
Example 3.7 Suppose that L ∼ N(0, 1). Let φ and Φ be the density and
distribution function of L. Then
ES
α
(L) =
1
1 −α
_
∞
Φ
−1
(α)
ldΦ(l)
=
1
1 −α
_
∞
Φ
−1
(α)
lφ(l)dl
=
1
1 −α
_
∞
Φ
−1
(α)
l
1
√
2π
e
−l
2
/2
dl
=
1
1 −α
_
−
1
√
2π
e
−l
2
/2
_
∞
Φ
−1
(α)
=
1
1 −α
_
−φ(l)
_
∞
Φ
−1
(α)
=
φ(Φ
−1
(α))
1 −α
.
Suppose that L
t
∼ N(µ, σ
2
). Then
ES
α
(L
t
) = E(L
t
[ L
t
≥ VaR
α
(L
t
))
= E(µ +σL [ µ +σL ≥ VaR
α
(µ +σL))
= E(µ +σL [ L ≥ VaR
α
(L))
= µ +σ ES
α
(L)
= µ +σ
φ(Φ
−1
(α))
1 −α
.
Exercise 3.2 Let L have a standard Student’s tdistribution with ν > 1 degrees
of freedom. Then L has density function
g
ν
(x) =
Γ((ν + 1)/2)
√
νπΓ(ν/2)
_
1 +
x
2
ν
_
−(ν+1)/2
.
Show that
ES
α
(L) =
g
ν
(t
−1
ν
(α))
1 −α
_
ν + (t
−1
ν
(α))
2
ν −1
_
,
where t
ν
denotes the distribution function of L.
18
4 Methods for computing VaR and ES
We will now introduce some standard methods for computing ValueatRisk and
expected shortfall for a portfolio of risky assets. The setup is as follows. We
consider a portfolio with value
V
m
= f(t
m
, Z
m
),
where f is a known function and Z
m
is a vector of riskfactors. The loss L
m+1
is then given by
L
m+1
= l
[m]
(X
m+1
).
Suppose we have observed the risk factors Z
m−n+1
, . . . , Z
m
. These observations
will be called historical data. How can we use these observations to compute
ValueatRisk or expected shortfall for the loss L
m+1
?
4.1 Empirical VaR and ES
Suppose we have observations x
1
, . . . , x
n
of iid random variables X
1
, . . . , X
n
with distribution F. The empirical distribution function is then given by
F
n
(x) =
1
n
n
k=1
I
[X
k
,∞)
(x).
The empirical quantile is then given by
q
α
(F
n
) = inf¦x ∈ R : F
n
(x) ≥ α¦ = F
←
n
(α).
If we order the sample X
1
, . . . , X
n
such that X
1,n
≥ ≥ X
n,n
(if F is con
tinuous, then X
j
,= X
k
a.s. for j ,= k), then the empirical quantile is given
by
q
α
(F
n
) = X
[n(1−α)]+1,n
, α ∈ (0, 1),
where [y] is the integer part of y, [y] = sup¦n ∈ N : n ≤ y¦ (the largest integer
less or equal to y). If F is strictly increasing, then q
α
(F
n
) → q
α
(F) a.s. as
n →∞for every α ∈ (0, 1). Thus, based on the observations x
1
, . . . , x
n
we may
estimate the quantile q
α
(F) by the empirical estimate ´ q
α
(F) = x
[n(1−α)]+1,n
.
The empirical estimator for expected shortfall is given by
´
ES
α
(F) =
[n(1−α)]+1
k=1
x
k,n
[n(1 −α)] + 1
which is the average of the [n(1 −α)] + 1 largest observations.
The reliability of these estimates is of course related to α and to the number
of observations. As the true distribution is unknown explicit conﬁdence bounds
can in general not be obtained. However, approximate conﬁdence bounds can be
obtained using nonparametric techniques. This is described in the next section.
19
4.2 Conﬁdence intervals
Suppose we have observations x
1
, . . . , x
n
of iid random variables X
1
, . . . , X
n
from an unknown distribution F and that we want to construct a conﬁdence
interval for the risk measure (F). That is, given p ∈ (0, 1) we want to ﬁnd a
stochastic interval (A, B), where A = f
A
(X
1
, . . . , X
n
) and B = f
B
(X
1
, . . . , X
n
)
for some functions f
A
, f
B
, such that
P(A < (F) < B) = p.
The interval (a, b), where a = f
A
(x
1
, . . . , x
n
) and b = f
B
(x
1
, . . . , x
n
), is a conﬁ
dence interval for (F) with conﬁdence level p. Typically we want a doublesided
and centered interval so that
P(A < (F) < B) = p, P(A ≥ (F)) = P(B ≤ (F)) = (1 −p)/2.
Unfortunately, F is unknown so we cannot ﬁnd suitable functions f
A
, f
B
. How
ever, we can construct approximate conﬁdence intervals. Moreover, if (F)
is a quantile of F (ValueatRisk), then we can actually ﬁnd exact conﬁdence
intervals for (F), but not for arbitrary choices of conﬁdence levels p.
4.2.1 Exact conﬁdence intervals for ValueatRisk
Suppose we have observations x
1
, . . . , x
n
from iid random variables X
1
, . . . , X
n
with common unknown continuous distribution function F. Suppose further
that we want to construct a conﬁdence interval (a, b) for the quantile q
α
(F),
where a = f
A
(x
1
, . . . , x
n
) and b = f
B
(x
1
, . . . , x
n
) such that
P(A < q
α
(F) < B) = p, P(A ≥ q
α
(F)) = P(B ≤ q
α
(F)) = (1 −p)/2,
where p is a conﬁdence level and A = f
A
(X
1
, . . . , X
n
) and B = f
B
(X
1
, . . . , X
n
).
Since F is unknown we cannot ﬁnd a and b. However, we can look for i > j and
the smallest p
t
≥ p such that
P(X
i,n
< q
α
(F) < X
j,n
) = p
t
,
P(X
i,n
≥ q
α
(F)) ≤ (1 −p)/2, P(X
j,n
≤ q
α
(F)) ≤ (1 −p)/2. (4.1)
Let Y
α
= #¦X
k
> q
α
(F)¦, i.e. the number of sample points exceeding q
α
(F).
It is easily seen that Y
α
is Binomial(n, 1 −α)distributed. Notice that
P(X
1,n
≤ q
α
(F)) = P(Y
α
= 0),
P(X
2,n
≤ q
α
(F)) = P(Y
α
≤ 1),
. . .
P(X
j,n
≤ q
α
(F)) = P(Y
α
≤ j −1).
Similarly, P(X
i,n
≥ q
α
(F)) = 1 − P(Y
α
≤ i − 1). Hence, we can compute
P(X
j,n
≤ q
α
(F)) and P(X
i,n
≥ q
α
(F)) for diﬀerent i and j until we ﬁnd indices
that satisfy (4.1).
20
0 20 40 60 80 100
5
1
0
2
0
3
0
4
0
Pareto 250
0 20 40 60 80 100
5
1
0
2
0
3
0
4
0
Pareto 1000
0 20 40 60 80 100
5
1
0
1
5
2
0
Exponential 250
0 20 40 60 80 100
5
1
0
1
5
2
0
Exponential 1000
0 20 40 60 80 100
5
1
0
1
5
2
0
2
5
3
0
3
5
0 20 40 60 80 100
5
1
0
1
5
2
0
2
5
3
0
3
5
Figure 5: Upper: Empirical estimates of VaR
0.99
for samples of diﬀerent sizes
and from diﬀerent distributions with VaR
0.99
= 10. Lower: Simulated 97.6%
conﬁdence intervals (x
18,1000
, x
4,1000
) for VaR
0.99
(X) = 10 based on samples of
size 1000 from a Pareto distribution.
Example 4.1 Suppose we have an iid sample X
1
, . . . , X
10
with common un
known continuous distribution function F and that we want a conﬁdence interval
for q
0.8
(F) with conﬁdence level p
t
≥ p = 0.75. Since Y
0.8
is Binomial(10, 0.2)
21
distributed and P(X
j,10
≤ q
0.8
(F)) = P(Y
0.8
≤ j −1) we ﬁnd that
P(X
1,10
≤ q
0.8
(F)) ≈ 0.11 and P(X
4,10
≥ q
0.8
(F)) ≈ 0.12.
Notice that max¦0.11, 0.12¦ ≤ (1 − p)/2 = 0.125 and P(X
4,10
< q
0.8
(F) <
X
1,10
) ≈ 0.77 so (x
4,10
, x
1,10
) is a conﬁdence interval for q
0.8
(F) with conﬁdence
level 77%.
4.2.2 Using the bootstrap to obtain conﬁdence intervals
Using the socalled nonparametric bootstrap method we can obtain conﬁdence
intervals for e.g. risk measures such as ValueatRisk and expected shortfall.
The nonparametric bootstrap works as follows.
Suppose we have observations x
1
, . . . , x
n
of iid random variables X
1
, . . . , X
n
and we want to estimate some parameter θ which depends on the unknown dis
tribution F of the X’s. For instance θ could be the αquantile θ = q
α
(F). First
an estimator
´
θ(x
1
, . . . , x
n
) of θ is constructed, e.g.
´
θ(x
1
, . . . , x
n
) = x
[n(1−α)]+1,n
.
Now we want to construct a conﬁdence interval for θ with conﬁdence level p
(for instance p = 0.95). To construct a conﬁdence interval we need to know
the distribution of
´
θ(X
1
, . . . , X
n
). If F was known this distribution could be
approximated arbitrarily well by simulating from F many (N large) times to
construct new samples
¯
X
(i)
1
, . . . ,
¯
X
(i)
n
, i = 1, . . . , N, and compute the estimator
for each of these samples to get
¯
θ
i
=
´
θ(
¯
X
(i)
1
, . . . ,
¯
X
(i)
n
), i = 1, . . . , N. As N →∞
the empirical distribution
1
N
N
i=1
I
[
e
θ
i
,∞)
(x)
of
´
θ(X
1
, . . . , X
n
) will converge to the true distribution of
´
θ(X
1
, . . . , X
n
). The
problem is that F is not known.
What is known is the empirical distribution F
n
which puts point masses 1/n
at the points X
1
, . . . , X
n
. If n is relatively large we expect that F
n
is a good
approximation of F. Moreover, we can resample from F
n
simply by drawing
with replacement among X
1
, . . . , X
n
. We denote such a resample by X
∗
1
, . . . , X
∗
n
.
Then we may compute θ
∗
=
´
θ(X
∗
1
, . . . , X
∗
n
). Since F
n
approximates the true
distribution F we expect the distribution of
´
θ(X
∗
1
, . . . , X
∗
n
) to approximate the
true distribution of
´
θ(X
1
, . . . , X
n
). To obtain the distribution of
´
θ(X
∗
1
, . . . , X
∗
n
)
we resample many (N large) times to create new samples X
∗(i)
1
, . . . , X
∗(i)
n
, i =
1, . . . , N. For each of these samples we compute the corresponding estimate of
θ, i.e. θ
∗
i
=
´
θ(X
∗(i)
1
, . . . , X
∗(i)
n
). The empirical distribution F
θ
N
, given by
F
θ
N
(x) =
1
N
N
i=1
I
[θ
∗
i
,∞)
(x),
is then an approximation of the true distribution of
´
θ(X
1
, . . . , X
n
) denoted F
θ
.
A conﬁdence interval is then constructed using A = q
(1−p)/2
(F
θ
N
) and B =
22
q
(1+p)/2
(F
θ
N
). This means that A = θ
∗
[N(1+p)/2]+1,N
and B = θ
∗
[N(1−p)/2]+1,N
,
where θ
∗
1,N
≥ ≥ θ
∗
N,N
is the ordered sample of θ
∗
1
, . . . , θ
∗
N
.
The nonparametric bootstrap to obtain conﬁdence intervals can be summa
rized in the following steps: Suppose we have observations x
1
, . . . , x
n
of the
iid random variables X
1
, . . . , X
n
with distribution F and we have an estimator
´
θ(x
1
, . . . , x
n
) of an unknown parameter θ.
• Resample N times among x
1
, . . . , x
n
to obtain new samples x
∗(i)
1
, . . . , x
∗(i)
n
,
i = 1, . . . , N.
• Compute the estimator for each of the new samples to get
θ
∗
i
=
´
θ(x
∗(i)
1
, . . . , x
∗(i)
n
), i = 1, . . . , N.
• Construct a conﬁdence interval I
p
with conﬁdence level p as
I
p
= (θ
∗
[N(1+p)/2]+1,N
, θ
∗
[N(1−p)/2]+1,N
),
where θ
∗
1,N
≥ ≥ θ
∗
N,N
is the ordered sample of θ
∗
1
, . . . , θ
∗
N
.
4.3 Historical simulation
In the historical simulation approach we suppose we have observations of risk
factors Z
m−n
, . . . , Z
m
and hence also the riskfactor changes X
m−n+1
, . . . , X
m
.
We denote these observations by x
m−n+1
, . . . , x
m
. Using the loss operator we
can compute the corresponding observations of losses l
k
= l
[m]
(x
m−k+1
), k =
1, . . . , n. Note that l
k
is the loss that we will experience if we have the risk
factor change x
m−k+1
over the next period. This gives us a sample from the
loss distribution. It is assumed that the losses during the diﬀerent time intervals
are iid. Then the empirical VaR and ES can be estimated using
¯
VaR
α
(L) = ´ q
α
(F
L
n) = l
[n(1−α)]+1,n
´
ES
α
(L) =
[n(1−α)]+1
i=1
l
i,n
[n(1 −α)] + 1
where l
1,n
≥ ≥ l
n,n
is the ordered sample.
Similarly we can also aggregate over several days. Say, for instance, that we
are interested in the ValueatRisk for the aggregate loss over ten days. Then
we simply use the historical observations given by
l
(10)
k
= l
[m]
_
_
10
j=1
x
m−n+10(k−1)+j
_
_
, k = 1, . . . , [n/10],
to compute the empirical VaR and ES.
Advantage: This approach is easy to implement and keeps the dependence struc
ture between the components of the vectors of riskfactor changes X
m−k
.
Disadvantage: The worst case is never worse than what has happened in history.
We need a very large sample of relevant historical data to get reliable estimates
of ValueatRisk and expected shortfall.
23
4.4 Variance–Covariance method
The basic idea of the variance–covariance method is to study the linearized loss
L
∆
m+1
= l
∆
[m]
(X
m+1
) = −(c +
d
i=1
w
i
X
m+1,i
) = −(c +w
T
X
m+1
)
where c = c
m
, w
i
= w
m,i
(i.e. known at time m), w = (w
1
, . . . , w
d
)
T
are weights
and X
m+1
= (X
m+1,1
, . . . , X
m+1,d
)
T
the riskfactor changes.
It is then assumed that X
m+1
∼ N
d
(µ, Σ), i.e. that the riskfactor changes
follow a multivariate (ddimensional) normal distribution. Using the properties
of the multivariate normal distribution we have
w
T
X
m+1
∼ N(w
T
µ, w
T
Σw).
Hence, the loss distribution is normal with mean −c−w
T
µ and variance w
T
Σw.
Suppose that we have n+1 historical observations of the riskfactors and the risk
factor changes X
m−n+1
, . . . , X
m
. Then (assuming that the riskfactor changes
are iid or at least weakly dependent) the mean vector µ and the covariance
matrix Σ can be estimated as usual by
´ µ
i
=
1
n
n
k=1
X
m−k+1,i
, i = 1, . . . , d,
´
Σ
ij
=
1
n −1
n
k=1
(X
m−k+1,i
− ´ µ
i
)(X
m−k+1,j
− ´ µ
j
), i, j = 1, . . . , d.
The estimated VaR is then given analytically by
¯
VaR
α
(L) = −c −w
T
´ µ +
_
w
T´
ΣwΦ
−1
(α).
Advantage: Analytic solutions can be obtained: no simulations required. Easy
to implement.
Disadvantage: Linearization not always appropriate. We need a short time
horizon to justify linearization (see e.g. Example 3.6). The normal distribution
may considerably underestimate the risk. We need proper justiﬁcation that the
normal distribution is appropriate before using this approach. In later chapters
we will introduce elliptical distributions, distributions that share many of the nice
properties of the multivariate normal distribution. The VarianceCovariance
method works well if we replace the assumption of multivariate normality with
the weaker assumption of ellipticality. This may provide a model that ﬁts data
better.
4.5 MonteCarlo methods
Suppose we have observed the riskfactors Z
m−n
, . . . , Z
m
and riskfactor changes
X
m−n+1
, . . . , X
m
. We suggest a parametric model for X
m+1
. For instance, that
X
m+1
has distribution function F and is independent of X
m−n+1
, . . . , X
m
.
24
When an appropriate model for X
m+1
is chosen and the parameters esti
mated we simulate a large number N of outcomes from this distribution to
get ¯ x
1
, . . . , ¯ x
N
. For each outcome we can compute the corresponding losses
l
1
, . . . , l
N
where l
k
= l
[m]
(¯ x
k
). The true loss distribution of L
m+1
is then ap
proximated by the empirical distribution F
L
N given by
F
L
N(x) =
1
N
N
k=1
I
[l
k
,∞)
(x).
ValueatRisk and expected shortfall can then be estimated by
¯
VaR
α
(L) = q
α
(F
L
N) = l
[N(1−α)]+1,N
´
ES
α
(L) =
[N(1−α)]+1
k=1
l
k,N
[N(1 −α)] + 1
Advantage: Very ﬂexible. Practically any model that you can simulate from is
possible to use. You can also use time series in the MonteCarlo method which
enables you to model time dependence between riskfactor changes.
Disadvantage: Computationally intensive. You need to run a large number of
simulations in order to get good estimates. This may take a long time (hours,
days) depending on the complexity of the model.
Example 4.2 Consider a portfolio consisting of one share of a stock with stock
price S
k
and assume that the log returns X
k+1
= lnS
k+1
− ln S
k
are iid with
distribution function F
θ
, where θ is an unknown parameter. The parameter θ
can be estimated from historical data using for instance maximum likelihood
and given the information about the stock price S
0
today, the ValueatRisk for
our portfolio loss over the time period todayuntiltomorrow is
VaR
α
(L
1
) = S
0
(1 −exp¦F
←
θ
(1 −α)¦),
i.e. the αquantile of the distribution of the loss L
1
= −S
0
(exp¦X
1
¦ −1). How
ever, the expected shortfall may be diﬃcult to compute explicitly if F
←
θ
has
a complicated expression. Instead of performing numerical integration we may
use the MonteCarlo approach to compute expected shortfall.
Example 4.3 Consider the situation in Example 4.2 with the exception that
the log return distribution is given by a GARCH(1, 1) model:
X
k+1
= σ
k+1
Z
k+1
, σ
2
k+1
= a
0
+a
1
X
2
k
+b
1
σ
2
k
,
where the Z
k
’s are independent and standard normally distributed and a
0
, a
1
and b
1
are parameters to be estimated. Because of the recursive structure it is
very easy to simulate from this model. However, analytic computation of the
tenday ValueatRisk or expected shortfall is very diﬃcult.
25
5 Extreme value theory for random variables
with heavy tails
Given historical loss data a risk manager typically wants to estimate the prob
ability of future large losses to assess the risk of holding a certain portfolio.
Extreme Value Theory (EVT) provides the tools for an optimal use of the loss
data to obtain accurate estimates of probabilities of large losses or, more gen
erally, of extreme events. Extreme events are particularly frightening because
although they are by deﬁnition rare, they may cause severe losses to a ﬁnan
cial institution or insurance company. Empirical estimates of probabilities of
losses larger than what has been observed so far are useless: such an event will
be assigned zero probability. Even if the loss lies just within the range of the
loss data set, empirical estimates have poor accuracy. However, under certain
conditions, EVT methods can extrapolate information from the loss data to ob
tain meaningful estimates of the probability of rare events such as large losses.
This also means that accurate estimates of ValueatRisk (VaR) and Expected
Shortfall (ES) can be obtained.
In this and the following two chapters we will present aspects of and estima
tors provided by EVT. In order to present the material and derive the expres
sions of the estimators without a lot of technical details we focus on EVT for
distributions with “heavy tails” (see below). Moreover, empirical investigations
often support the use of heavytailed distributions.
Empirical investigations have shown that daily and higherfrequency returns
from ﬁnancial assets typically have distributions with heavy tails. Although
there is no deﬁnition of the meaning of “heavy tails” it is common to consider
the right tail F(x) = 1 −F(x), x large, of the distribution function F heavy if
lim
x→∞
F(x)
e
−λx
= ∞ for every λ > 0,
i.e. if it is heavier than the right tail of every exponential distribution. It is
also not unusual to consider a random variable heavytailed if not all moments
are ﬁnite. We will now study the useful class of heavytailed distributions with
regularly varying tails.
5.1 Quantilequantile plots
In this section we will consider some useful practical methods to study the
extremal properties of a data set. To illustrate the methods we will consider
a dataset consisting of claims in million Danish Kroner from ﬁre insurance in
Denmark. We may observe that there are a few claims much larger the ’every
day’ claim. This suggests that the claims have a heavytailed distribution. To
get an indication of the heaviness of the tails it is useful to use socalled quantile
quantile plots (qqplots).
Suppose we have a sample X
1
, . . . , X
n
of iid random variables but we don’t
know the distribution of X
1
. One would typically suggest a reference distri
bution F and want to test whether it is reasonable to assume that the data is
26
0 500 1000 1500 2000
0
5
0
1
0
0
1
5
0
Figure 6: Claims from ﬁre insurance in Denmark in million Danish Kroner.
distributed according to this distribution. If we deﬁne the ordered sample as
X
n,n
≤ X
n−1,n
≤ ≤ X
1,n
, then the qqplot consists of the points
__
X
k,n
, F
←
_
n −k + 1
n + 1
__
: k = 1, . . . , n
_
.
If the data has a similar distribution as the reference distribution then the qq
plot is approximately linear. An important property is that the plot remains
approximately linear if the data has a distribution which is a linear transforma
tion of the reference distribution, i.e. from the associated locationscale family
F
µ,σ
(x) = F((x−µ)/σ). Thus, the qqplot enables us to come up with a suitable
class of distributions that ﬁts the data and then we may estimate the location
and scale parameters.
The qqplot is particularly useful for studying the tails of the distribution.
Given a reference distribution F, if F has heavier tails than the data then the
plot will curve down at the left and/or up at the right and the opposite if the
reference distribution has too light tails.
Exercise 5.1 Consider the distribution functions F and G given by F(x) =
1 −e
−x
(x > 0) and G(x) = 1 −x
−2
(x > 1). Plot
__
F
←
_
n −k + 1
n + 1
_
, G
←
_
n −k + 1
n + 1
__
: k = 1, . . . , n
_
and interpret the result.
27
0 50 100 150
0
2
4
6
0 50 100 150
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
0 50 100 150
0
5
0
1
0
0
1
5
0
0 50 100 150
0
1
0
2
0
3
0
4
0
Figure 7: Quantilequantile plots for the Danish ﬁre insurance claims. Upper
left: qqplot with standard exponential reference distribution. The plot curves
down which indicates that data has tails heavier than exponential. Upper right,
lower left, lower right: qqplot against a Pareto(α)distribution for α = 1, 1.5, 2.
The plots are approximately linear which indicates that the data may have a
Pareto(α)distribution with α ∈ (1, 2).
5.2 Regular variation
We start by introducing regularly varying functions.
Deﬁnition 5.1 A function h : (0, ∞) → (0, ∞) is regularly varying at ∞ with
index ρ ∈ R (written h ∈ RV
ρ
) if
lim
t→∞
h(tx)
h(t)
= x
ρ
for every x > 0. (5.1)
If ρ = 0, then we call h slowly varying (at ∞). Slowly varying functions are
generically denoted by L. If h ∈ RV
ρ
, then h(x)/x
ρ
∈ RV
0
. Hence, setting
L(x) = h(x)/x
ρ
we see that a function h ∈ RV
ρ
can always be represented as
h(x) = L(x)x
ρ
. If ρ < 0, then the convergence in (5.1) is uniform on intervals
28
0 10 20 30 40 50 60
0
1
0
2
0
3
0
4
0
0 5 10 15 20 25 30 35
0
1
0
2
0
3
0
4
0
0 200 400 600 800 1200
0
1
0
2
0
3
0
4
0
0 20 40 60 80 100
0
1
0
2
0
3
0
4
0
0 1 2 3 4 5 6
0
2
4
6
0 1 2 3 4 5 6
0
2
4
6
0 1 2 3 4 5 6
0
2
4
6
0 2 4 6 8
0
2
4
6
Figure 8: Quantilequantile plots for 1949 simulated data points from a distribu
tion F with F as reference distribution. The upper four plots: F = Pareto(2).
The lower four plots: F = Exp(1).
29
[b, ∞) for b > 0, i.e.
lim
t→∞
sup
x∈[b,∞)
¸
¸
¸
¸
h(tx)
h(t)
−x
ρ
¸
¸
¸
¸
= 0.
The most natural examples of slowly varying functions are positive constants
and functions converging to positive constants. Other examples are logarithms
and iterated logarithms. The following functions L are slowly varying.
(a) lim
x→∞
L(x) = c ∈ (0, ∞).
(b) L(x) = ln(1 +x).
(c) L(x) = ln(1 + ln(1 +x)).
(d) L(x) = ln(e +x) + sin x.
Note however that a slowly varying function can have inﬁnite oscillation in the
sense that liminf
x→∞
L(x) = 0 and limsup
x→∞
L(x) = ∞.
Example 5.1 (1) Let F(x) = 1−x
−α
, for x ≥ 1 and α > 0. Then F(tx)/F(t) =
x
−α
for t > 0. Hence F ∈ RV
−α
.
(2) Let
Deﬁnition 5.2 A nonnegative random variable is said to be regularly varying
if its distribution function F satisﬁes F ∈ RV
−α
for some α ≥ 0.
Remark 5.1 If X is a nonnegative random variable with distribution function
F satisfying F ∈ RV
−α
for some α > 0, then
E(X
β
) < ∞ if β < α,
E(X
β
) = ∞ if β > α.
Although the converse does not hold in general, it is useful to think of regularly
varying random variables as those random variables for which the βmoment
does not exist for β larger than some α > 0.
Example 5.2 Consider two risks X
1
and X
2
which are assumed to be nonneg
ative and iid with common distribution function F. Assume further that F has
a regularly varying right tail, i.e. F ∈ RV
−α
. An investor has bought two shares
of the ﬁrst risky asset and the probability of a portfolio loss greater than l is
thus given by P(2X
1
> l). Can the loss probability be made smaller by changing
to the well diversiﬁed portfolio with one share of each risky asset? To answer
this question we study the following ratio of loss probabilities:
P(X
1
+X
2
> l)
P(2X
1
> l)
30
for large l. We have, for ε ∈ (0, 1/2),
P(X
1
+X
2
> l)
= 2 P(X
1
+X
2
> l, X
1
≤ εl) + P(X
1
+X
2
> l, X
1
> εl, X
2
> εl)
≤ 2 P(X
2
> (1 −ε)l) + P(X
1
> εl)
2
.
and
P(X
1
+X
2
> l) ≥ P(X
1
> l or X
2
> l) = 2 P(X
1
> l) −P(X
1
> l)
2
.
Hence,
2 P(X
1
> l) −P(X
1
> l)
2
P(2X
1
> l)
. ¸¸ .
g(α,,l)
≤
P(X
1
+X
2
> l)
P(2X
1
> l)
≤
2 P(X
2
> (1 −ε)l) + P(X
1
> εl)
2
P(2X
1
> l)
. ¸¸ .
h(α,,l)
.
We have
lim
l→∞
g(α, , l) = 2 lim
l→∞
P(X
1
> l)
P(X
1
> l/2)
= 2
1−α
and similarly lim
l→∞
h(α, , l) = 2
1−α
(1 − ε)
−α
. Since ε > 0 can be chosen
arbitrary small we conclude that
lim
l→∞
P(X
1
+X
2
> l)
P(2X
1
> l)
= 2
1−α
.
This means that for α < 1 (very heavy tails) diversiﬁcation does not give us a
portfolio with smaller probability of large losses. However, for α > 1 (the risks
have ﬁnite means) diversiﬁcation reduces the probability of large losses.
Example 5.3 Let X
1
and X
2
be as in the previous example, and let α ∈ (0, 1).
We saw that for l suﬃciently large we have P(X
1
+ X
2
> l) > P(2X
1
> l).
Hence, for p ∈ (0, 1) suﬃciently large
VaR
p
(X
1
) + VaR
p
(X
2
) = 2 VaR
p
(X
1
) = VaR
p
(2X
1
)
= inf¦l ∈ R : P(2X
1
> l) ≥ p¦
≤ inf¦l ∈ R : P(X
1
+X
2
> l) ≥ p¦
= VaR
p
(X
1
+X
2
).
Example 5.4 Let X and Y be positive random variables representing losses
in two lines of business (losses due to ﬁre and car accidents) of an insurance
31
company. Suppose that X has distribution function F which satisﬁes F ∈ RV
−α
for α > 0. Moreover, suppose that E(Y
k
) < ∞ for every k > 0, i.e. Y has ﬁnite
moments of all orders.
The insurance company wants to compute lim
x→∞
P(X > x [ X + Y > x)
to know the probability of a large loss in the ﬁre insurance line given a large
total loss.
We have, for every ε ∈ (0, 1) and x > 0,
P(X +Y > x) = P(X +Y > x, X > (1 −ε)x) + P(X +Y > x, X ≤ (1 −ε)x)
≤ P(X +Y > x, X > (1 −ε)x) + P(X +Y > x, Y > εx)
≤ P(X > (1 −ε)x) + P(Y > εx).
Hence,
1 ≤
P(X +Y > x)
P(X > x)
≤
P(X > (1 −ε)x)
P(X > x)
+
P(Y > εx)
P(X > x)
≤
P(X > (1 −ε)x)
P(X > x)
+
E(Y
2α
)
(εx)
2α
P(X > x)
→(1 −ε)
−α
+ 0
as x → ∞. At the second to last step above, Markov’s inequality was used.
Since this is true for every ε ∈ (0, 1), choosing ε arbitrarily small gives
lim
x→∞
P(X +Y > x)
P(X > x)
= 1.
Hence,
lim
x→∞
P(X > x [ X +Y > x) = lim
x→∞
P(X > x, X +Y > x)
P(X +Y > x)
= lim
x→∞
P(X > x)
P(X +Y > x)
= 1.
We have found that if the insurance company suﬀers a large loss, it is likely that
this is due to a large loss in the ﬁre insurance line only.
32
6 Hill estimation
Suppose we have an iid sample of positive random variables X
1
, . . . , X
n
from
an unknown distribution function F with a regularly varying right tail. That
is, F(x) = x
−α
L(x) for some α > 0 and a slowly varying function L. In this
section we will describe Hill’s method for estimating α.
We will use a result known as the Karamata Theorem which says that for
β < −1
_
∞
u
x
β
L(x)dx ∼ −(β + 1)
−1
u
β+1
L(u) as u →∞,
where ∼ means that the ratio of the left and right sides tends to 1. Using
integration by parts we ﬁnd that
1
F(u)
_
∞
u
(lnx −ln u)dF(x)
=
1
F(u)
_
_
−(lnx −lnu)F(x)
_
∞
u
+
_
∞
u
F(x)
x
dx
_
=
1
u
−α
L(u)
_
∞
u
x
−α−1
L(x)dx.
Hence, by the Karamata Theorem,
1
F(u)
_
∞
u
(ln x −lnu)dF(x) →
1
α
as u →∞. (6.1)
To turn this into an estimator we replace F by the empirical distribution func
tion
F
n
(x) =
1
n
n
k=1
I
[X
k
,∞)
(x)
and replace u by a high data dependent level X
k,n
. Then
1
F
n
(X
k,n
)
_
∞
X
k,n
(lnx −lnX
k,n
)dF
n
(x) =
1
k −1
k−1
j=1
(ln X
j,n
−lnX
k,n
).
If k = k(n) →∞ and k/n →0 as n →∞, then X
k,n
→∞ a.s. as n →∞ and
by (6.1)
1
k −1
k−1
j=1
(ln X
j,n
−ln X
k,n
)
P
→
1
α
as n →∞.
The same result holds if we replace k −1 by k. This gives us the Hill estimator
´ α
(H)
k,n
=
_
1
k
k
j=1
(lnX
j,n
−ln X
k,n
)
_
−1
.
33
6.1 Selecting the number of upper order statistics
We have seen that if k = k(n) →∞ and k/n →0 as n →∞, then
´ α
(H)
k,n
=
_
1
k
k
j=1
(ln X
j,n
−ln X
k,n
)
_
−1
P
→α as n →∞.
In practice however we have a sample of ﬁxed size n and we need to ﬁnd a
suitable k such that ´ α
(H)
k,n
is a good estimate of α. The plot of the pairs
_
(k, ´ α
(H)
k,n
) : k = 2, . . . , n
_
is called the Hill plot. An estimator of α is obtained by graphical inspection
of the Hill plot and an estimate of α should be taken for values of k where the
plot is stable. On the one hand, k should not be chosen too small since the
small number of data points would lead to a high variance for the estimator.
On the other hand, k should not be chosen too large so the estimate is based on
sample points from the center of the distribution (this introduces a bias). This
is illustrated graphically in Figure 9. We now construct an estimator for the
0 50 100 150 200 250 300 350 400 450 500
1
1.5
2
2.5
3
3.5
4
4.5
5
Figure 9: Hill plot of the Danish ﬁre insurance data. The plot looks stable for
all k (typically this is NOT the case for heavytailed data). We may estimate
α by ´ α
(H)
k,n
= 1.9 (k = 50) or ´ α
(H)
k,n
= 1.5 (k = 250).
tail probability F(x), for x large, based on the sample points X
1
, . . . , X
n
and
the the Hill estimate ´ α
(H)
k,n
. Notice that
F(x) = F
_
x
X
k,n
X
k,n
_
≈
_
x
X
k,n
_
−α
F(X
k,n
)
≈
_
x
X
k,n
_
−α
(H)
k,n
F
n
(X
k,n
) ≈
k
n
_
x
X
k,n
_
−b α
(H)
k,n
.
34
This argument can be made more rigorously. Hence, the following estimator
seems reasonable:
F(x) =
k
n
_
x
X
k,n
_
−b α
(H)
k,n
.
This leads to an estimator of the quantile q
p
(F) = F
←
(p).
q
p
(F) = inf
_
x ∈ R :
F(x) ≤ 1 −p
_
= inf
_
x ∈ R :
k
n
_
x
X
k,n
_
−b α
(H)
k,n
≤ 1 −p
_
=
_
n
k
(1 −p)
_
−1/b α
(H)
k,n
X
k,n
.
More information about Hill estimation can be found in [8] and [12].
35
7 The Peaks Over Threshold (POT) method
Suppose we have an iid sample of random variables X
1
, . . . , X
n
from an unknown
distribution function F with a regularly varying right tail. It turns out that the
distribution of appropriately scaled excesses X
k
− u over a high threshold u
is typically well approximated by a distribution called the generalized Pareto
distribution. This fact can be used to construct estimates of tail probabilities
and quantiles.
For γ > 0 and β > 0, the generalized Pareto distribution (GPD) function
G
γ,β
is given by
G
γ,β
(x) = 1 −(1 +γx/β)
−1/γ
for x ≥ 0.
Suppose that X is a random variable with distribution function F that has a
regularly varying right tail so that lim
u→∞
F(λu)F(u) = λ
−α
for all λ > 0 and
some α > 0. Then
lim
u→∞
P
_
X −u
u/α
> x [ X > u
_
= lim
u→∞
P(X > u(1 +x/α))
P(X > u)
= (1 +x/α)
−α
= G
1/α,1
(x).
The excess distribution function of X over the threshold u is given by
F
u
(x) = P(X −u ≤ x [ X > u) for x ≥ 0.
Notice that
F
u
(x) =
F(u +x)
F(u)
=
F(u(1 +x/u))
F(u)
. (7.1)
Since F is regularly varying with index −α < 0 it holds that F(λu)/F(u) →λ
−α
uniformly in λ ≥ 1 as u →∞, i.e.
lim
u→∞
sup
λ≥1
[F(λu)/F(u) −λ
−α
[ = 0.
Hence, from expression (7.1) we see that
lim
u→∞
sup
x>0
[F
u
(x) −G
γ,β(u)
[ = 0, (7.2)
where γ = 1/α and β(u) ∼ u/α as u →∞.
We now demonstrate how these ﬁndings lead to natural tail and quantile
estimators based on the sample points X
1
, . . . , X
n
. Choose a high threshold u
and let
N
u
= #¦i ∈ ¦1, . . . , n¦ : X
i
> u¦
be the number of exceedances of u by X
1
, . . . , X
n
. Recall from (7.1) that
F(u +x) = F(u)F
u
(x). (7.3)
36
If u is not too far out into the tail, then the empirical approximation F(u) ≈
F
n
(u) = N
u
/n is accurate. Moreover, (7.2) shows that the approximation
F
u
(x) ≈ G
γ,β(u)
(x) ≈ G
b γ,
b
β
(x)
=
_
1 +´ γ
x
´
β
_
−1/b γ
,
where ´ γ and
´
β are the estimated parameters, makes sense. Relation (7.3) then
suggests a method for estimating the tail of F by estimating F
u
(x) and F(u)
separately. Hence, a natural estimator for F(u +x) is
F(u +x) =
N
u
n
_
1 + ´ γ
x
´
β
_
−1/b γ
. (7.4)
Expression (7.4) immediately leads to the following estimator of the quantile
q
p
(F) = F
←
(p).
q
p
(F) = inf¦x ∈ R :
F(x) ≤ 1 −p¦
= inf¦u +x ∈ R :
F(u +x) ≤ 1 −p¦
= u + inf
_
x ∈ R :
N
u
n
_
1 +´ γ
x
´
β
_
−1/b γ
≤ 1 −p
_
= u +
´
β
´ γ
_
_
n
N
u
(1 −p)
_
−b γ
−1
_
. (7.5)
The POT method for estimating tail probabilities and quantiles can be summa
rized in the following recipe. Each step will be discussed further below.
(i) Choose a high threshold u using some statistical method and count the
number of exceedances N
u
.
(ii) Given the sample Y
1
, . . . , Y
N
u
of excesses, estimate the parameters γ and
β.
(ii) Combine steps (i) and (ii) to get estimates of the form (7.4) and (7.5).
The rest of this section will be devoted to step (i) and (ii): How do we choose a
high threshold u in a suitable way? and How can one estimate the parameters
γ and β?
7.1 How to choose a high threshold.
The choice of a suitable high threshold u is crucial but diﬃcult. If we choose
u too large then we will have few observations to use for parameter estimation
resulting in poor estimates with large variance. If the threshold is too low then
we have more data but on the other hand the approximation F
u
(x) ≈ G
γ,β(u)
(x)
37
will be questionable. The main idea when choosing the threshold u is to look at
the meanexcess plot (see below) and choose u such that the sample meanexcess
function is approximately linear above u.
In the example with the Danish ﬁre insurance claims it seems reasonable to
choose u between 4.5 and 10. With u = 4.5 we get N
4.5
= 101 exceedances
which is about 5% of the data. With u = 10 we get N
10
= 24 exceedances
which is about 1.2% of the data. Given the shape of the meanexcess plot we
have selected the threshold u = 6 since the shape of the meanexcess plot does
not change much with u ∈ [4.5, 10] we get a reasonable amount of data.
7.2 Meanexcess plot
If E(X) < ∞, then we can deﬁne the mean excess function as
e(u) = E(X −u [ X > u) =
E((X −u)I
(u,∞)
(X))
E(I
(u,∞)
(X))
.
For the GPD G
γ,β
, with γ < 1, integration by parts gives
E((X −u)I
(u,∞)
(X)) =
_
−(x −u)(1 +γx/β)
−1/γ
_
∞
u
+
_
∞
u
(1 +γx/β)
−1/γ
dx
=
_
∞
u
(1 +γx/β)
−1/γ
dx
=
β
1 −γ
(1 +γu/β)
−1/γ+1
.
Moreover, E(I
(u,∞)
(X)) = P(X > u) = (1 +γu/β)
−1/γ
. Hence,
e(u) =
β +γu
1 −γ
.
In particular, the mean excess function is linear for the GPD. A graphical test
for assessing the tail behavior may be performed by studying the sample mean
excess function based on the sample X
1
, . . . , X
n
. With N
u
being the number of
exceedances of u by X
1
, . . . , X
n
, as above, the sample meanexcess function is
given by
e
n
(u) =
1
N
u
n
k=1
(X
k
−u)I
(u,∞)
(X
k
).
The meanexcess plot is the plot of the points
¦(X
k,n
, e
n
(X
k,n
)) : k = 2, . . . , n¦.
If the meanexcess plot is approximately linear with positive slope then X
1
may
be assumed to have a heavytailed Paretolike tail.
38
0 0.5 1 1.5 2 2.5 3 3.5 4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure 10: Meanexcess plot of data simulated from a Gaussian distribution.
0 1 2 3 4 5 6 7 8 9
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
Figure 11: Meanexcess plot of data simulated from an exponential distribution.
0 500 1000 1500 2000 2500 3000 3500
0
500
1000
1500
2000
2500
Figure 12: Meanexcess plot of data simulated from a Pareto(1) distribution.
7.3 Parameter estimation
Given the threshold u we may estimate the parameters γ and β in the GPD
based on the observations of excesses Y
1
, . . . , Y
N
u
over u. We assume that
39
the excesses have distribution function G
γ,β
and hence the likelihood function
becomes
L(γ, β; Y
1
, . . . , Y
N
u
) =
N
u
i=1
g
γ,β
(Y
i
), g
γ,β
(y) =
1
β
_
1 +γ
y
β
_
−1/γ−1
.
Instead of maximizing the likelihood function we can maximize the loglikelihood
function given by
ln L(γ, β; Y
1
, . . . , Y
N
u
) = −N
u
ln β −
_
1
γ
+ 1
_
N
u
i=1
ln
_
1 +
γ
β
Y
i
_
.
Maximizing the loglikelihood numerically gives estimates ´ γ and
´
β. The MLE
is approximately normal (for large N
u
)
_
´ γ −γ,
´
β
β
−1
_
≈ N
2
(0, Σ
−1
/N
u
), Σ
−1
= (1 +γ)
_
1 +γ −1
−1 2
_
Using the threshold u = 6 (gives N
u
= 56) we obtain the following estimates
for the Danish ﬁre insurance data:
´ γ = 0.58,
´
β = 3.60.
Having estimated the parameters we can now visually observe how the approx
imation (7.4) works, see Figure 14.
7.4 Estimation of ValueatRisk and Expected shortfall
Recall that ValueatRisk at conﬁdence level p for a risk X with distribution
function F is, by deﬁnition, the quantile q
p
(F). Hence, the POT method gives
the ValueatRisk estimator
¯
VaR
p,POT
= u +
´
β
´ γ
_
_
n
N
u
(1 −p)
_
−b γ
−1
_
.
Similarly, the POT method leads to an estimator for Expected shortfall. Recall
that
ES
p
(X) = E(X [ X > VaR
p
(X)) =
E(XI
(q
p
,∞)
(X))
E(I
(q
p
,∞)
(X))
,
where q
p
= VaR
p
(X)) and we have E(I
(q
p
,∞)
(X)) = F(q
p
). For a nonnegative
random variable Z with distribution function F, integration by parts show that
E(Z) =
_
∞
0
zdF(z) =
_
∞
0
zd(1 −F)(z) = −
_
∞
0
zdF(z)
=
_
−zF(z)
_
∞
0
+
_
∞
0
F(z)dz =
_
∞
0
P(Z > z)dz.
40
0 20 40 60 80
0
2
0
4
0
6
0
8
0
Figure 13: Meanexcess plot of the Danish ﬁre insurance data. The plot looks
approximately linear indicating Paretolike tails.
10 15 20 25
0
.
0
0
5
0
.
0
1
0
0
.
0
1
5
0
.
0
2
0
0
.
0
2
5
0
.
0
3
0
10 15 20 25
0
.
0
0
5
0
.
0
1
0
0
.
0
1
5
0
.
0
2
0
0
.
0
2
5
0
.
0
3
0
Figure 14: The empirical tail of the Danish data and the POT approximation.
If p is suﬃciently large so that q
p
> 0, then this can be applied to the nonnegative
random variable XI
(q
p
,∞)
(X). This gives
E(XI
(q
p
,∞)
(X)) = q
p
F(q
p
) +
_
∞
q
p
F(t)dt
= q
p
F(q
p
) +
_
∞
q
p
F(u)F
u
(t −u)dt.
41
Hence,
ES
p
(X) = q
p
+
F(u)
F(q
p
)
_
∞
q
p
F
u
(t −u)dt = q
p
+
_
∞
q
p
F
u
(t −u)dt
F
u
(q
p
−u)
.
We may now use the estimator
´
F
u
(t − u) = G
b γ,
b
β
(t − u) to obtain, with ´ q
p
=
¯
VaR
p,POT
,
´
ES
p,POT
= ´ q
p
+
_
∞
b q
p
(1 + ´ γ(t −u)/
´
β)
−1/b γ
dt
(1 + ´ γ( ´ q
p
−u)/
´
β)
−1/b γ
= ´ q
p
+
´
β +´ γ( ´ q
p
−u)
1 − ´ γ
.
More information about the POT method can be found in [8] and [12].
42
8 Multivariate distributions and dependence
We will now introduce some techniques and concepts for modeling multivariate
random vectors. We aim at constructing useful models for the vector of risk
factor changes X
n
. At this stage we will assume that all vectors of risk factor
changes (X
n
)
n∈Z
are iid but we allow for dependence between the components
of X
n
. That is, typically we assume that X
n,i
and X
n,j
are dependent whereas
X
n,i
and X
n+k,j
, are independent (for k ,= 0).
8.1 Basic properties of random vectors
The probability distribution of ddimensional random vector X = (X
1
, . . . , X
d
)
is completely determined by its joint distribution function F
F(x) = F(x
1
, . . . , x
d
) = P(X
1
≤ x
1
, . . . , X
d
≤ x
d
) = P(X ≤ x).
The ith marginal distribution F
i
of F is the distribution of X
i
and is given by
F
i
(x
i
) = P(X
i
≤ x
i
) = F(∞, . . . , ∞, x
i
, ∞, . . . , ∞).
The distribution F is said to be absolutely continuous if there is a function
f ≥ 0, integrating to one, such that
F(x
1
, . . . , x
d
) =
_
x
1
−∞
. . .
_
x
d
−∞
f(u
1
, . . . , u
d
)du
1
. . . du
d
and then f is called the density of F. The components of X are independent if
and only if
F(x) =
d
i=1
F
i
(x
i
)
or equivalently if and only if the joint density f (if the density exists) satisﬁes
f(x) =
d
i=1
f
i
(x
i
).
Recall that the distribution of a random vector X is completely determined by
its characteristic function given by
φ
X
(t) = E(exp¦i t
T
X¦), t ∈ R
d
.
Example 8.1 The multivariate normal distribution with mean µ and covari
ance matrix Σ has the density (with [Σ[ being the absolute value of the deter
minant of Σ)
f(x) =
1
_
(2π)
d
[Σ[
exp
_
−
1
2
(x −µ)
T
Σ
−1
(x −µ)
_
, x ∈ R
d
.
Its characteristic function is given by
φ
X
(t) = exp
_
i t
T
µ −
1
2
t
T
Σt
_
, t ∈ R
d
.
43
8.2 Joint log return distributions
Most classical multivariate models in ﬁnancial mathematics assume that the
joint distribution of log returns is a multivariate normal distribution. How
ever, the main reason for this assumption is mathematical convenience. This
assumption is not well supported by empirical ﬁndings. To illustrate this fact
we study daily log returns of exchange rates. Pairwise log returns of the Swiss
franc (chf), German mark (dem), British pound (gbp) and Japanese yen (jpy)
quoted against the US dollar are illustrated in Figure 15. Now we ask whether
a multivariate normal model would be suitable for the log return data. We
estimate the mean and the covariance matrix of each data set (pairwise) and
simulate from a bivariate normal distribution with this mean and covariance
matrix, see Figure 16. By comparing the two ﬁgures we see that although the
simulated data resembles the true observations, the simulated data have too
few points far away from the mean. That is, the tails of the log return data are
heavier than the simulated multivariate normal data.
Another example shows that not only does the multivariate normal distri
bution have too light tails but also the dependence structure in the normal
distribution may be inappropriate when modeling the joint distribution of log
returns. Consider for instance the data set consisting of log returns from BMW
and Siemens stocks, Figure 17. Notice the strong dependence between large
drops in the BMW and Siemens stock prices. The dependence of large drops
seems stronger than for ordinary returns. This is something that cannot be
modeled by a multivariate normal distribution. To ﬁnd a good model for the
BMW and Siemens data we need to be able to handle more advanced depen
dence structures than that oﬀered by the multivariate normal distribution.
8.3 Comonotonicity and countermonotonicity
Let (X
1
, X
2
) be a bivariate random vector and suppose there exist two monotone
functions α, β : R →R and a random variable Z such that
(X
1
, X
2
)
d
= (α(Z), β(Z)).
If both α and β are increasing, then X
1
and X
2
are said to be comonotonic. If
the distribution functions F
1
and F
2
are continuous, then X
2
= T(X
1
) a.s. with
T = F
←
2
◦ F
1
.
If α is increasing and β is decreasing, then X
1
and X
2
are said to be coun
termonotonic. If the distribution functions F
1
and F
2
are continuous, then
X
2
= T(X
1
) a.s. with T = F
←
2
◦ (1 −F
1
).
8.4 Covariance and linear correlation
Let X = (X
1
, . . . , X
d
)
T
be a random (column) vector with E(X
2
k
) < ∞ for
every k. The mean vector of X is µ = E(X) and the covariance matrix of X
is Cov(X) = E[(X−µ)(X−µ)
T
]. Here Cov(X) is a d d matrix whose (i, j)
entry (ith row, jth column) is Cov(X
i
, X
j
). Notice that Cov(X
i
, X
i
) = var(X
i
),
i.e. the variance of X
i
.
44
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
g
b
p
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
chf
d
e
m
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
c
h
f
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
gbp
c
h
f
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
d
e
m
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
gbp
d
e
m
Figure 15: log returns of foreign exchange rates quotes against the US dollar.
Covariance is easily manipulated under linear (aﬃne) transformations. If B
is a constant k d matrix and b is a constant kvector (column vector), then
Y = BX+b has mean E(Y) = Bµ +b and covariance matrix
Cov(Y) = E[(Y−E(Y))(Y−E(Y))
T
] = E[B(X−µ)(X−µ)
T
B
T
]
= B[E(X−µ)(X−µ)
T
]B
T
= BCov(X)B
T
.
45
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
g
b
p
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
chf
d
e
m
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
c
h
f
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
gbp
c
h
f
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
jpy
d
e
m
−0.04 −0.02 0.00 0.02 0.04
−
0
.
0
4
−
0
.
0
2
0
.
0
0
0
.
0
2
0
.
0
4
gbp
d
e
m
Figure 16: Simulated log returns of foreign exchange rates using bivariate normal
distribution with estimated mean and covariance matrix.
If var(X
1
), var(X
2
) ∈ (0, ∞), then the linear correlation coeﬃcient
L
(X
1
, X
2
)
is
L
(X
1
, X
2
) =
Cov(X
1
, X
2
)
_
var(X
1
) var(X
2
)
.
If X
1
and X
2
are independent, then
L
(X
1
, X
2
) = 0 but the converse is false;
46
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• •
•
•
• • •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• • •
•
•
• • •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
• • •
•
•
•
•
•
•
•
•
• •
••
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
• •
••
• •
•
• •
•
•
•
•
•
••
•
• • •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
• •
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• • •
••
•
•
•
•
•
• •
•
•
• •
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
• • •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
• •
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•• •
•
•
• •
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
• •
•
•
•
• • •
•
• •
••
• • •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
• •
•
•
• •
•
•
•
• • •
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• • •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
• •
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
• •
••
•
•
•
•
•
•
• • •
•
•
•
•
•
•
• • •
• •
•
•
• •
•
•
•
•
•
•
•
•
•
• • •
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• • •
• •
• • •
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
• •
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
• •
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
• •
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
• •
•
• •
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
• • •
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• ••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
•
•
•
•
•
• ••
•
• •
•
•
•
• • • •
•
•
• •
• •
• •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•• •
•
•
• •
•
•
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
•
••
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
• •
• •
•
•
•
•• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• • •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
• • •
•
•
•
•
•
•
•
• •
•
•
•
• ••
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
• •
••
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
BMW
S
i
e
m
e
n
s
0.15 0.10 0.05 0.0 0.05 0.10

0
.
1
0

0
.
0
5
0
.
0
0
.
0
5
Figure 17: log returns from BMW and Siemens stocks.
L
(X
1
, X
2
) = 0 need not imply that X
1
and X
2
are independent.
Example 8.2 If X
1
∼ N(0, 1) and X
2
= X
2
1
, then
Cov(X
1
, X
2
) = E[(X
1
−0)(X
2
−1)] = E(X
1
X
2
) −0 1
= E(X
3
1
) =
_
∞
−∞
x
3
1
√
2π
e
−x
2
/2
dx = 0.
Hence,
L
(X
1
, X
2
) = 0 but clearly X
1
and X
2
are strongly dependent.
Moreover [
L
(X
1
, X
2
)[ = 1 if and only if X
1
and X
2
are perfectly linear depen
dent. That is, if and only if there exist a ∈ R and b ,= 0 such that X
2
= a+bX
1
.
The linear correlation coeﬃcient is invariant under strictly increasing linear
transformations. In particular, for a
1
, a
2
∈ R and b
1
, b
2
,= 0 we have
L
(a
1
+b
1
X
1
, a
2
+b
2
X
2
) = sign(b
1
b
2
)
L
(X
1
, X
2
),
where sign(b
1
b
2
) = b
1
b
2
/[b
1
b
2
[ if b
1
b
2
,= 0 and 0 otherwise. However, linear
correlation is not invariant under nonlinear strictly increasing transformations
T : R →R. That is, for two random variables we have in general
L
(T(X
1
), T(X
2
)) ,=
L
(X
1
, X
2
).
This is a weakness of linear correlation as a measure of dependence. If we trans
form X
1
and X
2
by a strictly increasing transformation we have only rescaled
the marginal distributions, we have not changed the dependence between X
1
and X
2
. However, the linear correlation between T(X
1
) and T(X
2
) may have
changed (falsely) indicating that the dependence has changed.
47
Proposition 8.1 Let (X
1
, X
2
) be a random vector with marginal distribution
functions F
1
and F
2
, and an unspeciﬁed dependence structure. Assume further
that var(X
1
), var(X
2
) ∈ (0, ∞). Then
(1) The set of possible linear correlations is a closed interval [
L,min
,
L,max
]
with 0 ∈ [
L,min
,
L,max
].
(2) The minimum linear correlation
L,min
is obtained if and only if X
1
and
X
2
are countermonotonic; the maximum
L,max
if and only if X
1
and X
2
are comonotonic.
The following example illustrates Proposition 8.1.
Example 8.3 Let X
1
∼ Lognormal(0, 1) and X
2
∼ Lognormal(0, σ
2
) with σ >
0. Let Z ∼ N(0, 1) and note that
X
1
d
= e
Z
,
X
2
d
= e
σZ d
= e
−σZ
.
Note that e
Z
and e
σZ
are comonotonic and that e
Z
and e
−σZ
are countermono
tonic. Hence, by Proposition 8.1,
L,min
=
L
(e
Z
, e
−σZ
) =
e
−σ
−1
_
(e −1)(e
σ
2
−1)
,
L,max
=
L
(e
Z
, e
σZ
) =
e
σ
−1
_
(e −1)(e
σ
2
−1)
.
In particular,
L,min
¸ 0 and
L,max
¸ 0 as σ ¸ ∞. See Figure 18 for a
graphical illustration of these bounds as functions of σ.
8.5 Rank correlation
Rank correlations are measures of concordance for bivariate random vectors.
Given two points in R
2
, (x
1
, x
2
) and (¯ x
1
, ¯ x
2
), we say that the two points are
concordant if (x
1
− ¯ x
1
)(x
2
− ¯ x
2
) > 0 and discordant if (x
1
− ¯ x
1
)(x
2
− ¯ x
2
) < 0.
Hence, concordance (discordance) means that the line connecting the two points
have a positive (negative) slope. Now consider two independent random vectors
(X
1
, X
2
) and (
¯
X
1
,
¯
X
2
) with the same bivariate distribution. The Kendall’s tau
rank correlation is given by
τ
(X
1
, X
2
) = P
_
(X
1
−
¯
X
1
)(X
2
−
¯
X
2
) > 0
_
−P
_
(X
1
−
¯
X
1
)(X
2
−
¯
X
2
) < 0
_
.
If X
2
tend to increase with X
1
we expect the probability of concordance to be
high relative to the probability of discordance, giving a high value of
τ
(X
1
, X
2
).
Another measure of concordance/discordance is Spearman’s rho rank corre
lation where one introduces a third independent copy of (X
1
, X
2
) denoted by
48
Sigma
C
o
r
r
e
l
a
t
i
o
n
v
a
l
u
e
s
0 1 2 3 4 5

0
.
5
0
.
0
0
.
5
1
.
0
Figure 18: Bounds for the linear correlation coeﬃcient.
(
´
X
1
,
´
X
2
) and consider the concordance/discordance of the pairs (X
1
, X
2
) and
(
¯
X
1
,
´
X
2
). Spearman’s rho is deﬁned by
S
(X
1
, X
2
)=3
_
P
_
(X
1
−
¯
X
1
)(X
2
−
´
X
2
) > 0
_
−P
_
(X
1
−
¯
X
1
)(X
2
−
´
X
2
) < 0
__
.
Kendall’s tau and Spearman’s rho have many properties in common listed below.
They also illustrate important diﬀerences between Kendall’s tau and Spearman’s
rho on the one hand and the linear correlation coeﬃcient on the other hand.
•
τ
(X
1
, X
2
) ∈ [−1, 1] and
S
(X
1
, X
2
) ∈ [−1, 1]. All values can be obtained
regardless of the marginal distribution functions, if they are continuous.
• If the marginal distribution functions are continuous, then
τ
(X
1
, X
2
) =
S
(X
1
, X
2
) = 1 if and only if X
1
and X
2
are comonotonic;
τ
(X
1
, X
2
) =
S
(X
1
, X
2
) = −1 if and only if X
1
and X
2
are countermonotonic.
• If X
1
and X
2
are independent then
τ
(X
1
, X
2
) = 0 and
S
(X
1
, X
2
) = 0,
but the converse is not true in general.
• If T
1
, T
2
are strictly increasing then
τ
(T
1
(X
1
), T
2
(X
2
)) =
τ
(X
1
, X
2
) and
S
(T
1
(X
1
), T
2
(X
2
)) =
S
(X
1
, X
2
).
Estimation of Kendall’s tau based on iid bivariate random vectors X
1
, . . . , X
n
49
is easy. Note that
τ
(X
k,1
, X
k,2
) = P((X
k,1
−X
l,1
)(X
k,2
−X
l,2
) > 0)
−P((X
k,1
−X
l,1
)(X
k,2
−X
l,2
) < 0)
= E(sign((X
k,1
−X
l,1
)(X
k,2
−X
l,2
))).
Hence, we obtain the following estimator of Kendall’s tau
τ
=
τ
(X
k,1
, X
k,2
)
´
τ
=
_
n
2
_
−1
1≤k<l≤n
sign((X
k,1
−X
l,1
)(X
k,2
−X
l,2
))
=
_
n
2
_
−1 n−1
k=1
n
l=k+1
sign((X
k,1
−X
l,1
)(X
k,2
−X
l,2
)).
We will return to the rank correlations when we discuss copulas in Section 10.
8.6 Tail dependence
Motivated by the scatter plot of joint BMW and Siemens log returns above we
introduce a notion of dependence of extreme values, called tail dependence.
Let (X
1
, X
2
) be a random vector with marginal distribution functions F
1
and F
2
. The coeﬃcient of upper tail dependence of (X
1
, X
2
) is deﬁned as
λ
U
(X
1
, X
2
) = lim
u,1
P(X
2
> F
←
2
(u) [ X
1
> F
←
1
(u)),
provided that the limit λ
U
∈ [0, 1] exists. The coeﬃcient of lower tail dependence
is deﬁned as
λ
L
(X
1
, X
2
) = lim
u¸0
P(X
2
≤ F
←
2
(u) [ X
1
≤ F
←
1
(u)),
provided that the limit λ
L
∈ [0, 1] exists. If λ
U
> 0 (λ
L
> 0), then we say that
(X
1
, X
2
) has upper (lower) tail dependence.
See Figure 19 for an illustration of tail dependence.
50
−25 −20 −15 −10 −5 0 5 10 15 20 25
−25
−20
−15
−10
−5
0
5
10
15
20
25
Figure 19: Illustration of lower tail dependence.
−4 −2 0 2 4
−
5
0
5
−4 −2 0 2 4
−
5
0
5
Figure 20: Illustration of two bivariate distributions with linear correlation
L
= 0.8 and standard normal marginal distributions. The one to the left
has tail dependence λ
U
= λ
L
= 0.62 and the one to the right has λ
U
= λ
L
= 0.
9 Multivariate elliptical distributions
9.1 The multivariate normal distribution
Recall the following three equivalent deﬁnitions of a multivariate normal distri
bution.
(1) A random vector X = (X
1
, . . . , X
d
)
T
has a normal distribution if for every
vector a = (a
1
, . . . , a
d
)
T
the random variable a
T
X has a normal distribution.
The notation X ∼ N
d
(µ, Σ) is used to denote that X has a ddimensional
normal distribution with mean vector µ and covariance matrix Σ.
(2) X ∼ N
d
(µ, Σ) if and only if its characteristic function is given by
φ
X
(t) = E(exp¦it
T
X¦) = exp
_
it
T
µ −
1
2
t
T
Σt
_
.
(3) A random vector X with E(X) = µ and Cov(X) = Σ, such that [Σ[ > 0,
satisﬁes X ∼ N
d
(µ, Σ) if and only if it has the density
f
X
(x) =
1
_
(2π)
d
[Σ[
exp
_
−
(x −µ)
T
Σ
−1
(x −µ)
2
_
Next we list some useful properties of the multivariate normal distribution.
Let X ∼ N
d
(µ, Σ).
• Linear transformations. For B ∈ R
k·d
and b ∈ R
k
we have
BX+b ∼ N
k
(Bµ +b, BΣB
T
).
• Marginal distributions. Write X
T
= (X
T
1
, X
T
2
) with X
1
= (X
1
, . . . , X
k
)
T
,
X
2
= (X
k+1
, . . . , X
d
)
T
and write
µ
T
= (µ
T
1
, µ
T
2
), Σ =
_
Σ
11
Σ
12
Σ
21
Σ
22
_
.
Then X
1
∼ N
k
(µ
1
, Σ
11
) and X
2
∼ N
d−k
(µ
2
, Σ
22
).
• Conditional distributions. If Σ is nonsingular ([Σ[ > 0), then X
2
[X
1
=
x
1
∼ N
d−k
(µ
2,1
, Σ
22,1
), where
µ
2,1
= µ
2
+ Σ
21
Σ
−1
11
(x
1
−µ
1
) and Σ
22,1
= Σ
22
−Σ
21
Σ
−1
11
Σ
12
.
• Quadratic forms. If Σ is nonsingular, then
D
2
= (X−µ)
T
Σ
−1
(X−µ) ∼ χ
2
d
.
The variable D is called the Mahalanobis distance.
• Convolutions. If X ∼ N
d
(µ, Σ) and Y ∼ N
d
(¯ µ,
¯
Σ) are independent, then
X+Y ∼ N
d
(µ + ¯ µ, Σ+
¯
Σ).
53
9.2 Normal mixtures
Note that if X ∼ N
d
(µ, Σ), then X has the representation X
d
= µ +AZ, where
Z ∼ N
k
(0, I) (I denotes the identity matrix) and A ∈ R
d·k
is such that AA
T
=
Σ. If Σ is nonsingular, then we can take k = d.
Deﬁnition 9.1 An R
d
valued random vector X is said to have a multivariate
normal variance mixture distribution if X
d
= µ + WAZ, where Z ∼ N
k
(0, I),
W ≥ 0 is a positive random variable independent of Z, and A ∈ R
d·k
and
µ ∈ R
d
are a matrix and a vector of constants respectively.
Note that conditioning on W = w we have that X[W = w ∼ N
d
(µ, w
2
Σ),
where Σ = AA
T
.
Example 9.1 If we take W
2
d
= ν/S, where S ∼ χ
2
ν
(Chisquare distribution
with ν degrees of freedom), then X has the multivariate tdistribution with ν
degrees of freedom. We use the notation X ∼ t
d
(ν, µ, Σ). Note that Σ is not
the covariance matrix of X. Since E(W
2
) = ν/(ν − 2) (if ν > 2) we have
Cov(X) = [ν/(ν −2)]Σ.
9.3 Spherical distributions
Many results on spherical (and elliptical) distributions can be found in the book
[9].
Deﬁnition 9.2 A random vector X = (X
1
, . . . , X
d
)
T
has a spherical distribu
tion if there exists a function ψ of a scalar variable such that the characteristic
function of X satisﬁes φ
X
(t) = ψ(t
T
t) = ψ(t
2
1
+ +t
2
d
).
We write X ∼ S
d
(ψ) to denote the X has a spherical distribution with charac
teristic function ψ(t
T
t).
Proposition 9.1 The following statements are equivalent.
(1) X has a spherical distribution.
(2) For every vector a ∈ R
d
, a
T
X
d
= aX
1
with a
2
= a
2
1
+. . . a
2
d
.
(3) X has the stochastic representation X
d
= RS, where S is uniformly dis
tributed on the unit sphere S
d−1
= ¦x ∈ R
d
: x = 1¦ and R ≥ 0 is
independent of S.
The implication (1) ⇒(2) can be shown as follows. Recall that two random
variables (and vectors) have the same distribution if and only if their character
istic functions are the same. Let X ∼ S
d
(ψ). Then
φ
a
T
X
(s) = E(exp¦isa
T
X¦) = E(exp¦i(sa)
T
X¦) = ψ(s
2
a
T
a).
Note that aX
1
= t
T
X with t
T
= (a, 0, . . . , 0). Hence,
φ
aX
1
(s) = E(exp¦ist
T
X¦) = E(exp¦i(st)
T
X¦) = ψ(s
2
t
T
t)
= ψ(s
2
a
2
) = ψ(s
2
a
T
a).
54
Example 9.2 Let X ∼ N
d
(0, I). Then X ∼ S
d
(ψ) with ψ(x) = exp¦−x/2¦.
The characteristic function of the standard normal distribution is
φ
X
(t) = exp
_
it
T
0 −
1
2
t
T
It
_
= exp¦−t
T
t/2¦ = ψ(t
T
t).
From the stochastic representation X
d
= RS we conclude that X
2
d
= R
2
and
since the sum of squares of d independent standard normal random variables
has a chisquared distribution with d degrees of freedom we have R
2
∼ χ
2
d
.
The stochastic representation in (4) is very useful for simulating from spheri
cal distributions. Simply apply the following algorithm. Suppose the spherically
distributed random vector X has the stochastic representation X
d
= RS.
(i) Simulate s from the distribution of S.
(ii) Simulate r from the distribution of R.
(iii) Put x = rs.
This procedure can then be repeated n times to get a sample of size n from
the spherical distribution of X. We illustrate this in Figure 21. Note that a
convenient way to simulate a random element S that has uniform distribution on
the unit sphere in R
d
is to simulate a ddimensional random vector Y ∼ N
d
(0, I)
and then put S = Y/Y.
9.4 Elliptical distributions
Deﬁnition 9.3 An R
d
valued random vector X has an elliptical distribution if
X
d
= µ +AY, where Y ∼ S
k
(ψ), A ∈ R
d·k
and µ ∈ R
d
.
When d = 1 the elliptical distributions coincide with the 1dimensional sym
metric distributions. The characteristic function of an elliptically distributed
random vector X can be written as
φ
X
(t) = E(exp¦it
T
X¦)
= E(exp¦it
T
(µ +AY)¦)
= exp¦it
T
µ¦ E(exp¦i(A
T
t)
T
Y¦)
= exp¦it
T
µ¦ψ(t
T
Σt),
where Σ = AA
T
. We write X ∼ E
d
(µ, Σ, ψ). Here µ is called the location
parameter, Σ is called the dispersion matrix and ψ is called the characteris
tic generator of the elliptical distribution. If E(X
k
) < ∞, then E(X) = µ.
If E(X
k
) < ∞, then Cov(X) = cΣ for some c > 0. Note that elliptically
distributed random vectors are radially symmetric: if X ∼ E
d
(µ, Σ, ψ), then
X− µ
d
= µ − X. If A ∈ R
d·d
is nonsingular with AA
T
= Σ, then we have the
following relation between elliptical and spherical distributions:
X ∼ E
d
(µ, Σ, ψ) ⇐⇒ A
−1
(X−µ) ∼ S
d
(ψ), A ∈ R
d·d
, AA
T
= Σ.
55
−1 −0.5 0 0.5 1
−0.5
0
0.5
−4 −2 0 2 4
−3
−2
−1
0
1
2
3
Figure 21: Simulation from a spherical distribution using the stochastic rep
resentation. First we simulate independently n times from the uniform dis
tribution on the unit sphere to obtain s
1
, . . . , s
n
(above). Then, we simulate
r
1
, . . . , r
n
from the distribution of R. Finally we put x
k
= r
k
s
k
for k = 1, . . . , n
(below).
It follows immediately from the deﬁnition that elliptical distributed random vec
tors have the following stochastic representation. X ∼ E
d
(µ, Σ, ψ) if and only if
there exist S, R, and A such that X
d
= µ+RAS with S uniformly distributed on
the unit sphere, R ≥ 0 a random variable independent of S, A ∈ R
d·k
a matrix
with AA
T
= Σ and µ ∈ R
d
. The stochastic representation is useful when simu
lating from elliptical distributions. Suppose X has the stochastic representation
X
d
= µ +RAS.
(i) Simulate s from the distribution of S.
(ii) Multiply s by the matrix A to get As.
(iii) Simulate r from the distribution of R and form rAs.
(iv) Put x = µ +rAs.
This procedure can then be repeated n times to get a sample of size n from the
elliptical distribution of X. We illustrate this in Figure 22.
Example 9.3 An R
d
valued normally distributed random vector with mean µ
and covariance matrix Σ has the representation X
d
= µ +AZ, where AA
T
= Σ
56
−1 −0.5 0 0.5 1
−0.5
0
0.5
−1 −0.5 0 0.5 1
−0.5
0
0.5
−2 0 2
−3
−2
−1
0
1
2
3
8 10 12
7
8
9
10
11
12
13
Figure 22: Simulation from a spherical distribution using the stochastic repre
sentation. First we simulate independently n times from the uniform distribu
tion on the unit sphere to obtain s
1
, . . . , s
n
(above, left). Then, we multiply each
sample by A to get the points As
1
, . . . , As
n
(above, right). Next, we simulate
r
1
, . . . , r
n
from the distribution of R to obtain r
k
As
k
for k = 1, . . . , n (below,
left). Finally we add µ to obtain x
k
= µ+r
k
As
k
for k = 1, . . . , n (below, right).
and Z ∼ N
k
(0, I). Since Z has a spherical distribution it has representation
Z
d
= RS, where R
2
∼ χ
2
k
. Hence, X has the representation X
d
= µ + RAS and
we conclude that X ∼ E
d
(µ, Σ, ψ) with ψ(x) = exp¦−x/2¦.
Example 9.4 A random vector Z ∼ N
d
(0, I) has a spherical distribution with
stochastic representation Z
d
= V S. If X is a normal variance mixture, then
we see from the deﬁnition that it has representation X
d
= µ + V WAS, with
V
2
∼ χ
2
d
. Hence, it has an elliptical distribution with R
d
= V W. An example
given earlier is the multivariate t
ν
distribution where W
2
d
= ν/S with S ∼ χ
2
ν
.
This means that for the multivariate tdistribution R
2
/d
d
= V
2
W
2
/d has an
F(d, ν)distribution (see e.g. Problem 10, Chapter 1 in [11]).
9.5 Properties of elliptical distributions
Next we list some useful properties of elliptical distributions. Many of them
coincide with the properties of the multivariate normal distribution. This is
perhaps the most important argument for using the elliptical distributions in
applications. Let X ∼ E
d
(µ, Σ, ψ).
57
• Linear transformations. For B ∈ R
k·d
and b ∈ R
k
we have
BX+b ∼ E
k
(Bµ +b, BΣB
T
, ψ).
• Marginal distributions. Write X
T
= (X
T
1
, X
T
2
) with X
1
= (X
1
, . . . , X
k
)
T
,
X
2
= (X
k+1
, . . . , X
d
)
T
and write
µ
T
= (µ
T
1
, µ
T
2
), Σ =
_
Σ
11
Σ
12
Σ
21
Σ
22
_
.
Then X
1
∼ E
k
(µ
1
, Σ
11
, ψ) and X
2
∼ E
d−k
(µ
2
, Σ
22
, ψ).
• Conditional distributions. Assuming that Σ is nonsingular, X
2
[X
1
= x
1
∼
E
d−k
(µ
2,1
, Σ
22,1
,
¯
ψ), where
µ
2,1
= µ
2
+ Σ
21
Σ
−1
11
(x
1
−µ
1
) and Σ
22,1
= Σ
22
−Σ
21
Σ
−1
11
Σ
12
.
Typically
¯
ψ is a diﬀerent characteristic generator than the original ψ.
• Quadratic forms. If Σ is nonsingular, then
D
2
= (X−µ)
T
Σ
−1
(X−µ)
d
= R
2
.
The variable D is called the Mahalanobis distance.
• Convolutions. If X ∼ E
d
(µ, Σ, ψ) and Y ∼ E
d
(¯ µ, Σ,
¯
ψ) are independent,
then X + Y ∼ E
d
(µ + ¯ µ, Σ, ψ), with ψ(x) = ψ(x)
¯
ψ(x). Note that the
dispersion matrix Σ must (in general) be the same for X and Y.
IMPORTANT: Contrary to the multivariate normal distribution it is not
true that the components of a spherically distributed random vector X ∼
E
d
(0, I, ψ) are independent. In fact, the components of X are independent
only if X has a multivariate normal distribution. For instance, assume X =
(X
1
, X
2
)
T
∼ N
2
(µ, I). Then the linear correlation coeﬃcient
L
(X
1
, X
2
) = 0
and X
1
and X
2
are independent. However, if X = (X
1
, X
2
)
T
∼ E
2
(µ, I, ψ)
is not normal, then
L
(X
1
, X
2
) = 0 (if
L
(X
1
, X
2
) exists) but X
1
and X
2
are
dependent.
9.6 Elliptical distributions and risk management
Suppose we have the possibility today at time 0 to invest in d risky assets by
taking long or short positions. We consider a ﬁxed holding period of length T
and let X = (X
1
, . . . , X
d
) be the future asset returns at time T. Suppose that
X has a (ﬁnite) nonsingular covariance matrix Σ and mean vector E(X). Let P
be the set of all linear portfolios w
T
X for w ∈ R
d
and let
W
r
= ¦w ∈ R
d
: w
T
µ = r,
d
i=1
w
i
= 1¦ and P
r
= ¦Z = w
T
X : w ∈ W
r
¦.
58
Hence, P
r
is the set of of all linear portfolios with expected return r with the
normalization constraint
d
i=1
w
i
= 1.
How can we ﬁnd the least risky portfolio with expected portfolio return r?
The meanvariance (Markowitz) approach to portfolio optimization solves this
problem if we accept variance (or standard deviation) as our measure of risk.
The minimum variance portfolio is the portfolio that minimizes the variance
var(Z) = w
T
Σw over all Z ∈ P
r
. Hence, the portfolio weights of the minimum
variance portfolio is the solution to the following optimization problem:
min
w∈W
r
w
T
Σw.
This optimization problem has a unique solution, see e.g. p. 185 in [3]. As al
ready mentioned, the variance is in general not a suitable risk measure. We
would typically prefer a risk measure based on the appropriate tail of our port
folio return. It was shown in [7] that the “minimumrisk” portfolio, where risk
could be e.g. VaR or ES, coincides with the minimumvariance portfolio for
elliptically distributed return vectors X. Hence, the meanvariance approach
(Markowitz) to portfolio optimization makes sense in the elliptical world. In
fact we can take any risk measure with the following properties and replace
the variance in the Markowitz approach by . Suppose is a risk measure that
is translation invariant and positively homogeneous, i.e.
(X +a) = (X) +a for r ∈ R and (λX) = λ(X) for λ ≥ 0.
Moreover, suppose that (X) depends on X only through its distribution. The
risk measure could be for instance VaR or ES.
Since X ∼ E
d
(µ, Σ, ψ) we have X
d
= µ + AY, where AA
T
= Σ and Y ∼
S
d
(ψ). Hence, it follows from Proposition 9.1 that
w
T
X = w
T
µ +w
T
AY = w
T
µ + (A
T
w)
T
Y
d
= w
T
µ +A
T
wY
1
. (9.1)
Hence, for any Z ∈ P
r
it holds that
(Z) = r +A
T
w(Y
1
) and var(Z) = A
T
w
2
var(Y
1
) = w
T
Σwvar(Y
1
).
Hence, minimization with respect to and minimization with respect to the
variance give the same optimal portfolio weights w:
argmin
Z∈P
r
(Z) = argmin
w∈W
r
A
T
w
= argmin
w∈W
r
A
T
w
2
= argmin
Z∈P
r
var(Z).
Example 9.5 Note also that (9.1) implies that for w
1
, w
2
∈ R
d
we have
VaR
p
(w
1
X+w
2
X) = w
T
1
µ +w
T
2
µ +A
T
w
1
+A
T
w
2
 VaR
p
(Y
1
)
≤ w
T
1
µ +w
T
2
µ +A
T
w
1
 VaR
p
(Y
1
) +A
T
w
2
 VaR
p
(Y
1
)
= VaR
p
(w
T
1
X) + VaR
p
(w
T
2
X).
Hence, ValueatRisk is subadditive for elliptical distributions.
59
Example 9.6 Let X = (X
1
, X
2
)
T
be a vector of log returns, for the time
period todayuntiltomorrow, of two stocks with stock prices S
1
= S
2
= 1 today.
Suppose that X ∼ E
2
(µ, Σ, ψ) (elliptically distributed) with linear correlation
coeﬃcient ρ and that
µ = 0, Σ =
_
σ
2
σ
2
ρ
σ
2
ρ σ
2
_
.
Your total capital is 1 which you want to invest fully in the two stocks giving
you a linearized portfolio loss L
∆
= L
∆
(w
1
, w
2
) where w
1
and w
2
are portfolio
weights. Two investment strategies are available (long positions):
(A) invest your money in equal shares in the two stocks: w
A1
= w
A2
= 1/2;
(B) invest all your money in the ﬁrst stock: w
B1
= 1, w
B2
= 0.
How can we compute the ratio VaR
0.99
(L
∆
A
)/ VaR
0.99
(L
∆
B
), where L
∆
A
and L
∆
B
are linearized losses for investment strategies A and B, respectively?
We have
L
∆
= −w
T
X
d
= w
T
X
since X ∼ E
2
(0, Σ, ψ). Moreover, by (9.1) it holds that
w
T
X
d
=
√
w
T
ΣwZ,
where Z ∼ E
1
(0, 1, ψ). Hence, VaR
p
(w
T
X) =
√
w
T
ΣwVaR
p
(Z). This yields
VaR
0.99
(w
T
A
X)
VaR
0.99
(w
T
B
X)
=
_
w
T
A
Σw
A
_
w
T
B
Σw
B
.
We have w
T
A
Σw
A
= σ
2
(1 +ρ)/2 and w
T
B
Σw
B
= σ
2
. Hence
VaR
0.99
(L
∆
A
)
VaR
0.99
(L
∆
B
)
=
VaR
0.99
(w
T
A
X)
VaR
0.99
(w
T
B
X)
=
_
(1 +ρ)/2.
60
10 Copulas
We will now introduce the notion of copula. The main reasons for introduc
ing copulas are: 1) copulas are very useful for building multivariate models
with nonstandard (nonGaussian) dependence structures, 2) copulas provide an
understanding of dependence beyond linear correlation.
The reader seeking more information about copulas is recommended to con
sult the book [13].
10.1 Basic properties
Deﬁnition 10.1 A ddimensional copula is a distribution function on [0, 1]
d
with standard uniform marginal distributions.
This means that a copula is the distribution function P(U
1
≤ u
1
, . . . , U
d
≤ u
d
)
of a random vector (U
1
, . . . , U
d
) with the property that for all k it holds that
P(U
k
≤ u) = u for u ∈ [0, 1].
A bivariate distribution function F with marginal distribution functions is
a function F that satisﬁes
(A1) F(x
1
, x
2
) is nondecreasing in each argument x
k
.
(A2) F(x
1
, ∞) = F
1
(x
1
) and F(∞, x
2
) = F
2
(x
2
).
(A3) For all (a
1
, a
2
), (b
1
, b
2
) ∈ R
2
with a
k
≤ b
k
we have:
F(b
1
, b
2
) −F(a
1
, b
2
) −F(b
1
, a
2
) +F(a
1
, a
2
) ≥ 0.
Notice that (A3) says that probabilities are always nonnegative. Hence a copula
is a function C that satisﬁes
(B1) C(u
1
, u
2
) is nondecreasing in each argument u
k
.
(B2) C(u
1
, 1) = u
1
and C(1, u
2
) = u
2
.
(B3) For all (a
1
, a
2
), (b
1
, b
2
) ∈ [0, 1]
2
with a
k
≤ b
k
we have:
C(b
1
, b
2
) −C(a
1
, b
2
) −C(b
1
, a
2
) +C(a
1
, a
2
) ≥ 0.
Let h : R → R be nondecreasing. Then the following properties hold for the
generalized inverse h
←
of h.
(C1) h is continuous if and only if h
←
is strictly increasing.
(C2) h is strictly increasing if and only if h
←
is continuous.
(C3) If h is continuous, then h(h
←
(y)) = y.
(C4) If h is strictly increasing, then h
←
(h(x)) = x.
61
Recall the following important facts for a distribution function G on R.
(D1) Quantile transform. If U ∼ U(0, 1) (standard uniform distribution), then
P(G
←
(U) ≤ x) = G(x).
(D2) Probability transform. If Y has distribution function G, where G is con
tinuous, then G(Y ) ∼ U(0, 1).
Let X be a bivariate random vector with distribution function F that has con
tinuous marginal distribution functions F
1
, F
2
. Consider the function C given
by
C(u
1
, u
2
) = F(F
←
1
(u
1
), F
←
2
(u
2
)).
It is clear that C is nondecreasing in each argument so (B1) holds. Moreover,
C(u
1
, 1) = F(F
←
1
(u
1
), ∞) = P(X
1
≤ F
←
1
(u
1
))
= P(F
1
(X
1
) ≤ F
1
(F
←
1
(u
1
))) = P(F
1
(X
1
) ≤ u
1
)
= u
1
and similarly C(1, u
2
) = u
2
. Hence, (B2) holds. Since F is a bivariate distribu
tion function (B3) holds. Hence, C is a copula.
The following result known as Sklar’s Theorem is central to the theory of
copulas. It also explains the name “copula”: a function that “couples” the joint
distribution function to its (univariate) marginal distribution functions.
Theorem 10.1 (Sklar’s Theorem) Let F be a joint distribution function
with marginal distribution functions F
1
, . . . , F
d
. Then there exists a copula C
such that for all x
1
, . . . , x
d
∈ R = [−∞, ∞],
F(x
1
, . . . , x
d
) = C(F
1
(x
1
), . . . , F
d
(x
d
)). (10.1)
If F
1
, . . . , F
d
are continuous, then C is unique. Conversely, if C is a copula
and F
1
, . . . , F
d
are distribution functions, then F deﬁned by (10.1) is a joint
distribution function with marginal distribution functions F
1
, . . . , F
d
.
Deﬁnition 10.2 Let F be a joint distribution function with continuous margi
nal distribution functions F
1
, . . . , F
d
. Then the copula C in (10.1) is called the
copula of F. If X is a random vector with distribution function F, then we also
call C the copula of X.
By (C3) above, if F
k
is continuous, then F
k
(F
←
k
(u)) = u. Hence, if F is
a joint distribution function with continuous marginal distribution functions
F
1
, . . . , F
d
, then the unique copula of F is given by
C(u
1
, . . . , u
d
) = F(F
←
1
(u
1
), . . . , F
←
d
(u
d
)).
Much of the usefulness of copulas is due to the fact that the copula of a ran
dom vector with continuous marginal distribution functions is invariant under
strictly increasing transformations of the components of the random vector.
62
Proposition 10.1 Suppose (X
1
, . . . , X
d
) has continuous marginal distribution
functions and copula C and let T
1
, . . . , T
d
be strictly increasing. Then also the
random vector (T
1
(X
1
), . . . , T
d
(X
d
)) has copula C.
Proof. Let F
k
denote the distribution function of X
k
and let
¯
F
k
denote the
distribution function of T
k
(X
k
). Then
¯
F
k
(x) = P(T
k
(X
k
) ≤ x) = P(X
k
≤ T
←
k
(x)) = F
k
(T
←
k
(x)),
i.e.
¯
F
k
= F
k
◦ T
←
k
is continuous. Since
¯
F
1
, . . . ,
¯
F
d
are continuous the random
vector (T
1
(X
1
), . . . , T
d
(X
d
)) has a unique copula
¯
C. Moreover, for any x ∈ R
d
,
¯
C(
¯
F
1
(x
1
), . . . ,
¯
F
d
(x
d
)) = P(T
1
(X
1
) ≤ x
1
, . . . , T
d
(X
d
) ≤ x
d
)
= P(X
1
≤ T
←
1
(x
1
), . . . , X
d
≤ T
←
d
(x
d
))
= C(F
1
◦ T
←
1
(x
1
), . . . , F
d
◦ T
←
d
(x
d
))
= C(
¯
F
1
(x
1
), . . . ,
¯
F
d
(x
d
)).
Since, for k = 1, . . . , d,
¯
F
k
is continuous,
¯
F
k
(R) = [0, 1]. Hence
¯
C = C on [0, 1]
d
.
Example 10.1 Let C be a ddimensional copula and let U
1
, . . . , U
d
be random
variables that are uniformly distributed on [0, 1] with joint distribution function
C. Let F
1
, . . . , F
d
be univariate continuous distribution functions. Then, for
each k, F
←
k
(U
k
) has distribution function F
k
(the quantile transform). More
over, for each k, F
←
k
is strictly increasing (C1). Hence, the random vector
(F
←
1
(U
1
), . . . , F
←
d
(U
d
)) has marginal distribution functions F
1
, . . . , F
d
and, by
Proposition 10.1, copula C.
Example 10.2 Let (X
1
, X
2
) be a random vector with continuous marginal
distribution functions F
1
, F
2
and copula C Let g
1
and g
2
be strictly decreasing
functions. Determine the copula
¯
C of (g
1
(X
1
), g
2
(X
2
))?
The distribution and quantile function of g
k
(X
k
) is given by
¯
F
k
(x) = P(g
k
(X
k
) ≤ x) = P(X
k
≥ g
−1
k
(x)) = 1 −F
k
(g
−1
k
(x)),
¯
F
←
k
(p) = g
k
(F
←
k
(1 −p)).
Hence,
¯
C(u
1
, u
2
) = P(g
1
(X
1
) ≤
¯
F
−1
1
(u
1
), g
2
(X
2
) ≤
¯
F
−1
2
(u
2
))
= P(X
1
≥ F
←
1
(1 −u
1
), X
2
≥ F
←
2
(1 −u
2
))
= P(F
1
(X
1
) ≥ 1 −u
1
, F
2
(X
2
) ≥ 1 −u
2
)
= 1 −(1 −u
1
) −(1 −u
2
) +C(1 −u
1
, 1 −u
2
)
= C(1 −u
1
, 1 −u
2
) +u
1
+u
2
−1.
63
Notice that
C(u
1
, u
2
) = P(F
1
(X
1
) ≤ u
1
, F
2
(X
2
) ≤ u
2
),
¯
C(u
1
, u
2
) = P(1 −F
1
(X
1
) ≤ u
1
, 1 −F
2
(X
2
) ≤ u
2
).
See Figure 23 for an illustration with g
1
(x) = g
2
(x) = −x and F
1
(x) = F
2
(x) =
Φ(x) (standard normal distribution function).
−3 −2 −1 0 1 2 3
−
3
−
1
0
1
2
3
X1
X
2
−3 −2 −1 0 1 2 3
−
3
−
1
0
1
2
3
−X1
−
X
2
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
U1
U
2
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
1−U1
1
−
U
2
Figure 23: (U
1
, U
2
) has the copula C as its distribution function. (X
1
, X
2
) has
standard normal marginal distributions and copula C. Samples of size 3000
from (X
1
, X
2
), (−X
1
, −X
2
), (U
1
, U
2
) and (1 −U
1
, 1 −U
2
).
Example 10.3 Let X ∼ N
d
(0, R) where R is a correlation matrix. Denote by
Φ
R
and Φ the distribution functions of Xand X
1
respectively (the ddimensional
standard normal distribution function and the 1dimensional standard normal
distribution function). Then X has the socalled Gaussian copula C
Ga
R
given by
C
Ga
R
(u) = P(Φ(X
1
) ≤ u
1
, . . . , Φ(X
d
) ≤ u
d
) = Φ
R
(Φ
−1
(u
1
), . . . , Φ
−1
(u
d
)).
(10.2)
64
Let Y = µ + ΣX, where for some σ
1
, . . . , σ
d
> 0,
Σ =
_
_
_
_
σ
1
0 . . . 0
0 σ
2
0 . . . 0
. . .
0 . . . 0 σ
d
_
_
_
_
.
Then Y ∼ N
d
(µ, ΣRΣ). Note that Y has linear correlation matrix R. Note
also that T
k
(x) = µ
k
+σ
k
x is strictly increasing and that
Y = (T
1
(X
1
), . . . , T
d
(X
d
)).
Hence if C
Y
denotes the copula of Y, then by Proposition 10.1 we have C
Y
=
C
Ga
R
. We conclude that the copula of a nondegenerate ddimensional normal
distribution depends only on the linear correlation matrix. For d = 2 we see
from (10.2) that C
Ga
R
can be written as
C
Ga
R
(u
1
, u
2
) =
_
Φ
−1
(u
1
)
−∞
_
Φ
−1
(u
2
)
−∞
1
2π(1 −ρ
2
)
1/2
exp
_
−(x
2
1
−2ρx
1
x
2
+x
2
2
)
2(1 −ρ
2
)
_
dx
1
dx
2
,
if ρ = R
12
∈ (−1, 1).
The following result provides us with universal bounds for copulas. See also
Example 10.5 below.
Proposition 10.2 (Fr´echet bounds) For every copula C we have the bounds
max
_
d
k=1
u
k
−d + 1, 0
_
≤ C(u
1
, . . . , u
d
) ≤ min¦u
1
, . . . , u
d
¦.
For d ≥ 2 we denote by W
d
the Fr´echet lower bound and by M
d
the Fr´echet
upper bound. For d = 2 we drop the subscript of W
2
and M
2
, i.e. W = W
2
and
M = M
2
.
Example 10.4 Let W
d
be the Fr´echet lower bound and consider the set func
tion Q given by
Q([a
1
, b
1
] [a
d
, b
d
]) =
2
k
1
=1
2
k
d
=1
(−1)
k
1
++k
d
W
d
(u
1k
1
, . . . , u
dk
d
),
for all (a
1
, . . . , a
d
), (b
1
, . . . , b
d
) ∈ [0, 1]
d
with a
k
≤ b
k
where u
j1
= a
j
and u
j2
=
b
j
for j ∈ ¦1, . . . , d¦. W
d
is a copula (distribution function) if and only if Q is
its probability distribution. However,
Q([1/2, 1]
d
) = max(1 + + 1 −d + 1, 0)
−d max(1/2 + 1 + + 1 −d + 1, 0)
+
_
d
2
_
max(1/2 + 1/2 + 1 + + 1 −d + 1, 0)
. . .
+max(1/2 + + 1/2 −d + 1, 0)
= 1 −d/2.
65
Hence, Q is not a probability distribution for d ≥ 3 so W
d
is not a copula for
d ≥ 3.
The following result shows that the Fr´echet lower bound is the best possible.
Proposition 10.3 For any d ≥ 3 and any u ∈ [0, 1]
d
, there exists a copula C
such that C(u) = W
d
(u).
Remark 10.1 For any d ≥ 2, the Fr´echet upper bound M
d
is a copula.
For d = 2 the following example shows which random vectors have the
Fr´echet bounds as their copulas.
Example 10.5 Let X be a random variable with continuous distribution func
tion F
X
. Let Y = T(X) for some strictly increasing function T and denote by
F
Y
the distribution function of Y . Note that
F
Y
(x) = P(T(X) ≤ x) = F
X
(T
−1
(x))
and that F
Y
is continuous. Hence the copula of (X, T(X)) is
P(F
X
(X) ≤ u, F
Y
(Y ) ≤ v) = P(F
X
(X) ≤ u, F
X
(T
−1
(T(X))) ≤ v)
= P(F
X
(X) ≤ min¦u, v¦)
= min¦u, v¦.
By Proposition 10.1 the copula of (X, T(X)) is the copula of (U, U), where
U ∼ U(0, 1). Let Z = S(X) for some strictly decreasing function S and denote
by F
Z
the distribution function of Z. Note that
F
Z
(x) = P(S(X) ≤ x) = P(X > S
−1
(x)) = 1 −F
X
(S
−1
(x))
and that F
Z
is continuous. Hence the copula of (X, S(X)) is
P(F
X
(X) ≤ u, F
Z
(Z) ≤ v) = P(F
X
(X) ≤ u, 1 −F
X
(S
−1
(S(X))) ≤ v)
= P(F
X
(X) ≤ u, F
X
(X) > 1 −v)
= P(F
X
(X) ≤ u) −P(F
X
(X) ≤ min¦u, 1 −v¦)
= u −min¦u, 1 −v¦
= max¦u +v −1, 0¦.
By (a modiﬁed version of) Proposition 10.1 the copula of (X, S(X)) is the copula
of (U, 1 −U), where U ∼ U(0, 1).
10.2 Dependence measures
Comonotonicity and countermonotonicity revisited
Proposition 10.4 Let (X
1
, X
2
) have one of the copulas W or M (as a possible
copula). Then there exist two monotone functions α, β : R → R and a random
variable Z so that
(X
1
, X
2
)
d
= (α(Z), β(Z)),
66
with α increasing and β decreasing in the former case (W) and both α and β
increasing in the latter case (M). The converse of this result is also true.
Hence, if (X
1
, X
2
) has the copula M (as a possible copula), then X
1
and X
2
are comonotonic; if it has the copula W (as a possible copula), then they are
countermonotonic. Note that if any of F
1
and F
2
(the distribution functions of
X
1
and X
2
, respectively) have discontinuities, so that the copula is not unique,
then W and M are possible copulas. Recall also that if F
1
and F
2
are continuous,
then
C = W ⇔ X
2
= T(X
1
) a.s., T = F
←
2
◦ (1 −F
1
) decreasing,
C = M ⇔ X
2
= T(X
1
) a.s., T = F
←
2
◦ F
1
increasing.
Kendall’s tau and Spearman’s rho revisited
To begin with we recall the deﬁnitions of the concordance measures Kendall’s
tau and Spearman’s rho.
Deﬁnition 10.3 Kendall’s tau for the random vector (X
1
, X
2
) is deﬁned as
τ
(X
1
, X
2
) = P((X
1
−X
t
1
)(X
2
−X
t
2
) > 0) −P((X
1
−X
t
1
)(X
2
−X
t
2
) < 0),
where (X
t
1
, X
t
2
) is an independent copy of (X
1
, X
2
). Spearman’s rho for the
random vector (X
1
, X
2
) is deﬁned as
S
(X
1
, X
2
) = 3 (P((X
1
−X
t
1
)(X
2
−X
tt
2
) > 0) −P((X
1
−X
t
1
)(X
2
−X
tt
2
) < 0)) ,
where (X
t
1
, X
t
2
) and (X
tt
1
, X
tt
2
) are independent copies of (X
1
, X
2
).
An important property of Kendall’s tau and Spearman’s rho is that they are
invariant under strictly increasing transformations of the underlying random
variables. If (X
1
, X
2
) is a random vector with continuous marginal distribution
functions and T
1
and T
2
are strictly increasing transformations on the range
of X
1
and X
2
respectively, then
τ
(T
1
(X
1
), T
2
(X
2
)) =
τ
(X
1
, X
2
). The same
property holds for Spearman’s rho. Note that this implies that Kendall’s tau
and Spearman’s rho do not depend on the (marginal) distributions of X
1
and
X
2
. This is made clear in the following two results.
Proposition 10.5 Let (X
1
, X
2
) be a random vector with continuous marginal
distribution functions and with copula C. Then
τ
(X
1
, X
2
) = 4
_
[0,1]
2
C(u
1
, u
2
)dC(u
1
, u
2
) −1 = 4 E(C(U
1
, U
2
)) −1,
S
(X
1
, X
2
) = 12
_
[0,1]
2
u
1
u
2
dC(u
1
, u
2
) −3 = 12
_
[0,1]
2
C(u
1
, u
2
)du
1
du
2
−3
= 12 E(U
1
U
2
) −3,
where (U
1
, U
2
) has distribution function C.
67
Remark 10.2 Note that by Proposition 10.5, if F
1
and F
2
denotes the distri
bution functions of X
1
and X
2
respectively,
S
(X
1
, X
2
) = 12
_
[0,1]
2
u
1
u
2
dC(u
1
, u
2
) −3
= 12 E(F
1
(X
1
)F
2
(X
2
)) −3
=
E(F
1
(X
1
)F
2
(X
2
)) −1/4
1/12
=
E(F
1
(X
1
)F
2
(X
2
)) −E(F
1
(X
1
)) E(F
2
(X
2
))
_
var(F
1
(X
1
))
_
var(F
2
(X
2
))
=
l
(F
1
(X
1
), F
2
(X
2
)).
Hence Spearman’s rho is simply the linear correlation coeﬃcient of the proba
bility transformed random variables.
Tail dependence revisited
We now return to the dependence concept called tail dependence. The concept
of tail dependence is important for the modeling of joint extremes, particularly
in portfolio Risk Management. We recall the deﬁnition of the coeﬃcient of tail
dependence.
Let (X
1
, X
2
) be a random vector with marginal distribution functions F
1
and F
2
. The coeﬃcient of upper tail dependence of (X
1
, X
2
) is deﬁned as
λ
U
(X
1
, X
2
) = lim
u,1
P(X
2
> F
←
2
(u) [ X
1
> F
←
1
(u)),
provided that the limit λ
U
∈ [0, 1] exists. The coeﬃcient of lower tail dependence
is deﬁned as
λ
L
(X
1
, X
2
) = lim
u¸0
P(X
2
≤ F
←
2
(u) [ X
1
≤ F
←
1
(u)),
provided that the limit λ
L
∈ [0, 1] exists. If λ
U
> 0 (λ
L
> 0), then we say that
(X
1
, X
2
) has upper (lower) tail dependence.
Proposition 10.6 Let (X
1
, X
2
) be a random vector with continuous marginal
distribution functions and copula C. Then
λ
U
(X
1
, X
2
) = lim
u,1
(1 −2u +C(u, u))/(1 −u),
provided that the limit exists, and
λ
L
(X
1
, X
2
) = lim
u¸0
C(u, u)/u,
provided that the limit exists.
That the limit need not exist is shown by the following example.
68
Example 10.6 Let Q be a probability measure on [0, 1]
2
such that for every
integer n ≥ 1, Q assigns mass 2
−n
, uniformly distributed, to the line segment
between (1 −2
−n
, 1 −2
−n+1
) and (1 −2
−n+1
, 1 −2
−n
). Deﬁne the distribution
function C by C(u
1
, u
2
) = Q([0, u
1
] [0, u
2
]) for u
1
, u
2
∈ [0, 1]. Note that
C(u
1
, 0) = 0 = C(0, u
2
), C(u
1
, 1) = u
1
and C(1, u
2
) = u
2
, i.e. C is a copula.
Note also that for every n ≥ 1 (with C(u, u) = 1 −2u +C(u, u))
C(1 −2
−n+1
, 1 −2
−n+1
)/2
−n+1
= 1
and
C(1 −3/2
n+1
, 1 −3/2
n+1
)/(3/2
n+1
) = 2/3.
In particular lim
u,1
C(u, u)/(1 −u) does not exist.
Example 10.7 Consider the socalled Gumbel family of copulas given by
C
θ
(u
1
, u
2
) = exp(−[(−lnu
1
)
θ
+ (−ln u
2
)
θ
]
1/θ
),
for θ ≥ 1. Then
1 −2u +C
θ
(u, u)
1 −u
=
1 −2u + exp(2
1/θ
lnu)
1 −u
=
1 −2u +u
2
1/θ
1 −u
,
and hence by l’Hospitals rule
lim
u,1
(1 −2u +C
θ
(u, u))/(1 −u) = 2 − lim
u,1
2
1/θ
u
2
1/θ
−1
= 2 −2
1/θ
.
Thus for θ > 1, C
θ
has upper tail dependence: λ
U
= 2 −2
1/θ
. Moreover, again
by using l’Hospitals rule,
lim
u¸0
C
θ
(u, u)/u = lim
u¸0
2
1/θ
u
2
1/θ
= 0,
i.e. λ
L
= 0. See Figure 24 for a graphical illustration.
Example 10.8 Consider the socalled Clayton family of copulas given by
C
θ
(u
1
, u
2
) = (u
−θ
1
+u
−θ
2
−1)
−1/θ
,
for θ > 0. Then C
θ
(u, u) = (2u
−θ
−1)
−1/θ
and hence, by l’Hospitals rule,
lim
u¸0
C
θ
(u, u)/u = lim
u¸0
(2u
−θ
−1)
−1/θ
/u
= lim
u¸0
(−
1
θ
)(2u
−θ
−1)
−1/θ−1
(−2θ(u
θ
)
−1/θ−1
)
= lim
u¸0
2(2 −u
θ
)
−1/θ−1
= 2
−1/θ
,
i.e. λ
L
= 2
−1/θ
. Similarly one shows that λ
U
= 0. See Figure 24 for a graphical
illustration.
69
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• • • •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Gumbel
X1
Y
1
0 2 4 6 8 10 12 14
0
2
4
6
8
1
0
1
2
1
4
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Clayton
X2
Y
2
0 2 4 6 8 10 12 14
0
2
4
6
8
1
0
1
2
1
4
Figure 24: Samples from two distributions with Gamma(3, 1) marginal dis
tribution functions, linear correlation 0.5 but diﬀerent dependence structures.
(X1, Y 1) has a Gumbel copula and (X2, Y 2) has a Clayton copula.
10.3 Elliptical copulas
Deﬁnition 10.4 Let X ∼ E
d
(µ, Σ, ψ) with distribution function F and with
continuous marginal distribution functions F
1
, . . . , F
d
. Then the copula C given
by C(u) = F(F
←
1
(u
1
), . . . , F
←
d
(u
d
)) is said to be an elliptical copula.
Note that an elliptical copula is not the distribution function of an elliptical
distribution, but rather the copula of an elliptical distribution.
The copula of the ddimensional normal distribution with linear correlation
matrix R is
C
Ga
R
(u) = Φ
d
R
(Φ
−1
(u
1
), . . . , Φ
−1
(u
d
)),
where Φ
d
R
denotes the joint distribution function of the ddimensional standard
normal distribution function with linear correlation matrix R, and Φ
−1
denotes
the inverse of the distribution function of the univariate standard normal distri
bution. Copulas of the above form are called Gaussian copulas. In the bivariate
case the copula expression can be written as
C
Ga
R
(u
1
, u
2
) =
_
Φ
−1
(u
1
)
−∞
_
Φ
−1
(u
2
)
−∞
1
2π(1 −ρ
2
)
1/2
exp
_
−(x
2
1
−2ρx
1
x
2
+x
2
2
)
2(1 −ρ
2
)
_
dx
1
dx
2
,
if ρ = R
12
∈ (−1, 1).
Proposition 10.7 (i) If (X
1
, X
2
) is a normally distributed random vector, then
λ
U
(X
1
, X
2
) = λ
L
(X
1
, X
2
) = 0.
(ii) If (X
1
, X
2
) has continuous marginal distribution functions and a Gaussian
copula, then λ
U
(X
1
, X
2
) = λ
L
(X
1
, X
2
) = 0.
70
If X has the stochastic representation
X =
d
µ +
√
ν
√
S
AZ, (10.3)
where µ ∈ R
d
(column vector), S ∼ χ
2
ν
and Z ∼ N
d
(0, I) (column vector)
are independent, then X has an ddimensional t
ν
distribution with mean µ (for
ν > 1) and covariance matrix
ν
ν−2
AA
T
(for ν > 2). If ν ≤ 2, then Cov(X) is not
deﬁned. In this case we just interpret Σ = AA
T
as being the shape parameter
of the distribution of X. The copula of X given by (10.3) can be written as
C
t
ν,R
(u) = t
d
ν,R
(t
−1
ν
(u
1
), . . . , t
−1
ν
(u
d
)),
where R
ij
= Σ
ij
/
_
Σ
ii
Σ
jj
for i, j ∈ ¦1, . . . , d¦ and where t
d
ν,R
denotes the
distribution function of
√
νAZ/
√
S, where AA
T
= R. Here t
ν
denotes the
(equal) margins of t
d
ν,R
, i.e. the distribution function of
√
νZ
1
/
√
S. In the
bivariate case the copula expression can be written as
C
t
ν,R
(u
1
, u
2
) =
_
t
−1
ν
(u)
−∞
_
t
−1
ν
(v)
−∞
1
2π(1 −ρ
2
)
1/2
_
1+
x
2
1
−2ρx
1
x
2
+x
2
2
ν(1 −ρ
2
)
_
−(ν+2)/2
dx
1
dx
2
,
if ρ = R
12
∈ (−1, 1). Note that R
12
is simply the usual linear correlation
coeﬃcient of the corresponding bivariate t
ν
distribution if ν > 2.
Proposition 10.8 (i) If (X
1
, X
2
) has a tdistribution with ν degrees of freedom
and linear correlation matrix R, then
λ
U
(X
1
, X
2
) = λ
L
(X
1
, X
2
) = 2t
ν+1
_
√
ν + 1
_
1 −R
12
/
_
1 +R
12
_
. (10.4)
(ii) If (X
1
, X
2
) has continuous marginal distribution functions and a tcopula
with parameters ν and R, then λ
U
(X
1
, X
2
) and λ
L
(X
1
, X
2
) are as in (10.4).
From this it is also seen that the coeﬃcient of upper tail dependence is increasing
in R
12
and decreasing in ν, as one would expect. Furthermore, the coeﬃcient of
upper (lower) tail dependence tends to zero as the number of degrees of freedom
tends to inﬁnity for R
12
< 1.
The following result will play an important role in parameter estimation in
models with elliptical copulas.
Proposition 10.9 (i) If (X
1
, X
2
) ∼ E
2
(µ, Σ, ψ) with continuous marginal dis
tribution functions, then
τ
(X
1
, X
2
) =
2
π
arcsin R
12
, (10.5)
where R
12
= Σ
12
/
√
Σ
11
Σ
22
.
(ii) If (X
1
, X
2
) has continuous marginal distribution functions and the copula
of E
2
(µ, Σ, ψ), then relation (10.5) holds.
71
10.4 Simulation from Gaussian and tcopulas
We now address the question of random variate generation from the Gaussian
copula C
Ga
R
. For our purpose, it is suﬃcient to consider only strictly positive
deﬁnite matrices R. Write R = AA
T
for some dd matrix A, and if Z
1
, . . . , Z
d
∼
N(0, 1) are independent, then
µ +AZ ∼ N
d
(µ, R).
One natural choice of A is the Cholesky decomposition of R. The Cholesky
decomposition of R is the unique lowertriangular matrix L with LL
T
= R.
Furthermore Cholesky decomposition routines are implemented in most mathe
matical software. This provides an easy algorithm for random variate generation
from the ddimensional Gaussian copula C
Ga
R
.
Algorithm 10.1
• Find the Cholesky decomposition A of R: R = AA
T
.
• Simulate d independent random variates Z
1
, . . . , Z
d
from N(0, 1).
• Set X = AZ.
• Set U
k
= Φ(X
k
) for k = 1, . . . , d.
• U = (U
1
, . . . , U
d
) has distribution function C
Ga
R
.
As usual Φ denotes the univariate standard normal distribution function.
Equation (10.3) provides an easy algorithm for random variate generation
from the tcopula, C
t
ν,R
.
Algorithm 10.2
• Find the Cholesky decomposition A of R: R = AA
T
.
• Simulate d independent random variates Z
1
, . . . , Z
d
from N(0, 1).
• Simulate a random variate S from χ
2
ν
independent of Z
1
, . . . , Z
d
.
• Set Y = AZ.
• Set X =
√
ν
√
S
Y.
• Set U
k
= t
ν
(X
k
) for k = 1, . . . , d.
• U = (U
1
, . . . , U
d
) has distribution function C
t
ν,R
.
72
10.5 Archimedean copulas
As we have seen, elliptical copulas are derived from distribution functions for
elliptical distributions using Sklar’s Theorem. Since simulation from elliptical
distributions is easy, so is simulation from elliptical copulas. There are how
ever drawbacks: elliptical copulas do not have closed form expressions and are
restricted to have radial symmetry. In many ﬁnance and insurance applica
tions it seems reasonable that there is a stronger dependence between big losses
(e.g. a stock market crash) than between big gains. Such asymmetries cannot
be modeled with elliptical copulas.
In this section we discuss an important class of copulas called Archimedean
copulas. This class of copulas is worth studying for a number of reasons. Many
interesting parametric families of copulas are Archimedean and the class of
Archimedean copulas allows for a great variety of diﬀerent dependence struc
tures. Furthermore, in contrast to elliptical copulas, all commonly encountered
Archimedean copulas have closed form expressions. Unlike the copulas discussed
so far these copulas are not derived from multivariate distribution functions us
ing Sklar’s Theorem. A consequence of this is that we need somewhat technical
conditions to assert that multivariate extensions of bivariate Archimedean cop
ulas are indeed copulas. A further disadvantage is that multivariate extensions
of Archimedean copulas in general suﬀer from lack of free parameter choice in
the sense that some of the entries in the resulting rank correlation matrix are
forced to be equal. We begin with a general deﬁnition of Archimedean copulas.
Proposition 10.10 Let ϕ be a continuous, strictly decreasing function from
[0, 1] to [0, ∞] such that ϕ(0) = ∞ and ϕ(1) = 0. Let C : [0, 1]
2
→[0, 1] be given
by
C(u
1
, u
2
) = ϕ
−1
(ϕ(u
1
) +ϕ(u
2
)). (10.6)
Then C is a copula if and only if ϕ is convex.
Copulas of the form (10.6) are called Archimedean copulas. The function ϕ is
called a generator of the copula.
Example 10.9 Let ϕ(t) = (−lnt)
θ
, where θ ≥ 1. Clearly ϕ(t) is continuous
and ϕ(1) = 0. ϕ
t
(t) = −θ(−lnt)
θ−1 1
t
, so ϕ is a strictly decreasing function
from [0, 1] to [0, ∞]. ϕ
tt
(t) ≥ 0 on [0, 1], so ϕ is convex. Moreover ϕ(0) = ∞.
From (10.6) we get
C
Gu
θ
(u
1
, u
2
) = ϕ
−1
(ϕ(u
1
) +ϕ(u
2
)) = exp(−[(−lnu
1
)
θ
+ (−ln u
2
)
θ
]
1/θ
).
Furthermore C
1
= Π (Π(u
1
, u
2
) = u
1
u
2
) and lim
θ→∞
C
θ
= M (M(u
1
, u
2
) =
min(u
1
, u
2
)). This copula family is called the Gumbel family. As shown in
Example 10.7 this copula family has upper tail dependence.
Example 10.10 Let ϕ(t) = t
−θ
− 1, where θ > 0. This gives the Clayton
family
C
Cl
θ
(u
1
, u
2
) =
_
(u
−θ
1
−1) + (u
−θ
2
−1) + 1
_
−1/θ
. = (u
−θ
1
+u
−θ
2
−1)
−1/θ
. (10.7)
73
Moreover, lim
θ→0
C
θ
= Π and lim
θ→∞
C
θ
= M. As shown in Example 10.8 this
copula family has upper tail dependence.
Recall that Kendall’s tau for a copula C can be expressed as a double integral
of C. This double integral is in most cases not straightforward to evaluate.
However for an Archimedean copula, Kendall’s tau can be expressed as a one
dimensional integral of the generator and its derivative.
Proposition 10.11 Let (X
1
, X
2
) be a random vector with continuous marginal
distribution functions and with an Archimedean copula C generated by ϕ. Then
Kendall’s tau of (X
1
, X
2
) is given by
τ
(X
1
, X
2
) = 1 + 4
_
1
0
ϕ(t)
ϕ
t
(t)
dt.
Example 10.11 Consider the Gumbel family with generator ϕ(t) = (−lnt)
θ
,
for θ ≥ 1. Then ϕ(t)/ϕ
t
(t) = (t lnt)/θ. Using Proposition 10.11 we can calculate
Kendall’s tau for the Gumbel family.
τ
(θ) = 1+4
_
1
0
t lnt
θ
dt = 1+
4
θ
_
_
t
2
2
ln t
_
1
0
−
_
1
0
t
2
dt
_
= 1+
4
θ
(0−1/4) = 1−
1
θ
.
As a consequence, in order to have Kendall’s tau equal to 0.5 in Figure 25 (the
Gumbel case), we put θ = 2.
Example 10.12 Consider the Clayton family with generator ϕ(t) = t
−θ
− 1,
for θ > 0. Then ϕ(t)/ϕ
t
(t) = (t
θ+1
− t)/θ. Using Proposition 10.11 we can
calculate Kendall’s tau for the Clayton family.
τ
(θ) = 1 + 4
_
1
0
t
θ+1
−t
θ
dt = 1 +
4
θ
_
1
θ + 2
−
1
2
_
=
θ
θ + 2
.
A natural question is under which additional conditions on ϕ we have that
the most simple multivariate extension of bivariate Archimedean copulas,
ϕ
−1
(ϕ(u
1
) + +ϕ(u
d
)),
is a copula for d ≥ 3. The following results address this question and show why
inverses of Laplace transforms are natural choices for generators of Archimedean
copulas.
Deﬁnition 10.5 A function g : [0, ∞) → [0, ∞) is completely monotonic if it
is continuous and if for any t ∈ (0, ∞) and k = 0, 1, 2, . . . ,
(−1)
k
_
d
k
ds
k
g(s)
_
¸
¸
¸
s=t
≥ 0.
74
•
•
•
•
•
•
•
•
•
••
•
••
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
• •
•
•
•
• •
•
•
• •
•
••
•
•
•
••
•
•
•
•
• •
• •
•
• •
• •
• •
•
•• • • ••
•
••
•
•
•
•
•
•
•
•
•
•
••
• •
••
• •
• • •
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
• •
•
•
•
•
•• •
•
•
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
•
• •
•
•
•
•
• •
••
•
•
•
•
•
• •
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
• •
•
•
•
•
•
• •
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
• •
• •
•
•
• •
• • •
•
•
• •
•
•
• •
• ••
•• •
•
•
•
•
•
• • • •
•
•
•
•
• •
•
• •
•• •
•
• • •
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
• •
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
• •
•
• • •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
• •
•
• •
•
•
•
••
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•• •
• •
•
•
• •
•
•
•
•
•
••
• •
•
•
•
•
•
•
•
• • •
•
•
• •
•
•
••
•
•
••
• • •
•
• •
•
•
•
• •
•
• •
•
• ••
•
•
•
•
• •••
•
•
•
•
•••• ••
•
• •
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
• ••
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
••
• • •
•
• •
• • •
•
• • •
•
• •
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
• ••
•
•
••
•
•
•
•
• •
•••
•
•
•
•
•
•
•
•
•
••
• • • •
••
•
•
•••
•
• • •
•
•
• • • •
•
•
• • • •
• •
•
•
• •
•
•
• ••
•
•
•
•
•
•••
••• •
•
•
•
• ••
•
• •
•
•
•
•• •
•
•
•
•
• •
•
•
•
• •
•
• •
•
• • •
••
•
•
•• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
••
•
•
•
•
•
•• •
•
•
• •
•
•
••
• •
•
•
•
•
•
•
•
• ••
• •
•
•
•
••
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• ••
•
•
•
•
•
•
•
•
•
••
•
••
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•• •
•
•
• •
• •
•
••
••• •
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
• •
•
•
•
•
•
•
•
•
•
• •
••
•
•
• • •
•
• •••
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
••
• •
•
• • • •
•
•
•
•
• • •
•
•
• •
• •
•
•
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
• • •
•
• •
•
•
•
•
•
•• •
•
• •• ••
•
• •
•
•
•
•
• • •
•
••
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
• •
• •
• ••
•
•
•
•
• •
•• •
•
•
•
•••
•
•
• •
•
•
•
•
•
•
•
• • •
•
•
•
• • • •
•
•
•
• •
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
• •
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
••
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
••
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
• •
•
• •
•• • • •
•
•
• •
•
• •
•
•
•
• •
•
•
•
••
•
• •
•
•
• •
•
•
• • •
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
• • • •
•
• •
•
•
•
•
•
•
•
•
•
• • •
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
••
•
• • •
• •
•
• •
•
•
••
••
•
•
•
•
• •
•
•
•
•
• • •
•
•
••
•
•
•
•
•
• ••
•
•
•
• •
• •
•
•
•
• • •
•
•
••
•
•••
•
• •
•
• • •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
• • •
•• ••
•
• • •
•
•
•
• • • • •
•
•
• •
•
• • •
• •
•
•
•
•
•
•
•
•
•
• •
•
•
• ••
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• • •
•
•
•
•
• • •
•
•
•
•
•
• •
•
•
•
• ••
•
•
•
•
• •
• • •
•
•
•
•
•
••
•
•
•
•
• • •
•
••
•
•
• •
••
•
•
•
•
• •
••
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
• •••
••
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
BMWSiemens
15 10 5 0 5 10

1
5

1
0

5
0
5
1
0
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• • • •
•
•
•
•
• •
• ••
•
•
•
• •
•
• •
•
• • •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
• •
•
• • •
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
• •
• •
•
•
•
••
•
•
•
•
• •
•
•
•
•• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• • •
•
• •
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
• ••
•
•
•
•
•
•
•
• •
•
•
• • •
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• • ••
•
••
••
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• • • •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
••
• •
•
•
•
•
•
•
• •
••
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
• •
•
•
• •
•
•
•
•
•
•
•
•
•
• •
• • •
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
• • •
•
• •
•
•
•
•
• •
•
• •
•
•
•
•
•
• •
•
• •
••
•
•
•
•
•
•
•
• •
•
••
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
• • •
•
•
• • • •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
••
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
• • • •
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•• •
•
••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
• • •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•• • •
•
•
•
•
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
••
• •
•
•
••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
• •
• •
•
•
•
•
••
• • • • •
•
• •
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• •
•
•
•
••
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •• •
•
•
•
•
••
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
••
• •
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• • •
•
•••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
•
••
• •
•
•
••
••
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
• ••
•
•
•
•
•
•
••
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
• •
• ••
•
•
• • •
•
•
•
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
• •
•
••
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
• •
•
• • •
•
•
• •
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
• •
•
•
•
••
•
•
••
•
•
•
••
•
•
•
•
•
•
•
• •
•
•
•
• •
• • •
•
• • • •
•
• •
• • •
• ••
••
• •
•
•
•
• •
••
•
•
•
•
•
• •
•••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • • •
• •
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
•
•
•
• • • •
•
•
••
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
• • • •
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• • • • •
•
•
••
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• • •
• • • •
•
•
•
• •
•
•
• • • •
• •
•
•
••
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
• •
• • ••
•
• •
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
• • •
• •
•
••
•
•
•
•
•
• •
•
•
•
•
• •
•
•• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• • •
•
•
• •
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•• • •
•
•
•
•
• •
•
• • • •
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
t2
15 10 5 0 5 10

1
5

1
0

5
0
5
1
0
• •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • • •
•
•
•
•
• •
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
•
••
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
••
•
•
• •
•
• •
•
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
••• •
•
•
•
•
•
• •
• •
•
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• • •
•
•
•
•
• •
• •
•
•
•
•
• •
•
•
•
•
•• •
•
• •
•
•
•
• •
••
• •
•
•
• • •
•
•
•
•
•
•
•• • •
•
•
•
• • • • •
•
• •
•
•
•
•
•
•
• •
•
•
•
•
• •
•
• •
••
•
•
•
• •
•
•
•
•
•
•
•
••
•
• •
•
•
•
• •
•
•
•
•
•
••
• •
•
•
•
••
•
•
•
•
••
•
•
•
•
•
•
•
•
•• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
••
•
• •
• •
•
••
• •
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
• •
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
••
• •
•
•
• •
•
•
•
•
•
•
•
•
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
• •
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
••
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
• • •
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• • •
• •
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
••
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
••
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• • • • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
• •
•
• ••
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
• •
•
•
•
• •
• • • ••
•
•
•
•
•
• •• •
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
••
•
•
•
•
••
•
•
•
•
•
•
•
• • •
•
•
•• •
•
•
•
•
•
•
•
•
•
•
• •• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•• • • •
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
••
• •
•
• •
• ••
•• • •
• •
••
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• • •
•
•
•• •
• •
• •
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
• •
•
•
• •
•
• •
•
•
• • •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
• •
• • •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•• •
• •
•
•
•
•
•
•
•
• •
• •
•
•
• •
•
•
•• •
•
• •
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
• •
•
•
••
•
•
• •
••
•
•
• ••
•
•
• •
• •
• •
• •
•
•
•
•
•
•
•
•
• •
•
•
•
• • •
•
•
•
•
•
• •
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
• •
•
•
•
•
•
• • •
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
••
•
•
• •
•
•
•
•
•
• •
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
• •
•
• •
•
• • • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
• •
•
• •
•
•
•
•
••
•
•
•
•
••
•
•
•
•
•
•
• ••
•
• • • •
• ••
•
•
•
• •
• •• •
•
•
•
•
•
• • • •
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
••
• • •
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
••
•
•
•
•
•
•
•
••
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• ••
• •
• •
• •
• •
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
• •
•
•• •
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
• • • ••
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
••
• •
•
•
•
•
• •
•
•
•
• •
•
••
•
•
•
• •
•
•
•
•
•
•
•
•
•
• ••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
• • • •
•
• •
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
• • • ••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•• •
•
••
•
•
•
•
•
• •
•
•
••
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
• • •
• •
•
•
•
•
••
•
•
•
•
•
• ••
•
• •
•
•
•
Gaussian
15 10 5 0 5 10

1
5

1
0

5
0
5
1
0
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
••
•
•
•
• •
•
•
• •
•
•
• •
•
•
•
• • •
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• • •
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
• •
•
• • •
•
•
•
• •
•• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
• • •
•
•
•
•
•
•
•
••
•
• • •
• •
•
•
•
•
•
•
•
•
•
•
• •
• • •
• •
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
••
• •
•
•
• • •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• • •
•
•
••
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
• • •
• •
•
•
•
• • •
•
•
•
•
• •
•
•
• •
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
• • •
•
•
•
•
• •
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• • •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• • ••
•
• •
•
•
•
•
•
•
•
• •
•••
• •
•
•
•
•
• • •
• •
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• ••
• •
•
• • •
•
•
• •
•
•
•
• •
•
•
•
•
•
•
• • •
••
•
•
•
•
•
•
• • •
•
•
•
• •
•
•
•
•
•
• •
••
•
•
•
•
•
•
•
•
•
•
• •
•
••
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
• •
• ••
•
•
• •
•
• •
•
•
• •
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• • •
•
•••
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
• •
•
• • •
• •
•
• •
• •
•
•
••
•
• •
• •••
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
• •
• • •
•
•
•
• •
••
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• ••
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• • • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
••
•
•
• • •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •• •
•
•
•
• • •
•
•
•
••
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
• •
•
• •
•
•
• • •
•
• •
•
•
•
•
•
•
• •
•
•
• •
•
•
• •
•
•
•
•
•
••
•
•
•
•
•
• •
•
•
•
•
•
••
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
• •
•
• •
• •
•
•
•
••
•
• • •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• ••
•
•
•
• • • • •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
• •
•
• ••
•
• •
•
•
•
•
•
•••
•
••
•
• • •
•
•
•
•
•
•
• •
•
•
• •
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•• •
••
•
• •
•
•
•
•
•
• •
•
•
• •
• •
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
• ••
•
•
•
•• •
•
• •• •
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
• ••
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
• •
• •
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
• •
• • •
•
•• ••
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
• •
••
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•• •
• • •
•
• •
•
•
•
• • • •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
••
••
•
•
•
•
•
•
•
•
• •
•
•
•
•
• •
•
•
•
•
•
•
• •
• •
• •
•
•
•
••
•
•
•
•
•• •
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
•
• •
•
•
•
• •
•
• •
•
•
•
•
•
• •
•
•
••
•
•
•
• • •
•••
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
••
• • •
•
Gumbel
15 10 5 0 5 10

1
5

1
0

5
0
5
1
0
Figure 25: The upper left plot shows BMWSiemens daily log returns from
1989 to 1996. The other plots show samples from bivariate distributions with
t
4
margins and Kendall’s tau 0.5.
Proposition 10.12 Let ϕ : [0, 1] →[0, ∞] be continuous and strictly decreasing
such that ϕ(0) = ∞ and ϕ(1) = 0. Then, for any d ≥ 2, the function C :
[0, 1]
d
→[0, 1] given by
C(u) = ϕ
−1
(ϕ(u
1
) + +ϕ(u
d
))
is a copula if and only if ϕ
−1
is completely monotonic on [0, ∞).
The following result tells us where to look for generators satisfying the conditions
of Proposition 10.12.
Lemma 10.1 A function Ψ : [0, ∞) → [0, ∞) is the Laplace transform of a
distribution function G on [0, ∞) if and only if Ψ is completely monotonic and
Ψ(0) = 1.
Proposition 10.13 Let G be a distribution function on [0, ∞) with G(0) = 0
and Laplace transform
Ψ(s) =
_
∞
0
e
−sx
dG(x), s ≥ 0.
Consider a random variable X with distribution function G and a set of [0, 1]
valued random variables U
1
, . . . , U
d
which are conditionally independent given
75
X with conditional distribution function given by F
U
k
]X=x
(u) = exp(−xΨ
−1
(u))
for u ∈ [0, 1]. Then
P(U
1
≤ u
1
, . . . , U
d
≤ u
d
) = Ψ(Ψ
−1
(u
1
) + + Ψ
−1
(u
d
)),
so that the distribution function of U is an Archimedean copula with generator
Ψ
−1
.
Proof.
P(U
1
≤ u
1
, . . . , U
d
≤ u
d
)
=
_
∞
0
P(U
1
≤ u
1
, . . . , U
d
≤ u
d
[ X = x)dG(x)
=
_
∞
0
P(U
1
≤ u
1
[ X = x) . . . P(U
d
≤ u
d
[ X = x)dG(x)
=
_
∞
0
exp¦−x(Ψ
−1
(u
1
) + + Ψ
−1
(u
d
))¦dG(x)
= Ψ(Ψ
−1
(u
1
) + + Ψ
−1
(u
d
)).
10.6 Simulation from Gumbel and Clayton copulas
As seen from Proposition 10.13, the following algorithm shows how to simulate
from an Archimedean copula C of the form
C(u) = ϕ
−1
(ϕ(u
1
) + +ϕ(u
d
)),
where ϕ
−1
is the Laplace transform of a distribution function G on [0, ∞) with
G(0) = 0.
Algorithm 10.3
• Simulate a variate X with distribution function G such that the Laplace
transform Ψ of G is the inverse of the generator ϕ of the required copula C.
• Simulate independent standard uniform variates V
1
, . . . , V
d
.
• U = (Ψ(−ln(V
1
)/X), . . . , Ψ(−ln(V
d
)/X)) has distribution function C.
To verify that this is correct, notice that with U
k
= Ψ(−ln(V
k
)/X) we have
P(U
k
≤ u
k
[ X = x) = P(Ψ(−ln(V
k
)/x) ≤ u
k
)
= P(−lnV
k
≥ xΨ
−1
(u
k
))
= P(V
k
≤ exp¦−xΨ
−1
(u
k
)¦)
= exp¦−xΨ
−1
(u
k
)¦.
76
As we have seen, the generator ϕ(t) = t
−θ
− 1, θ > 0, generates the Clayton
copula C
Cl
θ
(u) = (u
−θ
1
+ +u
−θ
d
−d+1)
−1/θ
. Let X ∼ Gamma(1/θ, 1), i.e. let
X be Gammadistributed with density function f
X
(x) = x
1/θ−1
e
−x
/Γ(1/θ).
Then X has Laplace transform
E(e
−sX
) =
_
∞
0
e
−sx
1
Γ(1/θ)
x
1/θ−1
e
−x
dx = (s + 1)
−1/θ
= ϕ
−1
(s).
Hence, the following algorithm can be used for simulation from a Clayton copula.
Algorithm 10.4
• Simulate a variate X ∼ Gamma(1/θ, 1).
• Simulate independent standard uniform variates V
1
, . . . , V
d
.
• If Ψ(s) = (s + 1)
−1/θ
, then U = (Ψ(−ln(V
1
)/X), . . . , Ψ(−ln(V
d
)/X)) has
distribution function C
Cl
θ
.
This approach can also be used for simulation from Gumbel copulas. How
ever, in this case X is a random variable with a nonstandard distribution
which is not available in most statistical software. However, one can simu
late from bivariate Gumbel copulas using a diﬀerent method. Take θ ≥ 1 and
let F(x) = 1 − F(x) = exp(−x
1/θ
) for x ≥ 0. If (Z
1
, Z
2
) = (V S
θ
, (1 − V )S
θ
)
where V and S are independent, V is standard uniformly distributed and S has
density h(s) = (1−1/θ +(1/θ)s) exp(−s), then (F(Z
1
), F(Z
2
)) has distribution
function C
Gu
θ
, where
C
Gu
θ
(u
1
, u
2
) = exp(−[(−lnu
1
)
θ
+ (−lnu
2
)
θ
]
1/θ
).
This leads to the following algorithm for simulation from bivariate Gumbel
copulas.
Algorithm 10.5
• Simulate independent random variates V
1
, V
2
∼ U(0, 1).
• Simulate independent random variates W
1
, W
2
, where W
k
∼ Gamma(k, 1).
• Set S = 1
¦V
2
≤1/θ]
W
1
+ 1
¦V
2
>1/θ]
W
2
.
• Set (Z
1
, Z
2
) = (V
1
S
θ
, (1 −V
1
)S
θ
).
• U = (exp(−Z
1/θ
1
), exp(−Z
1/θ
2
)) has distribution function C
Gu
θ
.
77
10.7 Fitting copulas to data
We now turn to the problem of ﬁtting a parametric copula to data. Clearly,
ﬁrst one has to decide which parametric family to ﬁt. This is perhaps best done
by graphical means; if one sees clear signs of asymmetry in the dependence
structure as illustrated in Figure 24, then an Archimedean copula with this
kind of asymmetry might be a reasonable choice. Otherwise, see e.g. Figure 26,
one might go for an elliptical copula. It is also useful to check whether the data
shows signs of tail dependence, and depending on whether there are such signs
choose a copula with this property.
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
••
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
• •
•
••
•
•
••
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Gaussian
X1
Y
1
4 2 0 2 4

4

2
0
2
4
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
• •
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
• •
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
• •
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
t
X2
Y
2
4 2 0 2 4

4

2
0
2
4
Figure 26: Samples from two distributions with standard normal margins, R
12
=
0.8 but diﬀerent dependence structures. (X1, Y 1) has a Gaussian copula and
(X2, Y 2) has a t
2
copula.
We will consider the problem of estimating the parameter vector θ of a
copula C
θ
given an iid sample ¦X
1
, . . . , X
n
¦ where X
k
∼ F for some distribution
function F with continuous marginal distribution functions F
1
, . . . , F
d
and hence
a unique representation F(x) = C
θ
(F
1
(x
1
), . . . , F
d
(x
d
)).
We have seen that for Gaussian, t, Gumbel and Clayton copulas there are
simple relations between Kendall’s tau and certain copula parameters:
C
Ga
R
(u) = Φ
d
R
(Φ
−1
(u
1
), . . . , Φ
−1
(u
d
)), R
ij
= sin(π(
τ
)
ij
/2),
C
t
R
(u) = t
d
ν,R
(t
−1
ν
(u
1
), . . . , t
−1
ν
(u
d
)), R
ij
= sin(π(
τ
)
ij
/2),
C
Gu
θ
(u) = exp(−[(−lnu
1
)
θ
+ + (−lnu
d
)
θ
]
1/θ
), θ = 1/(1 −(
τ
)
ij
),
C
Cl
θ
(u) = (u
−θ
1
+ +u
−θ
d
−d + 1)
−1/θ
, θ = 2(
τ
)
ij
/(1 −(
τ
)
ij
),
where (
τ
)
ij
=
τ
(X
k,i
, X
k,j
). Hence parameter estimates for the copulas above
are obtained by simply replacing (
τ
)
ij
by its estimate (´
τ
)
ij
presented in Sec
tion 8.5.
78
10.8 Gaussian and tcopulas
For Gaussian and tcopulas of high dimensions, it might happen that
´
R, with
´
R
ij
= sin(π´
τ
ij
/2), is not positive deﬁnite. If this is the case, then one has to
replace
´
R by a linear correlation matrix R
∗
which is in some sense close to
´
R.
This can be achieved by the socalled eigenvalue method.
Algorithm 10.6
• Calculate the spectral decomposition
´
R = ΓΛΓ
T
, where Λ is a diagonal
matrix of eigenvalues of
´
R and Γ is an orthogonal matrix whose columns are
are eigenvectors of
´
R.
• Replace the negative eigenvalues in Λ by some small value δ > 0 to obtain
¯
Λ.
• Calculate
¯
R = Γ
¯
ΛΓ
T
which will be symmetric and positive deﬁnite but not
necessarily a linear correlation matrix since its diagonal elements might diﬀer
from one.
• Set
´
R = D
¯
RD, where D is a diagonal matrix with D
k,k
= 1/
_
¯
R
k,k
.
After having estimated R (with
´
R possibly modiﬁed to assure positive deﬁnite
ness) it remains to estimate the degrees of freedom parameter. We construct a
pseudosample ¦
´
U
1
, . . . ,
´
U
n
¦ of observations from the copula by componentwise
transformation with the estimated marginal distribution functions
´
F
1
, . . . ,
´
F
d
as
follows.
´
U
k
= (
´
F
1
(X
k,1
), . . . ,
´
F
d
(X
k,d
)), k = 1, . . . , n.
Either
´
F
k
can be taken as a ﬁtted parametric distribution function or as a
version of empirical distribution function:
´
F
k
(x) =
´
F
(β)
k
(x) =
1
n +β
n
j=1
1
¦X
j,k
≤x]
,
where β ∈ (0, 1] which guarantees that the pseudosample data lies within the
unit cube, i.e. that
´
U
k
∈ (0, 1)
d
. Given a pseudosample from the tcopula,
the degrees of freedom parameter ν can be estimated by maximum likelihood
estimation (MLE). A ML estimate of ν is obtained by maximizing
lnL(ξ,
´
U
1
, . . . ,
´
U
n
) =
n
k=1
ln c
ξ,
b
R
(
´
U
k
),
with respect to ξ, where c
ξ,R
denotes the density of a tcopula with ξ as degrees
of freedom parameter. The loglikelihood function for the tcopula is given by
ln L(ξ, R,
´
U
1
, . . . ,
´
U
n
)
=
n
k=1
lng
ξ,R
(t
−1
ξ
(
´
U
k,1
), . . . , t
−1
ξ
(
´
U
k,d
)) −
n
k=1
d
j=1
lng
ξ
(t
−1
ξ
(
´
U
k,j
)),
79
where g
ξ,R
denotes the joint density of a standard tdistribution with distri
bution function t
d
ξ,R
and g
ξ
denotes the density of a univariate standard t
distribution with distribution function t
ξ
. Hence an estimate of the degrees
of freedom parameter ν is obtained as the ξ ∈ (0, ∞) that maximizes the log
likelihood function lnL(ξ, R,
´
U
1
, . . . ,
´
U
n
).
80
11 Portfolio credit risk modeling
What is credit/default risk? The following explanation is given by Peter Crosbie
and Jeﬀrey R. Bohn in [5]:
Default risk is the uncertainty surrounding a ﬁrm’s ability to service its debts
and obligations. Prior to default, there is no way to discriminate unambiguously
between ﬁrms that will default and those that won’t. At best we can only make
probabilistic assessments of the likelihood of default. As a result, ﬁrms generally
pay a spread over the defaultfree rate of interest that is proportional to their
default probability to compensate lenders for this uncertainty.
If a ﬁrm (obligor) can not fulﬁll its commitments towards a lender, or coun
terparty in a ﬁnancial agreement, then we say that the ﬁrm is in default. Credit
risk also includes the risk related to events other than default such as up or
down moves in credit rating.
The loss suﬀered by a lender or counterparty in the event of default is usually
signiﬁcant and is determined largely by the details of the particular contract or
obligation. In most cases the obligor is able to repay a substantial amount of
the loan, so only a certain fraction of the entire loan is lost. For example, typical
loss rates in the event of default for senior secured bonds, subordinated bonds
and zero coupon bonds are 49%, 68%, and 81%, respectively.
In this chapter we will introduce a general framework for modeling portfolios
subject to credit risk.
11.1 A simple model
Consider a portfolio consisting of n loans (or bonds) subject to default. That
the loan is subject to default means that with some probability p
i
, obligor i will
not be able to repay his debt. Each loan has a certain loan size L
i
. If there
is a default then the lender does not lose the entire amount L
i
but rather a
proportion 1 −λ
i
of the loan size. We call λ
i
∈ [0, 1] the recovery rate of loan i.
The lossgivendefault for loan number i which is the amount lost by the lender
in the case of default is given by
LGD
i
= (1 −λ
i
)L
i
.
At some time T, say one year from now, each obligor can be in either of two
states, default or nondefault. We model the state of each obligor at time T by
a Bernoulli random variable
X
i
=
_
1 if obligor i is in default,
0 otherwise.
The default probability of obligor i is then given by p
i
= P(X
i
= 1). The total
loss at time T due to obligors defaulting is then given by
L =
n
i=1
X
i
LGD
i
=
n
i=1
X
i
(1 −λ
i
)L
i
.
81
An important issue in quantitative credit risk management is to understand the
distribution of the random variable L. Given that we know the size L
i
of each
loan we need to model the multivariate random vector (X
1
, . . . , X
n
, λ
1
, . . . , λ
n
)
in order to derive the loss distribution of L. Most commercial models in use
today assume the recovery rates λ
i
to be independent of X = (X
1
, . . . , X
n
)
and independent of each other. This leaves essentially the joint distribution of
default indicators X to be modeled.
The most simple model we may think of is when all loan sizes are equal
L
i
= L
1
, all recovery rates are deterministic and equal λ
i
= λ
1
and all default
indicators X
i
are iid with default probability p. Then the loss is given by
L = LGD
1
N, where N =
n
i=1
X
i
is Binomial(n, p)distributed. Below we will
study some more sophisticated models for the default indicators X.
11.2 Latent variable models
Since it is practically impossible to obtain historical observations of the default
indicator X
i
for a given obligor i (it is rather unusual that the ﬁrm has defaulted
many times before) it is a good idea to divide all obligors into m homogeneous
groups. Within each group all obligors (ﬁrms) have the same default probability.
Estimation of default probabilities can then be based upon how many obligors
that have defaulted within each group, leading to larger sample sizes. To this
end we may introduce a state variable S = (S
1
, . . . , S
n
), where S
i
represents
the state of the obligor i. We suppose that the state is an integer in the set
¦0, . . . , m¦ with S
i
= 0 indicating that obligor i is in the default state. The
other states may be thought of as the obligor being in diﬀerent rating classes.
We let X
i
denote the default indicator of obligor i, i.e.
X
i
=
_
0 if S
i
,= 0,
1 if S
i
= 0.
The vector X = (X
1
, . . . , X
n
) is the vector of default indicators and the default
probability is p
i
= P(X
i
= 1).
Often the state variables S = (S
1
, . . . , S
n
) are modeled using a vector of so
called latent variables Y = (Y
1
, . . . , Y
n
); Y
i
representing for instance the value
of the assets, or asset returns, of obligor i. Typically we have a number of
thresholds d
ij
, i = 1, . . . , n, j = 0, . . . , m+1, with d
i0
= −∞ and d
i(m+1)
= ∞.
The state of S
i
is then given through Y
i
by
S
i
= j if Y
i
∈ (d
ij
, d
i(j+1)
].
Let F
i
denote the distribution of Y
i
. Default occurs if Y
i
≤ d
i1
and hence the
default probability is given by p
i
= F
i
(d
i1
). The probability that the ﬁrst k
obligors, say, default is then given by (the F
i
s are assumed to be continuous)
p
1...k
= P(Y
1
≤ d
11
, . . . , Y
k
≤ d
k1
)
= C(F
1
(d
11
), . . . , F
k
(d
k1
), 1, . . . , 1)
= C(p
1
, . . . , p
k
, 1, . . . , 1),
82
where C denotes the copula of Y. As the marginal default probabilities F
i
(d
i1
)
are small, the joint default probability will depend heavily on the choice of
copula C.
Example 11.1 Consider a loan portfolio with n = 100 obligors where the credit
risk is modeled using a latent variable model with copula C. Suppose that C is
an exchangeable copula, i.e. that
C(u
1
, . . . , u
n
) = C(u
π(1)
, . . . , u
π(n)
),
where π is an arbitrary permutation of ¦1, . . . , n¦. Suppose further that the
individual default probability of each obligor is equal to p = 0.15, i.e. p
i
=
p = 0.15. Let N denote the number of defaults and let ρ
τ
(Y
i
, Y
j
) = τ, i ,= j
denote Kendall’s tau between any two latent variables (which are assumed to
have continuous distribution functions). We assume that τ = 0 and we simulate
the number of defaults 10
5
times and illustrate the distribution of the number
of defaults in a histogram when
(a) C is a Gaussian copula and (b) C is a t
4
copula. The histograms are shown
in Figure 27. One clearly sees that zero correlation is far from independence if
the dependence structure is nonGaussian.
11.3 Mixture models
The random vector X = (X
1
, . . . , X
n
) follows a Bernoulli mixture model if
there is a random vector Z = (Z
1
, . . . , Z
m
), m < n, and functions f
i
: R
m
→
[0, 1], i ∈ ¦1, . . . , n¦ such that conditional on Z, X is a vector of independent
Bernoulli random variables with
P(X
i
= 1 [ Z) = f
i
(Z), P(X
i
= 0 [ Z) = 1 −f
i
(Z).
For x = (x
1
, . . . , x
n
) ∈ ¦0, 1¦
n
we then have
P(X = x [ Z) =
n
i=1
f
i
(Z)
x
i
(1 −f
i
(Z))
1−x
i
.
The unconditional distribution is then given by
P(X = x) = E(P(X = x [ Z)) = E
_
n
i=1
f
i
(Z)
x
i
(1 −f
i
(Z))
1−x
i
_
.
If all the functions f
i
are equal, f
i
= f, then, conditional on Z, the number of
defaults N =
m
i=1
X
i
is Bin(n, f(Z))distributed.
The random vector X = (X
1
, . . . , X
n
) follows a Poisson mixture model
if there is a random vector Z = (Z
1
, . . . , Z
m
), m < n, and functions λ
i
: R
m
→
(0, ∞), i ∈ ¦1, . . . , n¦ such that conditional on Z, X is a vector of independent
Po(λ
i
(Z))distributed random variables. In this case we have
P(X
i
= x
i
[ Z) =
λ
i
(Z)
x
i
x
i
!
e
−λ
i
(Z)
, x
i
∈ N.
83
Histogram of ga0defaults
ga0defaults
F
r
e
q
u
e
n
c
y
5 10 15 20 25 30
0
2
0
0
0
4
0
0
0
6
0
0
0
8
0
0
0
1
0
0
0
0
Histogram of t0defaults
t0defaults
F
r
e
q
u
e
n
c
y
0 10 20 30 40 50 60
0
1
0
0
0
2
0
0
0
3
0
0
0
4
0
0
0
Figure 27: Histograms of the number of defaults: (a) Gaussian copula (upper)
and (b) t
4
copula (lower).
84
For x = (x
1
, . . . , x
n
) ∈ N
n
we then have
P(X = x [ Z) =
n
i=1
λ
i
(Z)
x
i
x
i
!
e
−λ
i
(Z)
.
The unconditional distribution is then given by
P(X = x) = E(P(X = x [ Z)) = E
_
n
i=1
λ
i
(Z)
x
i
x
i
!
e
−λ
i
(Z)
_
.
The use of Poisson mixture models for modeling defaults can be motivated as
follows. Suppose that
¯
X = (
¯
X
1
, . . . ,
¯
X
n
) follows a Poisson mixture model with
factors Z. Put X
i
= I
[1,∞)
(
¯
X
i
). Then X = (X
1
, . . . , X
n
) follows a Bernoulli
mixture model with
f
i
(Z) = 1 −e
−λ
i
(Z)
.
If the Poisson parameters λ
i
(Z) are small then
¯
N =
n
i=1
X
i
is approximately
equal to the number of defaulting obligors and conditional on Z,
¯
N is Poisson(λ)
distributed with λ(Z) =
n
i=1
λ
i
(Z).
Example 11.2 A bank has a loan portfolio of 100 loans. Let X
k
be the default
indicator for loan k such that X
k
= 1 in case of default and 0 otherwise. The
total number of defaults is N = X
1
+ +X
100
.
(a) Suppose that X
1
, . . . , X
100
are independent and identically distributed with
P(X
1
= 1) = 0.01. Compute E(N) and P(N = k) for k ∈ ¦0, . . . , 100¦.
(b) Consider the risk factor Z which reﬂects the state of the economy. Suppose
that conditional on Z, the default indicators are independent and identically
distributed with P(X
1
= 1 [ Z) = Z, where
P(Z = 0.01) = 0.9 and P(Z = 0.11) = 0.1.
Compute E(N).
(c) Consider the risk factor Z which reﬂects the state of the economy. Suppose
that conditional on Z, the default indicators are independent and identically
distributed with
P(X
1
= 1 [ Z) = Z
9
,
where Z is uniformly distributed on (0, 1). Compute E(N).
Solution (a): We have N ∼ Binomial(100, 0.01). Hence, E(N) = 100 0.01 = 1
and
P(N = k) =
_
100
k
_
0.01
k
0.99
100−k
.
85
Solution (b): We have N [ Z ∼ Binomial(100, Z). Hence,
E(N) = E(E(N [ Z)) = E(100Z) = 100 E(Z)
= 100(0.01 0.9 + 0.11 0.1) = 0.9 + 1.1 = 2.
Solution (c): We have N [ Z ∼ Binomial(100, Z
9
). Hence,
E(N) = E(E(N [ Z)) = E(100Z
9
) = 100 E(Z
9
)
= 100 0.1 = 10.
11.4 Onefactor Bernoulli mixture models
In this section we will consider the Bernoulli mixture model where Z is univari
ate, Z = Z, i.e. we only have one factor and all the functions f
i
are equal, f
i
= f.
This means that all marginal default probabilities are equal and the number of
defaults N satisﬁes N [ Z ∼ Binomial(n, f(Z)). Moreover, the unconditional
probability that only the ﬁrst k obligors defaults is given by
P(X
1
= 1, . . . , X
k
= 1, X
k+1
= 0, . . . , X
n
= 0)
= E(P(X
1
= 1, . . . , X
k
= 1, X
k+1
= 0, . . . , X
n
= 0 [ Z))
= E(f(Z)
k
(1 −f(Z))
n−k
).
To determine the unconditional default probabilities, number of defaults, etc. we
need to specify the distribution function G of Z. Given G, the unconditional
probability that the ﬁrst k obligors defaults is given by
P(X
1
= 1, . . . , X
k
= 1, X
k+1
= 0, . . . , X
n
= 0) =
_
∞
−∞
f(z)
k
(1 −f(z))
n−k
G(dz)
and the number of defaulting obligors N has unconditional distribution
P(N = k) =
_
n
k
__
∞
−∞
f(z)
k
(1 −f(z))
n−k
G(dz).
Notice also that
Cov(X
i
, X
j
) = E(X
i
X
j
) −E(X
i
) E(X
j
)
= E(E(X
i
X
j
[ Z)) −E(E(X
i
[ Z)) E(E(X
j
[ Z))
= E(f(Z)
2
) −E(f(Z))
2
= var(f(Z)).
We have N = E(N [ Z) +N −E(N [ Z) and
E(N) = E(E(N [ Z)) = nE(f(Z)) = np
1
,
var(N) = E(var(N [ Z)) + var(E(N [ Z))
= E(nf(Z)(1 −f(Z))) + var(nf(Z))
= nE(f(Z)(1 −f(Z))) +n
2
var(f(Z)).
86
Notice that by Markov’s inequality
P([N/n −f(Z)[ > ε [ Z) ≤
var(N/n [ Z)
ε
2
=
f(Z)(1 −f(Z))
nε
2
.
Hence, for every ε > 0,
P([N/n −f(Z)[ > ε) = E(P([N/n −f(Z)[ > ε [ Z)) ≤
E(f(Z)(1 −f(Z)))
nε
2
Hence, N/n
P
→ f(Z) as n → ∞ which justiﬁes the approximation N/n ≈ f(Z)
for n large. (In fact it hold that N/n →f(Z) a.s. as n →∞.)
11.5 Probit normal mixture models
In several portfolio credit risk models (see the section about the KMV model
below) the default indicators X
i
, i = 1, . . . , n, have the representation X
i
= 1
if and only if
√
Z +
√
1 −W
i
≤ d
i1
, where ∈ [0, 1] and Z, W
1
, . . . , W
n
are iid and standard normally distributed. Assuming equal individual default
probabilities p = P(X
i
= 1) we have d
i1
= Φ
−1
(p) and hence
X
i
= I
(−∞,Φ
−1
(p)]
(
√
Z +
_
1 −W
i
).
This gives
f(Z) = P(X
i
= 1 [ Z) = P(
√
Z +
_
1 −W
i
≤ Φ
−1
(p) [ Z)
= Φ
_
Φ
−1
(p)
√
1 −
+
√
Z
√
1 −
_
.
This leads to
VaR
q
(f(Z)) = Φ
_ √
√
1 −
Φ
−1
(q) +
1
√
1 −
Φ
−1
(p)
_
.
Setting q = 0.999 and using the approximation N/n ≈ f(Z) motivated above,
we arrive at the “Basel formula” for capital requirement as a fraction of the
total exposure for a homogeneous portfolio with individual default probabilities
p:
Capital requirement = c
1
c
2
_
Φ
_ √
√
1 −
Φ
−1
(0.999) +
1
√
1 −
Φ
−1
(p)
_
−p
_
,
where c
1
is the fraction of the exposure lost in case of default and c
2
is a constant
for maturity adjustments. The asset return correlation coeﬃcient is assigned
a value that depends on the asset type and also the size and default probability
of the borrowers.
87
11.6 Beta mixture models
For the Beta mixing distribution we assume that Z ∼ Beta(a, b) and f(z) = z.
It has density
g(z) =
1
β(a, b)
z
a−1
(1 −z)
b−1
, a, b > 0, z ∈ (0, 1),
where
β(a, b) =
_
1
0
z
a−1
(1 −z)
b−1
dz =
Γ(a)Γ(b)
Γ(a +b)
and hence, using that Γ(z + 1) = zΓ(z),
E(Z) =
1
β(a, b)
_
1
0
z
a
(1 −z)
b−1
dz =
β(a + 1, b)
β(a, b)
=
Γ(a + 1)Γ(b)
Γ(a +b + 1)
Γ(a +b)
Γ(a)Γ(b)
=
a
a +b
,
E(Z
2
) =
a(a + 1)
(a +b)(a +b + 1)
.
We immediately get that the number of defaults N has distribution
P(N = k) =
_
n
k
__
1
0
z
k
(1 −z)
n−k
g(z)dz
=
_
n
k
_
1
β(a, b)
_
1
0
z
a+k−1
(1 −z)
n−k+b−1
dz
=
_
n
k
_
β(a +k, b +n −k)
β(a, b)
,
which is called the betabinomial distribution. This probability function is il
lustrated in Figure 28. The expected number of defaults is easily computed.
E(N) = E(E(N [ Z)) = nE(E(X
1
[ Z)) = nE(Z) = n
a
a +b
.
If we have estimated the default probabilities P(X
i
= 1) and P(X
i
= X
j
= 1),
i ,= j, then the parameters a and b can be determined from the relations
P(X
i
= 1) = E(Z) =
a
a +b
, P(X
i
= X
j
= 1) = E(Z
2
) =
a(a + 1)
(a +b)(a +b + 1)
.
Example 11.3 Notice that if we only specify the individual default probability
p = P(X
i
= 1), then we know very little about the model. For example,
p = 0.01 = a/(a + b) for (a, b) = (0.01, 0.99), (0.1, 9.9), (1, 99), but the diﬀerent
choices of (a, b) leads to quite diﬀerent models. This is shown in the table below
which considers a portfolio with n = 1000 obligors.
88
0 20 40 60 80 100
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
0
.
1
0
0
.
1
2
0
.
1
4
number of defaults
p
r
o
b
0 20 40 60 80 100
0
.
0
0
0
.
0
1
0
.
0
2
0
.
0
3
0
.
0
4
0
.
0
5
0
.
0
6
0
.
0
7
number of defaults
p
r
o
b
Figure 28: The probability function for the number of defaults in a Beta mixture
model with n = 100 obligors and a = 0.5, b = 10 (left), a = 2, b = 20 (right).
(a, b) p corr(X
i
, X
j
) VaR
0.99[9]
(N) VaR
0.99[9]
(nZ)
(1,99) 0.01 0.01 47 [70] 45 [67]
(0.1,9.9) 0.01 0.09 155 [300] 155 [299]
(0.01,0.99) 0.01 0.5 371 [908] 371 [908]
Notice also how accurate the approximation N ≈ nZ is!
89
12 Popular portfolio credit risk models
In this chapter we will present the commercial models currently used by prac
titioners such as the KMV model and CreditRisk
+
. Interesting comparisons of
these models are given in [6] and [10].
12.1 The KMV model
A popular commercial model for credit risk is the socalled KMV model provided
by Moody’s KMV (www.moodyskmv.com). It is an example of a latent variable
model where the state variables S = (S
1
, . . . , S
n
) can only have two states
(m = 1). The latent variables Y = (Y
1
, . . . , Y
n
) are related to the value of the
assets of each ﬁrm in the following way.
The Merton model
It is assumed that the balance sheet of each ﬁrm consist of assets and liabilities.
The liabilities are divided into debt and equity. The value of the assets of the
ith ﬁrm at time T is denoted by V
A,i
(T), the value of the debt by K
i
and the
value of the equity of the ﬁrm at time T by V
E,i
(T). It is assumed that the
future asset value is modeled by a geometric Brownian motion
V
A,i
(T) = V
A,i
(t) exp
__
µ
A,i
−
σ
2
A,i
2
_
(T −t) +σ
A,i
_
W
i
(T) −W
i
(t)
__
(12.1)
where µ
A,i
is the drift, σ
A,i
the volatility and (W
i
(t); 0 ≤ t ≤ T) a Brownian
motion. In particular this means that W
i
(T) − W
i
(t) ∼ N(0, T − t) and hence
that ln V
A,i
(T) is normal with mean ln V
A,i
(t) + (µ
A,i
− σ
2
A,i
/2)(T − t) and
variance σ
2
A,i
(T − t). The ﬁrm defaults if at time T the value of the assets are
less than the value of the debt. That is, the default indicator X
i
is given by
X
i
= I
(−∞,K
i
)
(V
A,i
(T)).
Writing
Y
i
=
W
i
(T) −W
i
(t)
√
T −t
we get Y
i
∼ N(0, 1) and
X
i
= I
(−∞,K
i
)
(V
A,i
(T)) = I
(−∞,−DD
i
)
(Y
i
)
with
−DD
i
=
lnK
i
−ln V
A,i
(t) + (σ
2
A,i
/2 −µ
A,i
)(T −t)
σ
A,i
√
T −t
.
The quantity DD
i
is called the distancetodefault. In principle the default
probability can then be computed as P(V
A,i
(T) < K
i
) = P(Y
i
< −DD
i
). Hence,
in the general setup of a two state latent variable model we have Y
i
∼ N(0, 1)
and default thresholds d
i1
= −DD
i
.
90
Computing the distancetodefault
To compute the distancetodefault we need to ﬁnd V
A,i
(t), σ
A,i
and µ
A,i
. A
problem here is that the value of the ﬁrms assets V
A,i
(t) can not be observed
directly. However, the value of the ﬁrms equity can be observed by looking
at the market stock prices. KMV therefore takes the following viewpoint: the
equity holders have the right but not the obligation, at time T, to pay oﬀ the
holders of the other liabilities and take over the remaining assets of the ﬁrm.
That is, the debt holders own the ﬁrm until the debt is paid oﬀ in full by the
equity holders. This can be viewed as a call option on the ﬁrm’s assets with a
strike price equal to the debt. That is, at time T we have the relation
V
E,i
(T) = max(V
A,i
(T) −K
i
, 0).
The value of equity at time t, V
E,i
(t), can then be thought of as the price of a
call option with the value of assets as underlying and strike price K
i
. Under
some simplifying assumptions the price of such an option can be computed using
the BlackScholes option pricing formula. This gives
V
E,i
(t) = C(V
A,i
(t), σ
A,i
, r),
where
C(V
A,i
(t), σ
A,i
, r) = V
A,i
(t)Φ(e
1
) −K
i
e
−r(T−t)
Φ(e
2
),
e
1
=
ln V
A,i
(t) −ln K
i
+ (r +σ
2
A,i
/2)(T −t)
σ
A,i
√
T −t
e
2
= e
1
−σ
A,i
√
T −t,
Φ is the distribution function of the standard normal and r is the risk free inter
est rate (investors use e.g. the interest rate on a threemonth U.S. Treasury bill
as a proxy for the riskfree rate, since shortterm governmentissued securities
have virtually zero risk of default). KMV also introduces a relation between the
volatility σ
E,i
of V
E,i
and the volatility σ
A,i
of V
A,i
by
σ
E,i
= g(V
A,i
(t), σ
A,i
, r),
where g is some function. Using observed/estimated values of V
E,i
(t) and σ
E,i
the relation
V
E,i
(t) = C(V
A,i
(t), σ
A,i
, r)
σ
E,i
= g(V
A,i
(t), σ
A,i
, r)
is inverted to obtain V
A,i
(t) and σ
A,i
which enables computation of the distance
todefault DD
i
.
The expected default frequency
To ﬁnd the default probability corresponding to the distancetodefault DD
i
KMV do not actually use the probability P(Y
i
< −DD
i
). Instead they use
91
historical data to search for all companies which at some stage in their history
had approximately the same distancetodefault. Then the observed default
frequency is converted into an actual probability. In the terminology of KMV
this estimated default probability p
i
is called Expected Default Frequency (EDF).
To summarize: In order to compute the probability of default with the KMV
model the following steps are required:
(i) Estimate asset value and volatility. The asset value and asset volatility of
the ﬁrm is estimated from the market value and volatility of equity and
the book of liabilities.
(ii) Calculate the distancetodefault.
(iii) Calculate the default probability using the empirical distribution relating
distancetodefault to a default probability.
The multivariate KMV model
In the Merton model above we did not introduce any dependence between the
value of the assets for diﬀerent ﬁrms. We only considered each ﬁrm separately.
To compute joint default probabilities and the distribution of the total credit
loss it is natural to introduce dependence between the default indicators by
making the asset value processes V
A,i
dependent. The following methodology
is used by KMV. Let (W
j
(t) : 0 ≤ t ≤ T, j = 1, . . . , m) be m independent
standard Brownian motions. The evolution (12.1) of asset i is then replaced by
V
A,i
(T) = V
A,i
(t) exp
__
µ
A,i
−
σ
2
A,i
2
_
(T −t) +
m
j=1
σ
A,i,j
_
W
j
(T) −W
j
(t)
__
,
where
σ
2
A,i
=
m
j=1
σ
2
A,i,j
.
Here, σ
A,i,j
gives the magnitude of which asset i is inﬂuenced by the jth Brow
nian motion. The event V
A,i
(T) < K
i
that company i defaults is equivalent
to
m
j=1
σ
A,i,j
(W
j
(T) −W
j
(t)) < lnK
i
−lnV
A,i
(t) +
_
σ
2
A,i
2
−µ
A,i
_
(T −t).
If we let
Y
i
=
m
j=1
σ
A,i,j
(W
j
(T) −W
j
(t))
σ
A,i
√
T −t
,
then Y = (Y
1
, . . . , Y
n
) ∼ N
n
(0, Σ) with
Σ
ij
=
m
k=1
σ
A,i,k
σ
A,j,k
σ
A,i
σ
A,j
92
and the above inequality can be written as
Y
i
<
lnK
i
−ln V
A,i
(t) +
_
σ
2
A,i
2
−µ
A,i
_
(T −t)
σ
A,i
√
T −t
. ¸¸ .
−DD
i
.
Hence, in the language of a general latent variable model the probability that
the ﬁrst k ﬁrms default is given by
P(X
1
= 1, . . . , X
k
= 1) = P(Y
1
< −DD
1
, . . . , Y
k
< −DD
k
)
= C
Ga
Σ
(Φ(−DD
1
), . . . , Φ(−DD
k
), 1 . . . , 1),
where C
Ga
Σ
is the copula of a multivariate normal distribution with covariance
matrix Σ. As in the univariate case KMV do not use that default probability
resulting from the latent variable model but instead use the expected default
frequencies EDF
i
. In a similar way KMV use the joint default frequency
JDF
1...k
= C
Ga
Σ
(EDF
1
, . . . , EDF
k
, 1 . . . , 1),
as the default probability of the ﬁrst k ﬁrms.
Estimating the correlations
Estimating the correlations of the latent variables in Y is not particularly easy
as the dimension n is typically very large and there is limited available histor
ical data. Moreover, estimating pairwise correlations will rarely give a positive
deﬁnite correlation matrix if the dimension is large. A way around these prob
lems is to use a factor model where the asset value, or more precisely the latent
variables Y is divided into k key factors and one ﬁrm speciﬁc factor. The key
factors are typically macroeconomic factors such as
• Global economic eﬀects
• Regional economic eﬀects
• Sector eﬀects
• Country speciﬁc eﬀects
• Industry speciﬁc eﬀects
If we write
Y
i
d
=
k
j=1
a
ij
Z
j
+b
i
U
i
, i = 1, . . . , n,
where Z = (Z
1
, . . . , Z
k
) ∼ N
k
(0, Λ) is independent of U = (U
1
, . . . , U
n
) ∼
N
n
(0, I). then the covariance matrix of the right hand side is given by AΛA
T
+D
where A
ij
= a
ij
and D is a diagonal (n n) matrix with entries D
ii
= b
2
i
.
93
12.2 CreditRisk
+
– a Poisson mixture model
This material presented here on the CreditRisk
+
model is based on [6], [10] and
[4].
CreditRisk+ is a commercial model for credit risk developed by Credit Suisse
First Boston and is an example of a Poisson mixture model. The risk factors
Z
1
, . . . , Z
m
are assumed to be independent Z
j
∼ Gamma(α
j
, β
j
) and we have
λ
i
(Z) = λ
i
m
j=1
a
ij
Z
j
,
m
j=1
a
ij
= 1, a
ij
≥ 0.
for i = 1, . . . , n. Here λ
i
> 0 are constants. The density of Z
j
is given by
f
j
(z) =
z
α
j
−1
exp¦−z/β
j
¦
β
α
j
j
Γ(α
j
)
.
The parameters α
j
, β
j
are chosen so that α
j
β
j
= 1 and then E(Z
j
) = 1 and
E(λ
i
(Z)) = λ
i
. Notice that the expected number of defaults, E(N), is given by
E(N) = E(E(N [ Z)) =
n
i=1
E(E(X
i
[ Z))
=
n
i=1
E(λ
i
(Z)) =
n
i=1
λ
i
m
j=1
a
ij
E(Z
j
) =
n
i=1
λ
i
.
The lossgivendefault LGD
i
of obligor i is modeled as a constant fraction 1−λ
i
of the loan size L
i
,
LGD
i
= (1 −λ
i
)L
i
, i = 1, . . . , n.
Here λ
i
is the (deterministic) expected recovery rate. Each loss amount is then
expressed as an integer multiple v
i
of a ﬁxed base unit of loss (e.g. one million
dollars) denoted L
0
. Then we have
LGD
i
= (1 −λ
i
)L
i
≈
_
(1 −λ
i
)L
i
L
0
_
L
0
= v
i
L
0
, i = 1, . . . , n,
where [x] denotes the nearest integer of x (x − [x] ∈ (−1/2, 1/2]). In this way
every LGD
i
can be expressed as a ﬁxed integer multiple v
i
of a predeﬁned base
unit of loss L
0
. The main idea here is to approximate the total loss distribution
by a discrete distribution. For this discrete distribution it is possible to compute
its probability generating function (pgf) g.
Recall the deﬁnition of the pgf for a discrete random variable Y with values
in ¦y
1
, . . . , y
m
¦,
g
Y
(t) = E(t
Y
) =
m
i=1
t
y
i
P(Y = y
i
)
Recall the following formulas for probability generating functions.
94
(i) If Y ∼ Bernoulli(p) then g
Y
(t) = 1 +p(t −1).
(ii) If Y ∼ Poisson(λ) then g
Y
(t) = exp¦λ(t −1)¦.
(iii) If X
1
, . . . , X
n
are independent random variables then
g
X
1
++X
n
(t) =
n
i=1
g
X
i
(t).
(iv) Let Y have density f and let g
X]Y =y
(t) be the pgf of X[Y = y. Then
g
X
(t) =
_
g
X]Y =y
(t)f(y)dy.
(v) If X has pgf g
X
(t) then
P(X = k) =
1
k!
g
(k)
(0), with g
(k)
(t) =
d
k
g(t)
dt
k
.
The pgf of the loss distribution
Let us derive the pgf of the loss distribution
L =
n
i=1
X
i
v
i
L
0
.
First we determine the conditional pgf of the number of defaults N = X
1
+ +
X
n
given Z = (Z
1
, . . . , Z
m
). Given Z the default intensities λ
1
(Z), . . . , λ
n
(Z)
are known so conditional on Z the default indicators are independent and
Poisson(λ
i
(Z))distributed. Hence
g
X
i
]Z
(t) = exp¦λ
i
(Z)(t −1)¦, i = 1, . . . , n.
For N we now obtain
g
N]Z
(t) =
n
i=1
g
X
i
]Z
=
n
i=1
exp¦λ
i
(Z)(t −1)¦ = exp¦µ(t −1)¦,
µ =
n
i=1
λ
i
(Z).
95
Next we use (iv) to derive the unconditional distribution of the number of de
faults N.
g
N
(t) =
_
∞
0
_
∞
0
g
N]Z=(z
1
,...,z
m
)
(t)f
1
(z
1
) f
m
(z
m
)dz
1
dz
m
=
_
∞
0
_
∞
0
exp
_
(t −1)
n
i=1
_
λ
i
m
j=1
a
ij
z
j
_
_
f
1
(z
1
) f
m
(z
m
)dz
1
dz
m
=
_
∞
0
_
∞
0
exp
_
(t −1)
m
j=1
_
n
i=1
λ
i
a
ij
. ¸¸ .
µ
j
_
z
j
_
f
1
(z
1
) f
m
(z
m
)dz
1
dz
m
=
_
∞
0
_
∞
0
exp¦(t −1)µ
1
z
1
¦f
1
(z
1
)dz
1
exp¦(t −1)µ
m
z
m
¦f
m
(z
m
)dz
m
=
m
j=1
_
∞
0
exp¦zµ
j
(t −1)¦
1
β
α
j
j
Γ(α
j
)
z
α
j
−1
exp¦−z/β
j
¦dz.
Each of the integrals in the product can be computed as
_
∞
0
1
β
α
j
j
exp¦zµ
j
(t −1)¦z
α
j
−1
exp¦−z/β
j
¦dz
=
1
β
α
j
j
Γ(α
j
)
_
∞
0
z
α
j
−1
exp¦−z(β
−1
j
−µ
j
(t −1)¦dz
=
_
u = z(β
−1
j
−µ
j
(t −1))
_
=
Γ(α
j
)
β
α
j
j
Γ(α
j
)(β
−1
j
−µ
j
(t −1))
α
j
_
∞
0
1
Γ(α
j
)
u
α
j
−1
exp¦−u¦du
=
1
(1 +β
j
µ
j
(t −1))
α
j
=
_
1 −δ
j
1 −δ
j
t
_
α
j
,
where δ
j
= β
j
µ
j
/(1 +β
j
µ
j
). Finally we obtain
g
N
(t) =
m
j=1
_
1 −δ
j
1 −δ
j
t
_
α
j
. (12.2)
Similar computations will lead us to the pgf of the loss distribution. Conditional
on Z the loss of obligor i is given by
L
i
[Z = v
i
(X
i
[Z)
Since the variables X
i
[Z, i = 1, . . . , n, are independent so are the variables L
i
[Z,
i = 1, . . . , n. The pgf of L
i
[Z is
g
L
i
]Z
(t) = E(t
L
i
[Z) = E(t
v
i
X
i
[Z) = g
X
i
]Z
(t
v
i
).
96
Hence, the pgf of the total loss conditional on Z is
g
L]Z
(t) = g
L
1
++L
n
]Z
(t) =
n
i=1
g
L
i
]Z
(t) =
n
i=1
g
X
i
]Z
(t
v
i
)
= exp
_
m
j=1
Z
j
_
n
i=1
λ
i
a
ij
(t
v
i
−1)
_
_
.
Similar to the previous computation we obtain
g
L
(t) =
m
j=1
_
1 −δ
j
1 −δ
j
Λ
j
(t)
_
α
j
, Λ
j
(t) =
1
µ
j
n
i=1
λ
i
a
ij
t
v
i
with µ
j
and δ
j
as above. The loss distribution is then obtained by inverting the
pgf. In particular, the pgf g
L
(t) uniquely determines the loss distribution.
Example 12.1 Consider a portfolio that consists of n = 100 obligors. Recall
the probability generating function (12.2) for the number of defaults N in the
CreditRisk
+
model. In order to compute the probabilities P(N = k), k ≥ 0, we
need to compute the derivatives
g
(k)
N
(0) =
d
k
g
N
dt
k
(0).
It can be shown that g
(k)
N
(0) satisﬁes the recursion formula
g
(k)
N
(0) =
k−1
l=0
_
k −1
l
_
g
(k−1−l)
N
(0)
m
j=1
l! α
j
δ
l+1
j
, k ≥ 1
(show this!) Assume that λ
i
= λ = 0.15, α
j
= α = 1, β
j
= β = 1, a
ij
=
a = 1/m. To better understand the model we plot the function P(N = k) for
k = 0, . . . , 100 when m = 1 and when m = 5. The result is shown in Figure 29.
One can interpret the plot as follows. With only one risk factor, m = 1,
to which all default indicators are linked, we either have many default or we
have few defaults. Having approximately E(N) = 15 defaults is unlikely. With
m = 5 independent risk factors there is a diversiﬁcation eﬀect. In this case it is
most likely that we have approximately E(N) = 15 defaults.
97
0 20 40 60 80 100
0
.
0
0
0
.
0
1
0
.
0
2
0
.
0
3
0
.
0
4
0
.
0
5
0
.
0
6
0
.
0
7
k
P
(
N
=
k
)
0 20 40 60 80 100
0
.
0
0
0
.
0
1
0
.
0
2
0
.
0
3
0
.
0
4
0
.
0
5
0
.
0
6
0
.
0
7
k
P
(
N
=
k
)
Figure 29: P(N = k) for k = 0, . . . , 100 for m = 1 and m = 5. For m = 1 the
probability P(N = k) is decreasing in k, for m = 5 the probability P(N = k)
ﬁrst increasing in k and then decreasing.
References
[1] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1999) Coherent Measures
of Risk, Mathematical Finance 9, 203228.
[2] Basel Committee on Banking Supervision (1988) International Convergence
of Capital Measurement and Capital Standards. Available at
http://www.bis.org/publ/bcbs04A.pdf.
[3] Campbell, J.Y., Lo, A.W. and MacKinlay, A. (1997) The Econometrics of
Financial Markets, Princeton University Press, Princeton.
[4] Credit Suisse First Boston (1997) CreditRisk
+
: a Credit Management
Framework. Available at
http://www.csfb.com/institutional/research/assets/creditrisk.pdf
[5] Crosbie, P. and Bohn, J. (2001) Modelling Default Risk. K.M.V. Corpora
tion. Available at
http://www.moodyskmv.com/research/whitepaper/Modeling Default Risk.pdf.
[6] Crouhy, M., Galai, D. and Mark, R. (2000) A comparative analysis of
current credit risk models, Journal of Banking & Finance 24, 59–117.
[7] Embrechts, P., McNeil, A. and Straumann, D. (2002) Correlation and de
pendence in risk management: properties and pitfalls. In: Risk manage
ment: value at risk and beyond, edited by Dempster M, published by Cam
bridge University Press, Cambridge.
[8] Embrechts, P., Kl¨ uppelberg, C. and Mikosch, T. (1997) Modelling Extremal
Events for Insurance and Finance, Springer Verlag, Berlin.
[9] Fang, K.T., Kotz, S. and Ng, K.W. (1987) Symmetric Multivariate and
Related Distributions, Chapman & Hall, London.
[10] Gordy, M.B. (2000) A comparative anatomy of credit risk models, Journal
of Banking & Finance 24, 119149.
[11] Gut, A. (1995) An Intermediate Course in Probability, Springer Verlag,
New York.
[12] McNeil, A., Frey, R., Embrechts, P. (2005) Quantitative Risk Management:
Concepts, Techniques, and Tools, Princton University Press.
[13] Nelsen, R.B. (1999) An Introduction to Copulas, Springer Verlag, New
York.
99
A A few probability facts
A.1 Convergence concepts
Let X and Y be random variables (or vectors) with distribution functions F
X
and F
Y
. We say that X = Y almost surely (a.s.) if P(¦ω ∈ Ω : X(ω) =
Y (ω)¦) = 1. We say that they are equal in distribution, written X
d
= Y , if
F
X
= F
Y
. Notice that X = Y a.s. implies X
d
= Y . However, taking X to be
standard normally distributed and Y = −X we see that the converse is false.
Let X, X
1
, X
2
, . . . be a sequence of random variables. We say that X
n
converges to X almost surely, X
n
→X a.s., if
P(¦ω ∈ Ω : X
n
(ω) →X(ω)¦) = 1.
We say that X
n
converges to X in probability, X
n
P
→X, if for all ε > 0 it holds
that
P([X
n
−X[ > ε) →0.
We say that X
n
converges to X in distribution, X
n
d
→ X, if for all continuity
points x of F
X
it holds that
F
X
n
(x) →F
X
(x).
The following implications between the convergence concepts hold:
X
n
→X a.s. ⇒ X
n
P
→X ⇒ X
n
d
→X.
A.2 Limit theorems and inequalities
Let X
1
, X
2
, . . . be iid random variables with ﬁnite mean E(X
1
) = µ, and let
S
n
= X
1
+ +X
n
. The (strong) law of large numbers says that
S
n
/n →µ a.s. as n →∞.
If furthermore var(X
1
) = σ
2
∈ (0, ∞), then the central limit theorem says that
(S
n
−nµ)/(σ
√
n)
d
→Z as n →∞,
where the random variable Z has a standard normal distribution.
For a nonnegative random variable V with E(V
r
) < ∞ Markov’s inequality
says that
P(V > ε) ≤
E(V
r
)
ε
r
for every ε > 0.
For a random variable X with ﬁnite variance var(X) this leads to
P([X −E(X)[ > ε) ≤
E[(X −E(X))
2
]
ε
2
=
var(X)
ε
2
for every ε > 0.
This inequality is called Chebyshev’s inequality.
100
B Conditional expectations
Suppose that one holds a portfolio or ﬁnancial contract and let X be the payoﬀ
(or loss) one year from today. Let Z be a random variable which represents some
information about the state of the economy or relevant interest rate during the
next year. We think of Z as a future (and hence unknown) economic scenario
which takes values in a set of all possible scenarios. The conditional expectation
of X given Z, E(X [ Z), represents the best guess of the payoﬀ X given the
scenario Z. Notice that Z is unknown today and that E(X [ Z) is a function of
the future random scenario Z and hence a random variable. If we knew Z then
we would also know g(Z) for any given function g. Therefore the property
g(Z) E(X [ Z) = E(g(Z)X [ Z)
seems natural (whenever the expectations exist ﬁnitely). If there were only a
ﬁnite set of possible scenarios z
k
, i.e. values that Z may take, then it is clear
that the expected payoﬀ E(X) may be computed by computing the expected
payoﬀ for each scenario z
k
and then obtain E(X) as the weighted sum of these
values with the weights P(Z = z
k
). Therefore the property
E(X) = E(E(X [ Z))
seems natural.
B.1 Deﬁnition and properties
If X is a random variable with E(X
2
) < ∞, then the conditional expectation
E(X [ Z) is most naturally deﬁned geometrically as an orthogonal projection of
X onto a subspace.
Let L
2
be the space of random variables X with E(X
2
) < ∞. Let Z be a
random variable and let L
2
(Z) be the space of random variables Y = g(Z) for
some function g such that E(Y
2
) < ∞. Notice that the expected value E(X)
is the number µ that minimizes the expression E((X − µ)
2
). The conditional
expectation E(X [ Z) can be deﬁned similarly.
Deﬁnition For X ∈ L
2
, the conditional expectation E(X [ Z) is the random
variable Y ∈ L
2
(Z) that minimizes E((X −Y )
2
).
We say that X, Y ∈ L
2
are orthogonal if E(XY ) = 0. Then E(X [ Z) is
the orthogonal projection of X onto L
2
(Z), i.e. the point in the subspace L
2
(Z)
that is closest to X. Moreover, X − E(X [ Z) is orthogonal to all Y ∈ L
2
(Z),
i.e. E(Y (X − E(X [ Z))) = 0 for all Y ∈ L
2
(Z). Equivalently, by linearity of
the ordinary expectation,
E(Y E(X [ Z)) = E(Y X) for all Y ∈ L
2
(Z). (B.1)
This relation implies the following three properties:
101
(i) If X ∈ L
2
, then E(E(X [ Z)) = E(X).
(ii) If Y ∈ L
2
(Z), then Y E(X [ Z) = E(Y X [ Z).
(iii) If X ∈ L
2
and we set var(X [ Z) = E(X
2
[ Z) − E(X [ Z)
2
, then
var(X) = E(var(X [ Z)) + var(E(X [ Z)).
This can be shown as follows:
(i) Choosing Y = 1 in (B.1) yields E(E(X [ Z)) = E(X).
(ii) With Y replaced by
¯
Y Y (with
¯
Y ∈ L
2
(Z)), (B.1) says that E(
¯
Y Y X) =
E(
¯
Y Y E(X [ Z)). With Y replaced by
¯
Y and X replaced by Y X, (B.1) says
that E(
¯
Y Y X) = E(
¯
Y E(Y X [ Z)). Hence, for X ∈ L
2
and Y ∈ L
2
(Z),
E(
¯
Y Y E(X [ Z)) = E(
¯
Y E(Y X [ Z)) for all
¯
Y ∈ L
2
(Z).
Equivalently, E(
¯
Y [Y E(X [ Z)−E(Y X [ Z)]) = 0 for all
¯
Y ∈ L
2
(Z); in particular
for
¯
Y = Y E(X [ Z) −E(Y X [ Z), which gives E((Y E(X [ Z) −E(Y X [ Z))
2
) =
0. This implies that Y E(X [ Z) = E(Y X [ Z).
(iii) Starting with the righthand side we obtain
E(var(X [ Z)) + var(E(X [ Z))
= E
_
E(X
2
[ Z) −E(X [ Z)
2
_
+
_
E(E(X [ Z)
2
) −E(E(X [ Z))
2
_
= E(E(X
2
[ Z)) −E(E(X [ Z))
2
= E(X
2
) −E(X)
2
= var(X).
Hence, we have shown the properties (i)(iii) above.
As already mentioned there are other ways to introduce the conditional
expectation E(X [ Z) so that the properties (i) and (ii) hold. In that case the
statement in the deﬁnition above follows from properties (i) and (ii). This is seen
from the following argument. If W ∈ L
2
(Z), then W E(X [ Z) = E(WX [ Z)
and hence E(W E(X [ Z)) = E(E(WX [ Z)) = E(WX). Hence,
E(W(X −E(X [ Z))) = 0 for all W ∈ L
2
(Z). (B.2)
If Y ∈ L
2
(Z) and W = Y −E(X [ Z), then
E((X −Y )
2
) = E((X −E(X [ Z) −W)
2
)
= E((X −E(X [ Z))
2
) −2 E(W(X −E(X [ Z))) + E(W
2
)
= E((X −E(X [ Z))
2
) + E(W
2
).
Hence, E((X −Y )
2
) is minimized when W = 0, i.e., when Y = E(X [ Z).
B.2 An expression in terms the density of (X, Z)
It is common in introductory texts to assume that X and Z has a joint density
and derive the conditional expectation E(X [ Z) in terms of this joint density.
102
Suppose that the random vector (X, Z) has a density f(x, z). Write h(Z) for
the conditional expectation E(X [ Z). From (B.2) we know that E(g(Z)(X −
h(Z))) = 0 for all g(Z) ∈ L
2
(Z), i.e.
0 =
__
g(z)(x −h(z))f(x, z)dxdz =
_
g(z)
__
(x −h(z))f(x, z)dx
_
dz.
Hence,
0 =
_
xf(x, z)dx −
_
h(z)f(x, z)dx =
_
xf(x, z)dx −h(z)
_
f(x, z)dx
or equivalently h(z) =
_
xf(x, z)dx/
_
f(x, z)dx. Hence,
E(X [ Z) =
_
xf(x, Z)dx
_
f(x, Z)dx
.
B.3 Orthogonality and projections in Hilbert spaces
Orthogonality and orthogonal projections onto a subspace in the Euclidean
space R
d
is well known from linear algebra. However, these concepts are mean
ingful also in more general spaces. Such spaces are called Hilbert spaces. The
canonical example of a Hilbert space is R
3
and our intuition for orthogonality
and projections in R
3
works ﬁne in general Hilbert spaces.
A nonempty set H is called a (real) Hilbert space if H is a linear vector
space, so that elements in H may be added and multiplied by real numbers,
and there exists a function (x, y) →¸x, y) from HH to R with the properties:
(i) ¸x, x) ≥ 0 and ¸x, x) = 0 if and only if x = 0;
(ii) ¸x +y, z) = ¸x, z) +¸y, z) for all x, y, z ∈ H;
(iii) ¸λx, y) = λ¸x, y) for all x, y ∈ H and λ ∈ R;
(iv) ¸x, y) = ¸y, x) for all x, y ∈ H;
(v) If ¦x
n
¦ ⊂ H and lim
m,n→∞
¸x
n
− x
m
, x
n
− x
m
) = 0, then there exists
x ∈ H such that lim
n→∞
¸x
n
−x, x
n
−x) = 0.
The function (x, y) → ¸x, y) is called an inner product and [x[
H
= ¸x, x)
1/2
is
the norm of x ∈ H. For all x, y ∈ H it holds that [¸x, y)[ ≤ [x[
H
[y[
H
. If x, y ∈ H
and ¸x, y) = 0, then x and y are said to be orthogonal and we write x ⊥ y.
The projection theorem Let M be a closed linear subspace of a Hilbert
space H. For every x
0
∈ H there exists a unique element y
0
∈ M such that
[x
0
− y
0
[
H
≤ [x
0
− y[ for all y ∈ M. The element y
0
is called the orthogonal
projection of x
0
onto the subspace M, and x
0
−y
0
⊥ y for all y ∈ M.
Let L
2
be the Hilbert space of random variables X with E(X
2
) < ∞ equipped
103
with the inner product (X, Y ) → E(XY ). Let Z be a random variable and
consider the set of random variables Y = g(Z) for continuous functions g such
that E(g(Z)
2
) < ∞. We denote the closure of this set by L
2
(Z) and note that
L
2
(Z) is a closed subspace of L
2
. The closure is obtained by including elements
X ∈ L
2
which satisfy lim
n→∞
E((g
n
(Z) − X)
2
) = 0 for some sequence ¦g
n
¦ of
continuous functions such that E(g
n
(Z)
2
) < ∞.
104
Contents
1 Some background to ﬁnancial risk management 1.1 A preliminary example . . . . . . . . . . . . . . . . 1.2 Why risk management? . . . . . . . . . . . . . . . 1.3 Regulators and supervisors . . . . . . . . . . . . . 1.4 Why the government cares about the buﬀer capital 1.5 Types of risk . . . . . . . . . . . . . . . . . . . . . 1.6 Financial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 4 4 6 6 7
2 Loss operators and ﬁnancial portfolios 2.1 Portfolios and the loss operator . . . . . . . . . . . . . . . . . . . 2.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Risk measurement 10 3.1 Elementary measures of risk . . . . . . . . . . . . . . . . . . . . . 10 3.2 Risk measures based on the loss distribution . . . . . . . . . . . . 12 4 Methods for computing VaR and ES 4.1 Empirical VaR and ES . . . . . . . . . . . . . . . . . . . . 4.2 Conﬁdence intervals . . . . . . . . . . . . . . . . . . . . . 4.2.1 Exact conﬁdence intervals for ValueatRisk . . . . 4.2.2 Using the bootstrap to obtain conﬁdence intervals 4.3 Historical simulation . . . . . . . . . . . . . . . . . . . . . 4.4 Variance–Covariance method . . . . . . . . . . . . . . . . 4.5 MonteCarlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 20 22 23 24 24
5 Extreme value theory for random variables with heavy tails 26 5.1 Quantilequantile plots . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Regular variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Hill estimation 33 6.1 Selecting the number of upper order statistics . . . . . . . . . . . 34 7 The 7.1 7.2 7.3 7.4 Peaks Over Threshold (POT) method How to choose a high threshold. . . . . . . . . . . . Meanexcess plot . . . . . . . . . . . . . . . . . . . Parameter estimation . . . . . . . . . . . . . . . . Estimation of ValueatRisk and Expected shortfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 37 38 39 40 43 43 44 44 44 48 50
8 Multivariate distributions and dependence 8.1 Basic properties of random vectors . . . . . 8.2 Joint log return distributions . . . . . . . . 8.3 Comonotonicity and countermonotonicity . 8.4 Covariance and linear correlation . . . . . . 8.5 Rank correlation . . . . . . . . . . . . . . . 8.6 Tail dependence . . . . . . . . . . . . . . . . i
9 Multivariate elliptical distributions 9.1 The multivariate normal distribution . . . . . 9.2 Normal mixtures . . . . . . . . . . . . . . . . 9.3 Spherical distributions . . . . . . . . . . . . . 9.4 Elliptical distributions . . . . . . . . . . . . . 9.5 Properties of elliptical distributions . . . . . . 9.6 Elliptical distributions and risk management 10 Copulas 10.1 Basic properties . . . . . . . . . . . . . . . . . 10.2 Dependence measures . . . . . . . . . . . . . 10.3 Elliptical copulas . . . . . . . . . . . . . . . . 10.4 Simulation from Gaussian and tcopulas . . . 10.5 Archimedean copulas . . . . . . . . . . . . . . 10.6 Simulation from Gumbel and Clayton copulas 10.7 Fitting copulas to data . . . . . . . . . . . . . 10.8 Gaussian and tcopulas . . . . . . . . . . . . . 11 Portfolio credit risk modeling 11.1 A simple model . . . . . . . . . . . . 11.2 Latent variable models . . . . . . . . 11.3 Mixture models . . . . . . . . . . . . 11.4 Onefactor Bernoulli mixture models 11.5 Probit normal mixture models . . . . 11.6 Beta mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
53 53 54 54 55 57 58 61 61 66 70 72 73 76 78 79 81 81 82 83 86 87 88
12 Popular portfolio credit risk models 90 12.1 The KMV model . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 12.2 CreditRisk+ – a Poisson mixture model . . . . . . . . . . . . . . 94 A A few probability facts 100 A.1 Convergence concepts . . . . . . . . . . . . . . . . . . . . . . . . 100 A.2 Limit theorems and inequalities . . . . . . . . . . . . . . . . . . . 100 B Conditional expectations 101 B.1 Deﬁnition and properties . . . . . . . . . . . . . . . . . . . . . . . 101 B.2 An expression in terms the density of (X, Z) . . . . . . . . . . . 102 B.3 Orthogonality and projections in Hilbert spaces . . . . . . . . . . 103
ii
We will introduce statistical techniques used for deriving the proﬁtandloss distribution for a portfolio of ﬁnancial instruments and to compute risk measures associated with this distribution. Financial institutions typically hold portfolios consisting on large number of ﬁnancial instruments. statistics and an intermediate course in probability. A careful modeling of the dependence between these instruments is crucial for good risk management in these situations.Preface These lecture notes aims at giving an introduction to Quantitative Risk Management. More material on the topics presented in remaining chapters can be found in [8] (chapters 57). Henrik Hult and Filip Lindskog. A large part of these lecture notes is therefore devoted to the issue of dependence modeling. The reader is assumed to have a mathematical/statistical knowledge corresponding to basic courses in linear algebra. The lecture notes are written with the aim of presenting the material in a fairly rigorous way without any use of measure theory. [12] (chapters 812) and articles found in the list of references at the end of these lecture notes. The chapters 14 in these lecture notes are based on the book [12] which we strongly recommend. The focus lies on the mathematical/statistical modeling of market. 2007 iii .and credit risk. Operational risk and the use of ﬁnancial time series for risk modeling is not treated in these lecture notes. analysis.
the choice of whether to participate or not depends on the P&L distribution. Hence. (1. X= 0. All initial capital is used to place bets according to a predetermined gambling strategy. corresponding to E(X) = 0.1) The fair price for this game. For this purpose it is natural to use a risk measure. i. Then you need to evaluate some aspects of the distribution to decide whether to play or not.6 million Swedish Kroner with probability 1−p.5 of the loss L is known. This attitude is called riskaverseness.4.e. yielding p = 0. in most cases (think of investing in instruments on the ﬁnancial market) the P&L distribution is not known.6 with probability p.1) described above and suppose that the mean E(L) = −0. is p = 0.4 million Swedish Kroner.4 with probability 1 − p. However.1 (i. 1. As the loss may be positive this is a risky position. −0. the possible outcomes of a typical realworld game are typically not known and mean and standard deviation do 1 . The distribution of X is called the proﬁtandloss distribution (P&L) and the distribution of L = −X = V0 − V1 is simply called the loss distribution. there is a risk of losing some of the initial capital. Suppose a game is constructed so that it gives 1.1 Some background to ﬁnancial risk management We will now give a brief introduction to the topic of risk management and explain why this may be of importance for a bank or ﬁnancial institution. a positive expected proﬁt) and standard deviation std(L) = 0.4 the player might choose not to participate in the game with the view that not participating is more attractive than playing a game with a small expected proﬁt together with a risk of loosing 0. A risk measure is a mapping from the random variables to the real numbers. even if p > 0.e.1 A preliminary example A player (investor/speculator) is entering a casino with an initial capital of V0 = 1 million Swedish Kroner. We will start with a preliminary example illustrating in a simple way some of the issues encountered when dealing with risks and risk measurements. After the game the capital is V1 . Clearly. However. Consider the game (1. However. to every loss random variable L there is a real number (L) representing the riskiness of L.5.6 million Swedish Kroner with probability p and 0. To evaluate the loss distribution in terms of a single number is of course a huge simpliﬁcation of the world but the hope is that it can give us suﬃcient indication whether to play the game or not. We denote the proﬁt(loss) by a random variable X = V1 − V0 . In this case the game had only two known possible outcomes so the information about the mean and standard deviation uniquely speciﬁes the P&L distribution.
2. Orange County.5 million shares. 1. We may view a ﬁnancial investor as a player participating in the game at the ﬁnancial market and the loss distribution must be evaluated in order to know which game the investor is participating in. However.2) as riskier than the game (1. A simple example is the following: X= 0. −0. and to evaluate their positions properly they need quantitative tools from risk management.8.2) is riskier than the game (1.5. declared bankruptcy after suﬀering losses of 2 . most riskaverse players would agree that the game (1. a prosperous district in California.35 with probability 0. In 1970 the average daily trading volume at the New York Stock Exchange was 3. A risk manager at a ﬁnancial institution with responsibility for a portfolio consisting of a few up to hundreds or thousands of ﬁnancial assets and contracts faces a similar problem as the player above entering the casino. then the risk manager must rebalance the portfolio until a desirable loss distribution is obtained. In 2002 it was 1. evaluating a single risk measure such as a quantile will in general not provide a lot of information about the loss distribution.1) with p = 0. However.5.2 Why risk management? The trading volumes on the ﬁnancial markets have increased tremendously over the last decades. Using an appropriate quantile of the loss L as a risk measure would classify the game (1.9 with probability 0. To evaluate the position the risk manager tries to assess the loss distribution to make sure that the current positions is in accordance with imposed risk preferences.5. Recent history also shows several examples where large losses on the ﬁnancial market are mainly due to the absence of proper risk control. although it can provide some relevant information. Example 1. A key to a sound risk management is to look for risk measures that give as much relevant information about the loss distribution as possible.2) Here we also have E(L) = −0.1) with p = 0.1 (Orange County) On December 6 1994.1 and std(L) = 0. If it is not.4 billion shares. (1. There are a huge number of actors on the ﬁnancial markets taking risky positions Contracts FOREX Interest rate Total 1995 13 26 47 1998 18 50 80 2002 18 102 142 Table 1: Global market in OTC derivatives (nominal value) in trillion US dollars (1 trillion = 1012 ).not specify the P&L distribution. In the last few years we have seen a signiﬁcant increase in the derivatives markets. Management or investors have also imposed risk preferences that the risk manager is trying to meet.
2 (Barings bank) Barings bank had a long history of success and was much respected as the UK’s oldest merchant bank.com) 1.3 (LTCM) In 1994 a hedgefund called LongTerm Capital Management (LTCM) was founded and assembled a star team of traders and academics. The new Accord aims at an implementation date of 20062007. does not possess any formal supernational supervising authority and its conclusions does not have legal force. The main theme concerns advanced internal approaches to credit risk and also new capital requirements for operational risk. i. 3 .3 billion in the fund and after two years returns was running close to 40%. was bankrupted by $1 billion of unauthorized trading losses.com) Example 1. Investors and investment banks invested $1. with $900 million in capital. Details of Basel II is still hotly debated.erisk. In this way the Basel Committee has large impact on the national supervisory authorities.6 billion from a wrongway bet on interest rates in one of its principal investment pools. • In 1996 an amendment to Basel I prescribes a so–called standardized model for market risk with an option for larger banks to use internal ValueatRisk (VaR) models. (Source: www. The amount of buﬀer capital needed is of course related to the amount of risk the bank is taking. (Source: www. Early 1998 the net asset value stands at $4 billion but at the end of the year the fund had lost substantial amounts of the investors equity capital and the fund was at the brink of default. guidelines and recommends statements of best practice.erisk. Emphasis was on credit risk.com) Example 1.around $1. established in 1974.5 billion rescue package to avoid the threat of a systematic crisis in th world ﬁnancial system. There is also a strive to develop international standards and methods for computing regulatory capital. (Source: www. It formulates supervisory standards. The Basel Committee. The US Federal Reserve managed a $3.erisk. But in February 1995.3 Regulators and supervisors To be able to cover most ﬁnancial losses most banks and ﬁnancial institutions put aside a buﬀer capital. to the overall P&L distribution. This is the main task of the socalled Basel Committee. also called regulatory capital. • In 2001 a new consultative process for the new Basel Accord (Basel II) is initiated. • In 1988 the ﬁrst Basel Accord on Banking Supervision [2] initiated an important step toward an international minimal capital standard.e. this highly regarded bank. The amount is regulated by law and the national supervisory authority makes sure that the banks and ﬁnancial institutions follow the rules.
regulatory capital helps to privatize a burden that would otherwise be borne by national governments. For this the buyer pays a premium at time zero. This includes people risks such as incompetence and fraud. interest rate. There are also other types of risks such as liquidity risk which is risk that concerns the need for well functioning ﬁnancial markets where one can buy or sell contracts at fair prices. process risk such as transaction and operational control risk and technology risk such as system failure. people and systems of from external events. The value of the European Call at time T is then C(T ) = max(ST − K. a stock price. It gives the holder the right but not the obligation at a given date T to buy the stock S for the price K. stock index. They therefore wish to limit the cost of the safety net in case of a bank failure. commodity price to name a few. in eﬀect they act as a guarantor. Financial derivatives are traded not only for the purpose of speculation but is actively used as a risk management tool as they are tailor made for exchanging 4 .4 Why the government cares about the buﬀer capital The following motivation is given in [6]. • Operational risk – the risk of losses resulting from inadequate of failed internal processes. programming errors etc. National governments have a very direct interest in ensuring that banks remain capable of meeting their obligations. 0).1. 1. sometimes also as a lender of last resort. market prices. foreign exchange rate changes. • Credit risk – the risk carried by the lender that a debtor will not be able to repay his/her debt or that a counterparty in a ﬁnancial agreement can not fulﬁll his/her commitments. Other types of risks are for instance legal risk and reputational risk. The key example is the European Call option written on a particular stock. By acting as a buﬀer against unanticipated losses. interest rate ﬂuctuations. “Banks collect deposits and play a key role in the payment system. • Market risk – risks due to changing markets. commodity price changes etc.6 Financial derivatives Financial derivatives are ﬁnancial products or contracts derived from some fundamental underlying.” 1. In ﬁnancial risk management we try to be a bit more speciﬁc and divide most risks into three categories.5 Types of risk Here is a general deﬁnition of risk for an organization: any event or action that may adversely aﬀect an organization to achieve its obligations and execute its strategies.
risks between actors on the ﬁnancial market. 5 . Although they are of great importance in risk management we will not discuss ﬁnancial derivatives much in this course but put emphasis on the statistical models and methods for modeling ﬁnancial data.
. .2 Loss operators and ﬁnancial portfolios Here we follow [12] to introduce the loss operator and give some examples of ﬁnancial portfolios that ﬁt into this framework.tn+1 ] = L[n∆t. . The stock prices at time n are given by Sn. bonds or risky loans. or the overall position of a ﬁnancial institution. .i . With Zn.i − ln Sn. In ﬁnancial statistics one often tries to ﬁnd a statistical model for the evolution of the stock prices. and use tn = n∆t as the actual time. 6 .i } − 1) αi Sn. We will sometimes use the notation Ln+1 = L[tn . . .i }(exp{Xn+1. We may introduce a discrete parameter n = 0. t + ∆t] is given by V (t + ∆t) − V (t) and the distribution of V (t + ∆t) − V (t) is called the proﬁtandloss distribution (P&L). Given a time horizon ∆t the proﬁt over the interval [t.i } so the portfolio loss Ln+1 = −(Vn+1 − Vn ) may be written as d Ln+1 = − =− =− i=1 d i=1 d i=1 αi (exp{Zn+1.i are easier to model than the diﬀerences Sn+1.i = exp{Zn.i } − 1). one month or one year. .i we have Sn.i = ln Sn+1. e.1 Consider a portfolio of d stocks with αi units of stock number i.i } − exp{Zn. Example 2. i = 1.i . and the value of the portfolio is d Vn = i=1 αi Sn. However.i − Sn. ten days. The loss over the interval is then L[t.(n+1)∆t] = −(V ((n + 1)∆t) − V (n∆t)).1 Portfolios and the loss operator Consider a given portfolio such as for instance a collection of stocks.i . d.i − Sn. Typical values of ∆t is one day (or 1/250 years as we have approximately 250 trading days in one year). . .i . 2. Often we also write Vn for V (n∆t). .t+∆t] is called the loss distribution. The value of the portfolio at time t is denoted V (t). 2.i (exp{Xn+1. a model for Sn+1. .i = ln Sn.i }) αi exp{Zn. to be able to compute the loss distribution Ln+1 .t+∆t] = −(V (t + ∆t) − V (t)) and the distribution of L[t. 1. i = 1.g. d. it is often the case that the socalled log returns Xn+1.
z) = ∂f (t. i=1 2. Zn + x) − f (tn . Zn ) for some known function f and tn is the actual calendar time. Zn. Zn + Xn+1 ) − f (tn . Zn )∆t + n+1 fzi (tn .The relation between the modeled variables Xn+1. z)/∂zi . .i .i and the loss Ln+1 is nonlinear and it is sometimes useful to linearize this relation. The loss may be viewed as the result of applying an operator l[n] (·) to the riskfactor changes Xn+1 so that Ln+1 = l[n] (Xn+1 ) where l[n] (x) = − f (tn+1 . As in the example above it is often convenient to model the riskfactor changes Xn+1 = Zn+1 −Zn .i Xn+1. Then the loss is given by Ln+1 = −(Vn+1 − Vn ) = − f (tn+1 . z)/∂t and fzi (t. . Zn )xi i=1 is called the linearized lossoperator. Zn ) . .i . z) = ∂f (t. Zn )Xn+1. Zn )∆t + fzi (tn . i=1 Here ft (t. 7 .2 The general case A general portfolio with value Vn is often modeled using a ddimensional random vector Zn = (Zn. . recall the Taylor expansion ex = 1 + x + O(x2 ). The corresponding operator given by d ∆ l[n] (x) = − ft (tn . If we want to linearize the relation between Ln+1 and Xn+1 then we have to diﬀerentiate f to get the linearized loss d L∆ = − ft (tn . In this case this is done by replacing ex by 1 + x.d ) of riskfactors. The value of the portfolio is then expressed as Vn = f (tn .1 . Zn ) . The operator l[n] (·) is called the lossoperator. Then the linearized loss is given by d L∆ n+1 =− αi Sn.
Ti ). the yield may be diﬀerent for diﬀerent maturity times T . T ) for ﬁxed t is called the yieldcurve at t. if we have exp{−r(T − t)} dollars on the account at time t then we have exactly $1 at time T . We suppose that we have αi units of the bond with maturity Ti so that the value of the portfolio is d Vn = i=1 αi B(tn . The yield y(t. T )}. B(t. T ) can be identiﬁed with r and is interpreted as the constant interest rate contracted at t that we get over the period [t. The price of the contract at time t < T is denoted B(t. Hence. Ti ). respectively. To a zerocoupon bond we associate the continuously compounded yield y(t. T −t which has the solution St = exp{rt}.e.i and the riskfactorchanges are the log returns Xn+1.i xi i=1 are the loss operator and linearized loss operator. The ∆ loss is Ln+1 = l[n] (Xn+1 ) and the linearized loss is L∆ = l[n] (Xn+1 ). However.3 (A bond portfolio) A zerocoupon bond with maturity T is a contract which gives the holder of the contract $1 at time T . 8 .i . T ) = 1. where n+1 d d l[n] (x) = − i=1 αi Sn.Example 2.i = ln Sn. Every dollar put into the bank account at time t is then worth exp{r(T − t)} dollars at time T .i = ln Sn+1. consider a bank account where we get a constant interest rate r. dt S0 = 1 1 ln B(t. T ]. This means that in every inﬁnitesimal time interval dt we get the interest rate r. T ) = − i. To understand the notion of the yield. Example 2. The following examples may convince you that there are many relevant examples from ﬁnance that ﬁts into this general framework of riskfactors and lossoperators.i (exp{xi } − 1) and ∆ l[n] (x) =− αi Sn. Consider now a portfolio consisting of d diﬀerent (default free) zerocoupon bonds with maturity times Ti and prices B(t. The bank account evolves according to the diﬀerential equation dSt = rSt . T ) and by deﬁnition B(T. T ).i − ln Sn. T ) = exp{−(T − t)y(t. The function T → y(t.2 (Portfolio of stocks continued) In the previous example of a portfolio of stocks the riskfactors are the logprices of the stocks Z n.
Ti ) exp{Zn. 9 .1 + Cr Xn+1. CS is called delta.i = Zn+1. Ti ) exp{Zn.i )} − exp{−(Ti − tn )Zn.i ∆t − (Ti − tn+1 )Xn+1.i } αi B(tn .2 + Cσ Xn+1. The corresponding lossoperator is then d l[n] (x) = − i=1 αi B(tn . rn .i = y(tn . Cr is called rho.i ∆t − (Ti − tn+1 )Xn+1. rn .i } = f (tn . S. Ti ) Zn. n+1 The partial derivatives are usually called the “Greeks”. S is the stock price today at time t. In this case we may put Zn = (ln Sn+1 . σn )T Xn+1 = (ln Sn+1 − ln Sn .i ∆t − (Ti − tn+1 )xi } − 1 and the linearized loss is given by d L∆ = − n+1 i=1 αi B(tn . The price at time t < T for the contract is evaluated using a function C depending on some parameters. which is a measure of the standard deviation of the return of the stock.3 ). Cσ is called vega (although this is not a Greek letter). r.It is natural to use the yields Zn.i + Xn+1. σ) where t is the current time. σn+1 − σn )T The value of the portfolio is then Vn = C(tn . Sn .i and ∆t = tn+1 − tn ) d Ln+1 = − =− i=1 d i=1 αi exp{−(Ti − tn+1 )(Zn. rn+1 − rn . Ti ) as riskfactors. A European call option is a contract which pays max(ST − K. σn ) and the linearized loss is given by L∆ = −(Ct ∆t + CS Xn+1. Zn ) and the loss is given by (with Xn+1.4 (A European call option) In this example we will consider a portfolio consisting of one European call option on a nondividend paying stock S with maturity date T and strike price K.i } − 1 .i − Zn. Then d Vn = i=1 αi exp{−(Ti − tn )Zn. In the BlackScholes model C depends on four parameters C = C(t. Ct is called theta.i . 0) to the holder of the contract at time T . Example 2. r is the interest rate and σ is called the volatility.
the second by 20% and the third by 100%. Factor sensitivity measures Factor sensitivity measures gives the change in portfolio value for a predetermined change in one of the underlying risk factors. a portfolio with loans to m independent obligors is considered as risky as the same amount lent to a single obligor. the second on a multilateral developed bank and the third on the private sector. $96 000. and the regulatory capital should be at least 8% of this amount. Zn ) then factor sensitivity measures are given by the partial derivatives fzi (tn .1 Elementary measures of risk Notional amount Risk of a portfolio is measured as the sum of notional values of individual securities. If the value of the portfolio is given by Vn = f (tn .e. 50% and 100% are used (see [2]). The notional amount approach is used for instance in the standard approach of the Basel Committee where risk weights 0%. Then the regulatory capital should be such that regulatory capital ≥ 8%. Zn ) = ∂f (tn . According to the riskweights used by the Basel document the ﬁrst claim is weighted by 0%. weighted by a factor for each asset class.20 × 106 + 1 × 106 = 1 200 000. 20%. Disadvantage: Does not diﬀerentiate between long and short positions. There are no diversiﬁcation eﬀects. riskweighted sum Example 3. ∂zi 10 . Thus the riskweighted sum is 0 × 106 + 0. 10%. Next we present some common measures of risk. i. Advantage: easy to use. What is the purpose of risk measurement? 3.3 Risk measurement • Determination of risk capital – determine the amount of capital a ﬁnancial institution needs to cover unexpected losses. Zn ). The ﬁrst claim is on an OECD central bank. • Management tool – Risk measures are used by management to limit the amount of risk a unit within the ﬁrm may take.1 Suppose we have a portfolio with three claims each of notional amount $1 million.
. The risk is then measured as the maximum loss over all possible (predetermined) scenarios. . The SPAN margin for such a portfolio is compute as follows: First fourteen scenarios are considered. If it does not. Each scenario is speciﬁed by an up or down move of volatility combined with no move. this approach may be formulated as follows. using the full loss of the ﬁrst fourteen scenarios and only 35% of the loss for the last two “extreme” scenarios. . Scenario based risk measures In this approach we consider a number of possible scenarios. A scenario may be for instance a 10% rise in a relevant exchange rate and a simultaneous 20% drop in a relevant stock index. Loss distribution approach This is the approach a statistician would use to compute the risk of a portfolio. . Each scenario is given a weight. wN ). . . To assess the maximum loss extreme scenarios may be downweighted in a suitable way. Fix a number N of possible riskfactor changes. The account of the investor holding a portfolio is required to have suﬃcient current net worth to support the maximum expected loss. Advantage: Factor sensitivity measures provide information about the robustness of the portfolio value with respect to certain events (riskfactor changes). or a down move of the futures prices by 1/3. X = {x1 . . typically the Black model. Disadvantage: It is diﬃcult to aggregate the sensitivity with respect to changes in diﬀerent riskfactors or aggregate across markets to get an understanding of the overall riskiness of a position. wN l[n] (xN )} These risk measures are frequently used in practice (example: Chicago Mercantile Exchange). i.The “Greeks” of a derivative portfolio may be considered as factor sensitivity measures. The 11 . Here we try model the loss Ln+1 using a probability distribution FL . wi and we write w = (w1 . We consider a portfolio with lossoperator l[n] (·). 2/3 or 3/3 of a speciﬁc “range”. or an up move. then extra cash is required as margin call.e. . . A speciﬁed model. We describe how the initial margin is calculated for a simple portfolio consisting of units of a futures contract and of several puts and calls with a common expiration date on this futures contract. Example 3. an amount equal to the “measure of risk” involved. x2 . a number of possible riskfactor changes. . xN }.w] = max{w1 l[n] (x1 ). . is used to generate the corresponding prices for the options under each scenario.2 (The SPAN rules) As an example of a scenario based risk measure we consider the SPAN rules used at the Chicago Mercantile Exchange [1]. Next. The measure of risk is the maximum loss incurred. . Formally. The risk of the portfolio is then measured as ψ[X. two additional scenarios relate to “extreme” up or down moves of the futures prices.
2) and L2 ∼ t4 (standard Student’s tdistribution with 4 degrees of freedom). There are however some disadvantages using the standard deviation as a measure of risk. the standard deviation does not provide any information on how large potential losses may be.3 0.1 0.parameters of the loss distribution are estimated using historical data.035 20 log−ratio 3 4 5 6 7 0 0 5 10 15 0.015 0. Risk measures are based on the distribution function FL . The next section studies this approach to risk measurement.020 0. More important. 0. For instance the standard deviation is only deﬁned for distributions with E(L2 ) < ∞.2 −5 0 5 0.030 2 4 x 6 8 10 Figure 1: Left/Middle: The density function for a N(0. In fact.2 Risk measures based on the loss distribution Standard deviation The standard deviation of the loss distribution as a measure of risk is frequently used.3 Consider for instance the two loss distributions L1 ∼ N(0.010 0. 3. Example 3.000 0. The probability density is illustrated in Figure 1 for the two distributions.0 0. The t4 is highly peaked around zero and has much heavier tails. One can either try to model the loss distribution directly or to model the riskfactors or riskfactor changes as a ddimensional random vector or a multivariate time series. proﬁts and losses have equal impact on the standard deviation as risk measure and it does not discriminate between distributions with apparently diﬀerent probability of potentially large losses. in particular in portfolio theory.005 0. Right: The “logratio” ln[P(L2 > x)/ P(L1 > x)] is plotted. 12 . Clearly the probability of large losses is much higher for the t4 than for the normal distribution. ValueatRisk We now introduce the widely used risk measure known as ValueatRisk. so it is undeﬁned for random variables with very heavy tails. √ Both L1 and L2 have standard deviation equal to 2. 2) and a t4 distribution.025 0.
13 α ∈ (0.e. i. Deﬁnition 3.2 Given a nondecreasing function F : R → R the generalized inverse of F is given by F ← (y) = inf{x ∈ R : F (x) ≥ y} If F is strictly increasing then F ← = F −1 . 0. for market risk. Using the generalized inverse we deﬁne the αquantile of F by qα (F ) = F ← (α) = inf{x ∈ R : F (x) ≥ α}.1 Given a loss L and a conﬁdence level α ∈ (0. VaRα (L) is given by the smallest number l such that the probability that the loss L exceeds l is no larger than 1 − α.95 . The time horizon of ten days reﬂects the fact that markets are not perfectly liquid.e. The BIS (Bank of International Settlements) proposes.9% depending on the application. Common conﬁdence levels are 95%. with the convention inf ∅ = ∞.Deﬁnition 3. In market risk the time horizon is typically one day or ten days.2 0.1 0. We might think of L as the (potential) loss resulting from holding a portfolio over some ﬁxed time horizon. 1). In credit risk the portfolio may consist of loans and the time horizon is often one year. VaRα (L) = inf{l ∈ R : P(L > l) ≤ 1 − α} = inf{l ∈ R : 1 − FL (l) ≤ 1 − α} = inf{l ∈ R : FL (l) ≥ α}.3 0. 1). to compute tenday VaR with conﬁdence level α = 99%. the usual inverse. 99% and 99.0 −4 −2 0 2 4 Figure 2: Illustration of VaR0. i.4 95% VaR 0. .
where L ∼ N(0. we may compute the ValueatRisk as VaRα (L) = µ + σΦ−1 (α). 1/2 x ∈ [0. Example 3. FL1 (l) = P(−500(eX1 − 1) ≤ l) = P(eX1 ≥ 1 − l/500) = P(X1 ≥ ln(1 − l/500)) = 1 − FX1 (ln(1 − l/500)). where Φ is the distribution function of a standard normal random variable.99) ≈ 2. 14 .99 (L1 ).1.6 You hold a portfolio consisting of a long position of α = 5 shares of stock A. Let L1 be the portfolio loss from today until tomorrow. For −1 a standard normal random variable Z ∼ N(0.5 Suppose that the distribution function F is given by 0 x < 0. of stock A are assumed to be normally distributed with zero mean and standard deviation σ = 0. Then F ← (u) = 0 on (0. VaRα (aL + b) = inf{l ∈ R : P(aL + b ≤ l) ≥ α} = inf{l ∈ R : P(L ≤ (l − b)/a) ≥ α} {let l = (l − b)/a} = inf{al + b ∈ R : P(L ≤ l ) ≥ α} = a inf{l ∈ R : P(L ≤ l ) ≥ α} + b = a VaRα (L) + b. We have VaRu (L1 ) = −1 −1 −1 FL1 (u) and to compute FL1 (u) we use that FL1 (FL1 (u)) = u. . 1). Example 3. σ 2 ). Since VaRα (L ) = Φ−1 (α). (F is the distribution function of a random variable X with P(X = 0) = P(X = 1) = 1/2. (a) Compute VaR0. We have shown that L1 = −αS0 (eX1 −1) = −500(eX1 −1). 1) we have FZ (0. where F is the loss distribution. 1).3. the risk measured in VaR for a shares of a portfolio is a times the risk of one share of this portfolio. Moreover. The daily log returns X1 = ln(S1 /S0 ). .) Example 3. Notice that for a > 0 and b ∈ R. L ∼ N(µ. X2 = ln(S2 /S1 ). This d means that L = µ + σL . 1). adding (b < 0) or withdrawing (b > 0) an amount b of money from the portfolio changes this risk by the same amount. Hence. 1/2] and F ← (u) = 1 on (1/2.We note also that VaRα (F ) = qα (F ). . F (x) = 1 x ≥ 1. The stock price today is S0 = 100.4 Suppose the loss distribution is normal.
15 .4 Figure 3: Plot of the function ex for x ∈ [−0.0 0. 3].2 0. 0 −3 5 10 15 20 −2 −1 0 1 2 3 Figure 4: Plot of the function ex for x ∈ [−3.8 1.0.6 0.4 −0. 0.4 1.0 1.5.5].6 −0.2 1.2 0.
99) = 500(1 − e−FX1 (0. −1 FL1 (0. Hence. Using Figure 4 we ﬁnd that e−2.3 ). where L100 denotes the loss 100 from today until 100 days from today and L∆ denotes the corresponding lin100 earized 100day loss. where Z is normally distributed with zero mean and standard deviation one.99) ≈ 0.1.99) = 0. 16 .3 = 100 100 1150. Hence.3 ) ≈ 450.99 (L100 ) and VaR0.23 ) ≈ 500(1 − 0. (b) Compute VaR0.3 ≈ 0. −1 −FX (u) 1 −1 ⇔ 1 − FL1 (u)/500 = eFX1 (1−u) −1 ⇔ −1 −1 ln(1 − FL1 (u)/500) = FX1 (1 − u) Hence. −1 1 − FX1 (ln(1 − FL1 (u)/500)) = u −1 −1 ⇔ FL1 (u) = 500(1 − eFX1 (1−u) ).Hence. −1 −1 Since X1 is symmetric about 0 we have FX1 (1 − u) = −FX1 (u). VaR0.23 and with the help of Figure 3. Hence. Using that FX1 (0. VaR0.99 (L∆ ). X100 = Z.99) ≈ 500 · 2.99 (L100 ) = 500(1 − e−FZ −1 (0. Notice that X100 = ln S100 /S0 = ln S100 − ln S0 = ln S1 /S0 + · · · + ln S100 /S99 .99 (L100 ) ≈ 500(1 − e−2. where X100 is the 100day log return.1 · FZ (0. Hence. One sees that using the linearized loss here gives a very bad risk estimate. We have VaR0.e. i.99 (L1 ) ≈ 100. −1 We have L∆ = −500Z.1.99) ) −1 ≈ 500(1 − e−0. VaR0.99) ) ≈ 500(1 − e−2. We have L100 = −αS0 (eX100 −1 ).3 = 1/e2. You decide to keep your portfolio for 100 (trading) days before deciding what to do with the portfolio.8) = 100. −1 −1 −1 FL1 (u) = 500(1 − e ).99 (L∆ ) = 500FZ (0. X100 is a sum of 100 independent normally distributed random variables d with mean zero and standard deviation 0.
A useful deﬁnition called generalized expected shortfall. 17 . then the second term vanishes and GES α = ESα . which is a socalled coherent risk measure.∞) (L)) 1−α ∞ 1 = ldFL (l). If the distribution of L is continuous.Expected shortfall Although ValueatRisk has become a very popular risk measure among practitioners it has several limitations. In other words. (b) Let L have distribution function F (x) = 1 − (1 + γx)−1/γ . γ ∈ (0. (b) γ −1 [(1 − α)−γ (1 − γ)−1 − 1]. and calculate ESα (L). For instance. Lemma 3. α For a discrete distribution there are diﬀerent possibilities to deﬁne expected shortfall. Answer: (a) λ−1 (1 − ln(1 − α)).1 (a) Let L ∼ Exp(λ) and calculate ESα (L).3 For a loss L with continuous loss distribution function F L the expected shortfall at conﬁdence level α ∈ (0. 1). We can rewrite this as follows: ESα (L) = E(L  L ≥ VaRα (L)) = E(LI[qα (L). Exercise 3.∞) (L)) + qα (L) 1 − α − P(L ≥ qα (L)) 1−α . 1 − α qα (L) where IA is the indicator function: IA (x) = 1 if x ∈ A and 0 otherwise. is given by GESα (L) = 1 E(LI[qα (L).1 For a loss L with continuous distribution function FL the expected shortfall is given by ESα (L) = 1 1−α 1 VaRp (L)dp.∞) (L)) P(L ≥ qα (L)) 1 = E(LI[qα (L). 1) is given by ESα (L) = E(L  L ≥ VaRα (L)). what is the size of an “average loss” given that the loss exceeds the 99%ValueatRisk? Deﬁnition 3. it does not give any information about how bad losses may be when things go wrong. x ≥ 0.
Example 3. σ 2 ). 18 .7 Suppose that L ∼ N(0. Then 1 ESα (L) = 1−α = = 1 1−α ∞ Φ−1 (α) ∞ Φ−1 (α) ∞ ldΦ(l) lφ(l)dl 2 1 1 l √ e−l /2 dl 1 − α Φ−1 (α) 2π ∞ 2 1 1 = − √ e−l /2 1−α Φ−1 (α) 2π ∞ 1 − φ(l) = 1−α Φ−1 (α) φ(Φ−1 (α)) = . 1−α ν−1 −(ν+1)/2 .2 Let L have a standard Student’s tdistribution with ν > 1 degrees of freedom. 1−α = µ + σ ESα (L) Exercise 3. 1−α Suppose that L ∼ N(µ. Let φ and Φ be the density and distribution function of L. where tν denotes the distribution function of L. Then ESα (L ) = E(L  L ≥ VaRα (L )) = E(µ + σL  µ + σL ≥ VaRα (µ + σL)) = E(µ + σL  L ≥ VaRα (L)) = µ+σ φ(Φ−1 (α)) . 1). Then L has density function Γ((ν + 1)/2) x2 gν (x) = √ 1+ νπΓ(ν/2) ν Show that ESα (L) = gν (t−1 (α)) ν + (t−1 (α))2 ν ν .
If F is strictly increasing.s. .∞) (x). . then the empirical quantile is given by qα (Fn ) = X[n(1−α)]+1. .n . Xn such that X1.n . . . then Xj = Xk a. 1). Zm . . . The empirical estimator for expected shortfall is given by ESα (F ) = xk. where [y] is the integer part of y. for j = k). Zm ). based on the observations x1 . [y] = sup{n ∈ N : n ≤ y} (the largest integer less or equal to y). . . If we order the sample X1 . The setup is as follows.4 Methods for computing VaR and ES We will now introduce some standard methods for computing ValueatRisk and expected shortfall for a portfolio of risky assets. approximate conﬁdence bounds can be obtained using nonparametric techniques.s. . as n → ∞ for every α ∈ (0. . The empirical distribution function is then given by 1 Fn (x) = n n I[Xk . How can we use these observations to compute ValueatRisk or expected shortfall for the loss Lm+1 ? 4.n [n(1 − α)] + 1 [n(1−α)]+1 k=1 which is the average of the [n(1 − α)] + 1 largest observations. . . . . where f is a known function and Zm is a vector of riskfactors. As the true distribution is unknown explicit conﬁdence bounds can in general not be obtained. .1 Empirical VaR and ES Suppose we have observations x1 . 19 . Suppose we have observed the risk factors Zm−n+1 . xn we may estimate the quantile qα (F ) by the empirical estimate qα (F ) = x[n(1−α)]+1. These observations will be called historical data. The loss Lm+1 is then given by Lm+1 = l[m] (Xm+1 ). . . We consider a portfolio with value Vm = f (tm . k=1 The empirical quantile is then given by ← qα (Fn ) = inf{x ∈ R : Fn (x) ≥ α} = Fn (α). . Xn with distribution F . then qα (Fn ) → qα (F ) a. However. α ∈ (0. xn of iid random variables X1 . Thus.n ≥ · · · ≥ Xn. . The reliability of these estimates is of course related to α and to the number of observations. 1). This is described in the next section.n (if F is continuous.
Xn from an unknown distribution F and that we want to construct a conﬁdence interval for the risk measure (F ).2 Conﬁdence intervals Suppose we have observations x1 . xn ).n ≥ qα (F )) for diﬀerent i and j until we ﬁnd indices that satisfy (4. P(Xi. . . . xn ) such that P(A < qα (F ) < B) = p. . the number of sample points exceeding qα (F ). . Similarly. . However. . .n ≥ qα (F )) ≤ (1 − p)/2. . P(A ≥ (F )) = P(B ≤ (F )) = (1 − p)/2. Moreover. where a = fA (x1 . i..n ≤ qα (F )) and P(Xi. . Xn ) and B = fB (X1 . . . The interval (a. .n ≤ qα (F )) = P(Yα = 0). .4. . b) for the quantile qα (F ). P(Xj. .n ≤ qα (F )) = P(Yα ≤ j − 1). However.1) Let Yα = #{Xk > qα (F )}. . It is easily seen that Yα is Binomial(n. . That is. . . . such that P(A < (F ) < B) = p. where A = fA (X1 . . 4. fB . Typically we want a doublesided and centered interval so that P(A < (F ) < B) = p.n ≤ qα (F )) ≤ (1 − p)/2. .n ≤ qα (F )) = P(Yα ≤ 1). . . (4. . . . . . but not for arbitrary choices of conﬁdence levels p. . where p is a conﬁdence level and A = fA (X1 . Xn ) and B = fB (X1 . .n ) = p .n ≥ qα (F )) = 1 − P(Yα ≤ i − 1). . . xn ) and b = fB (x1 . we can compute P(Xj. .e. where a = fA (x1 . . xn of iid random variables X1 . b). Xn ) for some functions fA . F is unknown so we cannot ﬁnd suitable functions fA . xn ) and b = fB (x1 . . Since F is unknown we cannot ﬁnd a and b. then we can actually ﬁnd exact conﬁdence intervals for (F ).2. P(Xj. 20 . . .1 Exact conﬁdence intervals for ValueatRisk Suppose we have observations x1 . . 1) we want to ﬁnd a stochastic interval (A. . xn from iid random variables X1 . . Notice that P(X1. . P(X2. P(Xi. Unfortunately. Xn ). .. Xn with common unknown continuous distribution function F . . . we can look for i > j and the smallest p ≥ p such that P(Xi. we can construct approximate conﬁdence intervals. B). is a conﬁdence interval for (F ) with conﬁdence level p. . Hence. if (F ) is a quantile of F (ValueatRisk). Suppose further that we want to construct a conﬁdence interval (a.1). fB . . P(A ≥ qα (F )) = P(B ≤ qα (F )) = (1 − p)/2. 1 − α)distributed. . given p ∈ (0.n < qα (F ) < Xj.
1000 ) for VaR0.1 Suppose we have an iid sample X1 . Since Y0.8 (F ) with conﬁdence level p ≥ p = 0.1000.99 for samples of diﬀerent sizes and from diﬀerent distributions with VaR0.75. X10 with common unknown continuous distribution function F and that we want a conﬁdence interval for q0. .2) 21 . Example 4.Pareto 250 40 40 Pareto 1000 30 20 10 5 0 20 40 60 80 100 5 0 10 20 30 20 40 60 80 100 Exponential 250 20 20 Exponential 1000 15 10 5 0 20 40 60 80 100 5 0 10 15 20 40 60 80 100 5 0 10 15 20 25 30 35 20 40 60 80 100 Figure 5: Upper: Empirical estimates of VaR0. . Lower: Simulated 97. . . x4.6% conﬁdence intervals (x18. 0.99 = 10.8 is Binomial(10.99 (X) = 10 based on samples of size 1000 from a Pareto distribution.
x1. . i = 1. . . . . . . . Xn . .8 ≤ j − 1) we ﬁnd that P(X1.10 ) ≈ 0. . ∗ ∗ Then we may compute θ ∗ = θ(X1 . If n is relatively large we expect that Fn is a good approximation of F . . Xn ). . .10 ≤ q0.e. . i=1 is then an approximation of the true distribution of θ(X1 . risk measures such as ValueatRisk and expected shortfall. . . . First an estimator θ(x1 . e. Xn . xn ) = x[n(1−α)]+1. . Xn ) denoted F θ . . . xn ) of θ is constructed. and compute the estimator (i) (i) for each of these samples to get θi = θ(X1 .distributed and P(Xj. . N .g. . .8 (F )) ≈ 0. Xn ) ∗(i) ∗(i) we resample many (N large) times to create new samples X1 .125 and P(X4. . . . given by θ FN (x) = 1 N N ∗ I[θi .12.10 ) is a conﬁdence interval for q0. . . . .8 (F ) < X1. .2 Using the bootstrap to obtain conﬁdence intervals Using the socalled nonparametric bootstrap method we can obtain conﬁdence intervals for e. we can resample from Fn simply by drawing ∗ ∗ with replacement among X1 . The nonparametric bootstrap works as follows.95). . N . i = 1. . . .10 . . .77 so (x4. . . 4. . . To obtain the distribution of θ(X1 .∞) (x). .2. . . . . For each of these samples we compute the corresponding estimate of ∗(i) ∗(i) θ ∗ θ. .10 ≥ q0. . . . As N → ∞ the empirical distribution 1 N N I[θi . Xn . . We denote such a resample by X1 . . Xn ). .8 (F )) ≈ 0. . Xn ). Xn ). .11 and P(X4. . . .11. . Xn ) to approximate the ∗ ∗ true distribution of θ(X1 . . .8 (F )) = P(Y0. . i. . θ A conﬁdence interval is then constructed using A = q(1−p)/2 (FN ) and B = 22 . Xn . Suppose we have observations x1 . . . θi = θ(X1 . .12} ≤ (1 − p)/2 = 0. θ(x1 . Xn ). Moreover.∞) (x) e i=1 of θ(X1 .n . .10 < q0. . For instance θ could be the αquantile θ = qα (F ). . . . . Since Fn approximates the true ∗ ∗ distribution F we expect the distribution of θ(X1 . Xn ) will converge to the true distribution of θ(X1 . The problem is that F is not known. Xn . .10 ≤ q0. . . If F was known this distribution could be approximated arbitrarily well by simulating from F many (N large) times to (i) (i) construct new samples X1 . To construct a conﬁdence interval we need to know the distribution of θ(X1 . N . Xn ). . Notice that max{0. The empirical distribution FN .g. . . . Now we want to construct a conﬁdence interval for θ with conﬁdence level p (for instance p = 0. Xn and we want to estimate some parameter θ which depends on the unknown distribution F of the X’s.8 (F ) with conﬁdence level 77%. . . . . i = 1. . . xn of iid random variables X1 . . . . What is known is the empirical distribution Fn which puts point masses 1/n at the points X1 . 0.
θ ∗ ∗ q(1+p)/2 (FN ). This means that A = θ[N (1+p)/2]+1,N and B = θ[N (1−p)/2]+1,N , ∗ ∗ ∗ ∗ where θ1,N ≥ · · · ≥ θN,N is the ordered sample of θ1 , . . . , θN . The nonparametric bootstrap to obtain conﬁdence intervals can be summarized in the following steps: Suppose we have observations x1 , . . . , xn of the iid random variables X1 , . . . , Xn with distribution F and we have an estimator θ(x1 , . . . , xn ) of an unknown parameter θ.
• Resample N times among x1 , . . . , xn to obtain new samples x1 , . . . , xn , i = 1, . . . , N . • Compute the estimator for each of the new samples to get
∗ θi = θ(x1 , . . . , x∗(i) ), n ∗(i)
∗(i)
∗(i)
i = 1, . . . , N.
• Construct a conﬁdence interval Ip with conﬁdence level p as
∗ ∗ Ip = (θ[N (1+p)/2]+1,N , θ[N (1−p)/2]+1,N ), ∗ ∗ ∗ ∗ where θ1,N ≥ · · · ≥ θN,N is the ordered sample of θ1 , . . . , θN .
4.3
Historical simulation
In the historical simulation approach we suppose we have observations of riskfactors Zm−n , . . . , Zm and hence also the riskfactor changes Xm−n+1 , . . . , Xm . We denote these observations by xm−n+1 , . . . , xm . Using the loss operator we can compute the corresponding observations of losses lk = l[m] (xm−k+1 ), k = 1, . . . , n. Note that lk is the loss that we will experience if we have the risk factor change xm−k+1 over the next period. This gives us a sample from the loss distribution. It is assumed that the losses during the diﬀerent time intervals are iid. Then the empirical VaR and ES can be estimated using VaRα (L) = qα (FLn ) = l[n(1−α)]+1,n ESα (L) =
[n(1−α)]+1 li,n i=1
where l1,n ≥ · · · ≥ ln,n is the ordered sample. Similarly we can also aggregate over several days. Say, for instance, that we are interested in the ValueatRisk for the aggregate loss over ten days. Then we simply use the historical observations given by
10
[n(1 − α)] + 1
lk
(10)
to compute the empirical VaR and ES. Advantage: This approach is easy to implement and keeps the dependence structure between the components of the vectors of riskfactor changes Xm−k . Disadvantage: The worst case is never worse than what has happened in history. We need a very large sample of relevant historical data to get reliable estimates of ValueatRisk and expected shortfall. 23
= l[m]
j=1
xm−n+10(k−1)+j ,
k = 1, . . . , [n/10],
4.4
Variance–Covariance method
d ∆ L∆ = l[m] (Xm+1 ) = −(c + m+1 i=1
The basic idea of the variance–covariance method is to study the linearized loss wi Xm+1,i ) = −(c + wT Xm+1 )
where c = cm , wi = wm,i (i.e. known at time m), w = (w1 , . . . , wd )T are weights and Xm+1 = (Xm+1,1 , . . . , Xm+1,d )T the riskfactor changes. It is then assumed that Xm+1 ∼ Nd (µ, Σ), i.e. that the riskfactor changes follow a multivariate (ddimensional) normal distribution. Using the properties of the multivariate normal distribution we have wT Xm+1 ∼ N(wT µ, wT Σw). Hence, the loss distribution is normal with mean −c−w T µ and variance wT Σw. Suppose that we have n+1 historical observations of the riskfactors and the riskfactor changes Xm−n+1 , . . . , Xm . Then (assuming that the riskfactor changes are iid or at least weakly dependent) the mean vector µ and the covariance matrix Σ can be estimated as usual by 1 µi = n
n
Xm−k+1,i ,
k=1 n k=1
i = 1, . . . , d, i, j = 1, . . . , d.
1 Σij = n−1
(Xm−k+1,i − µi )(Xm−k+1,j − µj ),
The estimated VaR is then given analytically by VaRα (L) = −c − wT µ + wT ΣwΦ−1 (α).
Advantage: Analytic solutions can be obtained: no simulations required. Easy to implement. Disadvantage: Linearization not always appropriate. We need a short time horizon to justify linearization (see e.g. Example 3.6). The normal distribution may considerably underestimate the risk. We need proper justiﬁcation that the normal distribution is appropriate before using this approach. In later chapters we will introduce elliptical distributions, distributions that share many of the nice properties of the multivariate normal distribution. The VarianceCovariance method works well if we replace the assumption of multivariate normality with the weaker assumption of ellipticality. This may provide a model that ﬁts data better.
4.5
MonteCarlo methods
Suppose we have observed the riskfactors Zm−n , . . . , Zm and riskfactor changes Xm−n+1 , . . . , Xm . We suggest a parametric model for Xm+1 . For instance, that Xm+1 has distribution function F and is independent of Xm−n+1 , . . . , Xm . 24
When an appropriate model for Xm+1 is chosen and the parameters estimated we simulate a large number N of outcomes from this distribution to get x1 , . . . , xN . For each outcome we can compute the corresponding losses l1 , . . . , lN where lk = l[m] (xk ). The true loss distribution of Lm+1 is then approximated by the empirical distribution FLN given by FLN (x) = 1 N
N
I[lk ,∞) (x).
k=1
ValueatRisk and expected shortfall can then be estimated by VaRα (L) = qα (FLN ) = l[N (1−α)]+1,N ESα (L) =
[N (1−α)]+1 lk,N k=1
[N (1 − α)] + 1
Advantage: Very ﬂexible. Practically any model that you can simulate from is possible to use. You can also use time series in the MonteCarlo method which enables you to model time dependence between riskfactor changes. Disadvantage: Computationally intensive. You need to run a large number of simulations in order to get good estimates. This may take a long time (hours, days) depending on the complexity of the model. Example 4.2 Consider a portfolio consisting of one share of a stock with stock price Sk and assume that the log returns Xk+1 = ln Sk+1 − ln Sk are iid with distribution function Fθ , where θ is an unknown parameter. The parameter θ can be estimated from historical data using for instance maximum likelihood and given the information about the stock price S0 today, the ValueatRisk for our portfolio loss over the time period todayuntiltomorrow is
← VaRα (L1 ) = S0 (1 − exp{Fθ (1 − α)}),
i.e. the αquantile of the distribution of the loss L1 = −S0 (exp{X1 } − 1). How← ever, the expected shortfall may be diﬃcult to compute explicitly if Fθ has a complicated expression. Instead of performing numerical integration we may use the MonteCarlo approach to compute expected shortfall. Example 4.3 Consider the situation in Example 4.2 with the exception that the log return distribution is given by a GARCH(1, 1) model: Xk+1 = σk+1 Zk+1 ,
2 2 2 σk+1 = a0 + a1 Xk + b1 σk ,
where the Zk ’s are independent and standard normally distributed and a0 , a1 and b1 are parameters to be estimated. Because of the recursive structure it is very easy to simulate from this model. However, analytic computation of the tenday ValueatRisk or expected shortfall is very diﬃcult.
25
Extreme Value Theory (EVT) provides the tools for an optimal use of the loss data to obtain accurate estimates of probabilities of large losses or. This also means that accurate estimates of ValueatRisk (VaR) and Expected Shortfall (ES) can be obtained. Extreme events are particularly frightening because although they are by deﬁnition rare. if it is heavier than the right tail of every exponential distribution. This suggests that the claims have a heavytailed distribution. Empirical investigations have shown that daily and higherfrequency returns from ﬁnancial assets typically have distributions with heavy tails. . x→∞ e−λx i. empirical investigations often support the use of heavytailed distributions. Xn of iid random variables but we don’t know the distribution of X1 . x large. In order to present the material and derive the expressions of the estimators without a lot of technical details we focus on EVT for distributions with “heavy tails” (see below). Empirical estimates of probabilities of losses larger than what has been observed so far are useless: such an event will be assigned zero probability. they may cause severe losses to a ﬁnancial institution or insurance company. However. . of extreme events. One would typically suggest a reference distribution F and want to test whether it is reasonable to assume that the data is 26 . lim 5. We may observe that there are a few claims much larger the ’everyday’ claim.e. EVT methods can extrapolate information from the loss data to obtain meaningful estimates of the probability of rare events such as large losses. empirical estimates have poor accuracy. of the distribution function F heavy if F (x) = ∞ for every λ > 0. Although there is no deﬁnition of the meaning of “heavy tails” it is common to consider the right tail F (x) = 1 − F (x). To illustrate the methods we will consider a dataset consisting of claims in million Danish Kroner from ﬁre insurance in Denmark. more generally. . under certain conditions. In this and the following two chapters we will present aspects of and estimators provided by EVT.1 Quantilequantile plots In this section we will consider some useful practical methods to study the extremal properties of a data set. Even if the loss lies just within the range of the loss data set. Suppose we have a sample X1 . To get an indication of the heaviness of the tails it is useful to use socalled quantilequantile plots (qqplots). We will now study the useful class of heavytailed distributions with regularly varying tails. It is also not unusual to consider a random variable heavytailed if not all moments are ﬁnite.5 Extreme value theory for random variables with heavy tails Given historical loss data a risk manager typically wants to estimate the probability of future large losses to assess the risk of holding a certain portfolio. . Moreover.
27 . n . If we deﬁne the ordered sample as Xn. Thus. if F has heavier tails than the data then the plot will curve down at the left and/or up at the right and the opposite if the reference distribution has too light tails.0 0 50 100 150 500 1000 1500 2000 Figure 6: Claims from ﬁre insurance in Denmark in million Danish Kroner. i. G← n+1 n+1 : k = 1.n ≤ · · · ≤ X1. n and interpret the result. . .1 Consider the distribution functions F and G given by F (x) = 1 − e−x (x > 0) and G(x) = 1 − x−2 (x > 1). Given a reference distribution F . Exercise 5. . The qqplot is particularly useful for studying the tails of the distribution. .e. F ← n−k+1 n+1 : k = 1. Plot F← n−k+1 n−k+1 . . . from the associated locationscale family Fµ. If the data has a similar distribution as the reference distribution then the qqplot is approximately linear.n . distributed according to this distribution. .n . .n ≤ Xn−1. the qqplot enables us to come up with a suitable class of distributions that ﬁts the data and then we may estimate the location and scale parameters.σ (x) = F ((x−µ)/σ). then the qqplot consists of the points Xk. An important property is that the plot remains approximately linear if the data has a distribution which is a linear transformation of the reference distribution.
2. The plots are approximately linear which indicates that the data may have a Pareto(α)distribution with α ∈ (1. then h(x)/xρ ∈ RV0 . Upper right.1) is uniform on intervals 28 . (5. Hence.0 0 50 100 150 0 0 500 2 1000 4 1500 6 2000 50 100 150 150 100 50 0 0 50 100 150 0 0 10 20 30 40 50 100 150 Figure 7: Quantilequantile plots for the Danish ﬁre insurance claims.5. then the convergence in (5. Deﬁnition 5. If h ∈ RVρ . Slowly varying functions are generically denoted by L.2 Regular variation We start by introducing regularly varying functions. Upper left: qqplot with standard exponential reference distribution. ∞) is regularly varying at ∞ with index ρ ∈ R (written h ∈ RVρ ) if h(tx) = xρ t→∞ h(t) lim for every x > 0. lower left. 5. The plot curves down which indicates that data has tails heavier than exponential. If ρ < 0. setting L(x) = h(x)/xρ we see that a function h ∈ RVρ can always be represented as h(x) = L(x)xρ . then we call h slowly varying (at ∞). ∞) → (0. lower right: qqplot against a Pareto(α)distribution for α = 1.1) If ρ = 0. 2). 1.1 A function h : (0.
29 .40 30 20 10 0 0 10 20 30 40 50 60 0 0 10 20 30 40 5 10 15 20 25 30 35 40 30 20 10 0 0 200 400 600 800 1200 0 0 10 20 30 40 20 40 60 80 100 6 4 2 0 0 1 2 3 4 5 6 0 0 2 4 6 1 2 3 4 5 6 6 4 2 0 0 1 2 3 4 5 6 0 0 2 4 6 2 4 6 8 Figure 8: Quantilequantile plots for 1949 simulated data points from a distribution F with F as reference distribution. The upper four plots: F = Pareto(2). The lower four plots: F = Exp(1).
Then F (tx)/F (t) = x−α for t > 0.1 If X is a nonnegative random variable with distribution function F satisfying F ∈ RV−α for some α > 0. Remark 5. h(t) The most natural examples of slowly varying functions are positive constants and functions converging to positive constants. Note however that a slowly varying function can have inﬁnite oscillation in the sense that lim inf x→∞ L(x) = 0 and lim supx→∞ L(x) = ∞.1 (1) Let F (x) = 1−x−α .∞) lim sup h(tx) − xρ = 0.e. The following functions L are slowly varying.[b. Assume further that F has a regularly varying right tail. then E(X β ) < ∞ if β < α. Although the converse does not hold in general. (2) Let Deﬁnition 5. ∞) for b > 0. Example 5. for x ≥ 1 and α > 0. An investor has bought two shares of the ﬁrst risky asset and the probability of a portfolio loss greater than l is thus given by P(2X1 > l).2 Consider two risks X1 and X2 which are assumed to be nonnegative and iid with common distribution function F .2 A nonnegative random variable is said to be regularly varying if its distribution function F satisﬁes F ∈ RV−α for some α ≥ 0. ∞). (a) limx→∞ L(x) = c ∈ (0. E(X β ) = ∞ if β > α. Hence F ∈ RV−α . F ∈ RV−α . i. it is useful to think of regularly varying random variables as those random variables for which the βmoment does not exist for β larger than some α > 0. t→∞ x∈[b. i. (c) L(x) = ln(1 + ln(1 + x)). Can the loss probability be made smaller by changing to the well diversiﬁed portfolio with one share of each risky asset? To answer this question we study the following ratio of loss probabilities: P(X1 + X2 > l) P(2X1 > l) 30 . (d) L(x) = ln(e + x) + sin x. Other examples are logarithms and iterated logarithms.e. Example 5. (b) L(x) = ln(1 + x).
However.l) We have l→∞ lim g(α. X1 > εl. 1). P(X1 + X2 > l) = 2 P(X1 + X2 > l. 2 P(X1 > l) − P(X1 > l)2 P(X1 + X2 > l) ≤ P(2X1 > l) P(2X1 > l) g(α. l) = 21−α (1 − ε)−α . for α > 1 (the risks have ﬁnite means) diversiﬁcation reduces the probability of large losses. X1 ≤ εl) + P(X1 + X2 > l. . Hence. P(2X1 > l) h(α.l) ≤ 2 P(X2 > (1 − ε)l) + P(X1 > εl)2 . Since ε > 0 can be chosen arbitrary small we conclude that lim P(X1 + X2 > l) = 21−α . Hence. for ε ∈ (0. l) = 2 lim P(X1 > l) = 21−α l→∞ P(X1 > l/2) and similarly liml→∞ h(α.4 Let X and Y be positive random variables representing losses in two lines of business (losses due to ﬁre and car accidents) of an insurance 31 .for large l.3 Let X1 and X2 be as in the previous example. Example 5. 1) suﬃciently large VaRp (X1 ) + VaRp (X2 ) = 2 VaRp (X1 ) = VaRp (2X1 ) = inf{l ∈ R : P(2X1 > l) ≥ p} ≤ inf{l ∈ R : P(X1 + X2 > l) ≥ p} = VaRp (X1 + X2 ). . X2 > εl) ≤ 2 P(X2 > (1 − ε)l) + P(X1 > εl)2 . and P(X1 + X2 > l) ≥ P(X1 > l or X2 > l) = 2 P(X1 > l) − P(X1 > l)2 . . for p ∈ (0. P(2X1 > l) l→∞ This means that for α < 1 (very heavy tails) diversiﬁcation does not give us a portfolio with smaller probability of large losses. and let α ∈ (0. Example 5. 1/2). We saw that for l suﬃciently large we have P(X1 + X2 > l) > P(2X1 > l). We have. .
At the second to last step above. P(X + Y > x) = P(X + Y > x. 1). choosing ε arbitrarily small gives x→∞ lim P(X + Y > x) = 1. Moreover. P(X > x) Hence. suppose that E(Y k ) < ∞ for every k > 0. X ≤ (1 − ε)x) ≤ P(X + Y > x. 32 . lim P(X > x  X + Y > x) = lim P(X > x. X > (1 − ε)x) + P(X + Y > x. 1) and x > 0. i. The insurance company wants to compute limx→∞ P(X > x  X + Y > x) to know the probability of a large loss in the ﬁre insurance line given a large total loss. Suppose that X has distribution function F which satisﬁes F ∈ RV−α for α > 0. 1≤ P(X + Y > x) P(X > x) P(X > (1 − ε)x) P(Y > εx) ≤ + P(X > x) P(X > x) E(Y 2α ) P(X > (1 − ε)x) + ≤ P(X > x) (εx)2α P(X > x) → (1 − ε)−α + 0 as x → ∞. it is likely that this is due to a large loss in the ﬁre insurance line only. = lim x→∞ P(X + Y > x) x→∞ We have found that if the insurance company suﬀers a large loss.e. We have. Y > εx) ≤ P(X > (1 − ε)x) + P(Y > εx).company. Hence. X > (1 − ε)x) + P(X + Y > x. for every ε ∈ (0. X + Y > x) x→∞ P(X + Y > x) P(X > x) = 1. Since this is true for every ε ∈ (0. Markov’s inequality was used. Y has ﬁnite moments of all orders.
n − ln Xk. .n − ln Xk. . The same result holds if we replace k − 1 by k. 33 . F (x) = x−α L(x) for some α > 0 and a slowly varying function L. Then 1 F n (Xk.n ).n ) −1 .6 Hill estimation Suppose we have an iid sample of positive random variables X1 . Using integration by parts we ﬁnd that 1 F (u) ∞ u (ln x − ln u)dF (x) ∞ u ∞ 1 − (ln x − ln u)F (x) + = u F (u) ∞ 1 x−α−1 L(x)dx. by the Karamata Theorem. (6.n → ∞ a. 1 F (u) ∞ u (ln x − ln u)dF (x) → 1 α as u → ∞. This gives us the Hill estimator (H) αk.n . = −α u L(u) u F (x) dx x Hence. We will use a result known as the Karamata Theorem which says that for β < −1 ∞ u xβ L(x)dx ∼ −(β + 1)−1 uβ+1 L(u) as u → ∞.n ∞ k−1 j=1 (ln Xj. Xn from an unknown distribution function F with a regularly varying right tail. In this section we will describe Hill’s method for estimating α.s. .∞) (x) k=1 and replace u by a high data dependent level Xk.n − ln Xk. where ∼ means that the ratio of the left and right sides tends to 1.n ) → 1 α as n → ∞. as n → ∞ and by (6. That is. then Xk.1) To turn this into an estimator we replace F by the empirical distribution function 1 Fn (x) = n n I[Xk . .n ) 1 (ln x − ln Xk.n )dFn (x) = k−1 Xk.1) 1 k−1 k−1 j=1 P (ln Xj.n 1 = k k j=1 (ln Xj. If k = k(n) → ∞ and k/n → 0 as n → ∞.
αk. then (H) αk. The plot of the pairs (k.1 Selecting the number of upper order statistics k j=1 We have seen that if k = k(n) → ∞ and k/n → 0 as n → ∞.5 3 2.n = 1 k (ln Xj. We now construct an estimator for the 5 (H) 4.n − ln Xk. .n ) ≈ n 34 . based on the sample points X1 .n (H) ≈ x Xk.n α (H) k Fn (Xk. . .5 4 3.n −bk. We may estimate (H) (H) α by αk. Notice that F (x) = F ≈ x Xk. .n ) −1 P → α as n → ∞.9 (k = 50) or αk.n ) x Xk.5 1 0 50 100 150 200 250 300 350 400 450 500 Figure 9: Hill plot of the Danish ﬁre insurance data.n . tail probability F (x). . k should not be chosen too small since the small number of data points would lead to a high variance for the estimator. On the other hand.n ) : k = 2. This is illustrated graphically in Figure 9.5 2 1.n is a good estimate of α.n −αk. n is called the Hill plot. . The plot looks stable for all k (typically this is NOT the case for heavytailed data).n x Xk. In practice however we have a sample of ﬁxed size n and we need to ﬁnd a (H) suitable k such that αk.n = 1. Xn and (H) the the Hill estimate αk.5 (k = 250). .6.n Xk. k should not be chosen too large so the estimate is based on sample points from the center of the distribution (this introduces a bias).n −α F (Xk. . . On the one hand. for x large.n = 1. An estimator of α is obtained by graphical inspection of the Hill plot and an estimate of α should be taken for values of k where the plot is stable.
n −bk.n α (H) ≤1−p −1/b k. More information about Hill estimation can be found in [8] and [12].n (H) −bk. qp (F ) = inf x ∈ R : F (x) ≤ 1 − p k = inf x ∈ R : n n (1 − p) = k x Xk.n α Xk. the following estimator seems reasonable: k F (x) = n x Xk. This leads to an estimator of the quantile qp (F ) = F ← (p).n .This argument can be made more rigorously. Hence. 35 .n α (H) .
Recall from (7.3) .β(u)  = 0. .and quantile estimators based on the sample points X1 . Xn . For γ > 0 and β > 0. .β is given by Gγ. i. from expression (7. .1) that F (u + x) = F (u)F u (x). It turns out that the distribution of appropriately scaled excesses Xk − u over a high threshold u is typically well approximated by a distribution called the generalized Pareto distribution. . Then u→∞ lim P X−u P(X > u(1 + x/α)) > x  X > u = lim u→∞ u/α P(X > u) = (1 + x/α)−α = G1/α. . .7 The Peaks Over Threshold (POT) method Suppose we have an iid sample of random variables X1 . n} : Xi > u} be the number of exceedances of u by X1 . Hence. . . We now demonstrate how these ﬁndings lead to natural tail. Xn from an unknown distribution function F with a regularly varying right tail. Suppose that X is a random variable with distribution function F that has a regularly varying right tail so that limu→∞ F (λu)F (u) = λ−α for all λ > 0 and some α > 0. . . 36 (7.1) for x ≥ 0. Choose a high threshold u and let Nu = #{i ∈ {1. . (7. . . This fact can be used to construct estimates of tail probabilities and quantiles. Xn . The excess distribution function of X over the threshold u is given by Fu (x) = P(X − u ≤ x  X > u) Notice that Fu (x) = F (u + x) F (u(1 + x/u)) = . u→∞ λ≥1 lim sup F (λu)/F (u) − λ−α  = 0. Since F is regularly varying with index −α < 0 it holds that F (λu)/F (u) → λ−α uniformly in λ ≥ 1 as u → ∞.2) where γ = 1/α and β(u) ∼ u/α as u → ∞. the generalized Pareto distribution (GPD) function Gγ.1) we see that u→∞ x>0 lim sup Fu (x) − Gγ.e. F (u) F (u) (7. .β (x) = 1 − (1 + γx/β)−1/γ for x ≥ 0. .1 (x). .
β(u) (x) 37 . . Each step will be discussed further below. (ii) Combine steps (i) and (ii) to get estimates of the form (7. (i) Choose a high threshold u using some statistical method and count the number of exceedances Nu .β(u) (x) ≈ Gγ . If the threshold is too low then we have more data but on the other hand the approximation F u (x) ≈ Gγ.3) then suggests a method for estimating the tail of F by estimating F u (x) and F (u) separately.5) −1 .If u is not too far out into the tail. (7. estimate the parameters γ and β. qp (F ) = inf{x ∈ R : F (x) ≤ 1 − p} = inf{u + x ∈ R : F (u + x) ≤ 1 − p} = u + inf =u+ β γ Nu x∈R: n n (1 − p) Nu 1+γ γ −b x β −1/b γ ≤1−p (7. YNu of excesses. Relation (7. makes sense.4) immediately leads to the following estimator of the quantile qp (F ) = F ← (p). The POT method for estimating tail probabilities and quantiles can be summarized in the following recipe. . The choice of a suitable high threshold u is crucial but diﬃcult. a natural estimator for F (u + x) is Nu F (u + x) = n 1+γ x β −1/b γ .1 How to choose a high threshold. The rest of this section will be devoted to step (i) and (ii): How do we choose a high threshold u in a suitable way? and How can one estimate the parameters γ and β? 7. Hence.β (x) b b = 1+γ x β −1/b γ . If we choose u too large then we will have few observations to use for parameter estimation resulting in poor estimates with large variance. (7. .4) and (7. . Moreover.2) shows that the approximation F u (x) ≈ Gγ. then the empirical approximation F (u) ≈ Fn (u) = Nu /n is accurate. where γ and β are the estimated parameters.5).4) Expression (7. (ii) Given the sample Y1 .
then we can deﬁne the mean excess function as e(u) = E(X − u  X > u) = For the GPD Gγ. . With u = 10 we get N10 = 24 exceedances which is about 1. Given the shape of the meanexcess plot we have selected the threshold u = 6 since the shape of the meanexcess plot does not change much with u ∈ [4. . E(I(u. The main idea when choosing the threshold u is to look at the meanexcess plot (see below) and choose u such that the sample meanexcess function is approximately linear above u. . In the example with the Danish ﬁre insurance claims it seems reasonable to choose u between 4.∞) (X)) If E(X) < ∞.5 and 10. the mean excess function is linear for the GPD. integration by parts gives E((X − u)I(u.5.β . . A graphical test for assessing the tail behavior may be performed by studying the sample meanexcess function based on the sample X1 .2% of the data. en (Xk. Xn . Xn . .∞) (X)) = −(x − u)(1 + γx/β) = = ∞ u −1/γ ∞ u + ∞ u (1 + γx/β)−1/γ dx (1 + γx/β)−1/γ dx β (1 + γu/β)−1/γ+1 . . with γ < 1. Hence. 1−γ Moreover. 7. e(u) = β + γu . The meanexcess plot is the plot of the points {(Xk. the sample meanexcess function is given by 1 en (u) = Nu n k=1 (Xk − u)I(u. 38 . With u = 4.2 Meanexcess plot E((X − u)I(u.n . . . . .will be questionable.5 = 101 exceedances which is about 5% of the data.∞) (X)) = P(X > u) = (1 + γu/β)−1/γ . . . n}. 1−γ In particular.n )) : k = 2. 10] we get a reasonable amount of data.∞) (X)) . E(I(u. as above. With Nu being the number of exceedances of u by X1 .∞) (Xk ).5 we get N4. If the meanexcess plot is approximately linear with positive slope then X1 may be assumed to have a heavytailed Paretolike tail.
7 0.5 1 1.3 0.2 1.4 0. . .6 0. 1.3 1.0. .8 0.7 0. We assume that 39 .5 0. YNu over u.2 0.5 3 3.1 1 0.4 0 1 2 3 4 5 6 7 8 9 Figure 11: Meanexcess plot of data simulated from an exponential distribution. 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500 3000 3500 Figure 12: Meanexcess plot of data simulated from a Pareto(1) distribution.3 Parameter estimation Given the threshold u we may estimate the parameters γ and β in the GPD based on the observations of excesses Y1 .8 0.1 0 0.6 0.5 4 Figure 10: Meanexcess plot of data simulated from a Gaussian distribution.9 0. .5 0.5 2 2. 7.
. Having estimated the parameters we can now visually observe how the approximation (7. integration by parts show that E(Z) = 0 ∞ zdF (z) = 0 ∞ 0 ∞ zd(1 − F )(z) = − F (z)dz = 0 ∞ ∞ 0 zdF (z) = − zF (z) + 0 ∞ P(Z > z)dz. . β Σ−1 = (1 + γ) 1+γ −1 −1 2 Using the threshold u = 6 (gives Nu = 56) we obtain the following estimates for the Danish ﬁre insurance data: γ = 0. β. . β Maximizing the loglikelihood numerically gives estimates γ and β. Hence.β and hence the likelihood function becomes Nu L(γ. .β (y) = β y 1+γ β −1/γ−1 . YNu ) = −Nu ln β − 1 +1 γ Nu ln 1 + i=1 γ Yi . Y1. The MLE is approximately normal (for large Nu ) γ − γ. . E(I(qp . Recall that ESp (X) = E(X  X > VaRp (X)) = E(XI(qp . by deﬁnition.β (Yi ).POT β =u+ γ n (1 − p) Nu −b γ −1 . β.58. β − 1 ≈ N2 (0. For a nonnegative random variable Z with distribution function F .∞) (X)) where qp = VaRp (X)) and we have E(I(qp . Instead of maximizing the likelihood function we can maximize the loglikelihood function given by ln L(γ. β = 3.∞) (X)) .4 Estimation of ValueatRisk and Expected shortfall Recall that ValueatRisk at conﬁdence level p for a risk X with distribution function F is.60. 40 . see Figure 14. YNu ) = i=1 gγ. . the quantile qp (F ). Σ−1 /Nu ).4) works. the POT method gives the ValueatRisk estimator VaRp. the POT method leads to an estimator for Expected shortfall. Y1.the excesses have distribution function Gγ.∞) (X)) = F (qp ). 1 gγ. . . 7. Similarly.
010 0.025 0.∞) (X).0 0 20 40 60 80 20 40 60 80 Figure 13: Meanexcess plot of the Danish ﬁre insurance data. 41 . The plot looks approximately linear indicating Paretolike tails.005 0.∞) (X)) = qp F (qp ) + = qp F (qp ) + ∞ qp ∞ qp F (t)dt F (u)F u (t − u)dt.015 0. If p is suﬃciently large so that qp > 0. 0. then this can be applied to the nonnegative random variable XI(qp . This gives E(XI(qp .020 0.030 10 15 20 25 Figure 14: The empirical tail of the Danish data and the POT approximation.
POT .POT = qp + γ (1 + γ(qp − u)/β)−1/b = qp + β + γ(qp − u) . We may now use the estimator F u (t − u) = Gγ . 1−γ More information about the POT method can be found in [8] and [12]. F (u) ESp (X) = qp + F (qp ) ∞ qp F u (t − u)dt = qp + ∞ qp F u (qp − u) F u (t − u)dt . 42 . with qp = b b VaRp.Hence. ∞ (1 qp b γ + γ(t − u)/β)−1/b dt ESp.β (t − u) to obtain.
F (x1 . . ∞.i and Xn. ..1 Basic properties of random vectors The probability distribution of ddimensional random vector X = (X1 . . . . ∞. . We aim at constructing useful models for the vector of risk factor changes Xn . Xd ) is completely determined by its joint distribution function F The ith marginal distribution Fi of F is the distribution of Xi and is given by Fi (xi ) = P(Xi ≤ xi ) = F (∞. xi . . . At this stage we will assume that all vectors of risk factor changes (Xn )n∈Z are iid but we allow for dependence between the components of Xn . . Xd ≤ xd ) = P(X ≤ x). Example 8. .j are dependent whereas Xn. . Recall that the distribution of a random vector X is completely determined by its characteristic function given by φX (t) = E(exp{i tT X}). 2 t ∈ Rd . . . . −∞ f (u1 . 43 . typically we assume that Xn. . . . . 2 x ∈ Rd . t ∈ Rd . The distribution F is said to be absolutely continuous if there is a function f ≥ 0. . such that x1 xd F (x) = F (x1 . . ..j . The components of X are independent if and only if d F (x) = i=1 Fi (xi ) or equivalently if and only if the joint density f (if the density exists) satisﬁes d f (x) = i=1 fi (xi ).i and Xn+k. dud and then f is called the density of F . That is. . ∞). . . xd ) = −∞ .8 Multivariate distributions and dependence We will now introduce some techniques and concepts for modeling multivariate random vectors. . xd ) = P(X1 ≤ x1 . integrating to one. . . are independent (for k = 0). 8. . . . Its characteristic function is given by 1 φX (t) = exp i tT µ − tT Σt .1 The multivariate normal distribution with mean µ and covariance matrix Σ has the density (with Σ being the absolute value of the determinant of Σ) f (x) = 1 (2π)d Σ exp 1 − (x − µ)T Σ−1 (x − µ) . ud )du1 .
i. . The mean vector of X is µ = E(X) and the covariance matrix of X is Cov(X) = E[(X − µ)(X − µ)T ]. British pound (gbp) and Japanese yen (jpy) quoted against the US dollar are illustrated in Figure 15. . Xd )T be a random (column) vector with E(Xk ) < ∞ for every k. then ← X2 = T (X1 ) a. then X1 and X2 are said to be comonotonic. That is. We estimate the mean and the covariance matrix of each data set (pairwise) and simulate from a bivariate normal distribution with this mean and covariance matrix.s. 8. . To illustrate this fact we study daily log returns of exchange rates. If both α and β are increasing. j)entry (ith row.2 Joint log return distributions Most classical multivariate models in ﬁnancial mathematics assume that the joint distribution of log returns is a multivariate normal distribution.8. If the distribution functions F1 and F2 are continuous. . β : R → R and a random variable Z such that d (X1 . This assumption is not well supported by empirical ﬁndings. Here Cov(X) is a d × d matrix whose (i.e. the simulated data have too few points far away from the mean. then X1 and X2 are said to be countermonotonic. then X2 = T (X1 ) a.s. Figure 17. with T = F2 ◦ (1 − F1 ). the tails of the log return data are heavier than the simulated multivariate normal data. X2 ) be a bivariate random vector and suppose there exist two monotone functions α. Notice the strong dependence between large drops in the BMW and Siemens stock prices. Pairwise log returns of the Swiss franc (chf). β(Z)). Consider for instance the data set consisting of log returns from BMW and Siemens stocks. To ﬁnd a good model for the BMW and Siemens data we need to be able to handle more advanced dependence structures than that oﬀered by the multivariate normal distribution. jth column) is Cov(Xi . The dependence of large drops seems stronger than for ordinary returns. This is something that cannot be modeled by a multivariate normal distribution. Xj ).3 Comonotonicity and countermonotonicity Let (X1 . However. German mark (dem). Xi ) = var(Xi ). 44 . If the distribution functions F1 and F2 are continuous. the main reason for this assumption is mathematical convenience. the variance of Xi . Another example shows that not only does the multivariate normal distribution have too light tails but also the dependence structure in the normal distribution may be inappropriate when modeling the joint distribution of log returns.4 Covariance and linear correlation 2 Let X = (X1 . By comparing the two ﬁgures we see that although the simulated data resembles the true observations. If α is increasing and β is decreasing. see Figure 16. 8. Now we ask whether a multivariate normal model would be suitable for the log return data. X2 ) = (α(Z). Notice that Cov(Xi . with ← T = F2 ◦ F1 .
02 0.04 −0.02 0.02 −0.00 dem dem 0.02 0.04 0.00 gbp 0.00 chf chf 0.04 −0. 45 .04 Figure 15: log returns of foreign exchange rates quotes against the US dollar.02 0.04 −0.02 0.04 −0.04 −0.02 0. If B is a constant k × d matrix and b is a constant kvector (column vector).04 −0.00 jpy 0.02 0.04 −0.04 −0.02 0.04 −0.02 0.02 0.02 −0.02 0. then Y = BX + b has mean E(Y) = Bµ + b and covariance matrix Cov(Y) = E[(Y − E(Y))(Y − E(Y))T ] = E[B(X − µ)(X − µ)T B T ] = B[E(X − µ)(X − µ)T ]B T = BCov(X)B T .02 −0.04 −0.04 −0.04 −0.04 −0.04 −0.00 chf 0.02 0.00 dem gbp 0.02 0.04 0.00 −0.02 0.02 0.02 0.04 0. Covariance is easily manipulated under linear (aﬃne) transformations.00 −0.02 0.04 −0.04 −0.02 0.02 0.00 jpy 0.04 −0.00 jpy 0.04 −0.04 0.04 0.02 0.0.02 0.00 −0.00 gbp 0.
02 0.04 0.0.00 jpy 0.04 −0.02 0.02 0.02 0.04 −0. X2 ) .04 −0.04 0.00 gbp 0.02 −0.04 Figure 16: Simulated log returns of foreign exchange rates using bivariate normal distribution with estimated mean and covariance matrix.04 −0.00 jpy 0.04 −0.04 −0. 46 . If X1 and X2 are independent.02 −0.04 −0.02 0.00 dem gbp 0. var(X2 ) ∈ (0.00 chf chf 0.04 −0.02 0.04 −0.04 0. X2 ) var(X1 ) var(X2 ) L (X1 . then the linear correlation coeﬃcient is L (X1 .00 −0.04 −0.04 −0.02 0.04 −0.04 −0.02 0.02 0.00 jpy 0. then = 0 but the converse is false.02 −0.00 dem dem 0.02 0.00 −0.00 chf 0.02 0.04 −0. X2 ) L (X1 .02 0.04 −0.02 0.04 −0.00 gbp 0.02 0. If var(X1 ). ∞). X2 ) = Cov(X1 .04 −0.02 0.04 0.02 0.02 0.02 0.00 −0.02 0.02 0.04 0.04 −0.02 0.
However.0 0.05 0. where sign(b1 b2 ) = b1 b2 /b1 b2  if b1 b2 = 0 and 0 otherwise. for two random variables we have in general L (T (X1 ). a2 + b2 X2 ) = sign(b1 b2 ) L (X1 . 1) and X2 = X1 . In particular. That is. 2 Example 8. then Cov(X1 .10 Figure 17: log returns from BMW and Siemens stocks.05 • 0.2 If X1 ∼ N(0. X2 ).0 BMW 0. T (X2 )) = L (X1 .05 0. L (X1 .• • 0. for a1 . if and only if there exist a ∈ R and b = 0 such that X2 = a + bX1 . X2 ) = 0 but clearly X1 and X2 are strongly dependent. This is a weakness of linear correlation as a measure of dependence.15 0. X2 ) = 0 need not imply that X1 and X2 are independent. X2 ) = E[(X1 − 0)(X2 − 1)] = E(X1 X2 ) − 0 · 1 ∞ 2 1 3 = E(X1 ) = x3 √ e−x /2 dx = 0. L (X1 . the linear correlation between T (X1 ) and T (X2 ) may have changed (falsely) indicating that the dependence has changed.10 0. That is. a2 ∈ R and b1 . If we transform X1 and X2 by a strictly increasing transformation we have only rescaled the marginal distributions. X2 ) = 1 if and only if X1 and X2 are perfectly linear dependent. we have not changed the dependence between X1 and X2 . 2π −∞ Hence. b2 = 0 we have L (a1 + b1 X1 . The linear correlation coeﬃcient is invariant under strictly increasing linear transformations. linear correlation is not invariant under nonlinear strictly increasing transformations T : R → R.10 • • • • • • • • • • •• • • • • • • •• • ••• • ••• ••• • •• •• •••• • • • • • • • • •••• ••• • • • • • ••• • • ••••••••• •••••• •• • •• • • ••• • • •• •• • ••• • • • • •• • • • •• • • • • •• ••••••••••••••••••••••••••••• ••• • • •• • •• • • • •• •••••• •••••••• •• • •••• •• • •• •••••••••••••••••••••••••••••••••• • • • •• •••••••••••••••••••••• ••• • •• • •••••••••••••••••••••• • • •• •••• •••••••••••••••••••••••••••••••••••••••••••••• • • • • • • •• • • ••• ••••• •••• • ••• • • ••••••••••••••••••••••••••••••• • • •• • •• •••••••••••••••••••••••••••••••••••••••• • •• • •••••••••••••••••••••••••••••••••••••••••••••••••••••• ••• • • •• •• •• • ••• ••••••• ••••••••••••••••••••••• •• • •••••••••••••••••••••••••••••••• • •• ••• ••• ••• ••••••••••••• •• • • •• • ••••••••••••••••••••••••••••••••••••••••••••••• •• ••• • • •••••••••••••••••••••••••••••••• • •• ••• ••••••••••••••••••••••••••••••••••••••••••• •• ••• • ••••••••••••••••••••••••••• • • • • •••••••• ••••••• • • •• •••••••••••••••••••••••••••••••••••••••••••• •••••• • •••• ••••••••••••••••••••••••••••••••••• •• • • •••••••••••••••••• •• • • •• •• •• • ••••••••• •• • ••••••••••••••••••••••• ••• • • •• • • • • ••••••••••••••••••••••••••••••••••••••••••••••••••••• •••• • • • • • ••• ••••••••••••••••••••••••••••••••••• • • • • ••••••••••••••• • • • •• •• •••••••••••••••••••••••••••• •• ••• •• •••••••••••••••••••••••••••••••••••••••••••• ••••••• • •• • •••••••••••••••• •• • •• ••• • • • •• • •• ••• • ••••••••••••••••••••• • •• • • • •• • •• •••• •• • ••• •••••••••••••••••••••• ••••• • • •• •• •••• •• •••••••••••••• •• • • • • • •• • • • • ••••• • •• •••• • •• • • • • • • •• ••• ••• ••• •••• •• • ••• • • • • • • •• • • • • • • •• • • • • • • • • • • •• • • • • • • • • • • • • • • •• • • • • • • • • • Siemens 0. X2 ). However. 47 . Moreover  L (X1 .05 • • • •• • • • • • 0.
X2 ) be a random vector with marginal distribution functions F1 and F2 . we say that the two points are concordant if (x1 − x1 )(x2 − x2 ) > 0 and discordant if (x1 − x1 )(x2 − x2 ) < 0. Let Z ∼ N(0. X2 ) = P (X1 − X1 )(X2 − X2 ) > 0 − P (X1 − X1 )(X2 − X2 ) < 0 . X2 ). ∞). If X2 tend to increase with X1 we expect the probability of concordance to be high relative to the probability of discordance. X2 ) with the same bivariate distribution. by Proposition 8.max ]. var(X2 ) ∈ (0.min . Note that eZ and eσZ are comonotonic and that eZ and e−σZ are countermonotonic. Hence. 8.max ] (2) The minimum linear correlation L. L.Proposition 8. Assume further that var(X1 ). L. x2 ) and (x1 . σ 2 ) with σ > 0. Then (1) The set of possible linear correlations is a closed interval [ with 0 ∈ [ L. The following example illustrates Proposition 8. Now consider two independent random vectors (X1 .3 Let X1 ∼ Lognormal(0.1 Let (X1 . X2 ) and (X1 . (e − 1)(eσ 2 − 1) e−σ − 1 .min .max if and only if X1 and X2 are comonotonic.min = = Z −σZ )= L (e .min 0 and L. (x1 . L. X2 ) denoted by 48 .min is obtained if and only if X1 and X2 are countermonotonic. d eσZ = e−σZ .max 0 as σ ∞. x2 ). eσZ ) = (e − 1)(eσ 2 − 1) eσ − 1 . giving a high value of τ (X1 .max L (e . In particular. 1) and note that X1 X2 d = d = eZ . L. Another measure of concordance/discordance is Spearman’s rho rank correlation where one introduces a third independent copy of (X1 . Given two points in R2 .1. The Kendall’s tau rank correlation is given by τ (X1 .1. 1) and X2 ∼ Lognormal(0. See Figure 18 for a graphical illustration of these bounds as functions of σ.5 Rank correlation Rank correlations are measures of concordance for bivariate random vectors. concordance (discordance) means that the line connecting the two points have a positive (negative) slope. L. the maximum L. Hence. e Z L. and an unspeciﬁed dependence structure. Example 8.
5 1. X2 ) = 1 if and only if X1 and X2 are comonotonic. Spearman’s rho is deﬁned by S (X1 . • If X1 and X2 are independent then τ (X1 . • ∈ [−1. . T2 (X2 )) = S (X1 . • If T1 . and = τ (X1 . X2 ) Estimation of Kendall’s tau based on iid bivariate random vectors X1 . X2 ) = −1 if and only if X1 and X2 are countermonotonic. τ (X1 .5 0 0. 1]. X2 ) • If the marginal distribution functions are continuous. then τ (X1 .Correlation values 0. 1] and S (X1 . X2 ) and (X1 . (X1 . X2 ). τ (X1 . T2 are strictly increasing then S (T1 (X1 ).0 1 2 Sigma 3 4 5 Figure 18: Bounds for the linear correlation coeﬃcient. X2 ) and consider the concordance/discordance of the pairs (X1 . They also illustrate important diﬀerences between Kendall’s tau and Spearman’s rho on the one hand and the linear correlation coeﬃcient on the other hand. All values can be obtained regardless of the marginal distribution functions. Kendall’s tau and Spearman’s rho have many properties in common listed below. X2 ) = 0 and but the converse is not true in general. . . . Xn 49 . X2 ) = S (X1 . if they are continuous. τ (T1 (X1 ). X2 ) = 0.0 0. T2 (X2 )) S (X1 . X2 ). X2 ) = S (X1 . X2 ) = 3 P (X1 − X1 )(X2 − X2 ) > 0 −P (X1 − X1 )(X2 − X2 ) < 0 . X2 ) ∈ [−1.
1 )(Xk.1 )(Xk.1 − Xl.2 − Xl. u 1 provided that the limit λU ∈ [0.2 ) > 0) − P((Xk. X2 ) has upper (lower) tail dependence.1 )(Xk. Xk.is easy. Note that τ (Xk. called tail dependence. If λU > 0 (λL > 0). The coeﬃcient of lower tail dependence is deﬁned as ← ← λL (X1 . Let (X1 .2 − Xl.2 − Xl. 50 . = τ (Xk.2 ) = P((Xk. Xk. 1] exists.1 )(Xk. See Figure 19 for an illustration of tail dependence.1 − Xl. X2 ) = lim P(X2 > F2 (u)  X1 > F1 (u)).2 − Xl.2 ) < 0) τ = E(sign((Xk.2 − Xl.2 ))). = k=1 l=k+1 We will return to the rank correlations when we discuss copulas in Section 10. X2 ) be a random vector with marginal distribution functions F1 and F2 .1 . 8.2 ) Hence. The coeﬃcient of upper tail dependence of (X1 .1 − Xl.2 )).1 )(Xk. X2 ) = lim P(X2 ≤ F2 (u)  X1 ≤ F1 (u)). u 0 provided that the limit λL ∈ [0.1 − Xl. then we say that (X1 .1 − Xl. 1] exists. X2 ) is deﬁned as ← ← λU (X1 .2 )) sign((Xk.1 .6 Tail dependence Motivated by the scatter plot of joint BMW and Siemens log returns above we introduce a notion of dependence of extreme values. we obtain the following estimator of Kendall’s tau τ = n 2 n 2 −1 1≤k<l≤n −1 n−1 n sign((Xk.
25 20 15 10 5 0 −5 −10 −15 −20 −25 −25 −20 −15 −10 −5 0 5 10 15 20 25 Figure 19: Illustration of lower tail dependence. .
The one to the left has tail dependence λU = λL = 0.5 0 −5 −4 −2 0 2 4 −5 −4 0 5 −2 0 2 4 Figure 20: Illustration of two bivariate distributions with linear correlation L = 0.8 and standard normal marginal distributions.62 and the one to the right has λU = λL = 0. .
. Write XT = (XT . . BΣB T ).1 . . Let X ∼ Nd (µ.1 = Σ22 − Σ21 Σ−1 Σ12 . . Σ) if and only if it has the density fX (x) = 1 (2π)d Σ exp − (x − µ)T Σ−1 (x − µ) 2 Next we list some useful properties of the multivariate normal distribution. . Xd )T has a normal distribution if for every vector a = (a1 . Σ) is used to denote that X has a ddimensional normal distribution with mean vector µ and covariance matrix Σ. Σ22 ). If Σ is nonsingular. . 1 2 Σ= Σ11 Σ21 Σ12 Σ22 . . Σ + Σ). • Linear transformations. (2) X ∼ Nd (µ. then X + Y ∼ Nd (µ + µ. . 2 (3) A random vector X with E(X) = µ and Cov(X) = Σ.1 ). Σ). µT ). 11 11 • Quadratic forms. For B ∈ Rk×d and b ∈ Rk we have BX + b ∼ Nk (Bµ + b. If X ∼ Nd (µ.1 Multivariate elliptical distributions The multivariate normal distribution Recall the following three equivalent deﬁnitions of a multivariate normal distribution. . d The variable D is called the Mahalanobis distance. The notation X ∼ Nd (µ. . • Marginal distributions.9 9. Then X1 ∼ Nk (µ1 . where µ2. ad )T the random variable aT X has a normal distribution. . Σ11 ) and X2 ∼ Nd−k (µ2 . 53 . 2 1 X2 = (Xk+1 . Σ) if and only if its characteristic function is given by 1 φX (t) = E(exp{itT X}) = exp itT µ − tT Σt . . . • Conditional distributions. (1) A random vector X = (X1 . . If Σ is nonsingular (Σ > 0).1 = µ2 + Σ21 Σ−1 (x1 − µ1 ) and Σ22. Σ) are independent. • Convolutions. then X2 X1 = x1 ∼ Nd−k (µ2. then D 2 = (X − µ)T Σ−1 (X − µ) ∼ χ2 . XT ) with X1 = (X1 . Σ) and Y ∼ Nd (µ. satisﬁes X ∼ Nd (µ. Σ22. Xd )T and write µT = (µT . . Xk )T . . such that Σ > 0.
9. Then φaT X (s) = E(exp{isaT X}) = E(exp{i(sa)T X}) = ψ(s2 aT a). .2 Normal mixtures d Note that if X ∼ Nd (µ. where Z ∼ Nk (0. 0.1 An Rd valued random vector X is said to have a multivariate d normal variance mixture distribution if X = µ + W AZ. Xd )T has a spherical distribution if there exists a function ψ of a scalar variable such that the characteristic function of X satisﬁes φX (t) = ψ(tT t) = ψ(t2 + · · · + t2 ). d (2) For every vector a ∈ Rd . . I). Proposition 9. W ≥ 0 is a positive random variable independent of Z. Σ). . where Z ∼ Nk (0. where Σ = AAT . and A ∈ Rd×k and µ ∈ Rd are a matrix and a vector of constants respectively. Note that a X1 = tT X with tT = ( a . ad . We use the notation X ∼ td (ν. d Example 9.2 A random vector X = (X1 . 1 d (3) X has the stochastic representation X = RS. Recall that two random variables (and vectors) have the same distribution if and only if their characteristic functions are the same. Σ). . where S ∼ χ2 (Chisquare distribution ν with ν degrees of freedom). Let X ∼ Sd (ψ). then X has the multivariate tdistribution with ν degrees of freedom. If Σ is nonsingular.1 The following statements are equivalent. then we can take k = d. 54 .1 If we take W 2 = ν/S. φ a X1 (s) = E(exp{istT X}) = E(exp{i(st)T X}) = ψ(s2 tT t) = ψ(s2 a 2 ) = ψ(s2 aT a). 9. Hence. µ. . .3 Spherical distributions Many results on spherical (and elliptical) distributions can be found in the book [9]. Deﬁnition 9. 0). Note that Σ is not the covariance matrix of X. I) (I denotes the identity matrix) and A ∈ Rd×k is such that AAT = Σ. The implication (1) ⇒ (2) can be shown as follows. . (1) X has a spherical distribution. . aT X = a X1 with a 2 2 = a2 + . . then X has the representation X = µ + AZ. 1 d We write X ∼ Sd (ψ) to denote the X has a spherical distribution with characteristic function ψ(tT t). Since E(W 2 ) = ν/(ν − 2) (if ν > 2) we have Cov(X) = [ν/(ν − 2)]Σ. . w 2 Σ). Deﬁnition 9. where S is uniformly distributed on the unit sphere Sd−1 = {x ∈ Rd : x = 1} and R ≥ 0 is independent of S. Note that conditioning on W = w we have that XW = w ∼ Nd (µ.
Then X ∼ Sd (ψ) with ψ(x) = exp{−x/2}. (i) Simulate s from the distribution of S. We illustrate this in Figure 21. Suppose the spherically d distributed random vector X has the stochastic representation X = RS. ψ) ⇐⇒ A−1 (X − µ) ∼ Sd (ψ). then Cov(X) = cΣ for some c > 0. The characteristic function of an elliptically distributed random vector X can be written as φX (t) = E(exp{itT X}) = E(exp{itT (µ + AY)}) = exp{itT µ} E(exp{i(AT t)T Y}) = exp{itT µ}ψ(tT Σt). where Σ = AAT . (iii) Put x = rs. 9. This procedure can then be repeated n times to get a sample of size n from the spherical distribution of X. (ii) Simulate r from the distribution of R. A ∈ Rd×k and µ ∈ Rd .4 Elliptical distributions Deﬁnition 9.3 An Rd valued random vector X has an elliptical distribution if d X = µ + AY. Σ. ψ). We write X ∼ Ed (µ. I) and then put S = Y/ Y . Note that a convenient way to simulate a random element S that has uniform distribution on the unit sphere in Rd is to simulate a ddimensional random vector Y ∼ Nd (0. then we have the following relation between elliptical and spherical distributions: X ∼ Ed (µ. The characteristic function of the standard normal distribution is 1 φX (t) = exp itT 0 − tT It = exp{−tT t/2} = ψ(tT t). If A ∈ Rd×d is nonsingular with AAT = Σ. Σ is called the dispersion matrix and ψ is called the characteristic generator of the elliptical distribution. then d X − µ = µ − X. Σ. Simply apply the following algorithm. If E(Xk ) < ∞. Note that elliptically distributed random vectors are radially symmetric: if X ∼ Ed (µ. Σ. then E(X) = µ.2 Let X ∼ Nd (0. I). If E(Xk ) < ∞. Here µ is called the location parameter.Example 9. d The stochastic representation in (4) is very useful for simulating from spherical distributions. When d = 1 the elliptical distributions coincide with the 1dimensional symmetric distributions. AAT = Σ. . where Y ∼ Sk (ψ). ψ). 55 A ∈ Rd×d . 2 d d From the stochastic representation X = RS we conclude that X 2 = R2 and since the sum of squares of d independent standard normal random variables has a chisquared distribution with d degrees of freedom we have R 2 ∼ χ2 .
where AAT = Σ 56 . Finally we put xk = rk sk for k = 1. Σ. (iii) Simulate r from the distribution of R and form rAs. R ≥ 0 a random variable independent of S. . . The stochastic representation is useful when simulating from elliptical distributions. . A ∈ Rd×k a matrix with AAT = Σ and µ ∈ Rd . . This procedure can then be repeated n times to get a sample of size n from the elliptical distribution of X. Suppose X has the stochastic representation d X = µ + RAS. X ∼ Ed (µ.5 0 −0. rn from the distribution of R. . . ψ) if and only if d there exist S. (ii) Multiply s by the matrix A to get As. (iv) Put x = µ + rAs. (i) Simulate s from the distribution of S.5 1 3 2 1 0 −1 −2 −3 −4 −2 0 2 4 Figure 21: Simulation from a spherical distribution using the stochastic representation. First we simulate independently n times from the uniform distribution on the unit sphere to obtain s1 . . . We illustrate this in Figure 22. Example 9. and A such that X = µ+RAS with S uniformly distributed on the unit sphere. R. we simulate r1 . It follows immediately from the deﬁnition that elliptical distributed random vectors have the following stochastic representation.3 An Rd valued normally distributed random vector with mean µ d and covariance matrix Σ has the representation X = µ + AZ. n (below).5 −1 −0. sn (above). Then.0. .5 0 0. . . .
right). sn (above. If X is a normal variance mixture.0. Hence. . ψ) with ψ(x) = exp{−x/2}. ν)distribution (see e. . left). .5 0 0. Many of them coincide with the properties of the multivariate normal distribution. .g. An example d d given earlier is the multivariate tν distribution where W 2 = ν/S with S ∼ χ2 . Let X ∼ Ed (µ. . 9.5 −1 −0. Example 9.5 0 0 −0. Problem 10. n (below.5 1 −1 −0. then d we see from the deﬁnition that it has representation X = µ + V W AS. I). Asn (above. . . .5 0 0. . .5 1 3 2 1 0 −1 −2 −3 −2 0 2 13 12 11 10 9 8 7 8 10 12 Figure 22: Simulation from a spherical distribution using the stochastic representation.5 Properties of elliptical distributions Next we list some useful properties of elliptical distributions. Since Z has a spherical distribution it has representation d d Z = RS. . . it has an elliptical distribution with R = V W . we simulate r1 . n (below. ψ). . Hence. where R2 ∼ χ2 .5 0. ν d This means that for the multivariate tdistribution R2 /d = V 2 W 2 /d has an F (d. . Σ. right). First we simulate independently n times from the uniform distribution on the unit sphere to obtain s1 . X has the representation X = µ + RAS and k we conclude that X ∼ Ed (µ. This is perhaps the most important argument for using the elliptical distributions in applications. we multiply each sample by A to get the points As1 .5 −0. . . Next. . Σ. Finally we add µ to obtain xk = µ+ rk Ask for k = 1. I) has a spherical distribution with d stochastic representation Z = V S. with d V 2 ∼ χ2 . . left).4 A random vector Z ∼ Nd (0. rn from the distribution of R to obtain rk Ask for k = 1. Then. . and Z ∼ Nk (0. . Chapter 1 in [11]). 57 .
Σ. • Convolutions. ψ) are independent. I. Σ22 . µT ). if X = (X1 . I).• Linear transformations. then d D 2 = (X − µ)T Σ−1 (X − µ) = R2 . . .1 = µ2 + Σ21 Σ−1 (x1 − µ1 ) and Σ22. . . X2 )T ∼ E2 (µ. BΣB T .6 Elliptical distributions and risk management Suppose we have the possibility today at time 0 to invest in d risky assets by taking long or short positions. Then the linear correlation coeﬃcient L (X1 . Assuming that Σ is nonsingular. ψ) is not normal. For B ∈ Rk×d and b ∈ Rk we have BX + b ∼ Ek (Bµ + b. ψ). with ψ(x) = ψ(x)ψ(x). then L (X1 . ψ). Xk )T . Σ11 . • Quadratic forms.1 . Xd ) be the future asset returns at time T . 9. 58 . X2 ) = 0 and X1 and X2 are independent. . Note that the dispersion matrix Σ must (in general) be the same for X and Y. ψ). X2 X1 = x1 ∼ Ed−k (µ2. where µ2. . • Marginal distributions. . We consider a ﬁxed holding period of length T and let X = (X1 . 11 11 Typically ψ is a diﬀerent characteristic generator than the original ψ. . Suppose that X has a (ﬁnite) nonsingular covariance matrix Σ and mean vector E(X). I.1 . • Conditional distributions. . X2 )T ∼ N2 (µ. 1 2 X2 = (Xk+1 . In fact. assume X = (X1 . The variable D is called the Mahalanobis distance. Then X1 ∼ Ek (µ1 . wi = 1} i=1 and Pr = {Z = wT X : w ∈ Wr }. Write XT = (XT .1 = Σ22 − Σ21 Σ−1 Σ12 . ψ) and X2 ∼ Ed−k (µ2 . Σ22. 1 2 Σ= Σ11 Σ21 Σ12 Σ22 . X2 ) = 0 (if L (X1 . IMPORTANT: Contrary to the multivariate normal distribution it is not true that the components of a spherically distributed random vector X ∼ Ed (0. ψ). . the components of X are independent only if X has a multivariate normal distribution. then X + Y ∼ Ed (µ + µ. . Σ. ψ) are independent. If X ∼ Ed (µ. XT ) with X1 = (X1 . Σ. For instance. However. X2 ) exists) but X1 and X2 are dependent. If Σ is nonsingular. ψ) and Y ∼ Ed (µ. . Let P be the set of all linear portfolios w T X for w ∈ Rd and let d Wr = {w ∈ Rd : wT µ = r. Xd )T and write µT = (µT .
e.5 Note also that (9. the variance is in general not a suitable risk measure. Hence. for any Z ∈ Pr it holds that (Z) = r + AT w (Y1 ) and var(Z) = AT w 2 var(Y1 ) = wT Σwvar(Y1 ). How can we ﬁnd the least risky portfolio with expected portfolio return r? The meanvariance (Markowitz) approach to portfolio optimization solves this problem if we accept variance (or standard deviation) as our measure of risk. see e. The risk measure could be for instance VaR or ES. d Since X ∼ Ed (µ. 185 in [3]. As already mentioned. i. the meanvariance approach (Markowitz) to portfolio optimization makes sense in the elliptical world. w2 ∈ Rd we have T T VaRp (w1 X + w2 X) = w1 µ + w2 µ + AT w1 + AT w2 VaRp (Y1 ) T T ≤ w1 µ + w2 µ + AT w1 VaRp (Y1 ) + AT w2 VaRp (Y1 ) T T = VaRp (w1 X) + VaRp (w2 X). where risk could be e. Hence. Hence. Example 9. We would typically prefer a risk measure based on the appropriate tail of our portfolio return. ψ) we have X = µ + AY. where AAT = Σ and Y ∼ Sd (ψ). Σ. Hence. the portfolio weights of the minimumvariance portfolio is the solution to the following optimization problem: w∈Wr min wT Σw.Hence. Pr is the set of of all linear portfolios with expected return r with the d normalization constraint i=1 wi = 1. coincides with the minimumvariance portfolio for elliptically distributed return vectors X. Moreover. This optimization problem has a unique solution. The minimum variance portfolio is the portfolio that minimizes the variance var(Z) = wT Σw over all Z ∈ Pr .1 that d wT X = wT µ + wT AY = wT µ + (AT w)T Y = wT µ + AT w Y1 . It was shown in [7] that the “minimumrisk” portfolio. p. minimization with respect to and minimization with respect to the variance give the same optimal portfolio weights w: argminZ∈Pr (Z) = argminw∈Wr AT w = argminw∈Wr AT w 2 = argminZ∈Pr var(Z). Suppose is a risk measure that is translation invariant and positively homogeneous. (X + a) = (X) + a for r ∈ R and (λX) = λ (X) for λ ≥ 0. ValueatRisk is subadditive for elliptical distributions.g. it follows from Proposition 9.1) Hence. suppose that (X) depends on X only through its distribution.g.1) implies that for w1 . (9. VaR or ES. Hence. In fact we can take any risk measure with the following properties and replace the variance in the Markowitz approach by . 59 .
respectively? We have d L∆ = −wT X = wT X since X ∼ E2 (0. Hence T VaR0. VaRp (wT X) = T VaR0.99 (L∆ )/ VaR0. X2 )T be a vector of log returns. ψ). This yields T wA ΣwA T wB ΣwB .99 (wB X) B (1 + ρ)/2.99 (L∆ ) A = = T VaR0. Your total capital is 1 which you want to invest fully in the two stocks giving you a linearized portfolio loss L∆ = L∆ (w1 . How can we compute the ratio VaR0. Suppose that X ∼ E2 (µ. 60 . Hence. by (9.Example 9.99 (L∆ ). Two investment strategies are available (long positions): (A) invest your money in equal shares in the two stocks: wA1 = wA2 = 1/2. where L∆ and L∆ B A B A are linearized losses for investment strategies A and B. T T We have wA ΣwA = σ 2 (1 + ρ)/2 and wB ΣwB = σ 2 . Σ= σ2 σ2ρ 2 σ ρ σ2 . Σ. 1. ψ) (elliptically distributed) with linear correlation coeﬃcient ρ and that µ = 0. of two stocks with stock prices S1 = S2 = 1 today. where Z ∼ E1 (0.99 (wA X) VaR0. for the time period todayuntiltomorrow.6 Let X = (X1 . Σ. Moreover.99 (wA X) = T VaR0. ψ).99 (L∆ ) VaR0.1) it holds that √ d wT X = wT ΣwZ. (B) invest all your money in the ﬁrst stock: wB1 = 1.99 (wB X) √ wT Σw VaRp (Z). wB2 = 0. w2 ) where w1 and w2 are portfolio weights.
This means that a copula is the distribution function P(U1 ≤ u1 .10 Copulas We will now introduce the notion of copula. 2) copulas provide an understanding of dependence beyond linear correlation. 10. b2 ) − C(b1 . (C2) h is strictly increasing if and only if h← is continuous. then h(h← (y)) = y. (b1 . The main reasons for introducing copulas are: 1) copulas are very useful for building multivariate models with nonstandard (nonGaussian) dependence structures. 1]2 with ak ≤ bk we have: C(b1 . Ud ) with the property that for all k it holds that P(Uk ≤ u) = u for u ∈ [0. (A3) For all (a1 . . a2 ) ≥ 0. . a2 ) + F (a1 . a2 ) + C(a1 . Notice that (A3) says that probabilities are always nonnegative. a2 ). (C1) h is continuous if and only if h← is strictly increasing. 61 . 1) = u1 and C(1. . a2 ) ≥ 0. b2 ) ∈ R2 with ak ≤ bk we have: F (b1 . 1]. 1] d with standard uniform marginal distributions. then h← (h(x)) = x.1 Basic properties Deﬁnition 10. Let h : R → R be nondecreasing. (b1 . Hence a copula is a function C that satisﬁes (B1) C(u1 . . (B2) C(u1 . b2 ) − C(a1 . x2 ) = F2 (x2 ). Then the following properties hold for the generalized inverse h← of h. b2 ) − F (b1 . A bivariate distribution function F with marginal distribution functions is a function F that satisﬁes (A1) F (x1 . Ud ≤ ud ) of a random vector (U1 . . a2 ). (A2) F (x1 . The reader seeking more information about copulas is recommended to consult the book [13]. . (C3) If h is continuous. u2 ) is nondecreasing in each argument uk . (B3) For all (a1 .1 A ddimensional copula is a distribution function on [0. ∞) = F1 (x1 ) and F (∞. x2 ) is nondecreasing in each argument xk . u2 ) = u2 . b2 ) − F (a1 . (C4) If h is strictly increasing. . . b2 ) ∈ [0.
. . . 62 . Fd (xd )). . . ← ← C(u1 . C is a copula. ∞]. . u2 ) = u2 . Then there exists a copula C such that for all x1 . xd ) = C(F1 (x1 ). . then F deﬁned by (10. Deﬁnition 10. . . (D2) Probability transform. . . Consider the function C given by ← ← C(u1 . Moreover. then we also call C the copula of X. . . . F2 .1 (Sklar’s Theorem) Let F be a joint distribution function with marginal distribution functions F1 . . then the unique copula of F is given by ← ← C(u1 . . . Let X be a bivariate random vector with distribution function F that has continuous marginal distribution functions F1 . Fd are continuous. . ud ) = F (F1 (u1 ). The following result known as Sklar’s Theorem is central to the theory of copulas. . It also explains the name “copula”: a function that “couples” the joint distribution function to its (univariate) marginal distribution functions. . xd ∈ R = [−∞. where G is continuous. . . Fd are distribution functions. . 1). . Fd . 1) (standard uniform distribution). Hence. . . . Fd . (B2) holds. If U ∼ U (0. .Recall the following important facts for a distribution function G on R. .2 Let F be a joint distribution function with continuous marginal distribution functions F1 . . Much of the usefulness of copulas is due to the fact that the copula of a random vector with continuous marginal distribution functions is invariant under strictly increasing transformations of the components of the random vector. then P(G← (U ) ≤ x) = G(x). if Fk is continuous. then C is unique.1) If F1 . . . (D1) Quantile transform. . . F (x1 . . It is clear that C is nondecreasing in each argument so (B1) holds. . Fd (ud )). Then the copula C in (10. Theorem 10. u2 ) = F (F1 (u1 ). Hence. If Y has distribution function G. .1) is a joint distribution function with marginal distribution functions F1 . then G(Y ) ∼ U (0. (10. . Since F is a bivariate distribution function (B3) holds. . Fd . . F2 (u2 )). Hence. 1) = F (F1 (u1 ). ← By (C3) above. then Fk (Fk (u)) = u. if F is a joint distribution function with continuous marginal distribution functions F1 . if C is a copula and F1 .1) is called the copula of F . Fd . . . ∞) = P(X1 ≤ F1 (u1 )) ← = P(F1 (X1 ) ≤ F1 (F1 (u1 ))) = P(F1 (X1 ) ≤ u1 ) = u1 and similarly C(1. Conversely. . If X is a random vector with distribution function F . .
and quantile function of gk (Xk ) is given by −1 −1 Fk (x) = P(gk (Xk ) ≤ x) = P(Xk ≥ gk (x)) = 1 − Fk (gk (x)). Hence. Fd (xd )). . Fk (Uk ) has distribution function Fk (the quantile transform). . . . X2 ≥ F2 (1 − u2 )) = P(F1 (X1 ) ≥ 1 − u1 . . Fk = Fk ◦ Tk is continuous. Fd ◦ Td (xd )) C(F1 (x1 ). Td (Xd )) has a unique copula C. for each k. . X2 ) be a random vector with continuous marginal distribution functions F1 . .2 Let (X1 . g2 (X2 ) ≤ F2 (u2 )) ← ← = P(X1 ≥ F1 (1 − u1 ). . . . . Example 10. . Hence. .1 Let C be a ddimensional copula and let U1 . Fd be univariate continuous distribution functions. 63 .1 Suppose (X1 . for ← each k. . 1]d . u2 ) = P(g1 (X1 ) ≤ F1 (u1 ). Example 10. Let F1 . Td (Xd ) ≤ xd ) ← ← P(X1 ≤ T1 (x1 ). Xd ) has continuous marginal distribution functions and copula C and let T1 . for any x ∈ R . 1 − u2 ) + u1 + u2 − 1. .e. Fd (xd )) = = = = P(T1 (X1 ) ≤ x1 . . Xd ≤ Td (xd )) ← ← C(F1 ◦ T1 (x1 ). . . Then ← ← Fk (x) = P(Tk (Xk ) ≤ x) = P(Xk ≤ Tk (x)) = Fk (Tk (x)). . Fd and. . . Fd (Ud )) has marginal distribution functions F1 .1. −1 −1 C(u1 . . 1]. . . . copula C. . . Td (Xd )) has copula C. . . 1] with joint distribution function C. .Proposition 10. ← ← Fk (p) = gk (Fk (1 − p)). . F2 and copula C Let g1 and g2 be strictly decreasing functions. Moreover. . . Determine the copula C of (g1 (X1 ). Fk is continuous. . . Proof. F2 (X2 ) ≥ 1 − u2 ) = 1 − (1 − u1 ) − (1 − u2 ) + C(1 − u1 . Since. . Fk is strictly increasing (C1). . . the random vector ← ← (F1 (U1 ). . . . . . . . C(F1 (x1 ). . . by Proposition 10. Fk (R) = [0. . . . . . . Hence C = C on [0. Then. Td be strictly increasing. . . . . ← i. . Then also the random vector (T1 (X1 ). Since F1 . g2 (X2 ))? The distribution. Let Fk denote the distribution function of Xk and let Fk denote the distribution function of Tk (Xk ). Ud be random variables that are uniformly distributed on [0. for k = 1. . Fd are continuous the random d vector (T1 (X1 ). . More← over. 1 − u2 ) = C(1 − u1 . d.
(X1 .4 0. 3 2 1 −X2 −3 −2 −1 0 X1 1 2 3 X2 0 −1 −3 −3 −3 −1 0 1 2 3 −2 −1 0 −X1 1 2 3 1.0 0.8 1.0 0. .6 0. 1 − U2 ). .4 1−U2 0.3 Let X ∼ Nd (0.0 U2 0. . Φ(Xd ) ≤ ud ) = ΦR (Φ−1 (u1 ). Denote by ΦR and Φ the distribution functions of X and X1 respectively (the ddimensional standard normal distribution function and the 1dimensional standard normal Ga distribution function).2 0.6 0. (10. U2 ) has the copula C as its distribution function.8 0. (U1 .2 0.2 0. u2 ) = P(F1 (X1 ) ≤ u1 .0 0.0 0. . Samples of size 3000 from (X1 . u2 ) = P(1 − F1 (X1 ) ≤ u1 . X2 ) has standard normal marginal distributions and copula C.0 0.2 0. Then X has the socalled Gaussian copula CR given by Ga CR (u) = P(Φ(X1 ) ≤ u1 .8 1. . Example 10. . X2 ).0 1−U1 Figure 23: (U1 . C(u1 . (−X1 . −X2 ).4 U1 0. .4 0. F2 (X2 ) ≤ u2 ).2) 64 .0 0. .Notice that C(u1 . 1 − F2 (X2 ) ≤ u2 ).8 1. Φ−1 (ud )). U2 ) and (1 − U1 . See Figure 23 for an illustration with g1 (x) = g2 (x) = −x and F1 (x) = F2 (x) = Φ(x) (standard normal distribution function). R) where R is a correlation matrix.6 0.6 0.
. . 1 exp 2 1/2 −∞ 2π(1 − ρ ) −(x2 − 2ρx1 x2 + x2 ) 1 2 2(1 − ρ2 ) dx1 dx2 . 0) − d max(1/2 + 1 + · · · + 1 − d + 1.. where for some σ1 0 Σ= .2) that CR can be written as Ga CR (u1 . . Td (Xd )). 0) 1 − d/2. Note that Y has linear correlation matrix R.5 below. . See also Example 10. then by Proposition 10. For d = 2 we drop the subscript of W2 and M2 . kd =1 for all (a1 .. W = W2 and M = M2 .Then Y ∼ Nd (µ. . 1]d with ak ≤ bk where uj1 = aj and uj2 = bj for j ∈ {1. 1]d ) = max(1 + · · · + 1 − d + 1. Example 10.1 we have CY = Ga CR . Q([1/2.. 0 0 0 σd . (b1 . . . Note also that Tk (x) = µk + σk x is strictly increasing and that Y = (T1 (X1 ). . .. . i. . For d ≥ 2 we denote by Wd the Fr´chet lower bound and by Md the Fr´chet e e upper bound. . 0) d max(1/2 + 1/2 + 1 + · · · + 1 − d + 1. The following result provides us with universal bounds for copulas. . . . . . .2 (Fr´chet bounds) For every copula C we have the bounds e d max k=1 uk − d + 1. = + max(1/2 + · · · + 1/2 − d + 1. 0 σ2 . . . 65 .. .e.. σd > 0.. . 0) + 2 . u2 ) Φ−1 (u1 ) Φ−1 (u2 ) Let Y = µ + ΣX. . b1 ] × · · · × [ad . d}. . Proposition 10. . Wd is a copula (distribution function) if and only if Q is its probability distribution. . We conclude that the copula of a nondegenerate ddimensional normal distribution depends only on the linear correlation matrix. 0 ≤ C(u1 . . . ..4 Let Wd be the Fr´chet lower bound and consider the set funce tion Q given by 2 2 Q([a1 . . . ud ) ≤ min{u1 . ΣRΣ). 1). = −∞ if ρ = R12 ∈ (−1.. bd ]) = k1 =1 ··· (−1)k1 +···+kd Wd (u1k1 . 0 σ1 . ad ). However. . 0 .. For d = 2 we see Ga from (10. Hence if CY denotes the copula of Y. bd ) ∈ [0. ud }. udkd ). .
Remark 10. 10. T (X)) is the copula of (U.Hence. Note that FZ (x) = P(S(X) ≤ x) = P(X > S −1 (x)) = 1 − FX (S −1 (x)) and that FZ is continuous. Note that FY (x) = P(T (X) ≤ x) = FX (T −1 (x)) and that FY is continuous. 1 − U ). 1). U ).5 Let X be a random variable with continuous distribution function FX . X2 ) have one of the copulas W or M (as a possible copula). β : R → R and a random variable Z so that d (X1 . e For d = 2 the following example shows which random vectors have the Fr´chet bounds as their copulas. 1 − v}) = u − min{u. β(Z)). S(X)) is the copula of (U.1 the copula of (X. 0}.1 the copula of (X. 1). Let Y = T (X) for some strictly increasing function T and denote by FY the distribution function of Y . 1]d . Then there exist two monotone functions α. Let Z = S(X) for some strictly decreasing function S and denote by FZ the distribution function of Z. Hence the copula of (X. FY (Y ) ≤ v) = P(FX (X) ≤ u. e Proposition 10. Q is not a probability distribution for d ≥ 3 so Wd is not a copula for d ≥ 3. FX (T −1 (T (X))) ≤ v) = P(FX (X) ≤ min{u.2 Dependence measures Comonotonicity and countermonotonicity revisited Proposition 10. the Fr´chet upper bound Md is a copula. By Proposition 10.4 Let (X1 . The following result shows that the Fr´chet lower bound is the best possible. where U ∼ U (0. e Example 10. 66 . v}) = min{u. Hence the copula of (X. By (a modiﬁed version of) Proposition 10. there exists a copula C such that C(u) = Wd (u). FZ (Z) ≤ v) = P(FX (X) ≤ u.1 For any d ≥ 2. S(X)) is P(FX (X) ≤ u. 1 − v} = max{u + v − 1. 1 − FX (S −1 (S(X))) ≤ v) = P(FX (X) ≤ u. T (X)) is P(FX (X) ≤ u. FX (X) > 1 − v) = P(FX (X) ≤ u) − P(FX (X) ≤ min{u. where U ∼ U (0. X2 ) = (α(Z).3 For any d ≥ 3 and any u ∈ [0. v}.
67 . X2 ) and (X1 . T2 (X2 )) = τ (X1 . u2 )dC(u1 . Spearman’s rho for the random vector (X1 . X2 ) are independent copies of (X1 . where (U1 .1]2 C(u1 . X2 ) has the copula M (as a possible copula). where (X1 . Note that this implies that Kendall’s tau and Spearman’s rho do not depend on the (marginal) distributions of X1 and X2 . U2 )) − 1. X2 ) be a random vector with continuous marginal distribution functions and with copula C. X2 ) = 3 (P((X1 − X1 )(X2 − X2 ) > 0) − P((X1 − X1 )(X2 − X2 ) < 0)) . X2 ). respectively) have discontinuities. u2 )du1 du2 − 3 = 12 E(U1 U2 ) − 3. ← ⇔ X2 = T (X1 ) a.1]2 C(u1 . X2 ). then τ (T1 (X1 ). u2 ) − 1 = 4 E(C(U1 .s. if (X1 . Recall also that if F1 and F2 are continuous. X2 ) is a random vector with continuous marginal distribution functions and T1 and T2 are strictly increasing transformations on the range of X1 and X2 respectively. X2 ) = 12 [0. This is made clear in the following two results. then they are countermonotonic.3 Kendall’s tau for the random vector (X1 . so that the copula is not unique. X2 ) is an independent copy of (X1 . if it has the copula W (as a possible copula). u1 u2 dC(u1 . X2 ) is deﬁned as S (X1 .1]2 S (X1 . Deﬁnition 10. then C=W C=M ← ⇔ X2 = T (X1 ) a.. An important property of Kendall’s tau and Spearman’s rho is that they are invariant under strictly increasing transformations of the underlying random variables. The converse of this result is also true. Note that if any of F1 and F2 (the distribution functions of X1 and X2 . Then τ (X1 . where (X1 .with α increasing and β decreasing in the former case (W ) and both α and β increasing in the latter case (M ).s. u2 ) − 3 = 12 [0. T = F2 ◦ F1 increasing. then X1 and X2 are comonotonic. X2 ).. X2 ) is deﬁned as τ (X1 .5 Let (X1 . Kendall’s tau and Spearman’s rho revisited To begin with we recall the deﬁnitions of the concordance measures Kendall’s tau and Spearman’s rho. T = F2 ◦ (1 − F1 ) decreasing. then W and M are possible copulas. X2 ) = P((X1 − X1 )(X2 − X2 ) > 0) − P((X1 − X1 )(X2 − X2 ) < 0). Hence. If (X1 . The same property holds for Spearman’s rho. U2 ) has distribution function C. X2 ) =4 [0. Proposition 10.
X2 ) = lim P(X2 > F2 (u)  X1 > F1 (u)). u 0 provided that the limit exists. Then λU (X1 . u2 ) − 3 12 E(F1 (X1 )F2 (X2 )) − 3 E(F1 (X1 )F2 (X2 )) − 1/4 1/12 E(F1 (X1 )F2 (X2 )) − E(F1 (X1 )) E(F2 (X2 )) var(F1 (X1 )) var(F2 (X2 )) l (F1 (X1 ). particularly in portfolio Risk Management. then we say that (X1 . F2 (X2 )).6 Let (X1 . That the limit need not exist is shown by the following example. Proposition 10. X2 ) be a random vector with continuous marginal distribution functions and copula C. X2 ) = lim (1 − 2u + C(u.Remark 10. and λL (X1 . If λU > 0 (λL > 0). u 1 provided that the limit λU ∈ [0. 68 . u 1 provided that the limit exists.5. X2 ) = lim P(X2 ≤ F2 (u)  X1 ≤ F1 (u)). The coeﬃcient of lower tail dependence is deﬁned as ← ← λL (X1 . X2 ) = lim C(u. u 0 provided that the limit λL ∈ [0.1]2 u1 u2 dC(u1 . We recall the deﬁnition of the coeﬃcient of tail dependence. u)/u. The concept of tail dependence is important for the modeling of joint extremes. S (X1 .2 Note that by Proposition 10. if F1 and F2 denotes the distribution functions of X1 and X2 respectively. X2 ) be a random vector with marginal distribution functions F1 and F2 . The coeﬃcient of upper tail dependence of (X1 . Tail dependence revisited We now return to the dependence concept called tail dependence. u))/(1 − u). Hence Spearman’s rho is simply the linear correlation coeﬃcient of the probability transformed random variables. X2 ) has upper (lower) tail dependence. Let (X1 . 1] exists. X2 ) = = = = = 12 [0. X2 ) is deﬁned as ← ← λU (X1 . 1] exists.
for θ ≥ 1. 1 − 3/2n+1 )/(3/2n+1 ) = 2/3. by l’Hospitals rule. u lim Cθ (u. 69 . u) = 1 − 2u + C(u.8 Consider the socalled Clayton family of copulas given by Cθ (u1 . u)/(1 − u) does not exist. and hence by l’Hospitals rule u 1 lim (1 − 2u + Cθ (u. Thus for θ > 1. 1 − 2−n+1 ) and (1 − 2−n+1 . u2 ) = exp(−[(− ln u1 )θ + (− ln u2 )θ ]1/θ ).e. to the line segment between (1 − 2−n . u)/u = lim 21/θ u2 0 u 0 1/θ = 0. 1 C(u. λL = 0.e. 2 1 for θ > 0. 1 − 2−n ). 0) = 0 = C(0. Note that C(u1 . Example 10. uniformly distributed. u2 ]) for u1 . 1]2 such that for every integer n ≥ 1. i. Note also that for every n ≥ 1 (with C(u.Example 10. u)) C(1 − 2−n+1 . u) = = 1−u 1−u 1−u 1/θ 1/θ .6 Let Q be a probability measure on [0. u2 ) = u2 . λL = 2−1/θ . u2 ). C(u1 . Q assigns mass 2−n .e. u2 ) = (u−θ + u−θ − 1)−1/θ . C is a copula. Moreover. Then Cθ (u. Example 10. Cθ has upper tail dependence: λU = 2 − 21/θ . i. Similarly one shows that λU = 0. 1 − 2−n+1 )/2−n+1 = 1 and In particular limu C(1 − 3/2n+1 . u2 ∈ [0. i. u)/u = 0 u lim (2u−θ − 1)−1/θ /u 0 = = = 1 lim (− )(2u−θ − 1)−1/θ−1 (−2θ(uθ )−1/θ−1 ) u 0 θ u lim 2(2 − uθ )−1/θ−1 0 −1/θ 2 . u1 ] × [0. u))/(1 − u) = 2 − lim 21/θ u2 u 1 −1 = 2 − 21/θ . 1) = u1 and C(1. u) = (2u−θ − 1)−1/θ and hence. See Figure 24 for a graphical illustration. Deﬁne the distribution function C by C(u1 . u2 ) = Q([0. Then 1 − 2u + exp(21/θ ln u) 1 − 2u + u2 1 − 2u + Cθ (u. u lim Cθ (u.7 Consider the socalled Gumbel family of copulas given by Cθ (u1 . See Figure 24 for a graphical illustration. 1]. again by using l’Hospitals rule.
Σ. 10. . and Φ−1 denotes the inverse of the distribution function of the univariate standard normal distribution. then λU (X1 . . u2 ) = −∞ 1 exp 2 1/2 −∞ 2π(1 − ρ ) −(x2 − 2ρx1 x2 + x2 ) 2 1 2) 2(1 − ρ dx1 dx2 . In the bivariate case the copula expression can be written as Φ−1 (u1 ) Φ−1 (u2 ) Ga CR (u1 . . Φ−1 (ud )). 1) marginal distribution functions. . (ii) If (X1 . The copula of the ddimensional normal distribution with linear correlation matrix R is Ga d CR (u) = ΦR (Φ−1 (u1 ).5 but diﬀerent dependence structures. if ρ = R12 ∈ (−1.Gumbel 14 • • 12 •• • 10 • • • • • 10 • 12 14 Clayton • • • • • • • •• • • • • • • • •• • • • • • •• • • • •• •• • • • • • • •• •• ••• • • • • • • •••• •• •• •• • • • •• • • • •••••• •••••••• • • ••••• • • •• • •• • ••• •• ••• • ••••••••••• ••••• • ••• •• • • ••• • • • • • • ••• • • • • ••••••••••• •••• ••• • •• • • •• • • •• • • •••••••• • ••••••• ••••• • • •• ••••••••••••••••••• •••••••••••••••• • • • • • • • •• ••••••••••••••• • • •• • • •••••••••••••••••••••••• ••••• • •• • •• •• •••••• • • •• • •••••••••• ••• ••••• • • •• • • •••••• •• •• •• •••• • ••••••••••••••••••••••••••••••••• •• • • • • ••• ••• • •• • • ••• • •••••••••••••••••••• ••• ••• •• •• • • • • • •• • • • • ••••••••••••••• • • • ••• • •• ••••• •••• • • • •••••••••••••••••••••••••••••••••••• •• •• •• ••• • ••••••••••••••••••••••••••••• • •••••••• •• ••••• • •• • • •• • ••• • •• • ••••••••••••••• •• ••••••• • ••• • •• • • • • • ••••••••••• •• • • • •••• •• • •••••••••••••••••••••••••• • •••••••••••••••••••••••••• •••••••• •• •• • • • • •• • • • •••• • ••••••• ••••••••• • ••• • • • • •••••••••••••••••••••••• • •• • • • • •• •• •••••• • •• • ••••••••••••••••••••• •••••• • • •• •••••••••••••• •••••• • • • •••••••••• •• • • •••••• ••••••••••••• • • ••••••••••••••• •••• • •• • • ••••• ••• •••••••••••• • •••• •••••••• • • • • •••••••• •••• •• ••••• ••••••• • • ••• • •••••• ••• • • •• • •• • 0 2 4 6 X2 8 10 • •• • • •• • • • •••• • • • • • • • • ••• •• ••• •• • •• • • • • • • • • ••• • • • • • • • •• •• •• • • • •••• ••• •••• • •• •• •• ••• ••• • ••••• •• • • • • • • •• • • • •• • • •• • •• •• ••••••••• •• ••••• • •• • • ••••••••••••• ••••••• • • • •• ••• • • • • • • • • •• • • • •••• • •• ••••••••••••• • • •• ••••••••••••••••• ••••••• •••• • • ••• • • •• ••••• • • •• • • • • •• • ••••••••••••••••• •• •••• • •••• • ••• • ••• •••••••••••••••• •• •• • • • • • • ••• ••••••••••••••••••••• •• • ••• ••••• •••• • • ••••• • ••• •• •• • • • • ••••••••••••••••••••••• • • • •• •• • • • •••••••••••• ••• • • •• • • •••••••••••••••••••••••••••••••• • ••• •• ••••• •• • •••••••••••••••••••••••••••• •• • • •• •• • ••••••••••••••••••••••• •• • ••••••••••••• •••••••• • •• •• •••• •••• • ••• • ••••••••••••••••••••• •••••••••••••••••••••••••••• •• • • ••••••••••••• •• •• •••••••••••••••••••• • •• • • •• • •••••••••••••••••••••••••• •• • •• •••• ••••• • • ••••••••••••••••••••••••• • • • ••••• • ••••••••••••••• •• ••••••••• • • •• • •••••••••••••••••••••••• •••• • • • •••••••••••• • •••••••••••••••••••••• • ••••••••••••••• • •• • •• ••• • • • •• • •• •• •••••••• •••••••••••••• ••••• • • ••••••••••••• •••••• •• • • ••••••• ••• •• ••••••••••••• • • • • •• • • • •••• •••• 0 2 4 6 X1 8 10 8 Y1 6 Y2 6 • 8 • 4 4 2 0 12 14 0 2 • 12 14 Figure 24: Samples from two distributions with Gamma(3. . . Proposition 10. (X1.7 (i) If (X1 . linear correlation 0. 70 . . Then the copula C given ← ← by C(u) = F (F1 (u1 ). . d where ΦR denotes the joint distribution function of the ddimensional standard normal distribution function with linear correlation matrix R. . then λU (X1 . X2 ) has continuous marginal distribution functions and a Gaussian copula. Y 1) has a Gumbel copula and (X2. X2 ) = λL (X1 . Fd . 1).4 Let X ∼ Ed (µ. . X2 ) = 0. Fd (ud )) is said to be an elliptical copula. X2 ) is a normally distributed random vector. but rather the copula of an elliptical distribution.3 Elliptical copulas Deﬁnition 10. Y 2) has a Clayton copula. ψ) with distribution function F and with continuous marginal distribution functions F1 . X2 ) = λL (X1 . . Copulas of the above form are called Gaussian copulas. Note that an elliptical copula is not the distribution function of an elliptical distribution. X2 ) = 0. .
the coeﬃcient of upper (lower) tail dependence tends to zero as the number of degrees of freedom tends to inﬁnity for R12 < 1. Here tν √ √ d (equal) margins of tν. . where AAT = R.R ν ν where Rij = Σij / Σii Σjj for i. . X2 ) = 2tν+1 √ ν + 1 1 − R12 / 1 + R12 . Proposition 10. Σ. X2 ) ∼ E2 (µ. If ν ≤ 2.R .If X has the stochastic representation √ ν X =d µ + √ AZ. From this it is also seen that the coeﬃcient of upper tail dependence is increasing in R12 and decreasing in ν.3) can be written as t Cν. Furthermore.R (u) = td (t−1 (u1 ). if ρ = R12 ∈ (−1. then Cov(X) is not deﬁned.4). I) (column vector) ν are independent.8 (i) If (X1 .9 (i) If (X1 . . then relation (10. 1). u2 ) t−1 (u) ν = −∞ x2 − 2ρx1 x2 + x2 1 2 1+ 1 2) 2 )1/2 ν(1 − ρ −∞2π(1 − ρ t−1 (v) ν −(ν+2)/2 dx1 dx2 . d} and where td denotes the ν.e. X2 ) has continuous marginal distribution functions and a tcopula with parameters ν and R. as one would expect. X2 ) are as in (10. X2 ) and λL (X1 . t−1 (ud )). then λU (X1 . j ∈ {1. . π (10.4) (ii) If (X1 . . the distribution function of νZ1 / S. 71 .R √ √ denotes the distribution function of νAZ/ S. ψ).3) where µ ∈ Rd (column vector). X2 ) = 2 arcsin R12 . ν. then λU (X1 .R (u1 . In this case we just interpret Σ = AAT as being the shape parameter of the distribution of X. then τ (X1 . ψ) with continuous marginal distribution functions. The following result will play an important role in parameter estimation in models with elliptical copulas. i. The copula of X given by (10. Σ. (ii) If (X1 . . S ∼ χ2 and Z ∼ Nd (0. X2 ) has continuous marginal distribution functions and the copula of E2 (µ.5) holds. S (10. In the bivariate case the copula expression can be written as t Cν. . (10. Proposition 10. X2 ) has a tdistribution with ν degrees of freedom and linear correlation matrix R. Note that R12 is simply the usual linear correlation coeﬃcient of the corresponding bivariate tν distribution if ν > 2. X2 ) = λL (X1 .5) √ where R12 = Σ12 / Σ11 Σ22 . . then X has an ddimensional tν distribution with mean µ (for ν ν > 1) and covariance matrix ν−2 AAT (for ν > 2).
Algorithm 10.10. R). The Cholesky decomposition of R is the unique lowertriangular matrix L with LLT = R. Write R = AAT for some d×d matrix A. . Zd from N(0. Cν. • Simulate d independent random variates Z1 . .4 Simulation from Gaussian and tcopulas We now address the question of random variate generation from the Gaussian Ga copula CR . . Equation (10. . For our purpose. . . . .2 • Find the Cholesky decomposition A of R: R = AAT . t • U = (U1 .R . . . . . . d.R . d. . then µ + AZ ∼ Nd (µ. . . • Set Uk = Φ(Xk ) for k = 1. 72 . . • Set X = √ ν √ Y. and if Z1 . ν • Set Y = AZ. . Ud ) has distribution function CR . • Simulate a random variate S from χ2 independent of Z1 . . Zd from N(0. . . . Ga • U = (U1 . . 1). . 1).1 • Find the Cholesky decomposition A of R: R = AAT . Algorithm 10. . . . This provides an easy algorithm for random variate generation Ga from the ddimensional Gaussian copula CR . . Zd . • Set X = AZ.3) provides an easy algorithm for random variate generation t from the tcopula. . • Simulate d independent random variates Z1 . Ud ) has distribution function Cν. . it is suﬃcient to consider only strictly positive deﬁnite matrices R. Zd ∼ N(0. Furthermore Cholesky decomposition routines are implemented in most mathematical software. . One natural choice of A is the Cholesky decomposition of R. S • Set Uk = tν (Xk ) for k = 1. . As usual Φ denotes the univariate standard normal distribution function. 1) are independent.
6) Furthermore C1 = Π (Π(u1 . 1] be given by C(u1 . This gives the Clayton family Cl Cθ (u1 . Many interesting parametric families of copulas are Archimedean and the class of Archimedean copulas allows for a great variety of diﬀerent dependence structures. = (u−θ + u−θ − 1)−1/θ . u2 ) = ϕ−1 (ϕ(u1 ) + ϕ(u2 )). u2 ) = ϕ−1 (ϕ(u1 ) + ϕ(u2 )) = exp(−[(− ln u1 )θ + (− ln u2 )θ ]1/θ ). Then C is a copula if and only if ϕ is convex. so ϕ is convex. Example 10. (10. Moreover ϕ(0) = ∞.g. There are however drawbacks: elliptical copulas do not have closed form expressions and are restricted to have radial symmetry. u2 ) = (u−θ − 1) + (u−θ − 1) + 1 1 2 −1/θ .10 Let ϕ be a continuous. A further disadvantage is that multivariate extensions of Archimedean copulas in general suﬀer from lack of free parameter choice in the sense that some of the entries in the resulting rank correlation matrix are forced to be equal. strictly decreasing function from [0. 1] to [0. a stock market crash) than between big gains. where θ > 0.9 Let ϕ(t) = (− ln t)θ . elliptical copulas are derived from distribution functions for elliptical distributions using Sklar’s Theorem. Clearly ϕ(t) is continuous and ϕ(1) = 0. A consequence of this is that we need somewhat technical conditions to assert that multivariate extensions of bivariate Archimedean copulas are indeed copulas. in contrast to elliptical copulas. Furthermore.5 Archimedean copulas As we have seen. Proposition 10. ∞]. u2 )). This copula family is called the Gumbel family. Such asymmetries cannot be modeled with elliptical copulas. 1] to [0.7) 1 2 73 . 1]2 → [0. In this section we discuss an important class of copulas called Archimedean copulas. all commonly encountered Archimedean copulas have closed form expressions. We begin with a general deﬁnition of Archimedean copulas. so ϕ is a strictly decreasing function t from [0.6) we get Gu Cθ (u1 . This class of copulas is worth studying for a number of reasons. so is simulation from elliptical copulas. As shown in Example 10.10 Let ϕ(t) = t−θ − 1. In many ﬁnance and insurance applications it seems reasonable that there is a stronger dependence between big losses (e. Copulas of the form (10. The function ϕ is called a generator of the copula. ϕ (t) = −θ(− ln t)θ−1 1 . Since simulation from elliptical distributions is easy.10. ϕ (t) ≥ 0 on [0.7 this copula family has upper tail dependence. Example 10. (10. 1]. ∞] such that ϕ(0) = ∞ and ϕ(1) = 0. u2 ) = min(u1 . Let C : [0. From (10. Unlike the copulas discussed so far these copulas are not derived from multivariate distribution functions using Sklar’s Theorem. u2 ) = u1 u2 ) and limθ→∞ Cθ = M (M (u1 . where θ ≥ 1.6) are called Archimedean copulas.
θ θ As a consequence. X2 ) = 1 + 4 0 ϕ(t) dt. Kendall’s tau can be expressed as a onedimensional integral of the generator and its derivative. for θ > 0. Then ϕ(t)/ϕ (t) = (t ln t)/θ. is a copula for d ≥ 3. .8 this copula family has upper tail dependence. limθ→0 Cθ = Π and limθ→∞ Cθ = M . in order to have Kendall’s tau equal to 0.11 Let (X1 . 74 . Using Proposition 10. ∞) is completely monotonic if it is continuous and if for any t ∈ (0. Example 10. for θ ≥ 1. . ϕ (t) Example 10. Proposition 10.11 we can calculate Kendall’s tau for the Clayton family. Then Kendall’s tau of (X1 . ∞) and k = 0. X2 ) be a random vector with continuous marginal distribution functions and with an Archimedean copula C generated by ϕ. As shown in Example 10.12 Consider the Clayton family with generator ϕ(t) = t−θ − 1.5 in Figure 25 (the Gumbel case). . However for an Archimedean copula. (−1) k dk g(s) dsk s=t ≥ 0. θ+2 A natural question is under which additional conditions on ϕ we have that the most simple multivariate extension of bivariate Archimedean copulas. ∞) → [0.11 we can calculate Kendall’s tau for the Gumbel family.5 A function g : [0. 1 τ (θ) =1+4 0 4 tθ+1 − t dt = 1 + θ θ 1 1 − θ+2 2 = θ . The following results address this question and show why inverses of Laplace transforms are natural choices for generators of Archimedean copulas. Recall that Kendall’s tau for a copula C can be expressed as a double integral of C. This double integral is in most cases not straightforward to evaluate. 2.11 Consider the Gumbel family with generator ϕ(t) = (− ln t)θ . 1 τ (θ) = 1+4 0 4 t ln t dt = 1+ θ θ t2 ln t 2 1 0 1 − 0 t dt 2 4 1 = 1+ (0−1/4) = 1− .Moreover. ϕ−1 (ϕ(u1 ) + · · · + ϕ(ud )). Then ϕ(t)/ϕ (t) = (tθ+1 − t)/θ. X2 ) is given by 1 τ (X1 . 1. Using Proposition 10. we put θ = 2. Deﬁnition 10. .
15 10 5 0 5 s ≥ 0. 1] → [0. for any d ≥ 2.12 Let ϕ : [0. the function C : [0. ∞) if and only if Ψ is completely monotonic and Ψ(0) = 1. Consider a random variable X with distribution function G and a set of [0. The following result tells us where to look for generators satisfying the conditions of Proposition 10. 1]valued random variables U1 .BMWSiemens 10 10 • 0 5 t2 • • •• • • • • ••• •• •• ••• • •• • • • • ••••••••••••••••••••••• • • •••• • •••••••••••• • • •••••••••• • • ••• • • • •••••••••••••••• ••• ••• ••••••••••• • ••••• ••••••••••••• • •••• •••••••• ••••• • •••••••••••••••••••••• •• • •••••••••••••••••• • ••••• • • •••••••••••••••••••••••••• •• ••••••••• • ••••••••• ••••••••••• •• • •• • ••••• ••••••••••• •• • ••••••••••••••••••••• •• • • • ••• •••• • • •• • 15 10 5 15 10 5 • • • • • • ••• • • • • •• • •••••••• • • ••••• ••••••••••••••• •• • •••••• ••• •••••••••••••••• • •• • • • ••••••••••••••••••••• • • • •••••••• • •••••••• •••• •••••• ••• • •• ••••••••••••••••• •••• • ••••••••••••••••••••• •••• • •••••••••••••••• ••••••••••• •• •• • •• • • ••• • •••• •••••••••••••• • ••••• •• ••• • •• • • • • • 15 10 5 0 5 0 5 • • 15 10 • 10 5 0 5 10 Gaussian 10 10 • ••• ••••• •••••••• •••••••••••••••• • •• •••••• •• • ••••••••••••• • ••••••••••••••• •• • •••••••••••••••••••••• ••••• ••••••••••••• ••• • •••••••••• • ••••• ••••••••••••••••••• • ••••••••••••••••• ••••••• •••••••• •• • •••• • • •••••••••••••• • • • ••••• •• • •••••••••••••••••••• • •• • ••••••••••••• • ••••••••• ••••••• •• • • ••••• • • ••• ••• • •• • •• • 15 10 5 0 5 10 • • Gumbel • • • • • ••••• ••••• • ••••• • • •••••••••••••••••••• ••••••• ••• • ••••••••• •••••••••• ••••••••••••••••••••• •• •• •• • • • •••••••• ••••••• •• • • •••••••••••••• ••• • •••••••••••••••••••••••••• • • ••••••••••••••••• • • •••••••• ••••• • • ••• • •••••••••••• •• ••• • ••••••••••••••••••••• • •••••••••• • • ••••••• • • • ••••••••••••• • • ••• •• ••• • • • ••••••• • • • • • • 15 10 5 0 5 10 • 5 0 15 10 5 • Figure 25: The upper left plot shows BMWSiemens daily log returns from 1989 to 1996. Lemma 10. . ∞] be continuous and strictly decreasing such that ϕ(0) = ∞ and ϕ(1) = 0. Proposition 10. Ud which are conditionally independent given 75 . 1]d → [0.13 Let G be a distribution function on [0. ∞) is the Laplace transform of a distribution function G on [0. The other plots show samples from bivariate distributions with t4 margins and Kendall’s tau 0.5. . 1] given by C(u) = ϕ−1 (ϕ(u1 ) + · · · + ϕ(ud )) is a copula if and only if ϕ−1 is completely monotonic on [0.1 A function Ψ : [0. . Then. ∞) with G(0) = 0 and Laplace transform Ψ(s) = 0 ∞ e−sx dG(x). ∞). Proposition 10. ∞) → [0.12. .
Ud ≤ ud  X = x)dG(x) P(U1 ≤ u1  X = x) . To verify that this is correct. . . . . . the following algorithm shows how to simulate from an Archimedean copula C of the form C(u) = ϕ−1 (ϕ(u1 ) + · · · + ϕ(ud )). . Ψ(− ln(Vd )/X)) has distribution function C. .13.3 • Simulate a variate X with distribution function G such that the Laplace transform Ψ of G is the inverse of the generator ϕ of the required copula C. so that the distribution function of U is an Archimedean copula with generator Ψ−1 . . Vd . . . . • U = (Ψ(− ln(V1 )/X). .X with conditional distribution function given by FUk X=x (u) = exp(−xΨ−1 (u)) for u ∈ [0. Then P(U1 ≤ u1 . • Simulate independent standard uniform variates V 1 . . P(U1 ≤ u1 . . where ϕ−1 is the Laplace transform of a distribution function G on [0. Proof. . Ud ≤ ud ) = 0 ∞ P(U1 ≤ u1 . 1]. . . 10. = P(Vk ≤ exp{−xΨ−1 (uk )}) 76 . P(Ud ≤ ud  X = x)dG(x) exp{−x(Ψ−1 (u1 ) + · · · + Ψ−1 (ud ))}dG(x) = 0 ∞ ∞ 0 = = Ψ(Ψ−1 (u1 ) + · · · + Ψ−1 (ud )). notice that with Uk = Ψ(− ln(Vk )/X) we have P(Uk ≤ uk  X = x) = P(Ψ(− ln(Vk )/x) ≤ uk ) = P(− ln Vk ≥ xΨ−1 (uk )) = exp{−xΨ−1 (uk )}. .6 Simulation from Gumbel and Clayton copulas As seen from Proposition 10. Algorithm 10. . . . ∞) with G(0) = 0. . Ud ≤ ud ) = Ψ(Ψ−1 (u1 ) + · · · + Ψ−1 (ud )).
the generator ϕ(t) = t−θ − 1. 1). the following algorithm can be used for simulation from a Clayton copula. • Simulate independent standard uniform variates V 1 . then (F (Z1 ). exp(−Z2 )) has distribution function Cθ . let 1 d X be Gammadistributed with density function fX (x) = x1/θ−1 e−x /Γ(1/θ). one can simulate from bivariate Gumbel copulas using a diﬀerent method. . Z2 ) = (V1 S θ . Algorithm 10. where Gu Cθ (u1 . V is standard uniformly distributed and S has density h(s) = (1 − 1/θ + (1/θ)s) exp(−s). 1/θ 1/θ 77 .e. Gu • U = (exp(−Z1 ). . . However. However. Algorithm 10. 1). Take θ ≥ 1 and let F (x) = 1 − F (x) = exp(−x1/θ ) for x ≥ 0. then U = (Ψ(− ln(V1 )/X). . This leads to the following algorithm for simulation from bivariate Gumbel copulas. (1 − V1 )S θ ). • Set S = 1{V2 ≤1/θ} W1 + 1{V2 >1/θ} W2 . • Set (Z1 . θ > 0. V2 ∼ U (0. . in this case X is a random variable with a nonstandard distribution which is not available in most statistical software. u2 ) = exp(−[(− ln u1 )θ + (− ln u2 )θ ]1/θ ). .4 • Simulate a variate X ∼ Gamma(1/θ. Let X ∼ Gamma(1/θ.As we have seen. Then X has Laplace transform E(e −sX )= 0 ∞ e−sx 1 x1/θ−1 e−x dx = (s + 1)−1/θ = ϕ−1 (s). Vd . Γ(1/θ) Hence. 1). . W2 . • If Ψ(s) = (s + 1)−1/θ . i.5 • Simulate independent random variates V1 . 1). If (Z1 . F (Z2 )) has distribution Gu function Cθ . Ψ(− ln(Vd )/X)) has Cl distribution function Cθ . Z2 ) = (V S θ . • Simulate independent random variates W1 . (1 − V )S θ ) where V and S are independent. This approach can also be used for simulation from Gumbel copulas. where Wk ∼ Gamma(k. generates the Clayton Cl copula Cθ (u) = (u−θ + · · · + u−θ − d + 1)−1/θ . .
. θ = 2( τ )ij /(1 − ( τ )ij ). . . . R12 = 0. Y 1) has a Gaussian copula and (X2. then an Archimedean copula with this kind of asymmetry might be a reasonable choice. . It is also useful to check whether the data shows signs of tail dependence. t. Xk. 1 d 4 2 0 2 Rij = sin(π( τ )ij /2). Rij = sin(π( τ )ij /2). R t −1 CR (u) = td (t−1 (u1 ). where ( τ )ij = τ (Xk. Clearly. . Xn } where Xk ∼ F for some distribution function F with continuous marginal distribution functions F1 .and Clayton copulas there are simple relations between Kendall’s tau and certain copula parameters: Ga CR (u) = Φd (Φ−1 (u1 ).8 but diﬀerent dependence structures.5. ﬁrst one has to decide which parametric family to ﬁt. .7 Fitting copulas to data We now turn to the problem of ﬁtting a parametric copula to data. . . . We will consider the problem of estimating the parameter vector θ of a copula Cθ given an iid sample {X1 . (X1. Gumbel. . Cl Cθ (u) = (u−θ + · · · + u−θ − d + 1)−1/θ . . Fd and hence a unique representation F (x) = Cθ (F1 (x1 ). 78 . Figure 26. . . one might go for an elliptical copula. Y 2) has a t2 copula. and depending on whether there are such signs choose a copula with this property.i . This is perhaps best done by graphical means. .g. if one sees clear signs of asymmetry in the dependence structure as illustrated in Figure 24. We have seen that for Gaussian. .R ν Gu Cθ (u) = exp(−[(− ln u1 )θ + · · · + (− ln ud )θ ]1/θ ). Gaussian 4 4 t • • • • • • • • •• • • • •• • • • • • ••• • •••••••• • •• •• •• •• • •••••• • •••• •• • • • •••••••••••• • • •• •••• ••• ••••••••••• • • • ••••••••• • • • ••••••••••••••••••••••• • • •• • • • •••••••••••• • • •• • • • ••• • • •• • •• •••• ••••••••••••••••• • • ••••• ••• •• •• •••••••• • • • •••••••••••••••••••••••••••• ••• • •••• •••••••••••••••• ••• • • •• •••••••• • • •• •• • • ••••••• ••••••••••••••••••••••• • • • • •••••• • •• • ••• • ••••••••••• •••••• • ••• •• ••• • • ••••••• • ••••••••••••••••••••• • • • •••••• ••••• • • ••• • •• •• • • • •••••••••••••••••••••••••• • ••• • • •••••••••••••••••••• • • ••••• ••• •••••••• •••• ••••••••••••••••• • ••••••••••••••••••••••••••• •• • ••• •••••••••••• • • • • •••• • •••••••••••••••••••••••••••• ••• • •••••••••• • • • • •••••••••••••••••••••••••• •• •••••••••••••••••• • • •• • •• • •• • •••••••••••••••• • • • • ••• •• • • ••••••••••••••••••••••• •• ••••••••••••• • • •••••••••••• ••• ••••• • • •• •• ••••• • • •••••••••••••••••••••••••• • •••• • • ••••••••••• •••• • • • • •• •••••••••• • ••••••••••••• •• •••• • • • •••••• • • ••••• • •• • • • • • •••••• • • • • •••••••• ••• •• • • • • ••• •• • • • ••• •••••••••• • • ••• •• ••••••• •• ••• • •• • • • • ••••••• •• • •• • • • •• • • • •••• • •••••••• • • • • •••• • • • •• • • • • • • •• • • • •• • • • • 4 2 0 X2 2 4 • •• • • • •• • • • •• •• • •••• ••••••• •• • •• • • •••••• •• • • • • •• •• ••••• •••• •• •• •••• •• • •• • • •• •••••••••• •• • • • • • •• ••••••••••••••••• • •• • • • • ••••••••••••••••••••••••••• • ••••• • •• • ••• ••••••• • • •• • • • • • • ••• •• •••••••••••••••••••••••• • • ••••••• ••••••••••••••• •• •• • • ••••••••••••••••••••••••• • • • •••• •••••• ••• • • • ••• • •••• ••••••••••••• ••••• • •••••••••••••••••••••••••••••• • • •••••••••••••••••••••••••••••••• •• •••••••••••••••••• •• ••••• ••• • • •••• •••••••••••• •• ••••••••••••••••• ••• ••••••••••••••••••••••••••••••• • ••• • • ••••••••••••• •• • • ••• • •••••••• •• •••• • ••• • ••• ••••••••••••••••••••••••••••• •••••••••••••••••••• • ••• •••••••••••••••••••••••••••••• • • •••••••••••••••••• • • •• •••••• •• • •• • • • •••••••••••••••••••••••••••• • • •• ••••••••••••• •• •• • • ••••••••••••••••••• • • • •• •• • ••••• •• • •••••••••••••••••••••••••••• •• •• •• •• •••••• ••• •• ••• •••••• • • •••••••••••••••••••••••••••••• • • •••••••••••••••••••• • • •• •• •••••••••••••••••••••••• • • •• ••• ••••••••• • ••• ••• • • • • •• • • ••• •••••••••••••••••••••• • • •••••• • •••••• •• • • • • • • •••• ••••••• •• • • • • • •••• •• • • • • •••••••••••• • • • • •• • • • • ••• ••••••••• ••• • ••• •• • • • • •• •••• • •••••• •••• ••• • • • • •• • • • •• ••• ••••• • • • • • • • ••• • • • • 2 Y1 Y2 4 0 2 4 4 2 0 X1 2 Figure 26: Samples from two distributions with standard normal margins. ν. Hence parameter estimates for the copulas above are obtained by simply replacing ( τ )ij by its estimate ( τ )ij presented in Section 8. Otherwise. . tν (ud )). θ = 1/(1 − ( τ )ij ).10. see e. Φ−1 (ud )). . Fd (xd )). . .j ).
where cξ. .10. .R (t−1 (Uk. then one has to replace R by a linear correlation matrix R∗ which is in some sense close to R. that Uk ∈ (0. b with respect to ξ. . This can be achieved by the socalled eigenvalue method. 1] which guarantees that the pseudosample data lies within the unit cube. . . Given a pseudosample from the tcopula. • Calculate R = ΓΛΓT which will be symmetric and positive deﬁnite but not necessarily a linear correlation matrix since its diagonal elements might diﬀer from one. After having estimated R (with R possibly modiﬁed to assure positive deﬁniteness) it remains to estimate the degrees of freedom parameter. it might happen that R. If this is the case. . .R denotes the density of a tcopula with ξ as degrees of freedom parameter. where D is a diagonal matrix with Dk. .d )) ξ ξ − ln gξ (t−1 (Uk.8 Gaussian and tcopulas For Gaussian and tcopulas of high dimensions. Uk = (F1 (Xk. . with Rij = sin(π τ ij /2). Fd (Xk. . . • Set R = D RD. U1. n. where Λ is a diagonal matrix of eigenvalues of R and Γ is an orthogonal matrix whose columns are are eigenvectors of R. the degrees of freedom parameter ν can be estimated by maximum likelihood estimation (MLE).d )).k = 1/ Rk. .k . . . . The loglikelihood function for the tcopula is given by ln L(ξ. .e. k = 1. Either Fk can be taken as a ﬁtted parametric distribution function or as a version of empirical distribution function: Fk (x) = (β) Fk (x) 1 = n+β n j=1 1{Xj.6 • Calculate the spectral decomposition R = ΓΛΓT . t−1 (Uk. i. 1)d . . ξ k=1 j=1 79 . . . Algorithm 10. . Un ) = k=1 ln cξ. • Replace the negative eigenvalues in Λ by some small value δ > 0 to obtain Λ. A ML estimate of ν is obtained by maximizing n ln L(ξ. Fd as follows. Un } of observations from the copula by componentwise transformation with the estimated marginal distribution functions F1 . .R (Uk ).1 ). R.j )). Un ) n n d = k=1 ln gξ.1 ).k ≤x} . . is not positive deﬁnite. . . . . where β ∈ (0. U1 . We construct a pseudosample {U1 . . .
. . Un ). 80 .where gξ.R denotes the joint density of a standard tdistribution with distribution function td and gξ denotes the density of a univariate standard tξ. Hence an estimate of the degrees of freedom parameter ν is obtained as the ξ ∈ (0. ∞) that maximizes the log likelihood function ln L(ξ. U1. . R. .R distribution with distribution function tξ .
1] the recovery rate of loan i. The lossgivendefault for loan number i which is the amount lost by the lender in the case of default is given by LGDi = (1 − λi )Li . The loss suﬀered by a lender or counterparty in the event of default is usually signiﬁcant and is determined largely by the details of the particular contract or obligation. In this chapter we will introduce a general framework for modeling portfolios subject to credit risk. respectively. and 81%. typical loss rates in the event of default for senior secured bonds. so only a certain fraction of the entire loan is lost. there is no way to discriminate unambiguously between ﬁrms that will default and those that won’t. For example. Bohn in [5]: Default risk is the uncertainty surrounding a ﬁrm’s ability to service its debts and obligations. The default probability of obligor i is then given by pi = P(Xi = 1). The total loss at time T due to obligors defaulting is then given by n n L= i=1 Xi LGDi = i=1 Xi (1 − λi )Li .11 Portfolio credit risk modeling What is credit/default risk? The following explanation is given by Peter Crosbie and Jeﬀrey R. Each loan has a certain loan size Li . 68%. That the loan is subject to default means that with some probability pi . ﬁrms generally pay a spread over the defaultfree rate of interest that is proportional to their default probability to compensate lenders for this uncertainty. each obligor can be in either of two states. Prior to default. At best we can only make probabilistic assessments of the likelihood of default. then we say that the ﬁrm is in default. or counterparty in a ﬁnancial agreement. If a ﬁrm (obligor) can not fulﬁll its commitments towards a lender. Credit risk also includes the risk related to events other than default such as up. In most cases the obligor is able to repay a substantial amount of the loan. subordinated bonds and zero coupon bonds are 49%. As a result. If there is a default then the lender does not lose the entire amount Li but rather a proportion 1 − λi of the loan size. 0 otherwise. obligor i will not be able to repay his debt.or down moves in credit rating. We model the state of each obligor at time T by a Bernoulli random variable Xi = 1 if obligor i is in default. We call λi ∈ [0.1 A simple model Consider a portfolio consisting of n loans (or bonds) subject to default. say one year from now. 81 . 11. default or nondefault. At some time T .
. Most commercial models in use today assume the recovery rates λi to be independent of X = (X1 . Xn ) is the vector of default indicators and the default probability is pi = P(Xi = 1). The probability that the ﬁrst k obligors. . . . Yn ). . . . . . . Let Fi denote the distribution of Yi . pk . of obligor i. . where N = i=1 Xi is Binomial(n. . if Si = 0. The most simple model we may think of is when all loan sizes are equal Li = L1 . .. . . Within each group all obligors (ﬁrms) have the same default probability. . . Then the loss is given by n L = LGD1 N . . . Xn . . . . . . . . . leading to larger sample sizes. . . Yk ≤ dk1 ) = C(F1 (d11 ). . 1. say. Xi = 0 1 if Si = 0. To this end we may introduce a state variable S = (S1 . default is then given by (the Fi s are assumed to be continuous) p1. . 1. This leaves essentially the joint distribution of default indicators X to be modeled. Yi representing for instance the value of the assets. Default occurs if Yi ≤ di1 and hence the default probability is given by pi = Fi (di1 ). Xn ) and independent of each other. 1) = C(p1 . . . Given that we know the size Li of each loan we need to model the multivariate random vector (X1 . . 82 . . . . . 1). . m + 1. . . . or asset returns.e.k = P(Y1 ≤ d11 . j = 0. We let Xi denote the default indicator of obligor i. . . Typically we have a number of thresholds dij . . λ1 . The vector X = (X1 . Sn ). . n. . . . all recovery rates are deterministic and equal λi = λ1 and all default indicators Xi are iid with default probability p.An important issue in quantitative credit risk management is to understand the distribution of the random variable L. i = 1. . . .2 Latent variable models Since it is practically impossible to obtain historical observations of the default indicator Xi for a given obligor i (it is rather unusual that the ﬁrm has defaulted many times before) it is a good idea to divide all obligors into m homogeneous groups. Often the state variables S = (S1 . We suppose that the state is an integer in the set {0. m} with Si = 0 indicating that obligor i is in the default state. . λn ) in order to derive the loss distribution of L. i. . where Si represents the state of the obligor i. 11.. di(j+1) ]. p)distributed. . The state of Si is then given through Yi by Si = j if Yi ∈ (dij . Sn ) are modeled using a vector of socalled latent variables Y = (Y1 . . . . . with di0 = −∞ and di(m+1) = ∞. Fk (dk1 ). The other states may be thought of as the obligor being in diﬀerent rating classes. Estimation of default probabilities can then be based upon how many obligors that have defaulted within each group. . Below we will study some more sophisticated models for the default indicators X.
the joint default probability will depend heavily on the choice of copula C. f (Z))distributed. One clearly sees that zero correlation is far from independence if the dependence structure is nonGaussian. pi = p = 0. 1}n we then have P(X = x  Z) = i=1 fi (Z)xi (1 − fi (Z))1−xi . ∞). . xn ) ∈ {0. If all the functions fi are equal. and functions λi : Rm → (0. . . .e. n P(Xi = 0  Z) = 1 − fi (Z).15.1 Consider a loan portfolio with n = 100 obligors where the credit risk is modeled using a latent variable model with copula C. . . . As the marginal default probabilities Fi (di1 ) are small. 11. . For x = (x1 . . i. . Suppose further that the individual default probability of each obligor is equal to p = 0. un ) = C(uπ(1) . . . The unconditional distribution is then given by n P(X = x) = E(P(X = x  Z)) = E i=1 fi (Z)xi (1 − fi (Z))1−xi . Xn ) follows a Poisson mixture model if there is a random vector Z = (Z1 . i. Let N denote the number of defaults and let ρτ (Yi . that C(u1 . . X is a vector of independent Bernoulli random variables with P(Xi = 1  Z) = fi (Z). . conditional on Z. . fi = f . . . i ∈ {1.3 Mixture models The random vector X = (X1 . . . Xn ) follows a Bernoulli mixture model if there is a random vector Z = (Z1 . and functions fi : Rm → [0. . m < n. i ∈ {1. n} such that conditional on Z.15. . Example 11. We assume that τ = 0 and we simulate the number of defaults 105 times and illustrate the distribution of the number of defaults in a histogram when (a) C is a Gaussian copula and (b) C is a t4 copula. In this case we have λi (Z)xi −λi (Z) P(Xi = xi  Z) = e . . . . . . X is a vector of independent Po(λi (Z))distributed random variables. the number of m defaults N = i=1 Xi is Bin(n. Suppose that C is an exchangeable copula. . . Zm ). uπ(n) ). Zm ). . Yj ) = τ . then. . . The random vector X = (X1 . . .e. The histograms are shown in Figure 27. i = j denote Kendall’s tau between any two latent variables (which are assumed to have continuous distribution functions). . m < n. . xi ! 83 xi ∈ N. . . . 1]. .where C denotes the copula of Y. where π is an arbitrary permutation of {1. . n} such that conditional on Z. . n}.
84 .Histogram of ga0defaults Frequency 0 2000 4000 6000 8000 10000 5 10 15 ga0defaults 20 25 30 Histogram of t0defaults Frequency 0 0 1000 2000 3000 4000 10 20 30 t0defaults 40 50 60 Figure 27: Histograms of the number of defaults: (a) Gaussian copula (upper) and (b) t4 copula (lower).
. xi ! i=1 n n The unconditional distribution is then given by λi (Z)xi −λi (Z) P(X = x) = E(P(X = x  Z)) = E e . .9 Compute E(N ). Put Xi = I[1. Suppose that X = (X1 . N is Poisson(λ)n distributed with λ(Z) = i=1 λi (Z). 0.2 A bank has a loan portfolio of 100 loans. .1. k 85 and P(Z = 0. Suppose that conditional on Z. Xn ) follows a Bernoulli mixture model with fi (Z) = 1 − e−λi (Z) . . .01). where Z is uniformly distributed on (0. The total number of defaults is N = X1 + · · · + X100 . . . Compute E(N ) and P(N = k) for k ∈ {0. . Then X = (X1 . . Xn ) follows a Poisson mixture model with factors Z. Let Xk be the default indicator for loan k such that Xk = 1 in case of default and 0 otherwise.∞) (Xi ).99100−k . Hence. where P(Z = 0. 100}. (a) Suppose that X1 . . xi ! i=1 The use of Poisson mixture models for modeling defaults can be motivated as follows. Solution (a): We have N ∼ Binomial(100.01 = 1 and P(N = k) = 100 0. xn ) ∈ Nn we then have P(X = x  Z) = λi (Z)xi −λi (Z) e . . n . . the default indicators are independent and identically distributed with P(X1 = 1  Z) = Z.11) = 0. (c) Consider the risk factor Z which reﬂects the state of the economy. Example 11. . 1). . .01) = 0. If the Poisson parameters λi (Z) are small then N = i=1 Xi is approximately equal to the number of defaulting obligors and conditional on Z. X100 are independent and identically distributed with P(X1 = 1) = 0. . . .01k 0. . the default indicators are independent and identically distributed with P(X1 = 1  Z) = Z 9 .For x = (x1 . E(N ) = 100 · 0. Compute E(N ). (b) Consider the risk factor Z which reﬂects the state of the economy. Suppose that conditional on Z. .01.
etc. Xn = 0  Z)) = E(f (Z)k (1 − f (Z))n−k ). .1 = 10. . .1) = 0. the unconditional probability that only the ﬁrst k obligors defaults is given by P(X1 = 1. Xn = 0) = E(P(X1 = 1. . Z). . = 100(0. . . the unconditional probability that the ﬁrst k obligors defaults is given by P(X1 = 1. . Xk = 1. . . f (Z)). . Xj ) = E(Xi Xj ) − E(Xi ) E(Xj ) = E(E(Xi Xj  Z)) − E(E(Xi  Z)) E(E(Xj  Z)) = E(f (Z)2 ) − E(f (Z))2 = var(f (Z)).11 · 0. .e. This means that all marginal default probabilities are equal and the number of defaults N satisﬁes N  Z ∼ Binomial(n. 11.4 Onefactor Bernoulli mixture models In this section we will consider the Bernoulli mixture model where Z is univariate. = E(nf (Z)(1 − f (Z))) + var(nf (Z)) = n E(f (Z)(1 − f (Z))) + n2 var(f (Z)). . . Xk+1 = 0. Hence. Moreover. 86 . Xk+1 = 0. Hence. we only have one factor and all the functions fi are equal. Z = Z. Xk = 1. E(N ) = E(E(N  Z)) = E(100Z 9 ) = 100 E(Z 9 ) = 100 · 0. E(N ) = E(E(N  Z)) = E(100Z) = 100 E(Z) Solution (c): We have N  Z ∼ Binomial(100. . We have N = E(N  Z) + N − E(N  Z) and E(N ) = E(E(N  Z)) = n E(f (Z)) = np1 . . . var(N ) = E(var(N  Z)) + var(E(N  Z)) n k ∞ −∞ f (z)k (1 − f (z))n−k G(dz). Xn = 0) = ∞ −∞ f (z)k (1 − f (z))n−k G(dz) and the number of defaulting obligors N has unconditional distribution P(N = k) = Notice also that Cov(Xi . Given G. Xk+1 = 0. .9 + 0. fi = f . Z 9 ). .9 + 1. we need to specify the distribution function G of Z. To determine the unconditional default probabilities.1 = 2. i.Solution (b): We have N  Z ∼ Binomial(100. . . . Xk = 1.01 · 0. number of defaults. . .
as n → ∞. n. . . . . .Φ−1 (p)] ( This gives √ f (Z) = P(Xi = 1  Z) = P( Z + √ Z Φ−1 (p) =Φ √ +√ . W1 . . have the representation Xi = 1 √ √ if and only if Z + 1 − Wi ≤ di1 .5 Probit normal mixture models In several portfolio credit risk models (see the section about the KMV model below) the default indicators Xi .999) + √ Φ−1 (p) − p . . where ∈ [0. (In fact it hold that N/n → f (Z) a. for every ε > 0. 2 ε nε2 P Hence. 1− 1 − Wi ≤ Φ−1 (p)  Z) √ Z+ 1 − Wi ).999 and using the approximation N/n ≈ f (Z) motivated above. 87 . 1] and Z.) 11. 1− 1− where c1 is the fraction of the exposure lost in case of default and c2 is a constant for maturity adjustments. Assuming equal individual default probabilities p = P(Xi = 1) we have di1 = Φ−1 (p) and hence Xi = I(−∞.Notice that by Markov’s inequality P(N/n − f (Z) > ε  Z) ≤ Hence. Setting q = 0. The asset return correlation coeﬃcient is assigned a value that depends on the asset type and also the size and default probability of the borrowers. P(N/n − f (Z) > ε) = E(P(N/n − f (Z) > ε  Z)) ≤ E(f (Z)(1 − f (Z))) nε2 f (Z)(1 − f (Z)) var(N/n  Z) = . Wn are iid and standard normally distributed.s. N/n → f (Z) as n → ∞ which justiﬁes the approximation N/n ≈ f (Z) for n large. 1− 1− This leads to VaRq (f (Z)) = Φ √ √ 1− Φ−1 (q) + √ 1 Φ−1 (p) . i = 1. . we arrive at the “Basel formula” for capital requirement as a fraction of the total exposure for a homogeneous portfolio with individual default probabilities p: √ 1 Capital requirement = c1 c2 Φ √ Φ−1 (0.
99).1. β(a. z ∈ (0. a+b P(Xi = Xj = 1) = E(Z 2 ) = a(a + 1) . This probability function is illustrated in Figure 28.9). but the diﬀerent choices of (a. then we know very little about the model. Γ(a + b + 1) Γ(a)Γ(b) a+b a(a + 1) E(Z 2 ) = . For example. E(N ) = E(E(N  Z)) = n E(E(X1  Z)) = n E(Z) = n a . β(a. p = 0. (1. It has density g(z) = where 1 1 z a−1 (1 − z)b−1 . The expected number of defaults is easily computed. b) Γ(a + 1)Γ(b) Γ(a + b) a = = .3 Notice that if we only specify the individual default probability p = P(Xi = 1). then the parameters a and b can be determined from the relations P(Xi = 1) = E(Z) = a . b + n − k) . a+b If we have estimated the default probabilities P(Xi = 1) and P(Xi = Xj = 1).01 = a/(a + b) for (a. 1). 88 . a. b) 1 E(Z) = z a (1 − z)b−1 dz = β(a. b) 0 β(a. b) k which is called the betabinomial distribution. (0. using that Γ(z + 1) = zΓ(z).11. b) leads to quite diﬀerent models. (a + b)(a + b + 1) We immediately get that the number of defaults N has distribution P(N = k) = = = n k 1 0 z k (1 − z)n−k g(z)dz 1 n 1 z a+k−1 (1 − z)n−k+b−1 dz k β(a. b) β(a.99). b) = 0 z a−1 (1 − z)b−1 dz = Γ(a)Γ(b) Γ(a + b) and hence. b) and f (z) = z. 0. 1 β(a + 1. This is shown in the table below which considers a portfolio with n = 1000 obligors.01. i = j. b) = (0. b > 0. (a + b)(a + b + 1) Example 11.6 Beta mixture models For the Beta mixing distribution we assume that Z ∼ Beta(a. b) 0 n β(a + k. 9.
01 0.9) (0. Xj ) 0.09 0.05 0.01 0.99[9](nZ) 45 [67] 155 [299] 371 [908] Notice also how accurate the approximation N ≈ nZ is! 89 .14 0. a = 2.04 0. b = 20 (right).0.07 20 40 60 80 100 number of defaults number of defaults Figure 28: The probability function for the number of defaults in a Beta mixture model with n = 100 obligors and a = 0.0.1. b = 10 (left).02 0.03 0.10 0.06 0.99[9] (N ) 47 [70] 155 [300] 371 [908] VaR0.01.01 corr(Xi .9.06 prob 0 20 40 60 80 100 0. (a.12 0.5.00 0.08 prob 0.02 0.00 0 0.04 0.5 VaR0.99) p 0.01 0. b) (1.01 0.99) (0.
σA.i − (T − t) + σA.i (T ) = VA.i (T − t).i (t) + (µA.i is the drift. . Hence.1 The KMV model A popular commercial model for credit risk is the socalled KMV model provided by Moody’s KMV (www. . σA. The ﬁrm defaults if at time T the value of the assets are less than the value of the debt. . It is an example of a latent variable model where the state variables S = (S1 .i /2 − µA. It is assumed that the future asset value is modeled by a geometric Brownian motion VA. .1) where µA.i Wi (T ) − Wi (t) 2 (12. . the default indicator Xi is given by Xi = I(−∞. That is.i − σA. In principle the default probability can then be computed as P(VA.moodyskmv. The Merton model It is assumed that the balance sheet of each ﬁrm consist of assets and liabilities.i (t) + (σA. T − t) and hence 2 that ln VA.i T − t Wi (T ) − Wi (t) √ T −t Xi = I(−∞.Ki ) (VA. the value of the debt by Ki and the value of the equity of the ﬁrm at time T by VE.i /2)(T − t) and 2 variance σA.i (T ) is normal with mean ln VA.− DDi ) (Yi ) The quantity DDi is called the distancetodefault. 1) and with 2 ln Ki − ln VA.i )(T − t) √ − DDi = .i (T ). The liabilities are divided into debt and equity.com). Interesting comparisons of these models are given in [6] and [10].i (T ). 0 ≤ t ≤ T ) a Brownian motion. The latent variables Y = (Y1 . in the general setup of a two state latent variable model we have Yi ∼ N(0.12 Popular portfolio credit risk models In this chapter we will present the commercial models currently used by practitioners such as the KMV model and CreditRisk+ . 1) and default thresholds di1 = − DDi . The value of the assets of the ith ﬁrm at time T is denoted by VA.i the volatility and (Wi (t). . 90 . In particular this means that Wi (T ) − Wi (t) ∼ N(0.i (T )) = I(−∞. Sn ) can only have two states (m = 1).Ki ) (VA.i µA.i (T )).i (T ) < Ki ) = P(Yi < − DDi ). . Yn ) are related to the value of the assets of each ﬁrm in the following way. . Writing Yi = we get Yi ∼ N(0. 12.i (t) exp 2 σA.
σA.i . the value of the ﬁrms equity can be observed by looking at the market stock prices. σA.i (t) = C(VA. can then be thought of as the price of a call option with the value of assets as underlying and strike price Ki .i (t)Φ(e1 ) − Ki e−r(T −t) Φ(e2 ). This gives VE.i (t). at time T we have the relation VE. This can be viewed as a call option on the ﬁrm’s assets with a strike price equal to the debt.i the relation VE. where g is some function.i (t) can not be observed directly. σA.i (T ) = max(VA. VE. Using observed/estimated values of VE. The expected default frequency To ﬁnd the default probability corresponding to the distancetodefault DD i KMV do not actually use the probability P(Yi < − DDi ). Φ is the distribution function of the standard normal and r is the risk free interest rate (investors use e.i .i by σE. σA.i (t).i .i which enables computation of the distancetodefault DDi .i (t).i (t).i /2)(T − t) √ σA. KMV therefore takes the following viewpoint: the equity holders have the right but not the obligation. where C(VA. However.i .i (t) = C(VA. r). Treasury bill as a proxy for the riskfree rate. the interest rate on a threemonth U. Under some simplifying assumptions the price of such an option can be computed using the BlackScholes option pricing formula.i (t) − ln Ki + (r + σA.i = g(VA. That is.i .g. A problem here is that the value of the ﬁrms assets VA. e1 = 2 ln VA.i (t).i (t). Instead they use 91 .i and the volatility σA. The value of equity at time t.i and µA.Computing the distancetodefault To compute the distancetodefault we need to ﬁnd VA. r) = VA.i (t) and σA. σA. That is.i T − t √ e2 = e1 − σA. σA. 0). r) σE. KMV also introduces a relation between the volatility σE. r) is inverted to obtain VA. r).i (t) and σE. since shortterm governmentissued securities have virtually zero risk of default). at time T .i T − t. the debt holders own the ﬁrm until the debt is paid oﬀ in full by the equity holders.i of VE.i .i = g(VA.i of VA.S. to pay oﬀ the holders of the other liabilities and take over the remaining assets of the ﬁrm.i (t).i (T ) − Ki .
j .i T − t m k=1 then Y = (Y1 . σA.i (t) + 2 If we let Yi = m j=1 σA.i (T ) < Ki that company i defaults is equivalent to m j=1 2 σA. σA.i (T ) = VA. . m) be m independent standard Brownian motions.i m 2 σA.historical data to search for all companies which at some stage in their history had approximately the same distancetodefault.i σA.j (Wj (T ) − Wj (t)) < ln Ki − ln VA.i.i. The event VA.i. The asset value and asset volatility of the ﬁrm is estimated from the market value and volatility of equity and the book of liabilities.j gives the magnitude of which asset i is inﬂuenced by the jth Brownian motion.j (Wj (T ) − Wj (t)) √ .i dependent. In the terminology of KMV this estimated default probability pi is called Expected Default Frequency (EDF).i. To summarize: In order to compute the probability of default with the KMV model the following steps are required: (i) Estimate asset value and volatility.i − (T − t) + σA. (ii) Calculate the distancetodefault.j. The evolution (12. . . To compute joint default probabilities and the distribution of the total credit loss it is natural to introduce dependence between the default indicators by making the asset value processes VA. We only considered each ﬁrm separately. = j=1 2 σA.k σA.i.i (t) exp where m 2 σA. Let (Wj (t) : 0 ≤ t ≤ T. Yn ) ∼ Nn (0.i µA. The multivariate KMV model In the Merton model above we did not introduce any dependence between the value of the assets for diﬀerent ﬁrms. Here. The following methodology is used by KMV.i. . . Then the observed default frequency is converted into an actual probability. .i (T − t). .j Wj (T ) − Wj (t) 2 j=1 .k σA. j = 1. (iii) Calculate the default probability using the empirical distribution relating distancetodefault to a default probability.i − µA. Σ) with Σij = σA. σA.1) of asset i is then replaced by VA.j 92 . .
. Ga where CΣ is the copula of a multivariate normal distribution with covariance matrix Σ. σA. . A way around these problems is to use a factor model where the asset value. 1 . Yk < − DDk ) Ga = CΣ (Φ(− DD1 ). As in the univariate case KMV do not use that default probability resulting from the latent variable model but instead use the expected default frequencies EDFi . . estimating pairwise correlations will rarely give a positive deﬁnite correlation matrix if the dimension is large. . . . Un ) ∼ Nn (0. EDFk . 1). . . . 1 . Xk = 1) = P(Y1 < − DD1 . in the language of a general latent variable model the probability that the ﬁrst k ﬁrms default is given by P(X1 = 1. then the covariance matrix of the right hand side is given by AΛAT +D where Aij = aij and D is a diagonal (n × n) matrix with entries Dii = b2 . . . . .k = CΣ (EDF1 . Moreover.i (T − t) √ 2 Yi < .i (t) + A. where Z = (Z1 . . . Φ(− DDk ). . . . .i − µA. . . Estimating the correlations Estimating the correlations of the latent variables in Y is not particularly easy as the dimension n is typically very large and there is limited available historical data. . I). j=1 i = 1. In a similar way KMV use the joint default frequency Ga JDF1. . 1). . .. . . The key factors are typically macroeconomic factors such as • Global economic eﬀects • Regional economic eﬀects • Sector eﬀects • Country speciﬁc eﬀects • Industry speciﬁc eﬀects If we write k d Yi = aij Zj + bi Ui . . . as the default probability of the ﬁrst k ﬁrms. . n.and the above inequality can be written as ln Ki − ln VA. i 93 . . . .i T − t − DDi σ2 Hence. . Zk ) ∼ Nk (0.. Λ) is independent of U = (U1 . or more precisely the latent variables Y is divided into k key factors and one ﬁrm speciﬁc factor.
. . aij ≥ 0. . βj ) and we have m m λi (Z) = λi j=1 aij Zj . n. ym }. . α βj j Γ(αj ) The parameters αj . Then we have LGDi = (1 − λi )Li ≈ (1 − λi )Li L0 = v i L0 . m gY (t) = E(t ) = i=1 Y tyi P(Y = yi ) Recall the following formulas for probability generating functions. Here λi > 0 are constants. [10] and [4]. for i = 1. . βj are chosen so that αj βj = 1 and then E(Zj ) = 1 and E(λi (Z)) = λi . j=1 aij = 1. is given by n E(N ) = E(E(N  Z)) = n i=1 n i=1 E(E(Xi  Z)) m n = i=1 E(λi (Z)) = λi j=1 aij E(Zj ) = i=1 λi . . LGDi = (1 − λi )Li . . . . The risk factors Z1 . n. 1/2]). Notice that the expected number of defaults. CreditRisk+ is a commercial model for credit risk developed by Credit Suisse First Boston and is an example of a Poisson mixture model. .12. 94 . The lossgivendefault LGDi of obligor i is modeled as a constant fraction 1 − λi of the loan size Li . . The density of Zj is given by fj (z) = z αj −1 exp{−z/βj } . . . Recall the deﬁnition of the pgf for a discrete random variable Y with values in {y1 .2 CreditRisk+ – a Poisson mixture model This material presented here on the CreditRisk+ model is based on [6]. i = 1. Here λi is the (deterministic) expected recovery rate. . E(N ). The main idea here is to approximate the total loss distribution by a discrete distribution. Zm are assumed to be independent Zj ∼ Gamma(αj . . .g. For this discrete distribution it is possible to compute its probability generating function (pgf) g. one million dollars) denoted L0 . In this way every LGDi can be expressed as a ﬁxed integer multiple vi of a predeﬁned base unit of loss L0 . . n. . . . Each loss amount is then expressed as an integer multiple vi of a ﬁxed base unit of loss (e. where [x] denotes the nearest integer of x (x − [x] ∈ (−1/2. L0 i = 1.
. 95 . Hence gXi Z (t) = exp{λi (Z)(t − 1)}. λn (Z) are known so conditional on Z the default indicators are independent and Poisson(λi (Z))distributed. The pgf of the loss distribution Let us derive the pgf of the loss distribution n L= i=1 Xi v i L 0 . . . (ii) If Y ∼ Poisson(λ) then gY (t) = exp{λ(t − 1)}. . . i=1 exp{λi (Z)(t − 1)} = exp{µ(t − 1)}. . . For N we now obtain n n i = 1. . . . First we determine the conditional pgf of the number of defaults N = X1 + · · ·+ Xn given Z = (Z1 . . . . dtk gXY =y (t)f (y)dy.(i) If Y ∼ Bernoulli(p) then gY (t) = 1 + p(t − 1). . (iii) If X1 . gN Z (t) = µ= i=1 n i=1 gXi Z = λi (Z). Xn are independent random variables then n gX1 +···+Xn (t) = i=1 gXi (t). (iv) Let Y have density f and let gXY =y (t) be the pgf of XY = y. k! with g (k) (t) = dk g(t) . . Given Z the default intensities λ1 (Z). Then gX (t) = (v) If X has pgf gX (t) then P(X = k) = 1 (k) g (0). n. Zm ). .
Each of the integrals in the product can be computed as ∞ 0 1 αj −1 exp{−z/βj }dz αj exp{zµj (t − 1)}z βj ∞ 0 −1 z αj −1 exp{−z(βj − µj (t − 1)}dz 1 = αj βj Γ(αj ) −1 = u = z(βj − µj (t − 1)) Γ(αj ) αj −1 βj Γ(αj )(βj − µj (t − 1))αj 1 = (1 + βj µj (t − 1))αj 1 − δ j αj = . . Finally we obtain gN (t) = j=1 1 − δj 1 − δj t αj . . 1 − δj t = m ∞ 0 1 uαj −1 exp{−u}du Γ(αj ) where δj = βj µj /(1 + βj µj ). .. i = 1.2) Similar computations will lead us to the pgf of the loss distribution.zm ) (t)f1 (z1 ) · · · fm (zm )dz1 · · · dzm n m = 0 ∞ ··· ··· exp (t − 1) exp (t − 1) λi i=1 m j=1 j=1 n i=1 aij zj f1 (z1 ) · · · fm (zm )dz1 · · · dzm = 0 ∞ λi aij zj f1 (z1 ) · · · fm (zm )dz1 · · · dzm µj = = ∞ 0 m j=1 ··· ∞ 0 ∞ 0 exp{(t − 1)µ1 z1 }f1 (z1 )dz1 · · · exp{(t − 1)µm zm }fm (zm )dzm z α βj j Γ(αj ) 1 αj −1 exp{zµj (t − 1)} exp{−z/βj }dz.. gN (t) = 0 ∞ ··· ∞ 0 ∞ 0 ∞ 0 gN Z=(z1 .. are independent so are the variables Li Z. n. (12. . 96 . . n. . Conditional on Z the loss of obligor i is given by Li Z = vi (Xi Z) Since the variables Xi Z. . The pgf of Li Z is gLi Z (t) = E(tLi Z) = E(tvi Xi Z) = gXi Z (tvi ).Next we use (iv) to derive the unconditional distribution of the number of defaults N . i = 1.. .
j=1 k≥1 (show this!) Assume that λi = λ = 0. In order to compute the probabilities P(N = k). The loss distribution is then obtained by inverting the pgf. αj = α = 1. Example 12. The result is shown in Figure 29. k ≥ 0. One can interpret the plot as follows. To better understand the model we plot the function P(N = k) for k = 0. aij = a = 1/m. . . Similar to the previous computation we obtain m gL (t) = j=1 1 − δj 1 − δj Λj (t) αj . the pgf gL (t) uniquely determines the loss distribution. Having approximately E(N ) = 15 defaults is unlikely. .15. we either have many default or we have few defaults. In particular. dtk It can be shown that gN (0) satisﬁes the recursion formula k−1 (k) gN (0) = l=0 k − 1 (k−1−l) gN (0) l m l+1 l! αj δj .2) for the number of defaults N in the CreditRisk+ model. 100 when m = 1 and when m = 5. βj = β = 1. In this case it is most likely that we have approximately E(N ) = 15 defaults. 97 . we need to compute the derivatives gN (0) = (k) (k) dk g N (0).Hence. With only one risk factor. Recall the probability generating function (12. to which all default indicators are linked. . m = 1. 1 Λj (t) = µj n λi aij tvi i=1 with µj and δj as above. With m = 5 independent risk factors there is a diversiﬁcation eﬀect.1 Consider a portfolio that consists of n = 100 obligors. the pgf of the total loss conditional on Z is n n gLZ (t) = gL1 +···+Ln Z (t) = m n i=1 gLi Z (t) = i=1 gXi Z (tvi ) = exp j=1 Zj i=1 λi aij (tvi − 1) .
06 0.04 0.00 0 0.02 0.P(N=k) 0. .07 20 40 k 60 80 100 Figure 29: P(N = k) for k = 0. . . For m = 1 the probability P(N = k) is decreasing in k.03 0.05 0. for m = 5 the probability P(N = k) ﬁrst increasing in k and then decreasing. . 100 for m = 1 and m = 5. .01 0.
References
[1] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1999) Coherent Measures of Risk, Mathematical Finance 9, 203228. [2] Basel Committee on Banking Supervision (1988) International Convergence of Capital Measurement and Capital Standards. Available at http://www.bis.org/publ/bcbs04A.pdf. [3] Campbell, J.Y., Lo, A.W. and MacKinlay, A. (1997) The Econometrics of Financial Markets, Princeton University Press, Princeton. [4] Credit Suisse First Boston (1997) CreditRisk+ : a Credit Management Framework. Available at http://www.csfb.com/institutional/research/assets/creditrisk.pdf [5] Crosbie, P. and Bohn, J. (2001) Modelling Default Risk. K.M.V. Corporation. Available at
http://www.moodyskmv.com/research/whitepaper/Modeling Default Risk.pdf.
[6] Crouhy, M., Galai, D. and Mark, R. (2000) A comparative analysis of current credit risk models, Journal of Banking & Finance 24, 59–117. [7] Embrechts, P., McNeil, A. and Straumann, D. (2002) Correlation and dependence in risk management: properties and pitfalls. In: Risk management: value at risk and beyond, edited by Dempster M, published by Cambridge University Press, Cambridge. [8] Embrechts, P., Kl¨ppelberg, C. and Mikosch, T. (1997) Modelling Extremal u Events for Insurance and Finance, Springer Verlag, Berlin. [9] Fang, K.T., Kotz, S. and Ng, K.W. (1987) Symmetric Multivariate and Related Distributions, Chapman & Hall, London. [10] Gordy, M.B. (2000) A comparative anatomy of credit risk models, Journal of Banking & Finance 24, 119149. [11] Gut, A. (1995) An Intermediate Course in Probability, Springer Verlag, New York. [12] McNeil, A., Frey, R., Embrechts, P. (2005) Quantitative Risk Management: Concepts, Techniques, and Tools, Princton University Press. [13] Nelsen, R.B. (1999) An Introduction to Copulas, Springer Verlag, New York.
99
A
A.1
A few probability facts
Convergence concepts
Let X and Y be random variables (or vectors) with distribution functions FX and FY . We say that X = Y almost surely (a.s.) if P({ω ∈ Ω : X(ω) = d Y (ω)}) = 1. We say that they are equal in distribution, written X = Y , if d FX = FY . Notice that X = Y a.s. implies X = Y . However, taking X to be standard normally distributed and Y = −X we see that the converse is false. Let X, X1 , X2 , . . . be a sequence of random variables. We say that Xn converges to X almost surely, Xn → X a.s., if P({ω ∈ Ω : Xn (ω) → X(ω)}) = 1.
P We say that Xn converges to X in probability, Xn → X, if for all ε > 0 it holds that
P(Xn − X > ε) → 0.
d We say that Xn converges to X in distribution, Xn → X, if for all continuity points x of FX it holds that
FXn (x) → FX (x). The following implications between the convergence concepts hold: Xn → X a.s. ⇒
P Xn → X
⇒
d Xn → X.
A.2
Limit theorems and inequalities
Let X1 , X2 , . . . be iid random variables with ﬁnite mean E(X1 ) = µ, and let Sn = X1 + · · · + Xn . The (strong) law of large numbers says that Sn /n → µ a.s. as n → ∞. If furthermore var(X1 ) = σ 2 ∈ (0, ∞), then the central limit theorem says that √ d (Sn − nµ)/(σ n) → Z as n → ∞, where the random variable Z has a standard normal distribution. For a nonnegative random variable V with E(V r ) < ∞ Markov’s inequality says that E(V r ) P(V > ε) ≤ εr for every ε > 0.
For a random variable X with ﬁnite variance var(X) this leads to P(X − E(X) > ε) ≤ var(X) E[(X − E(X))2 ] = ε2 ε2 for every ε > 0.
This inequality is called Chebyshev’s inequality. 100
B
Conditional expectations
Suppose that one holds a portfolio or ﬁnancial contract and let X be the payoﬀ (or loss) one year from today. Let Z be a random variable which represents some information about the state of the economy or relevant interest rate during the next year. We think of Z as a future (and hence unknown) economic scenario which takes values in a set of all possible scenarios. The conditional expectation of X given Z, E(X  Z), represents the best guess of the payoﬀ X given the scenario Z. Notice that Z is unknown today and that E(X  Z) is a function of the future random scenario Z and hence a random variable. If we knew Z then we would also know g(Z) for any given function g. Therefore the property g(Z) E(X  Z) = E(g(Z)X  Z) seems natural (whenever the expectations exist ﬁnitely). If there were only a ﬁnite set of possible scenarios zk , i.e. values that Z may take, then it is clear that the expected payoﬀ E(X) may be computed by computing the expected payoﬀ for each scenario zk and then obtain E(X) as the weighted sum of these values with the weights P(Z = zk ). Therefore the property E(X) = E(E(X  Z)) seems natural.
B.1
Deﬁnition and properties
If X is a random variable with E(X 2 ) < ∞, then the conditional expectation E(X  Z) is most naturally deﬁned geometrically as an orthogonal projection of X onto a subspace. Let L2 be the space of random variables X with E(X 2 ) < ∞. Let Z be a random variable and let L2 (Z) be the space of random variables Y = g(Z) for some function g such that E(Y 2 ) < ∞. Notice that the expected value E(X) is the number µ that minimizes the expression E((X − µ)2 ). The conditional expectation E(X  Z) can be deﬁned similarly. Deﬁnition For X ∈ L2 , the conditional expectation E(X  Z) is the random variable Y ∈ L2 (Z) that minimizes E((X − Y )2 ). We say that X, Y ∈ L2 are orthogonal if E(XY ) = 0. Then E(X  Z) is the orthogonal projection of X onto L2 (Z), i.e. the point in the subspace L2 (Z) that is closest to X. Moreover, X − E(X  Z) is orthogonal to all Y ∈ L2 (Z), i.e. E(Y (X − E(X  Z))) = 0 for all Y ∈ L2 (Z). Equivalently, by linearity of the ordinary expectation, E(Y E(X  Z)) = E(Y X) for all Y ∈ L2 (Z). This relation implies the following three properties: 101 (B.1)
Hence. Equivalently. then E((X − Y )2 ) = E((X − E(X  Z) − W )2 ) = E((X − E(X  Z))2 ) − 2 E(W (X − E(X  Z))) + E(W 2 ) = E((X − E(X  Z))2 ) + E(W 2 ). If W ∈ L2 (Z). With Y replaced by Y and X replaced by Y X. This can be shown as follows: (i) Choosing Y = 1 in (B.1) yields E(E(X  Z)) = E(X).2) Hence. E(W (X − E(X  Z))) = 0 If Y ∈ L2 (Z) and W = Y − E(X  Z).1) says that E(Y Y X) = E(Y Y E(X  Z)). E(Y [Y E(X  Z)−E(Y X  Z)]) = 0 for all Y ∈ L2 (Z). Hence. which gives E((Y E(X  Z) − E(Y X  Z))2 ) = 0. 102 .2 An expression in terms the density of (X.1) says that E(Y Y X) = E(Y E(Y X  Z)). (iii) Starting with the righthand side we obtain E(var(X  Z)) + var(E(X  Z)) = E E(X 2  Z) − E(X  Z)2 + E(E(X  Z)2 ) − E(E(X  Z))2 = E(E(X 2  Z)) − E(E(X  Z))2 = E(X 2 ) − E(X)2 = var(X). (ii) With Y replaced by Y Y (with Y ∈ L2 (Z)). then var(X) = E(var(X  Z)) + var(E(X  Z)). then E(E(X  Z)) = E(X). (B. In that case the statement in the deﬁnition above follows from properties (i) and (ii). (ii) If Y ∈ L2 (Z). Z) It is common in introductory texts to assume that X and Z has a joint density and derive the conditional expectation E(X  Z) in terms of this joint density.(i) If X ∈ L2 . This is seen from the following argument. then W E(X  Z) = E(W X  Z) and hence E(W E(X  Z)) = E(E(W X  Z)) = E(W X). i. for X ∈ L2 and Y ∈ L2 (Z). in particular for Y = Y E(X  Z) − E(Y X  Z). for all W ∈ L2 (Z). (B. Hence. then Y E(X  Z) = E(Y X  Z). E((X − Y )2 ) is minimized when W = 0. This implies that Y E(X  Z) = E(Y X  Z). As already mentioned there are other ways to introduce the conditional expectation E(X  Z) so that the properties (i) and (ii) hold. (iii) If X ∈ L2 and we set var(X  Z) = E(X 2  Z) − E(X  Z)2 . when Y = E(X  Z).. we have shown the properties (i)(iii) above. E(Y Y E(X  Z)) = E(Y E(Y X  Z)) for all Y ∈ L2 (Z).e. B. (B.
y) → x. y = 0.n→∞ xn − xm . these concepts are meaningful also in more general spaces. xn − xm = 0. 0= Hence. For every x0 ∈ H there exists a unique element y0 ∈ M such that x0 − y0 H ≤ x0 − y for all y ∈ M .Suppose that the random vector (X. and there exists a function (x. z + y. A nonempty set H is called a (real) Hilbert space if H is a linear vector space. Such spaces are called Hilbert spaces. However. y ∈ H it holds that  x. z)dx = xf (x. (ii) x + y. y for all x. z). y ∈ H and λ ∈ R. f (x. (iii) λx. i. or equivalently h(z) = xf (x. From (B. z)dx dz. If x. then there exists x ∈ H such that limn→∞ xn − x. z)dx. y from H × H to R with the properties: (i) x. y = y. y) → x. z)dx − h(z) f (x. and x0 − y0 ⊥ y for all y ∈ M . y  ≤ xH yH .3 Orthogonality and projections in Hilbert spaces Orthogonality and orthogonal projections onto a subspace in the Euclidean space Rd is well known from linear algebra. (iv) x. z for all x. z)dx g(z)(x − h(z))f (x. y ∈ H. z = x. Write h(Z) for the conditional expectation E(X  Z). Z)dx . z ∈ H.2) we know that E(g(Z)(X − h(Z))) = 0 for all g(Z) ∈ L2 (Z). Z)dx B. The element y0 is called the orthogonal projection of x0 onto the subspace M . The function (x. x ≥ 0 and x. The projection theorem Let M be a closed linear subspace of a Hilbert space H. y. Z) has a density f (x. x for all x. y is called an inner product and xH = x. z)dx/ f (x. xn − x = 0.e. The canonical example of a Hilbert space is R3 and our intuition for orthogonality and projections in R3 works ﬁne in general Hilbert spaces. y = λ x. E(X  Z) = xf (x. (v) If {xn } ⊂ H and limm. z)dx − h(z)f (x. y ∈ H and x. x 1/2 is the norm of x ∈ H. Hence. z)dxdz = g(z) (x − h(z))f (x. For all x. then x and y are said to be orthogonal and we write x ⊥ y. so that elements in H may be added and multiplied by real numbers. x = 0 if and only if x = 0. 0= xf (x. Let L2 be the Hilbert space of random variables X with E(X 2 ) < ∞ equipped 103 .
The closure is obtained by including elements X ∈ L2 which satisfy limn→∞ E((gn (Z) − X)2 ) = 0 for some sequence {gn } of continuous functions such that E(gn (Z)2 ) < ∞. Y ) → E(XY ). Let Z be a random variable and consider the set of random variables Y = g(Z) for continuous functions g such that E(g(Z)2 ) < ∞. We denote the closure of this set by L2 (Z) and note that L2 (Z) is a closed subspace of L2 . 104 .with the inner product (X.