You are on page 1of 93

Portfolio Theory and Risk Management

Contents

1 Two assets 4
1.1 Expected return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Variance as a risk measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Semi–variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Portfolios consisting of two assets . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Feasible set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 Minimum variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9 Adding a risk–free security . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 Indifference curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 Many assets 30
2.1 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Risk and return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Three risky securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 Minimum variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Minimum variance line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.8 Derivation of CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9 Security Market Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Utility functions 59
3.1 Basic notions and axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Utility maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Relation to mean variance analysis . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Utility functions and indifference curves . . . . . . . . . . . . . . . . . . . . 70

4 Value at Risk 72
4.1 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Measuring downside risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Examples of computing VaR . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4 VaR in the Black–Scholes model . . . . . . . . . . . . . . . . . . . . . . . . 84

i
Introduction
These lecture notes cover the material of the module Portfolio Theory and Risk Manage-
ment, one of the three components of the certificate stage of on-line MSc in Mathematical
Finance offered by the University of York.
They are accompanied by eight lectures in the form of presentations with recorded voice
explanations.
The module is divided into four parts, each corresponding to some exercises and course-
work assignments as follows:
Chapter 1, Lectures 1,2,3 – Exercises 1, Coursework Assignment 1
Chapter 2, Lectures 4,5 – Exercises 2, Coursework Assignment 2
Chapter 3, Lectures 6,7 – Exercises 3, Coursework Assignment 3
Chapter 4, Lecture 8 – Exercises 4, Coursework Assignment 4
The notes contain links to some Excel files which should be placed in Excel subfolder.
The following textbooks are recommended as auxiliary sources:
[CK] M. J. Capinski, E.Kopp, Portfolio Theory and Risk Management, to appear in
Cambridge University Press
[CZ] M.Capinski, T.Zastawniak, Mathematics for Finance, 2ed. Springer 2010.

3
1
Two assets
In this chapter we first analyse various ways of introducing the two fundamental concepts
of finance: return and risk. In brief, return reflects the efficiency of an investment, risk is
concerned with uncertainty. The balance between these two is at the heart of portfolio theory.

1.1 Expected return


We are concerned with just two time instants: the present t = 0 and the future t = 1, where
1 may stand for any unit of time. Suppose we make a single period investment in some stock
with the current price S (0) known, and the future price S (1) unknown, hence assumed to be
represented by a random variable

S (1) : Ω → [0, +∞),

where Ω is the sample space of some probability space (Ω, F, P) .


When Ω is finite, Ω = {ω1 , . . . , ωN }, we shall adopt the notation

S (1, ωi ) = S (1)(ωi ) for i = 1, . . . , N,

for the possible values of S (1). In this setting it is natural to equip Ω with the σ-field F = 2Ω of
all its subsets. To define a probability measure P : F → [0, 1] it is sufficient to give its values
on single element sets, P({ωi }) = pi , by choosing pi ∈ [0, 1] such that i=1 pi = 1. We can then
PN
compute the expected price at the end of the period
N
X
E(S (1)) = S (1, ωi )pi ,
i=1

and the variance of the price


N
X
Var(S (1)) = (S (1, ωi ) − E(S (1)))2 pi .
i=1

4
Example 1.1
Assume that S (0) = 100 and

with probability 12 ,
(
120
S (1) =
90 with probability 12 .

Then E(S (1)) = 12 120 + 12 90 = 105 and Var(S (1)) = (120 − 105)2 12 + (90 − 105)2 21 =
152 . Observe
√ also that the standard deviation, which is the square root of the variance,
is equal to Var(S (1)) = 15. To open a file with this example click here: Excel.

Wen S (1) has a continuous distribution with a density function f : R → R, then


Z ∞
E(S (1)) = x f (x)dx,
−∞

and Z ∞
Var(S (1)) = (x − E(S (1)))2 f (x)dx.
−∞

Example 1.2
Assume that S (1) = S (0) exp (m + sZ) , where Z is a random variable with standard nor-
mal distribution N(0, 1). This means that S (1) has lognormal distribution. The density
function of S (1) is equal to
2
1 (ln S (0) −m) x

f (x) = √ e− 2s2 for x > 0,


xs 2π
and 0 for x ≤ 0. We can compute the expected price as
Z ∞
E(S (1)) = x f (x)dx
0
2
1 − (ln S (0)2−m)
Z ∞ x

= √ e 2s dx
0 s 2π
Z ∞ !
sy+m 1 1 x
2
− y2
= S (0)e √ e dy (taking y = ln −m )
−∞ 2π s S (0)
Z ∞
2
m+ s2 1 (y−s)2
= S (0)e √ e− 2 dy
−∞ 2π
s2
= S (0)em+ 2 .

5
We may allow any probability space. However, we must make sure that negative values of
S (1) are excluded since these make no sense from the point of view of economics. This means
that the distribution of S (1) has to be supported on [0, +∞) (meaning that P(S (1) ≥ 0) = 1).
The return on the investment S is a random variable K : Ω → R, defined as
S (1) − S (0)
K= .
S (0)
By the linearity of mathematical expectation
E(S (1)) − S (0)
E(K) = .
S (0)
We introduce the convention of using the Greek letter µ for expectations of various random
returns
µ = E(K),
with various subscripts indicating the context, if necessary.
The relationships between the prices and returns can be written as
S (1) = S (0)(1 + K),
E(S (1)) = S (0)(1 + µ),
which illustrates the possibility of reversing the approach: given the returns we can find the
prices.
The requirement that S (1) is nonnegative impies that we must have K ≥ −1. This in par-
ticular excludes the possibility of considering K with Gaussian (normal) distribution.
At time t = 1 a dividend may be paid. In practice, after the dividend is paid, the stock
price drops by this amount, which is logical. Thus we have to distinguish between the price
that includes the dividend; more precisely, between the right to receive that price (the cum
dividend price) and the price after the dividend is paid (the ex dividend price). We assume
that S (1) denotes the latter, hence the definition of the return has to be modified to account for
dividends:
S (1) + Div(1) − S (0)
K= .
S (0)
A bond is a special security that pays a certain sum of money, known in advance, at ma-
turity; this sum is the same in each state. The return on a bond is not random if the bond is
held to maturity. Consider a bond paying a unit of home currency at time t = 1, B(1) = 1,
purchased for B(0) < 1. The return
1 − B(0)
r=
B(0)
defines a risk-free interest rate (provided that the length of the period is one year; otherwise
some technical adjustment is necessary). The bond price can be expressed by means of r
1
B(0) =
1+r

6
giving the present value of a unit at time 1.

1.2 Variance as a risk measure


The concept of risk in finance is captured in many ways. The basic and most widely used one
is concerned with risk as uncertainty of the unknown future value of some quantity in question
(here we are concerned with return). This uncertainty is understood as the scatter around some
reference point. A natural candidate for the reference value is the mathematical expectation
(though some other numbers can be also considered). The extent of scatter is conveniently
measured by the variance. This notion takes care of two aspects of risk:

(i). The distances between possible values and the expectation,

(ii). The probabilities of attaining these values.

Definition 1.3
By (the measure of) risk we mean the variance of the return

Var(K) = E(K − µ)2 = E(K 2 ) − µ2

or the standard deviation denoted by

σK =
p
Var(K).

The variance of the return can be computed from the variance of S (1),
!
S (1) − S (0)
Var(K) = Var
S (0)
1
= Var (S (1) − S (0))
S (0)2
1
= Var (S (1)) .
S (0)2
We introduce the convention of using the Greek letter σ for standard deviations of various
random returns
σ = Var(K),
p

with various subscripts indicating the context, if necessary.


Standard deviation alone does not fully capture the risk of an investment. We illustrate this
with a simple example.

7
Example 1.4
Consider three assets with today’s prices S i (0) = 100 for i = 1, 2, 3 and time 1 prices
with following distributions

with probability 12 ,
(
120
S 1 (1) =
90 with probability 12 ,
with probability 12 ,
(
140
S 2 (1) =
90 with probability 12 ,
with probability 12 ,
(
130
S 3 (1) =
100 with probability 12 .

We can see that

σ1 = Var(K1 ) = 15,
p

σ2 = Var(K2 ) = 25,
p

σ3 = Var(K2 ) = 15.
p

Here σ2 > σ1 and σ3 = σ1 , both the second and third assets are preferable to the
first: both have higher expected return than the first (0.15 against 0.05), while the worst
outcomes are the same for the first two assets. We shall return to this in the next section.
(see Excel file: Excel)

When considering the risk of an investment we should take into account both the expected
return and its standard deviation. Given the choice between two securities a rational investor
will, if possible, choose that with the higher expected return and lower standard deviation, that
is, lower risk. This motivates the following definition.
Definition 1.5
We say that a security with expected return µ1 and standard deviation σ1 dominates
another security with expected return µ2 and standard deviation σ2 whenever

µ1 ≥ µ2 and σ1 ≤ σ2 .

The meaning of the word “dominates” is that we assume the investors to be risk averse.
One can imagine an investor whose personal goal is just the excitement. This person will not
pay any attention to return and will prefer higher risk. However, it is not our intention to cover
such individuals by our theory.
The playground for portfolio theory will be the (σ, µ)−plane, in fact the right half-plane
since the standard deviation is non-negative. Each security is represented by a dot on this
plane. This means that we are making a simplification by assuming that the expectation and
variance are all that matters when investment decisions are made.

8
Figure 1.1 Efficient subset.

We assume that the dominating securities are preferred, which geometrically (geograph-
ically) means that the north-west moves are preferable. This ordering is only partial, since
looking at Figure 1.1 we see for instance that the pairs (σ1 , µ1 ) and (σ2 , µ2 ) are not compara-
ble.
Given a set A of securities, we consider the subset of all maximal elements with respect
to the dominance relation and call it the efficient subset. If the set A is finite, finding the
efficient subsets reduces to eliminating the dominated securities. Figure 1.1 shows the set of
five securities with efficient subset consisting of just three, numbered 1, 2, and 4.

1.3 Semi–variance
Consider the three assets described in Example 1.4. Although σ1 = σ3 , the third asset carries
no ‘downside risk’, since neither outcome for S 3 (1) involves a loss for the investor. Similarly,
although σ2 > σ1 , the downside risk for the second asset is the same as that for the first (a 50%
chance of incurring a loss of 10), but the return for the expected return for the second asset is
0.15, making it the more attractive investment even though, as measured by variance, it is more
risky. Since investors regard risk as concerned with failure (i.e. downside risk), the following
modification of variance is sometimes used. It is called semi-variance and is computed by the
formula that takes into account only the unvafourable outcomes, where the return is below the
expected value
E(min{0, K − µ})2 . (1.1)
The square root of semi-variance is denoted by semi-σ. However, this notion still does not
agree fully with the intuition.

9
Example 1.6

Assume that Ω = {ω1 , ω2 }, P({ω1 }) = P({ω2 }) = 1


2
and

K(ω1 ) = 10%,
K(ω2 ) = 20%.

Consider a modification K 0 with

K 0 (ω1 ) = 10%,
K 0 (ω2 ) = 30%.

Then K 0 is definitely better than K but the semi-variance and the variance for K 0 are
both higher than for K (see Excel).

If variance or semi-variance are to represent risk, it is illogical that a better version should
be regarded as more risky. This defect can be rectified by replacing the expectation by some
other reference point, for instance risk free rate with the following modification of (1.1),

E(min{0, K − r})2 .
which eliminates the above unwanted feature. Instead of the risk free rate, one can also con-
sider the cost of capital, i.e. the return required by the investor.
These versions are not very popular in the financial world, the variance being the basic
measure of risk. This is explained by its simplicity, in particular differentiability, which gives
it the advantage that one can use calculus to solve minimisation problems. In addition, if the
probability distribution of a quantity is Gaussian (normal), the variance and the expectation
determine the whole distribution.
The Gaussian distributions are important due to the central limit theorem, since they natu-
rally emerge as limits of some discrete cases. In Example 1.2 we have seem a typical applica-
tion of Gaussian distributions for stocks with lognormal distribution. For such distributions it
is convenient to use the lognormal return
!
S (1)
k = ln ,
S (0)

and consider its expected return and standard deviation for portfolio analysis.
In our presentation of portfolio theory we follow the historical tradition and take variance as
the risk measure. It is however possible to develop a version of the theory for some alternative
risk measure. In most cases though such theory does not produce neat analytic formulae as is
the case for mean and variance.

10
1.4 Portfolios consisting of two assets
We begin a discussion of portfolio risk and expected return in the simplest situation of two
risky securities. We denote the prices of the securities as S 1 and S 2 . We start by a motivating
example.
Example 1.7
Let Ω = {ω1 , ω2 }, S 1 (0) = 200, S 2 (0) = 300. Assume that
1
P ({ω1 }) = P ({ω1 }) = ,
2
and that
S 1 (1, ω1 ) = 260, S 2 (1, ω1 ) = 270,
S 1 (1, ω2 ) = 180, S 2 (1, ω2 ) = 360.
The expected return and standard deviation for the two assets are

µ1 = 10%, µ2 = 5%
σ1 = 20%, σ2 = 15%.

Assume that we spend V(0) = 500, buying a single share of stock S 1 and a single share
of stock S 2 . At time 1 we will have

V(1, ω1 ) = 260 + 270 = 530,


V(1, ω2 ) = 180 + 360 = 540.

The expected return on the investment is 7% and the standard deviation is just 1%. We
can see that by diversifying the investment into two stocks we have considerably reduced
the risk.

From the example we see that the risk can be reduced by diversification. In this section we
discuss how to minimize risk when investing in two stocks.

1.5 Return
Suppose that we buy x1 shares of stock S 1 and x2 shares of stock S 2 . The initial value of this
portfolio is

V x1 ,x2 (0) = x1 S 1 (0) + x2 S 2 (0).


When we design a portfolio, usually its initial value is the starting point of our considerations
and it is given. The decision on the number of shares is secondary and follows from the

11
decision on the percentage division of our wealth. This can be expressed by means of the
weights defined by
x1 S 1 (0) x2 S 2 (0)
w1 = , w2 = . (1.2)
V x1 ,x2 (0) V x1 ,x2 (0)
If the initial wealth V(0) and the weights w1 , w2 , w1 + w2 = 1, are given, the funds allocated to
a particular stock are w1 V(0), w2 V(0), respectively, and the numbers of shares we receive are
w1 V(0) w2 V(0)
x1 = , x2 = .
S 1 (0) S 2 (0)
At the end of the period the securities prices change, which gives the final value of the
portfolio as a random variable

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1).

To express the return on a portfolio we shall employ the weights rather than the numbers
of shares since this is more convenient.
The return on the investment in two assets depends on the method of allocation of the
funds (the weights) and the corresponding returns. The vector of weights will be denoted by
w = (w1 , w2 ), or in matrix notation " #
w1
w= ,
w2
and the return of the corresponding portfolio by Kw .
Proposition 1.8
The return Kw on a portfolio consisting of two securities is the weighted average

Kw = w1 K1 + w2 K2 , (1.3)

where w1 and w2 are the weights and K1 and K2 the returns on the two components.

Proof
With the numbers of shares computed as above, we have the following formula for the

12
value of the portfolio

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1)


w1 V x1 ,x2 (0) w2 V x1 ,x2 (0)
= S 1 (0)(1 + K1 ) + S 2 (0)(1 + K2 )
S 1 (0) S 2 (0)
= V x1 ,x2 (0) (w1 (1 + K1 ) + w2 (1 + K2 ))
= V x1 ,x2 (0)(1 + w1 K1 + w2 K2 ),

hence
V x1 ,x2 (1) − V x1 ,x2 (0)
Kw = = w1 K1 + w2 K2 .
V x1 ,x2 (0)

In reality, the numbers of shares have to be integers. This, however, puts a constraint on
possible weights since not all percentage splits of our wealth can be realized. To simplify
matters we make an assumption that our stock position, that is, the number of shares, can be
any real number.
When the number of shares of given stock is positive, then we say that we have a long
position in the stock. We shall assume that we can also hold a negative number of shares of
stock. This is known as short selling. Short selling is a mechanism in which at time zero we
borrow stock and immediately sell it; we then need to buy it back at time one to return it to
the lender. This mechanism gives us additional money at time zero, that can be invested in a
different security.

13
Example 1.9
Consider the stocks S 1 and S 2 from Example 1.7. Suppose that at time zero we have
V(0) = 600. Suppose also that at time zero we borrow three shares of stock S 1 , meaning
that we choose x1 = −3. We sell the three shares of stock, which together with V(0)
gives us 3 · 200 + 600 = 1200 to invest in the second asset. We can thus take x2 = 4.
Note that
V x1 ,x2 (0) = x1 S 1 (0) + x2 S 2 (0) = 600 = V(0).
At time one we have the proceeds from holding four shares of S 2 , but we need to buy
back the three shares of S 1 at its market value. Since

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1),

we see that

V x1 ,x2 (1, ω1 ) = −3 · 260 + 4 · 270 = 300,


V x1 ,x2 (1, ω2 ) = −3 · 180 + 4 · 360 = 900.

We can compute the weights using (1.2)


−3 · 200 4 · 300
w1 = = −1, w2 = = 2.
600 600
We see that, as expected, w1 + w2 = 1.

When short selling is allowed, we assume that the weights can be any real numbers whose
sum is one. In real markets short selling comes with restrictions. To take a short position a
trader usually needs to pay a lending fee or to make a deposit. Throughout the discussion we
make a simplifying assumption that short selling is free of such charges. Since not all real
markets allow short selling, sometimes we distinguish a special case, where all the weights are
non-negative.

1.6 Feasible set


Finding the risk of a portfolio requires, apart from the risks of the components and the weights,
some knowledge about their statistical relationship.
Recall from the probability the notion of covariance of two random variables, X, Y
Cov(X, Y) = E [(X − E(X))(Y − E(Y)] = E(XY) − E(X)E(Y), (1.4)
with Cov(X, X) = Var(X) in particular. Let us introduce the following notation

σi j = Cov(Ki , K j ),

14
for i, j = 1, 2. In particular,

σ11 = Cov(K1 , K1 ) = Var(K1 ) = σ21 ,


σ22 = Cov(K2 , K2 ) = Var(K2 ) = σ22 .

From (1.4) we see that


σ12 = σ21 .
If the returns are independent, then we have σ12 = 0.
For convenience, the so-called correlation coefficient is also introduced
σi j
ρi j = . (1.5)
σi σ j
For this to make sense we have to assume that the variances of both returns are non-zero. A
variance is zero in one case only, namely when the random variable is constant (almost surely).
So we assume that the returns on stocks are genuine, non-constant, random variables, unlike
bonds, where the return is the same in each state (scenario).
Since |σi j | ≤ σi σ j (which is a particular case of the Schwarz inequality), the correlation
coefficient satisfies
−1 ≤ ρi j ≤ 1.
This makes correlation a good coefficient to measure dependence. If the correlation coefficient
is close to one or minus one, then there is a strong influence of one variable on the other. It is
more difficult to make such assertions by looking at covariance only. We can have examples
in which the covariance is a small number and yet the dependence is strong, or the other way
around.
Theorem 1.10
The expected return and the variance of the return on a portfolio are given by

µw = E(Kw ) = w1 µ1 + w2 µ2 , (1.6)
σ2w = Var (Kw ) = w21 σ21 + w22 σ22 + 2w1 w2 σ12 . (1.7)

Proof
Equation (1.6) follows directly from (1.3) and linearity of mathematical expectation
µw = E(Kw ) = E (w1 K1 + w2 K2 ) = w1 E(K1 ) + w2 E(K2 ).
We wish to compute the standard deviation of the return on a portfolio of two stocks:
σ2w = E(Kw2 ) − µ2w .

15
Figure 1.2 Attainable set.

Substituting (1.3) and (1.6), and using (1.4) in the last equality, gives

σ2w = E(w21 K12 + w22 K22 + 2w1 w2 K1 K2 ) − w21 µ21 − w22 µ22 − 2w1 w2 µ1 µ2
= w21 [E(K12 ) − µ21 ] + w22 [E(K22 ) − µ22 ] + 2w1 w2 [E(K1 K2 ) − µ1 µ2 ]
= w21 σ21 + w22 σ22 + 2w1 w2 σ12 .

Corollary 1.11
Using (1.5) we can rewrite the formula for variance of a portfolio as

σ2w = w21 σ21 + w22 σ22 + 2w1 w2 ρ12 σ1 σ2 . (1.8)

Corollary 1.12
Using the following matrix notation

µ1
" # " #
w1
w= , µ= ,
w2 µ2

σ21 σ12
" #
C= ,
σ12 σ22
equations (1.6–1.7) can be written as

µw = wT µ, (1.9)
σ2w = wTCw. (1.10)

The collection of all portfolios that can be manufactured by means of two given assets (in
other words, the feasible, or attainable set) can be conveniently depicted in the (σ, µ)-plane.
Assume that σ1 ≤ σ2 and µ1 , µ2 (let µ1 < µ2 for instance). Take the first weight as a

16
Figure 1.3 Portfolio lines for various values of ρ12 .

Figure 1.4 Portfolio line with one asset dominating the other.

parameter writing w = w1 . Hence w2 = 1 − w, w = (w, 1 − w) and the expected return and


standard deviation of the portfolio as functions of w have the form

µw = wµ1 + (1 − w)µ2 , (1.11)


σ2w =w2
σ21 + (1 − w)
2
σ22 + 2w(1 − w)ρ12 σ1 σ2 .

The attainable set is therefore a curve parameterized by w. An example of such set is depicted
in Figure 1.2 (see Excel). If short selling is not allowed we restrict our attention to the segment
corresponding to w ∈ [0, 1]. This is the thicker part of the curve in Figure 1.2.
The shape of the line depends on the correlation coefficient ρ12 . This is shown in Figure 1.3
(see Excel). We see that for negative ρ12 we can reduce the risk of the portfolio, at the same
time achieving an expected return between the expected returns of the two risky assets.
Suppose that the position of the two basis securities is such as in Figure 1.4, namely one
dominates the other. Depending on the correlation coefficient the portfolios manufactured may
give the investor extra choice, for instance we may obtain the portfolios whose risk is lower
than the risk of any of the individual assets. This shows that rejecting the dominated security
would be a bad decision.
From (1.11) we see that µw is affine, and σ2w is a quadratic function with respect to w. Since
a graph of the root of a quadratic function is a hyperbola, one can guess that the attainable set

17
y

x
h

(x−h)2 (y−k)2
Figure 1.5 A hyperbola a2
− b2
= 1.

consisting of all points (µw , σw ) is likely to be a hyperbola (see Excel).


Theorem 1.13
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the feasible set is a hyperbola with a center on the
vertical axis.

Proof
For better clarity we change the notation introducing the letters x, y for the coordinates
so that we have the following description of the feasible set:

y = wµ1 + (1 − w)µ2 , (1.12)


x2 = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 . (1.13)

The goal of further computations is to convert above system of equations to the form
(x − h)2 (y − k)2
− = 1, (1.14)
a2 b2
from which we will be able to read the properties of the hyperbola (see Figure 1.5).
Solving (1.12) for w
y − µ2
w=
µ1 − µ2
(note the relevance of the assumption µ1 , µ2 ) and inserting into (1.13), we get
1
x2 = [(y − µ2 )2 σ21 + (µ1 − y)2 σ22 + 2(y − µ2 )(µ1 − y)σ12 ],
A
where A = (µ1 − µ2 )2 > 0. Simple computation gives
1
x2 = [By2 − 2Cy + D], (1.15)
A

18
where

B = σ21 + σ22 − 2σ12 ,


C = σ21 µ2 + σ22 µ1 − σ12 (µ1 + µ2 ),
D = σ21 µ22 + σ22 µ21 − 2σ12 µ1 µ2 .

Observe, that B > 0 if ρ12 < 1, since σ21 + σ22 − 2σ12 > σ21 + σ22 − 2σ1 σ2 ≥ 0.
Let us observe that
 C D
By2 − 2Cy + D = B y2 − 2y +
B B
C 2 C2 D
" #
= B (y − ) − 2 +
B B B
= B(y − k) + c
2

 
with k = C
B
and c = 1
B
BD − C 2 . Substituting into (1.15) gives

1h i
x2 = B(y − k)2 + c ,
A
hence
x2 (y − k)2
c − c = 1. (1.16)
A B
We can see that we have obtained the desired hyperbola equation (1.14), with h = 0,
meaning that the center of the hyperbola lies on the vertical axis (see Figure 1.5).
One loose end to tie up is to show that c , 0. Otherwise we would have a division by
zero in (1.16). A simple but tedious computation shows that

BD − C 2 = Aσ21 σ22 (1 − ρ212 ).

Since ρ12 ∈ (−1, 1), B > 0 and A > 0,


1  A
c= BD − C 2 = σ21 σ22 (1 − ρ212 ) > 0.
B B

We shall return to the above discussion when later on we shall be working with n assets. It
will come as a surprise that from the point of view of technical difficulties, the general case will
be simpler than the particular situation just worked out, where only two assets are involved. It
will also turn out that the case of many assets reduces to the case of just two and we will be
able to draw valuable conclusions that remain valid in general case, from the discussion of the
present chapter .
In practice we can reject some of these portfolios drawing on the basic preference prop-

19
µ

( 2 , µ2 )

( 1 , µ1 )

Figure 1.6 Efficient frontier.

erty, namely, given two portfolios with the same risk, the one with higher expected return is
preferable. So we may discard the lower part of the curve restricting our attention to the up-
per, called the efficient set or frontier, as shown in Figure 1.6. More precisely, a portfolio is
called efficient if there is no other portfolio, except itself, that dominates it. The set of efficient
portfolios among all attainable portfolios is called the efficient frontier.

1.7 Special cases


Our first special case is when ρ12 = −1. From (1.8),

σ2w = w21 σ21 + w22 σ22 − 2w1 w2 σ1 σ2


= (w1 σ1 − w2 σ2 )2 ,

hence
σw = |w1 σ1 − w2 σ2 | .
Since σw is non negative, so the smallest possible value is σw = 0. Taking w1 = w and
w2 = 1 − w gives
σw = |wσ1 − (1 − w)σ2 | , (1.17)
and we can solve for σw = 0, obtaining
σ2 σ1
w= , 1−w= . (1.18)
σ1 + σ2 σ1 + σ2
Since σ1 , σ2 ≥ 0, we can see that w ∈ [0, 1], hence we can minimize our risk to zero without
short–selling.
From (1.17) and (1.11) one can show that the attainable set consists of two half lines,
emanating from the vertical axis (see Figure 1.7).

20
Figure 1.7 Attainable set for ρ12 = ±1.

Our second case is ρ12 = 1. Then


σ2w = w21 σ21 + w22 σ22 + 2w1 w2 σ1 σ2
= (w1 σ1 + w2 σ2 )2 ,
and
σw = |w1 σ1 + w2 σ2 | .
Similarly to the previous case, we obtain σw = 0 for
−σ2 σ1
w1 = , w2 = . (1.19)
σ1 − σ2 σ1 − σ2
This requires that σ1 , σ2 , and we exclude this trivial case. Since σ1 , σ2 ≥ 0, either w or 1 − w
has to be negative, hence we can not minimize risk to zero without short–selling. Without
short–selling the smallest risk is either at w = 0 or at w = 1.
Finally, consider a particular case where one of the assets is risk free, σ1 = 0, say. The
return on this asset is sure, µ1 = r and a reasonable assumption is that r < µ2 since otherwise
risk averse investors would never invest in the risky asset, its price should fall and so the
expected return should grow above the risk free level. (The preferences of investors will be
discussed in more detail later.) The return and risk for portfolios take a simplified form
µw = w1 r + w2 µ2 ,
σ2w = w22 σ22
giving
σw = |w2 | σ2 .
and so the set in the (σ, µ)-plane is as shown in Figure 1.8, see Excel (with redundant lower
part according to the preference relation).
The segment between the risk free asset and the asset characterized by (σ2 , µ2 ) corresponds
to positive weights. The line above (σ2 , µ2 ) requires taking short position in the risk free asset,
in other words, borrowing at the riskless rate (which we assume here to be possible). The
rejected lower segment shows portfolios with a short position in the risky asset.

21
Figure 1.8 Portfolio line for one risky and one risk–free security.

1.8 Minimum variance portfolio


We wish to minimize the variance σ2w - or, equivalently, the standard deviation σw . We start
with a theorem where the problem is solved in the case when short–selling is allowed.
Theorem 1.14
If short selling is allowed, then the portfolio with minimum variance has the weights
wmin = (w1 , w2 ) with
a b
w1 = , w2 = ,
a+b a+b
where

a = σ22 − ρ12 σ1 σ2 ,
b = σ21 − ρ12 σ1 σ2 ,

unless both ρ12 = 1 and σ1 = σ2 .

Proof
When ρ12 = −1, then from (1.18)
σ2 σ2 (σ1 + σ2 ) a
w1 = = = .
σ1 + σ2 (σ1 + σ2 )2 a+b
Similarly, for ρ12 = 1, using (1.19)
−σ2 −σ2 (σ1 − σ2 ) a
w1 = = = .
σ1 − σ2 (σ1 − σ2 )2 a+b
When ρ12 ∈ (−1, 1),

σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2

22
is a quadratic function. We compute the derivative of σ2w with respect to w and equate it
to 0:
2wσ21 − 2 (1 − w) σ22 + 2(1 − w)ρ12 σ1 σ2 − 2wρ12 σ1 σ2 = 0.
Solving for w gives the above result. The second derivative is positive,

2σ21 + 2σ22 − 4ρ12 σ1 σ2 > 2σ21 + 2σ22 − 4σ1 σ2 = 2 (σ1 − σ2 )2 ≥ 0,

which shows that we have a global minimum.

In Corollary 1.12 the return and variance of a given portfolio were stated in terms of the
covariance matrix
σ1 σ12
" 2 #
C=
σ12 σ22
for the two assets. We now do the same for the weights of the minimum variance portfolio.
By Cramer’s Rule
σ21 −σ12
" #
1
C =
−1
,
det C −σ12 σ22
so we have, writing 1 = (1, 1),
σ22 −σ12
" #
1
C 1 =
−1
,
det C −σ12 σ21
1
1T C −1 1 = (σ2 + σ22 − 2σ12 ).
det C 1
We have proved the following
Corollary 1.15
The vector w = (w1 , w2 ) of weights of the minimum variance portfolio found in Theorem
1.14 has the form
C −1 1
w = T −1 ,
1 C 1
provided that the denominator is non-zero.

We now discuss what happens when short–selling is not allowed. We need to find the
minimum of
σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2
for restricted values of the weight 0 ≤ w ≤ 1. Let w1 be the coefficient from Theorem 1.14.
The claim is illustrated in Figure 1.9, where the bold parts correspond to portfolios with no
short selling. We can see that the smallest variance is attained at wmin = (w, 1 − w) with
0 if w1 < 0,



w=

w1 if w1 ∈ [0, 1],

 1 if w > 1.


1

23
Figure 1.9 Smallest variance with short–selling restrictions.

Hence, if the global minimum is outside [0, 1], en embargo on short-selling means that an
investor wishing to minimise his risk should put all his funds into one of the two assets.

1.9 Adding a risk–free security


All portfolios built of the risk free asset (with rate of return r) and any other asset are repre-
sented by a straight half-line starting from (0, r) and passing though the corresponding points
on the (σ, µ) plane. The new feasible region is thus obtained by taking any point on the at-
tainable set and linking it with the risk free asset, as shown in Figure 1.10. To find the new
efficient frontier we seek a line with the highest slope according to the preference relation.
Note that it is reasonable to make the following restriction: the risk–free rate is smaller than
the expected return of the risk minimizing portfolio. Under this assumption there is a unique
portfolio on the efficient frontier, called the market portfolio, such that the line with the high-
est slope passes through it (see Figure (1.11)). This optimal line, called the Capital Market
Line, is tangent to the efficient frontier (as follows from the elementary geometric properties
of hyperbolas). Denoting the expected return of the market portfolio by µm and its risk by σm ,
the capital market line is given by
µm − r
µ=r+ σ. (1.20)
σm

24
µ

Figure 1.10 Feasible set after adding a risk–free security.

Theorem 1.16
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= , (1.21)
c+d c+d
where

c = σ22 (µ1 − r) − σ12 (µ2 − r),


d = σ21 (µ2 − r) − σ12 (µ1 − r).

Proof
For a portfolio (w, 1 − w), we denote its expected return by µ(w), and standard deviation
by σ(w). Optimization is based on maximizing the slope coefficient:
µ(w) − r
s(w) = .
σ(w)
To this end it is necessary and sufficient to solve
s0 (w) = 0
(sufficiency follows from the uniqueness of the solution). We have
µ0 (w)σ(w) − (µ(w) − r)σ0 (w)
s0 (w) = .
σ2 (w)
Since
p 0 1 1
σ0 (w) = σ2 (w) = (σ2 (w))0 = (σ2 (w))0
2 σ (w)
p
2 2σ(w)
the equation s0 (w) = 0 reduces to
2µ0 (w)σ2 (w) − (µ(w) − r)(σ2 (w))0 = 0

25
µ CML

( 2 , µ2 )
MP
MVP
r ( 1 , µ1 )

Figure 1.11 The minimum variance portfolio MVP, the market portfolio MP, and the capital market
line CML.

that is

(µ1 − µ2 )(w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 )


−(wµ1 + (1 − w)µ2 − r)(wσ21 − (1 − w)σ22 + (1 − 2w)σ12 ) = 0.

This is in fact a linear equation in w since all terms involving w2 cancel out. Elementary,
but tedious computations give
c d
w= , 1−w= .
c+d c+d

Corollary 1.17
The formulae (1.21) for the weights of the market portfolio can be written in matrix
notation as
C −1 (µ − r1)
m = T −1 , (1.22)
1 C (µ − r1)
where C is the covariance matrix, µ is the vector of expected returns, and 1 is a vector
consisting of numbers 1 on both coordinates.

The following argument shows a possible practical relevance of the market portfolio.
Suppose that the market consists of two securities and suppose that the investors make their
decisions on the basis of the expected returns and the covariance matrix, assuming in addition
that they all use the same numerical values (returns, variances and covariance for the assets).
If they all behave rationally, they perform the above computations and all arrive at the same
market portfolio. They may choose different portfolios on the Capital Market Line, but they all
invest in the two given components in the same proportions. The conclusion is that the weights
of the market portfolio are given by the percentage value of all shares of each asset.
To see this consider an example. Asset A is represented by 1000 shares 20 dollars each,
asset B by 500 shares, 40 dollars each, so each asset represents 50% of the market. If the

26
investors have these assets in any other proportion, this leads to a contradiction with the fact
that they all should have the same portfolio. Should any have above 50% of asset A, say, this
would leave some other investors unsatisfied, since they wish to get more A than is available,
and to sell some unwanted B. This would result in excess supply of B and excess demand
of A, which would alter the prices, the expected returns and consequently the weights on the
market portfolio. For the above argument to be valid we have to assume that the market is in
equilibrium
The assumptions needed to arrive at this conclusion are never fulfilled in real life (for
instance the assumption that all investors believe that the parameters of the assets are identical)
so the practical relevance of the result is rather limited.
Example 1.18
Assume that the covariance matrix C, the vector of expected returns µ, and the risk free
rate r are given. Assume also that an investor wishes to spend V and his aim is to achieve
an expected return equal to a given rate m. The question is how much he should spend
on the risky assets, and how much should he invest risk free.
First we compute m using (1.21). We can then compute the expected return of the
market portfolio using (1.9)
µm = mT µ.
Optimal investments lie on the capital market line. The investor needs to hold a combi-
nation of the market portfolio and the risk–free security. We assume that he spends λV
on the market portfolio and (1 − λ) V risk free. The desired λ can be computed from the
expected return of the position

λµm + (1 − λ) r = m,

giving
m−r
λ= .
µm − r
Since the investor spends λV on the market portfolio, the vector
!
v1
= λVm,
v2

gives us the amount v1 invested in the first asset, and v2 invested in the second asset. As
mentioned above, (1 − λ) V is invested risk free.

27
µ

( 1 , µ1 )

Figure 1.12 An indifference curve for (σ1 , µ1 ).

1.10 Indifference curves


The preference relation does not help us choose between two assets, one with higher expected
return and higher risk, the other less risky but with lower return. It seems impossible to extend
the relation to solve this decision problem so that this extension would be accepted by all
investors. The relation is based on risk aversion, but the investors who, as assumed, share this
attitude, may differ in the intensity of their aversion. An investor who is sensitive to risk may
require much higher returns as a compensation for increased exposure. Another investor may
be cornered, forced to accept risk to earn the return needed to fulfill the requirement created
by the circumstances, or may be just less sensitive to risk. It is inevitable that we have to allow
modelling individual preferences. So let us fix our attention on one particular investor, and
fix one particular asset (or portfolio of assets). We assume that this investor can answer the
following question: which assets are equally attractive as the fixed one? The answer provides
us with a certain set of assets. Since the preference relation is valid, the intersection of this
set by any line parallel to any of the axes can contain at most one element. So it is a graph of
an increasing function. We assume in addition that this function is convex for each investor –
in other words, to retain his peace of mind, the investor demands that a unit increase of risk
be offset by more than one unit increase in return, as shown in Figure 1.12 - and we call it an
indifference curve.
We assume that indifference curves are level sets of a function

u : R2 → R.

We assume that a curve {u = c2 } lies above {u = c1 } for c1 < c2 . In other words, the higher the
value of u, the higher the investor’s satisfaction with the investment. Given a set of attainable
portfolios, an investor chooses the one placed on the best indifference curve. It is geometrically
obvious as a result of convexity of the curves that the optimal portfolio is at the tangency point
for some indifference line, as shown in Figure 1.13.
For another investor, who is less risk averse, that is, who has less steep indifference curves,
the optimal portfolio may be different, Figure 1.13. It lies further to the right which agrees

28
Figure 1.13 Indifference curves and optimal inveatment for a risk averse investor (left), and for a
risk indifferent investor (right).

with our intuition regarding the risk preferences of this investor.


Example 1.19
Assume that the covariance matrix C, the vector of expected returns µ, and the risk free
rate r are given, and that an investor’s indifference curves are given by

u(σ, µ) = µ − aσ2 − bσ.

We show how the investor should spend V to maximize u.


Using (1.22), (1.9) and (1.10) we can find the market portfolio m, its expected return µm
and variance σ2m . Since the slope of the curve u, which is 2aσ + b, needs to match the
slope of the capital market line, the tangency point can be found by solving the system
of two linear equations
µm − r
µ = r+ σ,
σm
µm − r
2aσ + b = .
σm
This means that
µm − r 1 µm − r
!
µ=r+ −b .
σm 2a σm
We can now decide how to divide V amongst the assets using the same method as in
Example 1.18.

We shall return to indifference curves in Chapter ??, where we will discuss their relation
with utility functions.

29
2
Many assets
2.1 Lagrange multipliers
The objective of this section is to show how to solve the following problem
min f (v)
under constraints: (2.1)
g(v) = 0,
where

f : Rn → R,
g : Rn → Rk .

Before writing out a solution, we need to introduce some notations.


To keep better track of dimensions, we use a bold font whenever we are dealing with
vectors, and normal font when dealing with numbers. Note that above we used f for a function
taking values in reals and g for a function

g(v) = (g1 (v), . . . , gk (v))

taking values in Rk .
For a function f : Rn → R and v ∈ Rn we write ∇ f (v) for a vector
 ∂f 
 ∂x1 (v) 
∇ f (v) =  ...  .
 
 ∂ f 
∂xn
(v)

Theorem 2.1
If v∗ is a solution of the problem (2.1), and the derivative of g at v∗ is a matrix of rank k,
then there exists a sequence of numbers λ1 , . . . , λk ∈ R such that

∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ )) = 0. (2.2)

30
The λ1 , . . . , λk from Theorem 2.1 are referred to as Lagrange multipliers, and the function

L(v) = ∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ ))

is referred to as the Lagrangian.


We give the proof of Theorem 2.1 later on in the section, after introducing some necessary
preliminaries. Before doing so let us comment that Theorem 2.1 provides only the necessary
conditions. Even when (2.2) holds for some v∗ , it does not necessarily imply that v∗ is a
minimum. This is very similar in spirit to searching for a minimum of a function f : R → R.
The obvious candidate for a minimum is a point x∗ satisfying f 0 (x∗ ) = 0. It is of course not
enough that f 0 (x∗ ) = 0 for x∗ to be a minimum though. Some additional conditions need
to be checked. Similarly, Theorem 2.1 is a handy tool for finding a candidate for a solution
of problem (2.1). To prove that this candidate is indeed a solution usually requires some
additional work.
The proof of Theorem 2.1 will rely on the implicit function theorem, which is a classi-
cal result in analysis. We therefore write out the theorem without giving its proof. We first
introduce some notations.
For
g = (g1 , . . . , gk ) : Rl × Rm → Rk
∂g ∂g
and (x, y) ∈ Rl × Rm , x = (x1 , . . . xl ) and y = (y1 , . . . ym ) we write ∂x
and ∂y
for matrices

∂g1 ∂g1 ∂g1


...
 
∂x1
(x, y) ∂x2
(x, y) ∂xl
(x, y)
∂g
 
(x, y) = 
 .. .. ..  ,

∂x . . .
∂gk ∂gk ∂gk

(x, y) . . .

∂x1
(x, y) ∂x2 ∂xl
(x, y)
∂g1 ∂g1 ∂g1
...
 
∂y1
(x, y) ∂y2
(x, y) ∂ym
(x, y)
∂g
 
(x, y) = 
 .. .. ..  .

∂y . . .
∂gk ∂gk ∂gk

y) . . .

∂y1
(x, y) ∂y2
(x, ∂ym
(x, y)

31
Theorem 2.2 Implicit function theorem

Consider n > k and a C 1 function

g = (g1 , . . . , gk ) : Rn−k × Rk → Rk .

Assume that for a point (x∗ , y∗ ) ∈ Rn−k × Rk

g(x∗ , y∗ ) = 0,
∂g ∗ ∗
and that the matrix ∂y (x , y ) is invertible. Then there exists a neighbourhood U × V ⊂
R × R of (x , y ) and a C 1 function
n−k k ∗ ∗

h : U → V,

such that
g(x, h(x)) = 0 for all x ∈ U.
Moreover, for any v ∈ U × V, if g(v) = 0, then v = (x, h(x)) for some x ∈ U.

Corollary 2.3
For the function h from Theorem 2.2
!−1
∂g ∂g
h (x) = −
0
(x, h(x)) (x, h(x)) .
∂y ∂x

Proof
Since g(x, h(x)) = 0, by computing a derivative with respect to x we obtain
∂g ∂g
(x, h(x)) + (x, h(x))h0 (x) = 0.
∂x ∂y
The claim follows by rearranging so that h0 (x) is on the left hand side.

We are now ready to prove Theorem 2.1:


Proof Proof of Theorem 2.1
Since the derivative of g at v∗ is a matrix of rank k there exist k-dimensional coordinates
∂g ∗
y such ∂y (v ) is invertible. We can always re-number the coordinates so that v = (x, y)
with x ∈ Rn−k and y ∈ Rk .

32
By the implicit function theorem, we know that there exists a function h such that

g(x, h(x)) = 0.

Since v∗ = (x∗ , y∗ ) is a solution of problem (2.1), x∗ is a minimum of f (x, h(x)), meaning


that the derivative of f (x, h(x)) with respect to x is zero at x∗ . Applying Corollary 2.3,
this gives
∂f ∗ ∂f ∗ 0 ∗
0= (v ) + (v )h (x )
∂x ∂y
!−1
∂f ∗ ∂ f ∗ ∂g ∗ ∂g ∗
= (v ) − (v ) (v ) (v ) . (2.3)
∂x ∂y ∂y ∂x

We define a 1 × k matrix Λ as
!−1
h i ∂ f ∗ ∂g ∗
Λ= λ1 λ2 . . . λk = (v ) (v ) .
∂y ∂y

From (2.3) follows that


∂f ∗ ∂g
(v ) = Λ (v∗ ) . (2.4)
∂x ∂x
From the definition of Λ
∂f ∗ ∂g
(v ) = Λ (v∗ ) . (2.5)
∂y ∂y
Conditions (2.4) and (2.5) combined give (2.2).

In some cases, the necessary condition (2.2) turns out to be sufficient for v∗ to be a solution
of the problem (2.1).
Theorem 2.4
Assume that f (v) is a smooth convex function and that

g(v) = Av − c,

where A is an k×n matrix and c ∈ Rk . If there exist a sequence of numbers λ1 , . . . , λk ∈ R


and a point v∗ ∈ Rn such that (2.2) is satisfied, then v∗ is a solution of the problem (2.1).

Proof
Let us take any v satisfying g(v) = 0. We need to show that f (v) ≥ f (v∗ ).

33
Using a notation λ = (λ1 , . . . , λk ) we can write

λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ ) = AT λ. (2.6)

Let w = v − v∗ . Since g(v) = 0 and g(v∗ ) = 0

0 = g(v) = g(v∗ + w) = Av∗ + Aw − c = g(v∗ ) + Aw = Aw. (2.7)

By the Taylor’s formula


1
f (v∗ + w) = f (v∗ ) + ∇ f (v∗ ) · w + D2 f (ξ)(w, w) (2.8)
2
for some point ξ ∈ Rn . Since f is convex

D2 f (ξ)(w, w) ≥ 0, (2.9)

for any ξ, w ∈ Rn .
We can now compute

f (v) = f (v∗ + w)
= f (v∗ ) + ∇ f (v∗ ) · w + 21 D2 f (ξ)(w, w) (from (2.8))
= f (v∗ ) + AT λ · w + 21 D2 f (ξ)(w, w) (from (2.2) and (2.6))
 T
= f (v∗ ) + AT λ w + 12 D2 f (ξ)(w, w)
= f (v∗ ) + λT Aw + 12 D2 f (ξ)(w, w)
= f (v∗ ) + 12 D2 f (ξ)(w, w) (from (2.7))
≥ f (v∗ ). (from (2.9))

2.2 Risk and return


A portfolio constructed from n different securities can be described by means of the vector of
weights
w =(w1 , . . . , wn ),
with a constraint j=1 w j = 1. With a notation 1 for an n-dimensional vector
Pn

1 = (1, . . . , 1) ,
the constraint can be convieniently written as
wT 1 = 1. (2.10)
The attainable set A consists of all possible portfolios with weights w:

34
n o
A = w : wT 1 = 1 .
If short-selling is not possible, a constraint w j ≥ 0 , for all j, is added throughout. Here, unless
stated otherwise, we assume availability of short sales.
Alternatively a portfolio is described by the vector of positions taken in particular compo-
nents (numbers of units of assets)
x = (x1 , . . . , xn ).
We have the following relations between the weights, prices and the numbers of shares
x j S j (0)
wj = , j = 1, . . . , n,
V(0)
where x j is the number of shares of type j in the portfolio, S j (0) is the initial price of security j,
and V(0) is the total money invested.
Denote the random returns on the securities by K1 , . . . , Kn , and the vector of expected
returns by
µ = (µ1 , . . . , µn ),
with
µ j = E(K j ), for j = 1, . . . , n.
The covariances between returns will be denoted by σ jk = Cov(K j , Kk ), in particular σ j j =
σ2j = Var(K j ). These are the entries of the n × n covariance matrix

σ11 σ12 · · · σ1n


 
 
σ21 σ22 · · · σ2n
 
C =  .. .. .. ..  .
 
 . . . . 
σn1 σn2 · · · σnn
 

We write as before n
X
Kw = w j K j.
j=1

Theorem 1.10 can easily be generalised.


Theorem 2.5
The expected return µw = E(Kw ) and variance σ2w = Var(Kw ) of a portfolio with weights
w are given by

µw = wT µ,
σ2w = wTCw.

35
Proof
The formula for µw follows from the linearity of mathematical expectation,
 n  n
X  X
µw = E(Kw ) = E  w j K j  =
  w j µ j = wT µ.
j=1 j=1

For σ2w we use the bilinearity of covariance:

σ2w = Var(Kw )
= Cov (Kw , Kw )
 n n

X X 
= Cov  w j K j , wk Kk 
j=1 k=1
n
X
= w j wk σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTCw.

Proposition 2.6
For any two portfolios

wA = wA,1 , . . . , wA,n ,


wB = wB,1 , . . . , wB,n ,


the covariance between the returns is

Cov(KwA , KwB ) = wTA CwB .

Proof
Using the bilinearity of covariance we compute
 n n

X X 
Cov(KwA , KwB ) = Cov  wA, j K j , wB,k Kk 
j=1 k=1
n
X
= wA, j wB,k σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTA CwB .

Instead of using covariance matrix as initial data, we can use correlation. Assume that we

36
have a matrix of correlations between returns
 1 ρ12 · · · ρ1n
 

 ρ21 1 · · · ρ2n
 
% =  .. .. . . .  ,

 . . . .. 
ρn1 ρn2 · · · 1
 

with σi j
ρi j = .
σi σ j
We also use a notation diag(σ1 , . . . , σn ) to denote an n × n matrix with σ1 , . . . , σn on the
diagonal, and zero on remaining entries. The following lemma gives a link between C and %.
Lemma 2.7
The covariance matrix is equal to

C = diag(σ1 , . . . , σn ) % diag(σ1 , . . . , σn ).

Proof
We first compute

 σ1 ρ12 σ2 · · · ρ1n σn 


 

 ρ21 σ1 σ2 · · · ρ2n σn 


 
% diag(σ1 , . . . , σn ) =  .. .. .. ..  .
 . . . . 
ρn1 σ1 ρn2 σ2 · · · σn
 

Multiplying by diag(σ1 , . . . , σn ) on the left hand side we have

diag(σ1 , . . . , σn ) % diag(σ1 , . . . , σn )
 σ1 σ1 σ1 ρ12 σ2 · · · σ1 ρ1n σn 
 

 σ2 ρ21 σ1 σ2 σ2 · · · σ2 ρ2n σn 


 
=  .. .. ... ..
. . .

 
σn ρn1 σ1 σn ρn2 σ2 · · · σn σn
 
= C (since σi j = σi ρi j σ j , and ρii = 1)

2.3 Three risky securities


The purpose of this section is to provide geometric intuition as to the shape of the attainable
set.

37
w1 w1
w2 w2

Figure 2.1 The plots of µw and σw with respect to w1 , w2 .

w2 w2
1 1

w1 w1
1 1

Figure 2.2 The lines µw = m (left) and the curves σw = c (right).

In the case when we have three risky assets, the third weight of a portfolio can be computed
from the first two weights
w3 = 1 − w2 − w1 ,
meaning that the attainable set is parameterized by w1 and w2 . We can write the formulae for
µw and σw with respect to these two parameters as

µw = w1 µ1 + w2 µ2 + w3 µ3
= w1 µ1 + w2 µ2 + (1 − w1 − w2 ) µ3 ,

and

σ2w = w21 σ21 + w22 σ22 + w23 σ23 + 2w1 w2 σ12 + 2w1 w3 σ13 + 2w2 w3 σ23
= w21 σ21 + w22 σ22 + (1 − w2 − w1 )2 σ23 + 2w1 w2 σ12
+2w1 (1 − w2 − w1 ) σ13 + 2w2 (1 − w2 − w1 ) σ23 .

The plots of µw and σw are given in Figure 2.1. The lines on the graphs represent the level sets
{µw = m} and {σw = c} for several values of m and c.
Since the third weight can be computed from the first two, the attainable set is represented
as the (w1 , w2 ) plane, as in Figure 2.2. The vertices of the gray triangle represent investments
in single assets. The point (1, 0) represents the first asset, (0, 1) the second asset, and since

38
w1 w1
w2 w2 w1
w2

Figure 2.3 The plot of σw together with µw = m.

w3 = 1 − w1 − w2 , the point (0, 0) represents the third asset. The gray triangle consists of the
points
{(w1 , w2 ) : w1 , w2 ≥ 0, w1 + w2 ≤ 1}, (2.11)
and contains portfolios attainable without short–selling. The level sets {µw = m} and {σw = c}
from Figure 2.1 can be projected onto the (w1 , w2 ) plane in Figure 2.2. These are the straight
lines and ellipses on the graphs in Figure 2.2, respectively. The middle point of the ellipses
is the minimum variance portfolio. In this particular figure, since the point lies outside of the
triangle, we see that the minimum variance portfolio requires short selling. In Figure 2.2 we
also see that if short selling is not allowed, then the smallest attainable σw lies on the ellipse,
which is tangent to the gray triangle. The minimum variance portfolio without short–selling is
the tangency point.
We now discuss the shape the attainabe portfolios take on the (σ, µ) plane. We start with
Figure 2.3, where we see the plane corresponding to portfolios with µw = m, together with the
plot of σw . We see that there is a single point, which has smallest attainable variance under
the constraint µw = m. This is the point at the bottom of the intersection of the plane with the
hyperbola. From the plot we also see that for µw = m we can have portfolios with arbitrarily
large σ. This leads to a conclusion that on the (σ, µ) plane, the set of portfolios with µw = m is
a horizontal half line, which is depicted in Figure 2.4. Intuitively one can think of Figure 2.4
as the leftmost graph from Figure 2.3, rotated clockwise by ninety degrees, and projected onto
the plane. Since the plot of σw is a hyperbola, one is lead to believe that the boundary of the
attainable set on the (σ, µ) plane should also be a hyperbola. This is just a geometric intuition,
and is by no means meant as a proof. We shall prove this fact later on.
When short–selling is not allowed, the attainable set is restricted to the set from equation
(2.11). In such case, on the (σ, µ) plane the attainable set takes the shape depicted in Figure
2.5. The three points represent the three assets. A hyperbola passing through any two points,
represents portfolios involving investments in the two securities corresponding to the points.
The fragments of the hyperbolas between two points correspond to the edges of the triangle
from Figure 2.2. The attainable set in Figure 2.5 can therefore be interpreted as a distorted and
folded projection of the triangle from Figure 2.2.

39
µ

Figure 2.4 Attainable portfolios.

Figure 2.5 Attainable portfolios with short–selling constraints.

2.4 Minimum variance portfolio


In this section we give the formula for the weights of the portfolio with smallest variance.
Before doing so, we need to consider a technical lemma.
Lemma 2.8
We have the following formulae for the gradients with respect to w
 
∇ wT µ = µ, (2.12)
 
∇ wT 1 = 1, (2.13)
 
∇ wTCw = 2Cw, (2.14)

and the Hessian of wTCw is equal to 2C.

Proof
Since
∂  T  ∂
w µ = (w1 µ1 + . . . + wn µn ) = µi
∂wi ∂wi

40
we see that

 
wT µ µ1
   
∂w1
  
  .. ..
 
∇ wT µ =   =   = µ,
 
.  .

 

wT µ µn
 
∂wn

which proves (2.12).


The proof of (2.13) follows from a mirror argument, using 1 instead of µ.
To prove (2.14) we observe that in
n n
∂  T  ∂ XX
w Cw = w j wk σ jk
∂wi ∂wi j=1 k=1

the derivative of each term can be non zero only when j = i or k = i. This means that
n n
∂ XX
w j wk σ jk
∂wi j=1 k=1
 
∂  XX XX 
= wi wi σii + w j wk σ jk + w j wk σ jk 
∂wi j=i k,i j,i k=i
X X
= 2wi σii + wk σik + w j σ ji
k,i j,i
n
X
= 2 wk σik (since σ ji = σi j ) (2.15)
k=1
= 2 (Cw)i

where (Cw)i stands for the i-th coordinate of the vector Cw. Combining the partial
deriveatives on all coordinates gives (2.14).
Using (2.15) we can compute
 n 
∂ ∂  T  ∂  X
w Cw = wk σik 

2
∂wl ∂wi ∂wl k=1
= 2σil
= 2σli ,

hence
∂2  T 
!
w Cw = (2σli )l,i≤n = 2C,
∂wl ∂wi l,i≤n

which is the Hessian of wTCw.

41
We are now ready to give the the formula for the weights of the minimum variance protfo-
lio.
Theorem 2.9
The portfolio with the smallest variance in the attainable set has weights

C −1 1
wmin = . (2.16)
1TC −1 1

Proof
We need to find the minimum of wTCw subject to the constraint

wT 1 = 1. (2.17)

To this end we use the method of Lagrange multipliers taking a Lagrangeian


   
L(w) = ∇ wTCw −∇ λ(1T w − 1) .

By (2.13) and (2.14) from Lemma 2.8,

L(w) = 2Cw − λ1 = 0,

hence
λ
w = C −1 1. (2.18)
2
Substituting this into the constraint (2.17), we obtain
λ T −1
1 = wT 1 = 1T w = 1 C 1
2
Solving this for λ and substituting the result into (2.18) gives (2.16).
We have shown that (2.16) is the only candidate for a local extremum. From Lemma
2.8 we know that the Hessian of wTCw is 2C, which is positive definite. By Theorem
?? this means that wmin is a local minimum. Since wmin is the only local extremum, it
needs to be a global minimum.

The minimum variance portfolio has a surprising property that its covariance with any other
portfolio is constant. This property will prove useful later on, when discussing the shape of the
attainable set on the (σ, µ) plane.

42
µ
MVL

Figure 2.6 Minimum variance line (MVL).

Corollary 2.10
For any portfolio w
Cov(Kw , Kwmin ) = σ2wmin .

Proof
By Proposition 2.6

Cov(Kw , Kwmin ) = wTCwmin


C −1 1
= wTC T −1
1 C 1
T
w 1
= T −1
1 C 1
1
= T −1 . (2.19)
1 C 1
Above holds for any portfolio w, hence also in particular for w = wmin , giving
1
σ2wmin = Var(Kwmin ) = Cov(Kwmin , Kwmin ) = . (2.20)
1TC −1 1
Combining (2.19) with (2.20) we obtain our claim.

2.5 Minimum variance line


To find the efficient frontier, we have to recognise and eliminate the dominated portfolios. To
this end we fix a level of expected return, denote it by m, and consider all portfolios with
µw = m. All of these are redundant except the one with the smallest variance. The family of
such portfolios, parametrised by m, is called the minimum variance line (see Figure 2.6).
More precisely, portfolios on the minimum variance line are solutions of the following

43
problem:
min wTCw

under constraints: (2.21)


wT µ = m,
wT 1 = 1.

Theorem 2.11
Let M be a 2 × 2 matrix of the form
µ C µ µTC −1 1
" T −1 #
M= .
µTC −1 1 1TC −1 1

If C and M are invertible, then the solution of problem (2.21) is given by


1
w= C −1 (det(M1 ) µ + det(M2 ) 1) , (2.22)
det(M)
where
m µTC −1 1 µTC −1 µ m
" # " #
M1 = , M2 = .
1 1TC −1 1 µTC −1 1 1

Proof
We introduce the Lagrange multiplier λ = (λ1 , λ2 ), and the Lagrangeian
      
L(w) = ∇ wTCw −∇ λ1 wT µ − m + λ2 wT 1 − 1 = 0.

Using Lemma 2.8 we can compute

L(w) = 2Cw − λ1 µ − λ2 1 = 0.

We solve this system for w


1 1
w = λ1C −1 µ + λ2C −1 1. (2.23)
2 2
Since wT µ = µT w and wT µ = µT w, substituting (2.23) into the constraints from (2.21),
we obtain a system of linear equations
1 1
λ1 µTC −1 µ + λ2 µTC −1 1 = m,
2 2
1 1
λ1 1TC −1 µ + λ2 1TC −1 1 = 1.
2 2

44
We can solve above system for λ1 and λ2 to obtain (note the relevence of the assumption
that M is invertible, which ensures that det(M) , 0)
1 det (M1 ) 1 det (M2 )
λ1 = , λ2 = .
2 det (M) 2 det (M)
Substituting the above back into (2.23) gives (2.22).
We have found a candidate for the solution of (2.21). By Lemma 2.8 we know that the
Hessian of wTCw is equal 2C, which is a positive definite matrix. This ensures that we
have found a global minimum.

Consider three uncorrelated assets with


σ21 = 0.01, σ22 = 0.02, σ23 = 0.04,
µ1 = 0.1, µ2 = 0.2, µ3 = 0.3.

Using (2.22) compute the portfolio which solves the problem (2.21) for m = 0.25.
The formula (2.22) is long and somewhat cumbersome to apply. Our aim will be to simplify
it. The first step towards this end is to notice that all portfolios on the minimum varance line
can be expressed as an affince function of two fixed vectors.
Corollary 2.12
There exist two vectors a and b, which depend only on C and µ, such that a solution of
the problem (2.21) is
w = ma + b.

Proof
Since

det (M1 ) = m1TC −1 1 − µTC −1 1,


det (M2 ) = µTC −1 µ − mµTC −1 1,

from (2.22) we see that w = ma + b for


1     
a = C −1 1TC −1 1 µ − µTC −1 1 1 ,
det(M)
1     
b = C −1 µTC −1 µ 1 − µTC −1 1 µ .
det(M)

The efficient frontier, which is the set of all portfolios not dominated by any other portfo-
lios, consists of w = am + b for m ≥ µwmin (see Figure 2.7) and Excel).

45
µ

µwmin MVP

Figure 2.7 Efficient frontier, together with the minimum variance portfolio (MVP).

We will now show that the whole minimum variance line can be computed from just two
portfolios.
Corollary 2.13
Suppose that w1 and w2 are two portfolios on the minimum variance line with different
expected returns: µw1 , µw2 . Then any portfolio w on the minimum variance line can be
obtained from these two, that is, there is an α such that w = αw1 + (1 − α)w2 .

Proof
We first find α so that
µw = αµw1 + (1 − α)µw2 .
This is possible since the returns are different:
µw − µw2
α= .
µw1 − µw2

Since the two portfolios lie on the minimum variance line, they satisfy

w1 = µw1 a + b,
w2 = µw2 a + b.

From these relations we have

αw1 + (1 − α)w2 = (αµw1 + (1 − α)µw2 )a + b = µw a + b,

but w is also on the minimum variance line so w = µw a + b hence the result.

The minimum variance portfolio wmin lies on the minimum variance line (see Excel). We
therefore already have a simple formula (2.16) for one of the two portfolios needed to obtain
the minimum variance line. The second portfolio will be the market portfolio. We give the
formula for the market portfolio in the next section. The resulting parameterization of the
minimum variance line is then written out in equation (2.27).

46
From Corrolary 2.13 follows an important observation.
Theorem 2.14
Suppose that there exist two portfolios w1 and w2 on the minimum variance line with
different expected returns: µw1 , µw2 . Then the minimum variance line is a hyperbola
with a center on the vertical axis.

Proof
Let Kw1 and Kw2 be the returns on portfolios w1 and w2 , respectively. From Corollary
2.13 we know that any portfolio on the minimum variance line can be expressed as

w = αw1 + (1 − α)w2 ,

hence its return is equal to

Kw = αKw1 + (1 − α)Kw2 .

We can treat each of the two portfolios as if it were a single security. Applying the
results from chapter two for portfolios consisting of two securities, we know that

µw = αµw1 + (1 − α) µw2 ,
σ2w = α2 σ2w1 + (1 − α)2 σ2w2 + 2α (1 − α) Cov Kw1 , Kw2 .


Since µw1 , µw2 , by Theorem 1.13 the curve (σw , µw ) is a hyperbola.

2.6 Market portfolio


Recall that the market portfolio is the optimal portfolio on the efficient frontier taking into
account existence of a risk–free asset. The line connecting the market portfolio with the risk
free asset is tangent to the minimum variance line and has the maximal slope among the lines
determined by all portfolios (Figure 2.8 and Excel).
Above we found the formula for the market portfolio obtained in case of two risky securi-
ties determining the efficient set. This result is of course applicable to the general situation in
view of Corollary 2.13. However, we derive the formula again, this time the parameters of all
n securities will be used.

47
µ CML

MP
MVP
r

Figure 2.8 Minimum variance portfolio MVP, the market portfolio MP, and the capital market line
CML.

Theorem 2.15
If the risk free rate is smaller than the expected return of the minimum variance portfolio,
then the market portfolio exists and is given by

C −1 (µ − r1)
m= . (2.24)
1TC −1 (µ − r1)

Proof
From Theorem 2.14 we know that the minimum variance line is a hyperbola. Since its
centre is on the vertical axis, there exists a single tangency point for a half line emanating
from (0, r), which maximizes the slope (see Figure 2.8). The slope in question is of the
form
µw − r wT µ − r
= √ ,
σw wTCw
where w are the weights of a portfolio and r is the risk free rate of return. At the maximal
slope the Lagrangian
wT µ − r
L(w) = ∇ √ − λ∇(wT 1 − 1),
T
w Cw
needs to be equal to zero. We can compute the gradients using Lemma 2.8 and equate
them to zero:

µ wTCw − (wT µ − r) 2 √w1TCw 2Cw
L(w) = − λ1 = 0.
wTCw
This yields
Cw
µσw − (µw − r) − λσ2w 1 = 0,
σw
hence
µw − r
Cw = µ − λσw 1.
σ2w

48
Multiplying by wT on the left and using the fact that wT 1 = 1 we get
µw − r T
w Cw = µw − λσw ,
σ2w
so
r
λ= ,
σw
therefore we have the equation
γCw = µ − r1
µw −r
where γ = σ2w
, so that
γw = C −1 (µ − r1).
Even though we have a w in the formula for γ, the γ turns out to be a constant. This
follows from multiplying above equation by 1T on both sides, which gives

γ = 1TC −1 (µ − r1). (2.25)

By substituting the γ into (2.25) we obtain our claim.

The line joining the risk–free security represented by (0, r) and the market portfolio with
coordinates (σm , µm ) satisfies the equation
µm − r
µ=r+ σ. (2.26)
σm
It is called the Capital Market Line, CML in brief. For a portfolio on CML with risk σ the
term µσmm−r σ is called the risk premium which is the additional return above the risk–free level
as a reward or compensation for exposure to risk.
If each investor has the same view on the values of the model parameters (the expected re-
turns on the basic assets and the entries of the covariance matrix) and if each investor chooses
an optimal portfolio according to convex indifference curves on the basis of risk-return anal-
ysis, then all these optimal portfolios are placed on the CML. Consequently, they should all
invest in just one risky portfolio, namely the market portfolio, (combining it with the risk free
asset in a preferred individual way). Consequently, the market portfolio weights should rep-
resent the relative volumes of the values of particular shares with respect to the whole market
(similar argument like in chapter two where we discussed a baby market with just two ingredi-
ents). Such a portfolio is represented in reality by an index. An empirical test of the practical
relevance of the whole theory is therefore to see if the market index lies on the efficient frontier
and does it determine the tangent line if combined with the risk free asset.
We now return to the discussion on the shape of the minimum variance line. From Corol-
lary 2.13 we know that the minimum variance line can be constructed using wmin and m.
By Corollary 2.13, Cov(Kwmin , Km ) = σ2wmin , which gives the following paremetrization of all

49
µ

m2
r2 m1
r1

Figure 2.9 Efficient frontier in the case of different rates for investing and borrowing risk free.

(σw , µw ) on the minimum variance line:

µw = αµwmin + (1 − α) µm , (2.27)
σ2w = α2 σ2wmin + (1 − α)2 σ2m + 2α (1 − α) σ2wmin .

The µwmin , σwmin , µm and σm are easy to compute, due to the simplicity of the expressions for
wmin and m. This makes (2.27) a handy tool for making plots of the minimum variance line
(see Excel).
We finish the section by considering a situation where we have different interest rates for
risk free borrowing and investing. This is a more realistic setting than assuming that we have
single interest rate r.
Assume that we can invest risk free at a rate r1 and borrow at a rate r2 . It is natural to
consider r1 < r2 . Any portfolio w invested in the risky securities can be combined with a risk
free investment at the rate r1 . This gives the following portfolios on the (σ, µ)-plane:

µα = αr1 + (1 − α) µw ,
for α ≥ 0.
σα = |1 − α| σw ,

Note that we can not take α < 0, since this implies a short position in r1 , which would mean
borrowing at r1 .
We can also combine any portfolio w with borrowing at r2 , giving

µα = αr2 + (1 − α) µw ,
for α ≤ 0.
σα = (1 − α)σw ,

We can not take α > 0, since this would mean investing at r2 , which is not allowed. We can
only borrow at this rate.
To find the efficient frontier we first establish two tangency portfolios m1 and m2 , for the
half lines starting from (0, r1 ) and (0, r2 ), respectively. The m1 and m2 can be computed using
(2.24) taking r1 and r2 instead of r, respectively. The frontier is depicted in Figure 2.9 and
consists of the interval between (0, r1 ) to (σm1 , µm1 ), the fragment of the minimum variance
line between (σm1 , µm1 ) and (σm2 , µm2 ), together with the half line starting from (σm2 , µm2 ).

50
2.7 CAPM
Paradoxically, in the world where decisions are based on risk expressed as variance, if we
look at an asset to asses its risk, its variance turns out not so relevant as the covariance with
all the other assets. This claim will be illustrated by an example, supported by economic
considerations, and finally proved.
We begin with a brief discussion of market equilibrium. The goal of an investment is
to earn a suitable return. The question is about the understanding the meaning of the word
‘suitable’. We take a simplistic view that the level of this return, called the required return,
depends on the risk concerned with a particular asset. If risk is high, the required rate of return
is also high. We leave for a moment the question of how to relate these notions in a quantitative
way, an answer will be given later in this chapter. The required return is compared with the
expected return (we assume that investors have already estimated it). If the required return
is lower than the expected, the investor decides to buy the asset. As a result of the emerging
demand the price grows, which pushes the expected return down. If on the other hand the
required return is higher than expected, the prices should drop thus leading to an increase of
the expected return (as the very definition of return shows). In equilibrium the expected return
should be equal to the required return.
Consider now a conjecture that the required return is related to variance as a measure of
risk. Consider two assets with the same expected return but different variances and assume that
the market is in equilibrium. We shall demonstrate that a contradiction emerges which leads to
rejecting this conjecture. For, note the asset with higher variance should have higher required
return and in equilibrium the expected return should be also higher, matching the required
return. Hence the situation described is impossible, namely, the required return cannot be
related to variance.
We support this observation with a simple example

51
Example 2.16

Suppose that the weights of a portfolio are of the form w j = 1n where n is the number
of the assets. We shall investigate the risk of this portfolio depending on n. Assume
that the variances of all portfolios on the market (suppose there are countably many) are
uniformly bounded, σ2j ≤ L. Then
X X X
σ2w = w j wk σ jk = w2j σ2j + w j wk σ jk
j,k
1 1 X
≤ 2 nL + 2 σ jk
n n j,k

Assume further that the off-diagonal elements of the covariance matrix are all equal,
σ jk = c > 0, say. Then
L 1
σ2w ≤ + 2 n(n − 1)c.
n n
The upper bound converges to c as n → ∞. Hence the risk of a portfolio containing many
assets is determined by the covariances since the variances of the ingredients become
irrelevant for large n.

This example motivates the following distinction between two kinds of risk: diversifiable,
or specific risk, which can be reduced to zero by expanding the portfolio, and undiversifiable,
systematic, or market risk which cannot be avoided because the securities are to some extent
linked to the market. We will return to this distinction later on in this chapter.
Let us now recall that the assumption that the market is in equilibrium combined with the
assumption that all investors have homogeneous views on all parameters (expected returns and
the entries of the covariance matrix) leads to the conclusion that they all invest in a single
portfolio, called the market portfolio (as shown in the previous chapters). As a consequence,
each investor has the same proportion of each asset, hence the weights of the market portfolio
should equal to the relative values of each assets as compared with the whole market:
market value of asset j
mj = .
market value of all assets
To illustrate this idea consider a pizza with various ingredients arranged in rings. Each slice
of this pizza contains each ingredient in equal proportion, though the sizes of slices may be
different, Figure 2.10.
If the slices were not proper, some guests would be deprived of fair share of some ingre-
dients which might create some tension. This tension on a market leads to demand on some
assets and consequently to price movements contradicting the state of equilibrium, which is
our basic assumption here.

52
olives
tomatoes
ham

Figure 2.10 The proportional components in slices.

m CML

M A

(0,r)

Figure 2.11 A contradiction when there is no tangency at the market portfolio.

The expected return on the market portfolio is one of the ingredients of a simple but effec-
tive model, which is the topic of this chapter. Capital Asset Pricing Model (CAPM in brief)
distinguishes some parameters influencing the return on an asset and describes this influence
by means of a simple formula.

2.8 Derivation of CAPM


As we know, the Capital Market Line is tangent to the efficient frontier at certain point M
corresponding to a portfolio, which we call the market portfolio and denote the weights by
m so M = (σm , µm ). Consider any other asset (it can be a portfolio) represented by a point
A in the (σ, µ)-plane, A = (σA , µA ). Consider all portfolios built by means of M and P. They
form a hyperbole which we claim to be tangent to the CML at M. Suppose for the contrary
that this hyperbole intersects the CML. This clearly contradict the fact that the slope of CML
is maximal, see Figure 2.11
We compute the slope of the tangent to the hyperbole at M and then we will use the fact that
the slope of CML is the same. Denote the weights of A and M in a portfolio on the hyperbole
by x = (x, 1 − x). The risk and return are of the form

µx = xµA + (1 − x)µm
1
σx = (x2 σ2A + (1 − x)2 σ2m + 2x(1 − x)cov(KA , Km )) 2

53
and we compute the derivatives with respect to x at x = 0 to get
∂µx
| x=0 = µA − µm ,
∂x
∂σx cov(KA , Km ) − σ2m
| x=0 = .
∂x σm
The slope of the tangent is the ratio of these derivatives and we equate it to the slope of CML:
µa − µm µm − r
=
cov(KA ,Km )−σ2m σm
σm

Solving for µA we get


cov(KA , Km )
µA = r + (µm − r)
σ2m
We introduce the famous notation:
Definition 2.17
We call
Cov(KA , Km )
βA =
σ2m
the beta factor of the given asset (portfolio).

We have thus proved the following theorem.


Theorem 2.18 CAPM
Suppose that the risk free rate r is lower that the expected return of the minimal variance
portfolio (so that the market portfolio m exists). Then the expected return µA on any
asset is given by the formula
µA = r + βA (µm − r)

The second term is called the risk premium. It represents the additional return required by
an investor who faces a risk represented by the link of the portfolio to the whole market.
The CAPM formula is concerned with expectations. Our next step is to consider the re-
turns, that is the random variables. Define the error as

ew = Kw − r − βw (Km − r)

and of course we have


E(ew ) = 0.
It is interesting to see that the principle of error minimising implies the form of the beta coef-
ficient:

54
Proposition 2.19
Suppose that E(e) = 0 where e = K − r − β(Km − r) for some number beta. The variance
of e is minimal for β = Cov(K,K
σ2
m)
.
m

Proof
We can easily compute the variance in question

Var(e) = E(e2 ) = E(K − r − β(Km − r))2


= β2 E(Km − r)2 − 2βE[(K − r)(Km − r)] + γ

where γ does not contain β. Differentiating with respect to β yields

0 = 2βE(Km − r)2 − 2E[(K − r)(Km − r)].

Now the result follows from the basic properties of variance and covariance. Namely,
Var(X) = Var(X + a) for any random variable, hence E(Km − r)2 = E(Km − µm + (µm −
r))2 = σ2m . Second, the covariance is linear with respect to each argument and hence
Cov(X, Y) = Cov(X + a, Y + b) because Cov(a, Y) = Cov(X, b) = Cov(a, b) = 0 for
any random variables X, Y and constants a, b. This clearly implies E[(K − r)(Km − r)] =
Cov(K, Km ) by adding suitable constants so that each r becomes the expected value of
the corresponding random variable.

We rewrite the definition of the error

Kw = r + βw (Km − r) + ew (2.28)

to use this formula to compute the variance of the portfolio with weights w. First we find the
covariance between ew and Km

Cov(Km , ew ) = Cov(Km , Kw − r − βw (Km − r))


= Cov(Km , Kw ) − βw Cov(Km , Km ) = 0

by the definition of beta. Next

Var(Kw ) = β2w Var(Km − r) + Var(ew )


so, using the translation invariance of the variance,

σ2w = β2w σ2m + Var(ew ).

This formula sheds better light on the above distinction between two kinds of risk. The first
term represents the systematic risk that cannot be avoided by adding more securities to the

55
portfolio and it is measured by the beta coefficient. The second term is the diversifiable part of
the risk. If w = m, then we have Var(em ) = 0 so this term can be discarded if we invest in the
market portfolio or in a portfolio sufficiently large to serve in practice as its substitute.
CAPM shows the link between βw and the expected return µw and, consequently, the prices
of securities. An increase of the systematic risk increases the required return and pushes down
the prices. Diversifiable risk attracts no premium having no effect on µw since it can be elim-
inated by spreading an investment in a suitable portfolio, in particular, choosing the market
portfolio which is well approximated by some stock exchange indices.
The above relation between the returns (see (2.28)) and the connection to the minimising
of the variance of the error lead to the following method of finding the beta from historical
data. Beta can be recognised as the result of applying the linear regression (see Excel). If
the realized past returns on the securities are plotted against the realized returns on the market
portfolio the line of best fit, also known as the characteristic line can be found. We can write
the equation of the line obtained in the form

y = βx + α.

If the historical data are consistent with the expectations for the future returns (these are in-
volved in the CAPM formula), then the β obtained from linear regression can be used to find
the expected return on the security by means of CAPM.

2.9 Security Market Line


Drawing on the results established before, in Chapter 4, we can give an alternative proof of the
CAPM formula. Consider an arbitrary portfolio with weights w. The vector of weights in the
market portfolio will be denoted by m and as we know it satisfies

γCm = µ − r1
for some number γ > 0. The beta of the portfolio with weights w can, therefore, be written as

Cov(Kw , Km ) wTCm wT γ1 (µ − r1) µw − r


βw = = T = = .
σ2m m Cm mT γ1 (µ − r1) µm − r

Solving this for µw we obtain the CAPM relation again:

µw = r + βw (µm − r).
The expected return is an affine function of the beta coefficient. The graph of this function
in the (β, µ)-plane is called the security market line. This straight line is shown in Figure 2.12
where the CML is also plotted for comparison.

56
m m

mm

r
mw
s b
0 sm bw 0 bm =1

Figure 2.12 The capital market line and the security market line.

Going back to the discussion of the market equilibrium let us see the theoretical conse-
quences of possible departure from this state. In the state of equilibrium everyone is holding a
fraction of the market portfolio, the prices determine the expected returns which exactly match
the required returns given by the right hand sides of CAPM applied for each security. Any new
information about a particular security may affect the expected return and the CAPM may no
longer hold. Suppose, for example, that

µA > r + βA (µm − r)

for some security. In this case investors place buy orders and as a result of the demand created
the price goes up, which pushes the expected return down. On the other hand, if

µA < r + βA (µm − r)

investors want to sell or even short-sell the security, the price falls because of the excess supply,
and the expected return increases. In both cases the we should observe some adjustments
restoring the CAPM formula and the equilibrium.
Apart form illustrating the market equilibrium, CAPM has applications in analysing the
performance of various investments. The right hand side of CAPM gives the target return and
this is compared with realized return. The difference: realized return minus the target return,
is called the Jensen index. The goal is to achieve positive value of this index, the higher the
better.
Another approach to evaluation of performance comes from comparing the market prices
of risk, which by definition is the excess return per risk:
µw − r
MPRw = .
σw
The Sharpe index or Sharpe Ratio is obtained if the expected returns are replaced by the real-
ized returns and the standard deviation by the sample standard deviation. The benchmark is
the market price of risk for the market portfolio, in other words the slope of the CLM:
µm − r
MPRm = .
σm

57
The goal of an investor is to maximize the Sharpe index. Apart form the evaluation of the
performance, the above measures can be used to construct portfolios (on the basis of historical
data).
Remark 2.20
Capital Asset Pricing model is also called a single factor model. This single factor is
beta showing the dependence of required return on risk related to the whole market. This
theory has been generalised to a multi-factor version. The idea is similar and in addition
to the market we take some other economic quantities, like inflation, growth of the
national product, exchange rates for key foreign currencies, markets in other countries.
The formula for the expected return takes the form

µw = α + β1 F1 + β2 F2 + · · · + βm Fm

where β j are constants and F j represent the numerical values of the factors chosen.
Linear regression can be applied and the beta parameters can be estimated on the basis
of historical data. This theory is called Arbitrage Pricing Theory (APT in brief).

58
3
Utility functions
3.1 Basic notions and axioms
We begin with recalling some basic probability notation, slightly modified for our needs. The
probability space is discrete, Ω = {1, . . . , N}, the elements i ∈ Ω ale called states. The prices
of securities are denoted by S j (0), the initial prices, and S j (1, i), the prices at the end of the
period, which depend on the state. Portfolios will be described by the numbers x j of securities
held, so it is represented by a vector x = (x0 , x1 , . . . , xn ). The initial wealth of the investor is
denoted by W so the formation of a portfolio is subject to the bound
n
X
x j S j (0) ≤ W.
j=0

The final wealth is a random variable determined by the portfolio chosen, and we denote it by
Vx (1). In the state i it takes the value
n
X
Vx (1, i) = x j S j (1, i) = Sxi
j=0

where S = [S i j ] = [S j (1, i)]. This amount can be consumed by the investor so this motivates
the name feasible consumption set for the set
n
X
FCS = {X : X(i) ≥ 0, all i, X = Vx (1) where x j S j (0) ≤ W}.
j=0

We assume that the matrix S represents a monomorphism. This means in particular that the
number of rows, i.e. the cardinality N of Ω, is bigger that the number n of columns i.e. the
number of assets. This also means that the rank of the matrix is maximal, i.e. n.

59
Remark 3.1
The following version of this setup if often considered in the literature. An investor
(often called an agent) receives some known endowment e(0) at the beginning and a
random one e(1), at the end of the period, consumes c(0) at the beginning (a known
quantity) and.c(1, i) in state i (a random variable). Then the initial constraint is of the
form n
X
x j S j (0) ≤ e(0) − c(0)
j=0

meaning that the money available for investment is what is left after initial consumption.
The end of period consumption is the main object now and the feasible consumption
plans are considered
n
X n
X
FCP = {c(1) : c(1) ≥ 0, c(1) ≤ e(1) + x j S j (1), x j S j (0) ≤ e(0) − c(0)}.
j=0 j=0

Our setting is a particular case with W = e(0) − c(0), e(1) = 0, and assumption that the
whole final value of the portfolio is consumed.

We assume that the investor can decide between any two possible final consumptions. So
we assume that a binary relation on FCS is given: for X, Y ∈ FS C we write X  Y to mean
that Y is preferred to X, and X ∼ Y if the investor is indifferent between X and Y. We also
write X ≺ Y if X  Y and X / Y. The following axioms are formulated to describe the rational
behaviour of investors.

Axiom 1 (transitivity) If X  Y and Y  Z then X  Z.

This axiom is sometimes called the consistency axiom since it excludes irrational prefer-
ences.

Axiom 2 (completeness) For all X, Y either X  Y or Y  X.

Here we assume that each individual can always arrive at a decision.


If Axioms 1 and 2 are satisfied, we call  a preference relation. In practice this relation
may be difficult to specify. An alternative approach is based on employing so-called utility.
Consider a real functions U : RN → R representing investor’s preference, who wishes to solve
the problem

max{U(X) : X ∈ FCS }.
Usually, we assume that

(i). U is strictly increasing with respect to each variable,

60
(ii). U is differentiable,

(iii). U is strictly concave.

Such a function determines the preference relation:

XY if and only if U(X) ≤ U(Y).

The question is if any preference can be represented by a utility. The answer is positive for
finite case.
The situation is different in general case where this representation may be impossible (un-
less some additional technical assumptions are made).
A particular case of utility is the expected utility determined by means of some u : R → R
called the utility function, and the formula

U(X) = E(u(X)).
If such a u exists we say that U is a von Neuman-Morgenstern utility. The crucial feature
of this representation of utility is that it is done by means of a single variable function. If X
is non-random (corresponding to a portfolio involving risk-free assets only) then X(i) = c for
some constant c and then U(c) = u(c). The following assumptions are usually imposed on
utility functions (analogous to those introduced for utilities):

(i). u is strictly increasing,

(ii). u is differentiable,

(iii). u is strictly concave and as a result the first derivative u0 , called the marginal utility, is
decreasing.

Typical examples of utility functions are as follows:

(i). Exponential: u(x) = −e−ax ;

(ii). Logarithmic: u(x) = ln x;

(iii). Power: u(x) = axa for a ≤ 1;

(iv). Quadratic: u(x) = x − 12 bx2 (which is increasing only for x < b1 ).

61
The computation of the expected utility involves only the probability distribution PX of X,
not the random variable itself since
Z
E(u(X)) = u(x)dPX (x).

For this reason the preference relation concerned with the expected utility is often discussed
on the set P of all probability measures on R Of course the relation given on random variables
induces the relation on distributions: if X  Y then we say that PX  PY . The preference
relation on P is assumed to satisfy the following Axioms.

Axiom 1 (transitivity) If P1  P2 , P2  P3 then P1  P3 .

Axiom 2 (completeness) For all P1 , P2 ∈ P either P1  P2 or P2  P1

Axiom 3 (independence) For all P1 , P2 , P3 ∈ P and a ∈ (0, 1], if P1 ≺ P2 then aP1 + (1 −


a)P3 ≺ aP2 + (1 − a)P3 .

In other words, the choice between P1 and P2 is not affected by the appearance of some
other opportunity. However, this axiom is not consistent with empirical facts. We present the
so-called Alais paradox. Given the choice between P1 = δ1M (1 million for sure, δ is the Dirac
delta measure δa (A) = 1 if and only if a ∈ A, so δa ({a}) = 1) and P2 = 0.1 × δ5M + 0.89 × δ1M +
0.01 × δ0 most people choose P1 . They also prefer P3 = 0.1 × δ5M + 0.9 × δ0 to P4 = 0.11 × δ1M .
This, however, contradicts the independence axiom.
Axiom 4 (Archimedean) For all P1 ≺ P2 ≺ P3 there exists a ∈ (0, 1) such that aP1 + (1 −
a)P3 ≺ P2 and there exists b ∈ (0, 1) such that P2 ≺ bP1 + (1 − b)P2 .

In other words, a ‘good’ plan P3 is never ultimately good since combined with bad P1
becomes worse than P2 . Similarly, there is no ultimately ‘bad’ plan. In financial terms this
axiom is acceptable though with some limitations: if ‘bad’ means ultimate bankruptcy, no
sweetener (even high probability of huge wealth) may be acceptable to some individuals. The
name ‘Archimedean’ comes from a well-known property of ordered fields: for all positive x, y
there is an integer n so that nx > y (despite x being possibly much smaller than y).
One can prove that Axioms 1-4 are equivalent to the existence of von Neuman-Morgenstern
utility function in the case of measures supported on a finite set. Some additional conditions
allow an extension of this result to a general case. We shall not dwell on that.

3.2 Utility maximisation


We say that a portfolio y = (y1 , . . . , yn ) is an arbitrage opportunity if Vy (0) ≤ 0 and Vy (1) ≥ 0
with Vy (1, i) > 0 for at least one i ∈ Ω = {1, . . . , N}. A fundamental assumption of finance
theory is that arbitrage opportunities do not exist. We shall see how this principle is related to
the existence of a solution to the problem of maximising utility.

62
Theorem 3.2
Assume that the rank of S is n. If there is a solution to the problem

max{U(X) : X ∈ FCS },

where U satisfies the above conditions, then there is no arbitrage. Conversely, if U is


continuous and there is no arbitrage, then the problem has a solution.

Proof
Suppose there is an x such that U(X) ≤ U(Vx (1)) for any feasible consumption X and
suppose that there exists arbitrage opportunity y. Take z = x + y. At the initial time
Vz (0) = Vx (0) so z is feasible. Since U is strictly increasing in each variable, U(Vz (1)) =
U(Vx (1) + Vy (1)) > U(Vx (1)) which is a contradiction.
For the converse, it is sufficient to see that the set of feasible consumptions is bounded
since it is obviously closed (being defined by weak inequalities). To this end it is suffi-
cient to show that the set A ⊂ Rn of all portfolios x such that Vx (1) ∈ FCS is bounded.
Suppose for the contrary that there is sequence xn ∈ A with kxn k unbounded. We may
assume that kxn k → ∞ taking a subsequence if necessary. The sequence zn = kxxnn k is
bounded and has a subsequence convergent to a limit z. We shall see that z is an arbi-
trage opportunity which will be a contradiction. First,
X 1 X W
Vz (0) = z j S j (0) = (xn ) j S j (0) ≤ →0
kxn k kxn k
so Vz (0) ≤ 0. Second, Vzn (1) ≥ 0 by the definition of FCS , and this inequality is
preserved in the limit. But Vz (1) , 0 since kzk = 1 (for otherwise Vz (1) = Sz = 0 would
imply z = 0).

We turn to the question of relation between the security prices at time 0 and 1 introducing
the so-called state prices πi which are positive numbers such that
N
X
S j (0) = S j (1, i)πi .
i=1

Suppose that one of the securities is risk-free, that is, S 1 (1, i) = 1 for all i, say. Then
N
X
S 1 (0) = πi
i=1

which is the price of a sure euro to be received at time 1, that is, it is the discount factor. We
can then define the interest rate (suppose that the length of the period is one year)
1
1+R= P .
πi

63
Remark 3.3
State prices are related to risk neutral probabilities. Recall that
1 1 X
S j (0) = Eq (S j (1)) = qi S j (1, i)
1+R 1+R
where qi are risk neutral (martingale) probabilities. The above relation says that the dis-
counted prices form a martingale. Their existence is guaranteed by no-arbitrage condi-
tion, uniqueness holds is complete models which in our setting means that each random
variable X representing the payoff at time 1 can be replicated, i.e. written as X = Vx (1)
for some portfolio x. This is the case if the matrix S = [S j (1, i)]i j is quadratic and in-
vertible. We can see that the existence of state prices is equivalent to the no-arbitrage
principle and
qi
πi = . (3.1)
1+R

We shall show that state prices can be found by means of utilities. An interesting fact is
that the result is to some extent independent of the particular form of utility.
Theorem 3.4
Suppose that X ∗ is a positive solution of the maximisation problem for utility U. Then
there is a number λ such that the state prices are of the form
∂U ∗
πi = λ (X ).
∂xi

Proof
For any portfolio x consider a real valued function fx (t) = U(X ∗ + tSx). Since X ∗ is
optimal, fx0 (0) = 0 hence i, j ∂U (X ∗ )S j (1, i)x j = 0. In other words ∇U(X ∗ )T S ⊥ x. Take
P
∂xi
all x such that S j (0)x j = 0. Hence the vectors (S j (0)) j and ∇U(X ∗ )T S are collinear
P
and there is a scalar α such that ∇U(X ∗ )T S = α(S j (0)). Note that α > 0 and let λ =
1
α
.Therefore
N
X ∂U ∗
λ (X )S j (1, i) = S j (0)
i=1
∂xi
hence the claim.

For a particular case of expected utility we have


∂U ∗ ∂ X
(X ) = u(X ∗ (i))pi = u0 (X ∗ (i))pi
∂xi ∂xi

64
hence the state prices are of the form
πi = λu0 (X ∗ (i))pi
Recall the definition of risk neutral probabilities and use the above
πi pi u0 (X ∗ (i)) u0 (X ∗ (i))
qi = P = P N = pi
πi 0 ∗
i=1 u (X (i))pi
E(u0 (X ∗ ))
Now we have the following expressions for the current securities prices
N
X
S j (0) = πi S j (1, i)
i=1
XN
=λ 0 ∗
u (X (i))pi S j (1, i)
i=1
= λE(u0 (X ∗ )S j (1))
= γEq (S j (1))
where γ = πi is the discount factor.
P

Example 3.5
Consider a trinomial model with two assets: risk free with S 1 (1, i) = S 1 (0)(1 + r) and
risky S 2 (1, i) = S 2 (0)(1+K2 (i)), i = 1, 2, 3. Let u(x) = − exp(−ax). Suppose that r = 5%,


 20% with probability 0.4
K2 (i) = 

10% with probability 0.3


 −10% with probability 0.3

S 1 (0) = 1, S 2 (0) = 10. For a = 0.1 maximising expected utility gives a portfo-
lio x = (1.8343, 1.6566) with X ∗ (i) = 23.75, 21.92, and 18.25, respectively. Then
u0 (X ∗ (i))pi = 0.00372, 0.00335, 0.00484 and after normalising we get πi = 0.3124,
0.2814, 0.4062, respectively. Finally, πi K2 (i) = r = 5% so the prices of the risky asset
P
form a martingale. For other values of a we get different measures but the martingale
property is preserved as shown in general above (see Excel)

3.3 Relation to mean variance analysis


Decision making based on utility functions is consistent in a special case with Markowitz
theory based on finding balance between risk (variance or standard deviation) and return (ex-
pected return). This case is concerned with a special kind of utility, no restrictions imposed on
the distributions of returns.

65
Consider quadratic utility function, u(x) = ax − 12 bx2 , which is applicable under the as-
sumption that the possible future wealth does not exceed 1b . In this case

!
1 1 1
max E[aW(1) − bW (1)] = max aE[W(1)] − bE[W(1)] − bVar(W(1)) .
2 2
2 2 2

We can see that expectation and variance are all that is needed to arrive at a decision. For a
given level of expected wealth (which is equivalent to saying ”for a given expected return”)
we need to minimise the variance of wealth. But

Var(W(1)) = W 2 (0)Var(1 + K) = W 2 (0)Var(K)

which means that to optimise we need to minimise the variance of return. This means that
the optimal (from the point of view of utility approach) portfolio lies on the frontier (efficient
frontier, if the expected return is large enough).
Our next step is to make another circle of ideas in portfolio theory by showing the relation-
ship between the utilities approach and Capital Asset Pricing Model.
First, we will relate the results of the previous section to the approach based on returns. To
this end we go back to the description of investment decisions by means of weights and returns
rather than numbers of securities and their prices. Assume that the first security is risk free:

S 1 (1) = S 1 (0)(1 + R)

and for the remaining we have random returns

S j (1) = S j (0)(1 + K j ).

The final wealth can be written in the following way:


n
X
W(1) = W(0)(w1 (1 + R) + w j (1 + K j ))
j=2
n
X n
X
= W(0)((1 − w j )(1 + R) + w j (1 + K j ))
j=2 j=2

where we eliminate the first weight since it is determined by the remaining ones. The first
order condition for the expected utility maximisation (max(E(u(W(1))) reads

Eu(W(1)) = E[u0 (W(1))(K j − R)] = 0
∂w j
No arbitrage condition excludes the cases K j ≥ R almost surely or K j ≤ R almost surely, which
is consistent with the above condition.

66
Next, we have

Cov(u0 (W(1)), (K j − r)) = E[u0 (W(1))(K j − r)] − Eu0 (W(1))E(K j − r)

so
Eu0 (W(1))E(K j − R) = −Cov(u0 (W(1)), (K j − R)) = −Cov(u0 (W(1)), K j )
This relation holds for each investor separately since they have different utility functions. Sup-
pose we have a number of investors with utilities um = am x− 21 bm x2 and optimal wealths Wm (1).
Then u0m (x) = am − bm x and

(am − bm EWm (1))E(K j − R) = −bm Cov(Wm (1), K j ).

Write
am
γm = − + EWm (1)
bm
so that
γm E(K j − R) = Cov(Wm (1), K j )
and after adding up
X X
γm E(K j − R) = Cov( Wm (1), K j ) = Cov(M, K j ) = M(0)Cov(K M , K j )
m m

where M = m Wm (1) is value of the whole market, M = M(0)(1 + K M ), K M is the return on


P
the market portfolio. We can write briefly

E(K j − R) = ηCov(K M , K j )

for each return (on each asset), with η = 1/( γm ). In particular, for the market portfolio we
P
have
E(K M − R) = ηCov(K M , K M ) = ηVar(K M ).
Eliminating η from the last two equations we get the familiar CAPM formula

E(K j ) − R = β j (E(K M ) − R).

3.4 Risk aversion


Investor is said to be risk averse if

u(E(X)) ≥ E(u(X)) for all X.

An intuitive explanation is this: both sides represent an expected utility. On the left we have
sure consumption at the level E(X), on the right we have an uncertain wealth X, both with the

67
same expected value. The inequality says that the investor always chooses the sure thing. We
say that the investor is risk neutral if

u(E(X)) = E(u(X)) for all X.

If the investor is risk averse we define the risk premium as a function γ : FCS → R such
that
u(E(X) − γ(X)) = E(u(X))
The number E(X) − γ(X) is called the certainty equivalent of X.
We shall find an approximate formula for γ. Assume that X takes only finitely many values
x1 , . . . , xn . Take the Taylor expansion at xi of u around m = E(X) to get

u(xi ) ≈ u(m) + (xi − m)u0 (m) + (xi − m)2 u00 (m).

Multiplying by pi = prob(X = xi ) and summing we get

E(u(X)) ≈ u(m) + E(X − m)u0 (m) + E(x − m)2 u00 (m) = u(m) + Var(X)u00 (m). (3.2)

Taking Taylor expansion at m − γ(X) around m gives

u(m − γ(X)) ≈ u(m) − γ(X)u0 (m)

so (by the definition of the risk premium)

E(u(X)) ≈ u(m) − γ(X)u0 (m). (3.3)

Comparing the right hand sides of (3.2) and (3.3) we get

u(m) + Var(X)u0 (m) ≈ u(m) − γ(X)u00 (m)

which yields
u00 (E(X))
γ(X) ≈ − Var(X)
u0 (E(X))
The number
u00 (E(X))
ARA = −
u0 (E(X))
is called the absolute risk aversion coefficient (introduced by Arrow and Pratt).
For better understanding of the issue we approach the problem from a slightly different
angle. The above analysis was concerned with the level of wealth, below we discuss the
returns. Suppose that the asset number 0 is risk free with return r (risk free rate). Denote by
Wi the amount of money invested in asset i, that is, Wi = xi S i (0). If the whole initial wealth is

68
Wi = W. Denote by Ki the random return on security i. The final wealth
Pn
invested, we have i=0
is then
n
X
X = W0 (1 + r) + Wi (1 + Ki )
i=1
n
X n
X
= W(1 + r) − Wi (1 + r) + Wi (1 + Ki )
i=1 i=1
n
X
= W(1 + r) + Wi (Ki − r).
i=1

The investor wishes to solve


n
X
max E(u(W(1 + r) + Wi (Ki − r))).
i=1

If the solution exists (we will discuss the existence problem in the next chapter), then the first
order conditions are satisfied: for all i
E(u0 (X)(Ki − r)) = 0.
If the investor puts all the money in the risk free asset, Wi = 0, all i > 0, and does not wish to
invest in any risky security, the slope of the expected utility in each direction Wi , i > 0, must
be negative:
E(u0 (W(1 + r))(Ki − r)) ≤ 0
which implies E(Ki − r) ≤ 0 since u0 is positive and u0 (W(1 + r)) is not random. The necessary
insentove to invest some of the money in a risky security is therefore E(Ki − r) ≥ 0 for some
i. Suppose now that there is only one risky asset with return K1 = K, and consider a condition
under which all the money is invested in it, W0 = 1, W1 = W:
E(u0 (W(1 + K))(K − r)) ≥ 0.
Take first order Taylor’s expansion of u0 at W(1 + K) around W(1 + r) :
u0 (W(1 + K)) ≈ u0 (W(1 + r)) + [W(1 + K) − W(1 + r)]u00 (W(1 + r))
= u0 (W(1 + r)) + u00 (W(1 + r))W(K − r)
Multiply by K − r and take the expectation to get
E(u0 (W(1 + K))(K − r)) ≈ u0 (W(1 + r))E(K − r) + u00 (W(1 + r))WE(K − r)2
and this should equal to 0 (the first order condition). The risk premium in terms of the return,
that is the expected excess return above risk free rate E(K − r), is therefore given by
u00 (W(1 + r))
E(K − r) ≈ − WVar(K)
u0 (W(1 + r))
and we can recognise on the right the risk aversion coefficient evaluated at W(1 + r).

69
3.5 Utility functions and indifference curves
Suppose that the final wealth X of a portfolio has normal distribution. Then the expected
utility depends on two parameters: the mean and standard deviation which we denote by m
and s, respectively. The random return K on this portfolio has normal distribution as well
since X = W + W × K. The mean and standard deviation of the return are denoted by µ and σ
as before.
We now assign a real number to each pair (µ, σ) in the following way. Denote the normal
density of the wealth by fW (x, m, s) and put
Z
U(m, s) = u(x) fW (x)dx.

Since m and s are related to µ and σ, this formula automatically defines a function on the (µ, σ)
plane. The locus of all points with the same value of U is the indifference curve (discussed in
Chapter 3).
We claim that concavity of the utility function (which is related to risk aversion) implies
convexity of the indifference curves in (σ, µ) plane. We make a simplifying assumtion that u is
differentiable and then u0 (x) > 0 since u is increasing, and u0 is decreasing sonce u is concave.
Next we note that the implicit function m(s) given by the condition U(m, s) = const = c is
increasing.
Proposition 3.6
dm(s)
The derivative ds
is positive.

Proof
Let Y = X−m
s
, which has standard normal distribution. Consequently X = sY + m and
Z +∞
U(m, s) = E(u(X)) = u(sy + m) f (y)dy (3.4)
−∞

where f is the standard normal density (that is, f (x) = fW (x, 0, 1)). The implicit function
theorem says that
∂U(m,s)
dm(s) ∂s
|U=c = − ∂U(m,s) .
ds
∂m

70
We can compute the partial derivatives
Z +∞
∂U(m, s)
= yu0 (sy + m) f (y)dy
∂s
Z−∞+∞
∂U(m, s)
= u0 (sy + m) f (y)dy
∂m −∞
R +∞
where clearly the latter is positive. It is sufficient to show that −∞ yu0 (sy+m) f (y)dy < 0
and this would be implied by
Z 0 Z +∞
− yu (sy + m) f (y)dy >
0
yu0 (sy + m) f (y)dy
−∞ 0

which is clear as u0 is decreasing and f is symmetric:


Z 0 Z +∞
− yu (sy + m) f (y)dy =
0
yu0 (−sy + m) f (−y)dy
−∞
Z0 +∞
= yu0 (−sy + m) f (y)dy
Z0 +∞
> yu0 (sy + m) f (y)dy.
0

To see convexity of indifference curves take two points on it, U(m1 , s1 ) = U(m2 , s2 ) = c and
consider the point ( m1 +m
2
2 s1 +s2
, 2 ). For convexity if is sufficient to show that U( m1 +m
2
2 s1 +s2
, 2 ) > c.
To this end we use (3.4), then do simple arythmetics, then use concavity of u, and finally (3.4)
again

+∞
m1 + m2 s1 + s2 s1 + s2 m1 + m2
Z
U( , )= u( y+ ) f (y)dy
2 2 −∞ 2 2
Z +∞
m1 + s1 y m2 + ys2
= u( + ) f (y)dy
−∞ 2 2
Z +∞
1 1
> [ u(m1 + s1 y) + u(m2 + s2 y)] f (y)dy
−∞ 2 2
1 1
= U(m1 , s1 ) + U(m2 , s2 ) = c.
2 2

71
4
Value at Risk
Until now we have focused our attention on variance, or more precisely, standard deviation,
as a tool for measuring risk. The standard deviation σK of the return K on a risky investment
measures the spread of the random values of K from their mean µK . In portfolio selection we
seek to minimize σK while maximizing µK . However, an investor seeking to measure the risk
inherent in an asset he holds is naturally more concerned to place a bound on his potential
losses, while remaining relaxed about possible high levels of profit! Thus one looks for risk
measures which focus on the downside risk, that is, measures concerned with the lower tail
of the distribution of K. Variance and standard deviation are symmetric, so they are not good
candidates in this search.
In looking for quantitative measures of the overall risk in a portfolio, we seek a statistic
which can be applied universally, enabling us to compare the risks of different types of risky
portfolio, irrespective of whether these are based on equities, currencies or commodities. Ide-
ally, we look for a number (or set of numbers) that expresses the potential loss with a given
level of confidence, enabling the risk manager to adjudge the risk as acceptable or not.
In the wake of spectacular financial collapses in the early 1990s at Barings Bank and Or-
ange County,Value at Risk (henceforth abbreviated as VaR) became a standard benchmark for
measuring financial risk. It has the advantage of relative simplicity and ease of use when suffi-
cient data are available. Its principal drawback is that it does not provide full protection against
extreme (i.e. highly unlikely) events. In this chapter we explore this popular risk measure.

4.1 Quantiles
An investor holding an asset whose future value is uncertain may wish to determine whether
his final position, X, in the asset has at least 95% probability of remaining above a certain
(usually negative) level. Value at Risk at 5% answers this question by specifying the worst
of the best 95% of possible outcomes. Its calculation is therefore closely tied to the values of
the distribution function F X of X. This leads us to examine the so-called quantiles of F X more
closely.
We start with a simple example.

72
Example 4.1
Consider a two step binomial model with stock prices

121
%
110 −→ 99
% %
100 −→ 90 −→ 81

Assume that the probability p of the price going up in a single step is p = 0.8. In this
example we neglect the time value of money and compute the gain at time three of
buying a single share of stock as

X = S (3) − S (0),

21 with probability p2 = 0.64





X= −1 with probability 2p(1 − p) = 0.32,


 −19 with probability (1 − p)2 = 0.04.

We can see that the probability that our investment will lead to a loss of 19 is

P(−X < 19) = P(X > −19) = 0.96.

This means that with with probability 96% we can believe that we will lose no more
than 1. If we agree, for instance, to ignore the worst 5% of potential outcomes, our
‘worst-case scenario’ would be to expect a loss of 1. However, if we are only willing to
exclude (say) the worst 2.5%, the loss of 19 should be taken into account.

An outcome at a given probability can be expressed using quantiles. We recall the defini-
tion and some simple properties.
Let (Ω, F, P) be a probability space and let X : Ω → R be a random variable. The cumu-
lative distribution function F X : R → [0, 1], defined by F X (x) = P(X ≤ x) is right-continuous
and non-decreasing (see [PF] for details).

73
Figure 4.1 The upper and lower quantiles for various distribution functions.

Definition 4.2
For α ∈ (0, 1) the number

qα (X) = inf{x : α < F X (x)}, (4.1)

is called the upper α-quantile of X. The number

qα (X) = inf{x : α ≤ F X (x)}, (4.2)

is called the lower α-quantile of X. Any

q ∈ [qα (X), qα (X)],

is called an α-quantile of X.

The definition is best understood when looking at the graph of the cumulative distribution
function. In Figure 4.1 we can see that the upper and the lower quantiles differ when the plot
of F X (x) becomes flat at the value F X (x) = α, otherwise they are equal.

74
-19 -1 -21

Figure 4.2 The plot of the distribution function for X from Example 4.1.

Example 4.3
For X from Example 4.1 we can compute the upper and the lower α-quantiles, for α ∈
{0.025, 0.04, 0.05}, as (see Figure 4.2)

q0.025 (X) = −19, q0.025 (X) = −19,


q0.04 (X) = −1, q0.04 (X) = −19,
q0.05 (X) = −1, q0.05 (X) = −1.

We list some basic properties of quantiles. The proofs are all elementary, but we defer the
more technical parts to the end of the chapter to avoid disturbing the flow of development.
Proposition 4.4
Let X, Y be random variables.

(i). X ≥ Y implies qα (X) ≥ qα (Y).

(ii). For any b ∈ R, qα (X + b) = qα (X) + b.

(iii). For b > 0, qα (bX) = bqα (X).

(iv). qα (−X) = −q1−α (X).

Proof
If X ≥ Y then
F X (x) = P(X ≤ x) ≤ P(Y ≤ x) = FY (x),
hence α < F X (x) implies that α < FY (x). This means that

{x : α < F X (x)} ⊂ {x : α < FY (x)}

75
which gives

qα (X) = inf{x : α < F X (x)} ≥ inf{x : α < FY (x)} = qα (Y).

The second property follows since with Y = X + b we have

FY (x + b) = P(X + b ≤ x + b) = F X (x),

so that

qα (X + b) = inf{x + b : α < FY (x + b)}


= inf{x : α < FY (x + b)} + b
= inf{x : α < F X (x)} + b
= qα (X) + b.

Since P(bX ≤ x) = P(X ≤ x/b) we see similarly that

FbX (x) = F X (x/b),

hence for b > 0

qα (bX) = inf{x : α < FbX (x)}


= inf{x : α < F X (x/b)}
= inf {by : α < F X (y)}
= b inf{y : α < F X (y)}
= bqα (X).

To prove (iv) we first need to show that for any b ∈ R

inf{x : b ≤ P (X ≤ x)} = inf{x : b ≤ P (X < x)}. (4.3)

Since P (X < x) ≤ P (X ≤ x) , if b ≤ P (X < x) then b ≤ P (X ≤ x) , which means that

{x : b ≤ P (X < x)} ⊂ {x : b ≤ P (X ≤ x)},

hence
inf{x : b ≤ P (X < x)} ≥ inf{x : b ≤ P (X ≤ x)}.
Suppose that

inf{x : b ≤ P (X ≤ x)} < x∗ < inf{x : b ≤ P (X < x)}, (4.4)

76
for some x∗ ∈ R. Then P (X < x∗ ) < b, and since x → P (X < x) is left-continuous, we
can find an x∗∗ ∈ R, x∗ > x∗∗ , for which

P (X < x∗∗ ) < b.

This would mean that


P (X ≤ x∗ ) ≤ P (X < x∗∗ ) < b.
The fact that P (X ≤ x∗ ) < b contradicts inf{x : b ≤ P (X ≤ x)} < x∗ , which means that
(4.4) cannot hold.
To prove (iv) we shall also use the fact that

F−X (x) = P (−X ≤ x) = P (X ≥ −x) = 1 − P (X < −x) . (4.5)

We can now compute

qα (−X) = inf{x : α < F−X (x)}


= − sup{−x : α < F−X (x)}
= − sup{−x : α < 1 − P (X < −x)}
(using (4.5))
= − sup{y : α < 1 − P (X < y)}
(taking y = −x)
= − sup{y : P (X < y) < 1 − α}
= − inf{y : 1 − α ≤ P (X < y)}
(since y → P (X < y) is non-decreasing)
= − inf{y : 1 − α ≤ P (X ≤ y)}
(using (4.3))
= − inf{y : 1 − α ≤ F X (y)}
= −q1−α (X).

Lemma 4.5
If F X (x) is continuous and strictly increasing then

qα (X) = F X−1 (α).

77
Proof
Since F X (x) is continuous and strictly increasing, the cumulative distribution function
F X (x) is invertible, and α < F X (x) is equivalent to F X−1 (α) < x. This gives

qα (X) = inf{x : α < F X (x)} = inf{x : F X−1 (α) < x} = F X−1 (α).

Lemma 4.6
Let X be a random variable. If f : R → R is right-continuous and non-decreasing then

qα ( f (X)) = f (qα (X)).

Proof
Since

F f (X) ( f (qα (X))) = P( f (X) ≤ f (qα (X)))


≥ P(X ≤ qα (X))
= F X (qα (X))
≥ α,

we see that
f (qα (X)) ≥ qα ( f (X)).
If we can show that y ≥ qα ( f (X)) whenever y > f (qα (X)) , then f (qα (X)) is the largest
α-quantile for f (X)
Take any y > f (qα (X)). Since f is right-continuous and non-decreasing, the set
f −1 (−∞, y) is an open interval of the form (−∞, a), for some a ∈ R. This gives

(−∞, qα (X)] ⊂ {x : f (x) ≤ f (qα (X))} ⊂ {x : f (x) < y} = (−∞, a),

which means that there exists an x∗ for which qα (X) < x∗ < a. Since qα (X) < x∗

α < F X (x∗ ),

hence, with Y = f (X),

FY (y) = P(Y ≤ y) ≥ P(Y < y) = P(X < a) ≥ P(X ≤ x∗ ) = F X (x∗ ) > α,

which implies that y ≥ qα (Y) = qα ( f (X)).

78
Figure 4.3 −VaRα (X) is the upper α-quantile for X.

4.2 Measuring downside risk


We work in a single-step financial market model in which we invest at time t = 0 and terminate
our investment at t = T. We denote by X the proceeds from the investment at time T .
Definition 4.7
For α in (0, 1), we define the Value at Risk (VaR) of X, at confidence level 1 − α, as
(see Figure 4.3)
VaRα (X) = −qα (X) = − inf{x : α < F X (x)}.

To gain some intuition, let us consider the following example.


Example 4.8
Let X be as in Example 4.1. By looking at the distribution function F X (x) (see Figure
4.2) we can see that

VaR0.04 (X) = 1,
VaR0.025 (X) = 19,

which agrees with our intuition of worst possible loss at probability 0.95 and 0.975,
respectively.

Let us observe that since X denotes the gain from an investment, the −X is the loss. We
can express VaR in terms of the loss as follows.

VaRα (X) = −qα (X)


= q1−α (−X) (by (iv) from Proposition 4.4)
= inf{x : 1 − α ≤ P(−X ≤ x)}
= inf{x : P(x < −X) ≤ α}
= inf{x : P(X + x < 0) ≤ α}.

79
In loose terms, this means that the probability of the loss exceeding VaRα is no greater than
α. In other words, VaRα is the worst possible loss at the confidence level 1 − α. Its simple
algebraic properties follow from those we proved for the upper quantile:
Proposition 4.9
Let X, Y be random variables.

(i). X ≥ Y implies VaRα (X) ≤ VaRα (Y),

(ii). For any a ∈ R, VaRα (X + a) = VaRα (X) − a,

(iii). For any a > 0, VaRα (aX) = aVaRα (X).

Proof
The proof of all above properties follows directly from the definition of VaRα (X) and
from the respective properties of quantiles proved in Proposition 4.4.

4.3 Examples of computing VaR


To familiarize ourselves with the definition of VaR let us consider a few simple examples.
We shall assume that at time zero we invest V(0) to receive V(T ) at time T . We use Ṽ(t) to
denote the discounted value
Ṽ(t) = e−rt V(t),
where r is the risk-free rate for continuous compounding. We use G(T ) to denote the gain from
an investment a time T
G(T ) = V(T ) − V(0),
and G̃(T ) to denote the discounted gain

G̃(T ) = Ṽ(T ) − V(0).

For investments starting at time zero and terminating at time T we shall be interested in
computing the VaR for
X = G̃(T ).

80
Example 4.10

Suppose that we invest V(0) risk-free. Then V(T ) = erT V(0) giving

X = G̃(T ) = e−rT V(T ) − V(0) = 0.

The distribution function of X is then


(
1 for x ≥ 0
F X (x) =
0 for x < 0.

For any α ∈ (0, 1), qα (X) = 0, which gives

VaRα (X) = −qα (X) = 0.

Example 4.11
Consider (
−20 with probability 0.025
X= (4.6)
−10 with probability 0.025
and P(X > 0) = 0.95. For x < 0


 0 x ∈ (−∞, −20)
F X (x) = 

0.025 x ∈ [−20, −10)


 0.05 x ∈ [−10, 0).

Taking α = 0.05 we have

VaR0.05 (X) = −q0.05 (X) = 10,

For any α < 0.05,


VaRα (X) = −qα (X) = 20,
which demonstrates that VaR can be sensitive to the choice of α.
Let us now change the −20 from (4.6) to −2000. The VaR0.05 still remains equal to 10!
This illustrates that VaR does not take into consideration unlikely events, no matter the
severity of their outcome. This is an undesirable feature is a risk measure.

81
Example 4.12
Consider two independent investments X1 , X2 with gains
(
0 with probability p
Xi =
1 with probability 1 − p

for i = 1, 2. We can think of these as corporate bonds with the same price and maturity
date, of two independent companies that each have a probability of default equal to p.
If p < α then
VaRα (X1 ) = VaRα (X2 ) = 0.
If we diversify our investment equally between the two bonds, then our gain will be
equal to
with probability p2

0
1 1


X1 + X2 = 

 1
with probability 2p(1 − p)
2 2  12


with probability (1 − p)2 .
If we choose α ∈ (p, p2 + 2p(1 − p)) then
!
1
F 21 X1 + 12 X2 = p2 + 2p(1 − p) > α
2

hence !
α 1 1 1
VaR X1 + X2 = .
2 2 2
We can see that
!
α 1 1
VaR X1 + X2 > max {VaRα (X1 ), VaRα (X2 )} ,
2 2

which means that the risk of a diversified position, as measured by VaRα , is greater than
the risk of investing all our funds in a single bond. This runs counter to the principle that
diversification should reduce risk, and therefore illustrates a second serious drawback
in using VaR to measure risk. In the next chapter we will consider risk measures that
avoid these defects - for the present we present some further computations with VaR.

From examples explored so far we see that finding VaR in the case of discrete distributions
is an easy task. This is summarized in the following lemma.

82
Lemma 4.13
Assume that X is a discrete random variable with P(X = xi ) = pi , pi = 1, and
PN
i=1
x1 < x2 < . . . < xN . Then
VaRα (X) = −xkα ,
pi ≤ α.
P α −1
where kα ∈ N is the largest number such that ki=1

Proof
Since X has discrete distribution and x1 < x2 < . . . < xN we can see that
k
X
P(X ≤ xk ) = pi . (4.7)
i=1

We shall also use the fact that


k
X k−1
X
min{k : α < pi } = max{k : pi ≤ α}. (4.8)
i=1 i=1

This gives

qα (X) = inf{x : α < P(X ≤ x)}


= min{xk : α < P(X ≤ xk )} (since X ∈ {x1 , . . . , xN })
Xk
= min{xk : α < pi } (by (4.7))
i=1
k−1
X
= max{xk : pi ≤ α} (by (4.8))
i=1
= xkα (by definition of kα ).

We now turn to random variables with continuous distributions.

83
Example 4.14
Suppose that today’s price of stock is equal to S (0). Assume also that the price of stock
at time T is equal to S (T ) = S (0)em+σZ , with Z having a standard normal distribution
N(0, 1). We shall compute VaRα (X) for

X = e−rT S (T ) − S (0).

By Lemma 4.5, qα (Z) = N −1 (α), where N is the standard normal distribution function.
Observing that
X = f (Z),
where
f (ζ) = e−rT S (0)em+σζ − S (0)
is an increasing function,

VaRα (X) = −qα ( f (Z))


= − f (qα (Z)) (by Lemma 4.6)
= −1
− f (N (α)) (by Lemma 4.5)
S (0) 1 − em−rT +σN (α) .
−1
 
= (4.9)

In Example 4.14 we have exploited the fact that X was a non-decreasing function of a
random variable with standard normal distribution, for which quantiles are easy to compute.
This idea can be formulated in more general terms as follows.
Lemma 4.15
Let f : R → R be a non-decreasing right-continuous function. Then

VaRα ( f (X)) = − f (qα (X)).

Proof
By Lemma 4.6
VaRα ( f (X)) = −qα ( f (X)) = − f (qα (X)).

4.4 VaR in the Black–Scholes model


In the Black–Scholes model we have a single stock and a risk-free asset. The time zero price
of the stock is S (0) > 0. The stock price at time T is given by

2
 √
µ− σ2 T +σ T Z
S (T ) = S (0)e , (4.10)

84
where µ and σ are positive real parameters, and Z is a random variable with standard normal
distribution N(0, 1). The parameter µ represents the drift and the parameter σ represents the
volatility of stock. The risk free rate is constant and equal to r > 0, with continuous com-
pounding, meaning that the time T price of the risk-free asset is

A(T ) = A(0)erT . (4.11)

A put option with strike price K which expires at time T has a payoff

(K − S (T ))+ = max(K − S (T ), 0),

and costs
P(r, T, K, S (0), σ) = Ke−rT N(−d− ) − S (0)N(−d+ ), (4.12)
where    
ln S K(0) + r + 12 σ2 T ln S K(0) + r − 21 σ2 T
d+ = √ , d− = √ , (4.13)
σ T σ T
and N is the standard normal cumulative distribution function. For more details on the Black-
Scholes model see [BSM].
Let H(t) denote the value of a put option at time t ∈ {0, T }

H(0) = P(r, T, K, S (0), σ),


H(T ) = (K − S (T ))+ . (4.14)

We start with a simple Lemma.


Lemma 4.16
For S (T ) and H(T ) given by (4.10) and (4.14), respectively,

2
 √
α µ− σ2 T +σ T N −1 (α)
q (S (T )) = S (0)e , (4.15)
α α +
q (−H(T )) = − (K − q (S (T ))) . (4.16)

Proof
σ2

By Lemma 4.5, qα (Z) = N −1 (α). Since z 7−→ S (0)e(µ− 2 )T +σ T z is an increasing func-
tion, (4.15) follows from Lemma 4.6.
Similarly, since ζ 7−→ −(K − ζ)+ is a non-decreasing function, (4.16) follows also from
Lemma 4.6.

85
Assume that we buy a single share of stock. The discounted gain from this investment is

G̃stock (T ) = S̃ (T ) − S (0) = e−rT S (T ) − S (0).

By Lemma 4.15 we can see that

VaRα (G̃stock (T )) = S (0) − e−rT qα (S (T )). (4.17)

We now consider an investment where at time zero we buy x shares of stock and y units of
the risk-free asset. For t ∈ {0, T }, we use V(x,y) (t) to denote the value of the portfolio at time t

V(x,y) (t) = xS (t) + yA(t),

we use Ṽ(x,y) to denote the discounted value of the portfolio

Ṽ(x,y) (t) = e−rt V(x,y) (t),

and G̃stock,rf
(x,y) (T ) to denote the discounted gain

(x,y) (T ) = Ṽ(x,y) (T ) − Ṽ(x,y) (0).


G̃stock,rf

Lemma 4.17
If x ≥ 0 then

VaRα G̃stock,rf = V(x,y) (0) − xe−rT qα (S (T )) − yA(0).


 
(x,y) (T ) (4.18)

Proof
Since x ≥ 0, the gain G̃stock,rf
(x,y) (T ) can be expressed as a non-decreasing function of S (T )

(x,y) (T ) = f (S (T )),
G̃stock,rf
with
f (ζ) = e−rT (xξ + yA(T )) − V(x,y) (0)
= e−rT xξ + yA(0) − V(x,y) (0).

86
hence (4.18) follows from Lemma 4.15.

Choosing any x ∈ (0, 1) and y = (1−x)S (0)


A(0)
we can see that

V(x,y) (0) = xS (0) + yA(0) = S (0),

and

VaRα G̃stock,rf
 
(x,y) (T )
= V(x,y) (0) − xe−rT qα (S (T )) − yA(0) (from (4.18))
−rT α
= xS (0) − xe q (S (T )) (since V(x,y) (0) = xS (0) + yA(0))
= xVaRα (G̃stock (T )) (from (4.17))
α
< stock
VaR (G̃ (T )).

This means that diversifying an investment between the stock and the risk-free asset reduces
VaR (which is hardly a surprise!).
Another natural idea to reduce VaR is to buy European put options. By doing so one
can hedge against undesirable scenarios, while leaving oneself open to the positive outcomes.
Assume that at time zero we buy x number of stocks and z number of put options with a strike
price K. The value of such an investment is

V(x,z) (t) = xS (t) + zH(t),

and the discounted gain is

G̃stock,put
(x,z) (T ) = Ṽ(x,z) (T ) − Ṽ(x,z) (0)
= e−rT xS (T ) + z (K − S (T ))+ − V(x,z) (0).


Lemma 4.18
If 0 < z < x then

VaRα G̃stock,put = V(x,z) (0) − e−rT xqα (S (T )) + z (K − qα (S (T )))+ .


  
(x,z) (T ) (4.19)

87
Proof
Since 0 < z < x,we see that G̃(x,z) (T ) can be expressed as a non-decreasing function of
S (T )
G̃stock,put
(x,z) (T ) = f (S (T )),
with
f (ζ) = e−rT xζ + z (K − ζ)+ − V(x,z) (0).


By Lemma 4.15

VaRα G̃stock,put = − f (qα (S (T )))


 
(x,z) (T )
= e−rT −xqα (S (T )) − z (K − qα (S (T )))+ + V(x,z) (0),


which combined with (4.16) gives (4.19).

Example 4.19
Assume that we want to invest V0 at time zero and buy x shares of stock. In order to
have V(x,z) (0) = V0 we need to buy

(1 − x) V0
z = z(K) =
P(r, T, K, S (0), σ)
put options. Depending on the choice of the strike price K we obtain different values of

VaRα G̃stock,put xqα (S (T )) + z(K) (K − qα (S (T )))+


 
(x,z(K)) (T ) = V0 − e
−rT 

(see Figure 4.4). The choice of a high strike price makes the term (K − qα (S (T )))+ large,
but since options with a high strike price are expensive, their number z(K) is small. On
the other hand, if we choose a low strike price, then we can buy a larger number of
options z(K), but each offers weaker protection (K − qα (S (T )))+ . An optimal choice of
the strike price K lies somewhere between these extremes (see Figure 4.4and Excel).

Usually we do not have full freedom of choice for the strike price of a put option and need
to choose between options which are available on the market. Let us assume that we can invest
in n put options with strike prices K1 , . . . , Kn and maturities T. We denote by Hi (t) the value of
a put option with strike price Ki ; in particular
Hi (0) = P(r, T, Ki , S (0), σ),
Hi (T ) = (Ki − S (T ))+ .
Assume that we buy x shares of stock and zi put options with strike price Ki , for i = 1, . . . , n.
Assume also that we buy y units of the risk free asset A. Let z, 1 and H(t) for t = 0, T be vectors

88
 stock,put 
Figure 4.4 VaR5% G̃(x,z(K)) (T ) for different choices of K, for parameters V0 = S (0) = 100,
µ = 0.1, σ = 0.2, r = 0.03, T = 1 and x = 0.99.

in Rn defined as
z1 1  H1 (t)
     
.. .. H(t) =  ...
    
z =  .  , 1 =  .  ,  .
     
     
zn 1 Hn (t)
The value of our investment at time t is

V(x,y,z) (t) = xS (t) + yA(t) + zT H(t).

We show how to compute VaRα for

G̃stock,rf,puts
(x,y,z) (T ) = Ṽ(x,y,z) (T ) − Ṽ(x,y,z) (0).

Proposition 4.20
n
If zi ≥ 0, for i = 1, . . . , n, and z i = zT 1 ≤ x, then
P
i=1

VaRα G̃stock,rf,puts (T ) = V(x,y,z) (0) − e−rT xqα (S (T )) + yA(T ) − zT qα (−H(T )) , (4.20)


   
(x,y,z)

where
+
 (K1 − qα (S (T )))
 
..

qα (−H(T )) = −  .  .
 
(4.21)
(Kn − qα (S (T )))+
 

89
Proof
The formula (4.21) follows from Lemma 4.16.
Since zT 1 ≤ x, the function
n
 
X
ζ 7−→ e−rT  xζ + yA(T ) + zi (Ki − ζ)+  − V(x,y,z) (0)

 

i=1

is non-decreasing, which by Lemma 4.6 implies that

VaRα G̃stock,rf,puts
 
(x,y,z) (T ) =
n
 
+
 α X
α
= V(x,y,z) (0) − e  xq (S (T )) + yA(T ) + zi (Ki − q (S (T )))  ,
−rT


i=1

which gives (4.20).

From now on we shall assume that x and y are fixed and investigate how to minimize
α
 stock,rf,puts
VaR G̃(x,y,z) (T ) by choosing z. We assume that we have V0 at our disposal for investing
and hedging. This means that we spend

c = V0 − xS (0) − yA(0)

on put options. We assume that we do not take short positions in stock or puts, and that the
number of options does not exceed the number of shares of stock in our portfolio. These
restrictions are imposed by common sense. Later in this chapter we produce an example of
what might happen if these are violated. Under such assumptions, by (4.20), minimizing
α
 stock,rf,puts
VaR G̃(x,y,z) (T ) is equivalent the following problem

min zT qα (−H(T ))
subject to: zT H(0) = c,
(4.22)
zT 1 ≤ x,
z0 , . . . , zn ≥ 0.

Since H(0) and qα (−H(T )) are fixed vectors in Rn , (4.22) is a typical linear programming
problem, which can be solved numerically.

90
Example 4.21
Consider the Black–Scholes model with parameters S (0) = 100, µ = 0.1, σ = 0.2 and
r = 0.03. Assume that we want to invest V0 = 1000 in stock and put options with strike
prices K1 = 75, K2 = 90, K3 = 110 with expiry T = 1. We shall solve the problem
(4.22) for α = 0.05, taking y = 0 and considering c = 0, 10, 30, 50 and 80.
We compute the prices of the put options using (4.12)
 
 0.406 
H(0) =  2.769  .
 
12.042
 

Using the fact that N −1 (0.05) = −1.645 we compute


σ2

qα (S (T )) = S (0)e(µ− 2 )T +σ T N −1 (α)
= 77.96

and  
 0 
α
q (−H(T )) =  −12.04  .
 
−32.04
 

The numerical solutions of (4.22) are given in Table 4.1.

c x z1 z2 z3 VaRα
0 10 0.00 0.00 0.00 243.44
10 9.9 0.00 3.61 0.00 208.81
30 9.7 0.00 9.36 0.34 146.23
50 9.5 0.00 6.95 2.55 120.68
80 9.2 0.00 3.32 5.88 82.35

Table 4.1 VaRα for various hedging expenditures from Example 4.21.

Evidently it does not make sense to buy put options with strike prices below qα (S (T )).
Looking at Table 4.1 we can see that c is small then we buy options which are cheaper. When
c is large then we can afford to spend money on options with higher strike price, which offer
better protection. A full picture is obtained when we look not only at VaR, but at the distribu-
tion of X in Figure 4.5.
In the formulation of (4.22) we have added constraints that we do not take short positions
in puts, and that we do not buy more puts than stocks. We finish the section by demonstrating
that exercising such common sense is often necessary when dealing with VaR.

91
stock,rf,puts
Figure 4.5 The gain X = G̃(x,y,z) (T ) from Example 4.21 for various levels of c (left), and its
distribution function (right).

Example 4.22
Consider the data from Example 4.21. Suppose that we want to invest V0 = 1000 and
decide to buy x = 20 shares of stock and hedge them with z3 = 20 put options with
strike price K3 . Clearly V(0) does not provide enough funds to enter such a position.
We decide to finance our strategy by taking a short position in put options with strike
price K1
1
z1 = (V0 − xS (0) − z3 H3 (0)) = −3056.
H1 (0)
Clearly our strategy is not a good idea. Common sense dictates that the short position in
unhedged puts will be catastrophic if S (T ) < K1 . Since the probability of this is small,

P(S (T ) < K1 ) < P(S (T ) ≤ qα (S (T ))) = α,

such a scenario is ignored in the computation of VaR and we obtain (see Figure 4.6)

VaRα G̃stock,rf,puts
 
(x,0,z) (T ) = −1135,

which can lull us into a false sense of security, again illustrating the most serious short-
coming of VaR as a risk measure.

In Figure 4.7 we see that the strategy from Example 4.22 can suffer huge losses if stock
prices move against us. In Figure 4.6 we can see that with probability of 2% we will suffer a
severe loss. The conclusion is that looking at loss at given confidence level is not enough, we
need to also look at expected loss.

92
CHAPTER 4. VALUE AT RISK 93

1135 1135

Figure 4.6 The cumulative distribution function for the gain X = VaRα G̃(x,0,z)
 stock,rf,puts 
(T ) for the
strategy from Example 4.22. The full picture on the left, and a closeup on the right.

stock,rf,puts
Figure 4.7 The gain X = G̃(x,0,z) (T ) for the strategy from Example 4.22.

You might also like