Professional Documents
Culture Documents
ST339 - 19-03 Notes
ST339 - 19-03 Notes
1
I would like to thank the students and teaching assistants Shiyao Bian, Nikolaos Constantinou, Chester
Gan, Alia Hajji, Scott Hamilton, Kairav Hirani, Nazem Khan, Kevin Lam, Rahul Mathur, Noah Prasad,
Anthony Shaffu, Osian Shelley, Haodong Sun, Anastasiya Tsyhanova, and Ben Windsor for spotting typos in
previous versions. Moreover, I am grateful to my colleague Sebastian Herrmann for fruitful discussions on the
material. Any remaining mistakes and errors are of course mine.
Contents
1 Introduction and Preliminaries 4
1.1 What is Mathematical Finance? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 A primer on financial assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fundamental concepts of Probability Theory . . . . . . . . . . . . . . . . . . . . 5
4 Utility Theory 43
4.1 Measure theoretic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Preferences on lotteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Von Neumann-Morgenstern representation . . . . . . . . . . . . . . . . . . . . . 45
4.4 Concave functions and Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . 48
4.5 Expected utility representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Measuring risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 A primer on utility maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3
1 Introduction and Preliminaries
In this chapter, we briefly discuss what Mathematical Finance is about as a subject, give a
short introduction to financial assets, and then review some key concepts from Probability
Theory.
Another main goal is to understand how to optimally invest into a financial market (e.g. stocks
traded on the London Stock Exchange) taking into account
• preferences of investors.
This module aims to give basic answers to the above questions. Along the way, we also need
to make concepts like risk or preferences of investors mathematically precise.
This module focusses on key concepts and the probability side of Mathematical Finance.
This means that we assume throughout that all stochastic parameters (like the distribution
of assets) are known. In reality, of course, all parameters must be estimated from data. This
is a very interesting and challenging topic of its own and usually referred to as Financial
Econometrics. Moreover, we will not address computational aspects of Mathematical Finance,
which are crucial for the financial industry. This is again a very interesting and challenging
topic of its own and usually referred to as Computational Finance. Finally, we will not address
any questions related to machine learning which is of course becoming increasingly important
in the financial industry.
4
One main distinction between financial assets is whether they are equity or debt. An equity
security entitles the holder to a share of the profits of a business, usually paid out as dividends.
The prime example are stocks, and they are often traded at large exchanges as the London
Stock Exchange (LSE) or the New York Stock Exchange (NYSE). By contrast, a debt security
entitles the holder to a fixed, predefined payment stream from a counterparty. Important
examples are corporate or government bonds. They are often also traded at large exchanges
like the LSE or NYSE. Usually equity is considered more “risky” than debt, because dividends
of business are uncertain whereas the payments of a bond are certain. Notwithstanding, bonds
are not “riskless” because businesses (and also countries) may default.
Another main distinction between financial assets is whether they are primary or deriva-
tive. The payment structure of a derivative security (or just derivative) depends on some
other, more basic, underlying variable or financial asset, whereas the payment structure of a
primary asset does not. The prime examples of primary assets are stocks and bonds. The
most important examples of derivative assets are options, futures and swaps.2 They are either
traded at large exchanges or over the counter (OTC). Derivates can either reduce or magnify
risk, depending on how they are used.
If you buy a financial asset you pay today a certain amount of money to get the asset. You
are then said to have a long position/to be long in the asset. By contrast if you (short-)sell a
financial asset that you do not own, you get some money today and have to deliver the asset in
the future. You are then said to have a short position/to be short in the asset. More complex
portfolios, held e.g. by hedge funds, often involve a mixture of both long and short positions.
Probability spaces
A sample space Ω is a (finite or infinite) set. Each ω ∈ Ω describes a possible “state of the
world”.
Given a sample space Ω, a σ-algebra F on Ω is a collection of subsets of Ω such that
(a) Ω ∈ F;
(b) A ∈ F =⇒ Ac = Ω \ A ∈ F;
S
(c) A1 , A2 , . . . ∈ F =⇒ n∈N An ∈ F.
The pair (Ω, F) is called a measurable space and each A ∈ F is called an F-measurable event.
2
For an excellent introduction to derivative securities, we refer to the textbook by Hull [4].
5
Example 1.1. (a) If Ω is finite (or countable), i.e., Ω = {ω1 , . . . , ωN } (or Ω = {ω1 , ω2 , . . .}),
then the usual choice for a σ-algebra on Ω is F := 2Ω := {A : A ⊂ Ω}, the power set of Ω.
(b) If Ω = R, it turns out that the power set 2Ω is “too big” to be chosen as σ-algebra.3 For
this reason, one chooses instead F := BR , the Borel σ-algebra on R, i.e., the smallest σ-algebra
on R that contains all intervals of the form (a, b), (a, b], [a, b) and [a, b] for −∞ < a < b < ∞.
Given a measurable space (Ω, F), a probability measure P on (Ω, F) is a map F → [0, 1]
such that
Random variables
{X ∈ B} := {ω ∈ Ω : X(ω) ∈ B} ∈ F.
We then also say that X is F-measurable. Whereas this definition is nice from a theoretical
perspective, it is almost useless for checking in practice that a given function X : Ω → R is
F-measurable. But fortunately, one can show that it suffices to check this condition only for
Borel sets B of the form B = (−∞, x] for x ∈ R, i.e., X is F-measurable if and only if for all
x ∈ R,
{X ≤ x} := {ω ∈ Ω : X(ω) ≤ x} ∈ F.
If (Ω, F) = (R, BR ), a random variable is often also called a measurable function and
denoted by f or g, etc. Note that all continuous functions on R are measurable.
The definition of a random variable does not mention any probability measure P at all.
However, if we add a probability measure P to a measurable space (Ω, F), we get some further
3
See [3, Theorem 1.5] for a precise formulation of this statement.
6
notions: For B ∈ BR we say that X ∈ B P -almost surely (P -a.s.) if P [X ∈ B] = 1. The
distribution (or law ) of X is defined by
P X [B] := P [X ∈ B].
One can check that P X is again a probability measure – on the measurable space (R, BR ).
A random variable X is called discrete if there exists a finite or countable set B ∈ BR
such that P X [B] = P [X ∈ B] = 1 P -a.s. In this case, we define the probability mass function
(pmf) pX : R → [0, 1] by
Example 1.2. Let (Ω, F, P ) be a probability space and A ∈ F an F-measurable event. Then
the indicator function 1A defined by
1 if ω ∈ A,
1A (ω) =
0 if ω ∈ Ac .
Remark 1.3. If Ω is finite or countable, then every random variable on (Ω, F) is discrete.
Remark 1.4. Note that there are many random variables which are neither discrete nor
continuous.
Expectation
For a random variable X on a probability space (Ω, F, P ), the expectation of X under P (also
called the integral of X with respect to P ), is defined as follows:
7
• If X is nonnegative, writing X = limn→∞ Xn , where (Xn )n∈N is a nondecreasing se-
quence of nonnegative simple random variables,4 one sets5
• If X is general, one says that X is integrable or has finite expectation if E P [|X|] < ∞.
In this case one sets
E P [X] := E P X + − E P X − ,
where X + = max{0, X} denotes the positive part and X − = max{0, −X} denotes the
negative part of X.6
N
X
E [X] = X(ωn )P [{ωn }].
n=1
N
X
X(ω) = cn 1An (ω).
n=1
N
X N
X
E [X] = cn P [An ] = X(ωn )P [{ωn }].
n=1 n=1
The following result lists two important properties of the expectation operator.
Lemma 1.6. Let X and Y be integrable random variables on some probability space (Ω, F, P ).
8
Moreover, the inequality in (1.1) is an equality if and only if X = Y P -a.s.
Property (a) is referred to as linearity of the expectation and property (b) as monotonicity
of the expectation.
The following result is very useful for calculating expectations (of functions of) discrete or
continuous random variables.
7
If X is discrete, then h(X) is again discrete. However, if X is continuous, h(X) need not be continuous.
9
2 No-Arbitrage and the Fundamental Theorem of Asset Pricing
In this chapter, we develop a mathematical model for financial markets in one period, introduce
the key concept of no-arbitrage, and formulate and prove the so-called Fundamental Theorem
of Asset Pricing on the absence of arbitrage in this setting.
and call the Rd -valued stochastic process S = (St )t∈{0,1} the risky assets.12
Example 2.1 (One-period Binomial model). Assume that d = 1, i.e., there is only one risky
asset, and there are only two states of the world at time 1, i.e., Ω = {ω1 , ω2 }. We assume that
S01 = 1 and
S 1 (ω1 ) = 1 + u and S 1 (ω2 ) = 1 + d,
8
Note that some derivative assets such as swaps have zero initial value.
9
Government bonds are usually considered to be “riskless” in reality, in particular German or Swiss govern-
ment bonds. Notwithstanding the “bank account” is a somewhat fictitious security, in particular in continuous
time where it denotes the “rollover” of very short term “riskless” bonds.
10
Before the financial crisis of 2008, interest rates tended to be positive (or at least nonnegative).
11
Note that S0 is an Rd -valued vector and S1 is an Rd -valued random vector. Likewise, S 0 is an R1+d -valued
vector and S 1 is an R1+d -valued random vector.
12 d
R -valued stochastic process here means the collection of the two Rd -valued random vectors S0 and S1
(where S0 (ω) := S0 ).
10
where u > d > −1. Here, u and d are mnemonics for “up” and “down”, and it is often assumed
that u > 0. The probabilities for “up” and “down” are given by
where p1 , p2 ∈ (0, 1) and p1 + p2 = 1.13 One can nicely illustrate this model by the following
trees, where the numbers beside the branches denote probabilities:
1
S0 : 1 1+r
p1 1+u
S1 : 1
p2 1+d
where ϑi denotes the number of shares held in asset i. The price today for buying the trading
strategy/portfolio ϑ is
d
X
ϑ · S0 = ϑi S0i = ϑ0 + ϑ · S0 .
i=0
d
X
ϑ · S 1 (ω) = ϑi S1i (ω) = ϑ0 (1 + r) + ϑ · S1 (ω),
i=0
11
Definition 2.2. A trading strategy ϑ ∈ R1+d is called an arbitrage opportunity for S if
The financial market S is called arbitrage-free if there are no arbitrage opportunities. In this
case one also says that S satisfies NA.
An arbitrage opportunity gives something (a positive chance of strictly positive final wealth
P [ϑ·S 1 > 0] > 0) out of nothing (zero or negative initial wealth ϑ·S 0 ≤ 0) without risk (almost
sure nonnegative final wealth ϑ · S 1 ≥ 0 P -a.s.).
In well-functioning real financial markets, arbitrage opportunities do not exist for long
because they are immediately exploited by so-called arbitrageurs (often hedge funds) and dis-
appear. The reason that arbitrage opportunities disappear is that in real financial markets
(unlike in our textbook setting) prices adjust to the trading activities of the market par-
ticipants, i.e., prices of assets in high demand rise and prices of assets in low demand fall.
Therefore, it is reasonable to assume that financial markets are arbitrage-free, which is indeed
a key assumption of Mathematical Finance.
Example 2.3 (An arbitrage opportunity). Consider the financial market given by the follow-
ing trees where the numbers beside the branches denote probabilities:
1
S0 : 1 1.01
202
0.8
0.1
S 1 : 200 200
0.1
198
We claim that the market S = (St0 , St1 )t∈{0,1} admits arbitrage. Indeed, the riskless asset
always has in all states of the world the same or a higher return than the risky asset. So we
short the risky asset and use it to buy the riskless asset. Mathematically, we set
Then at time 0,
ϑ · S 0 = 200 × 1 + (−1) × 200 = 0.
12
and at time 1,
200 × 1.01 + (−1) × 202 = 0 if ω = ω1 ,
ϑ · S 1 (ω) = 200 × 1.01 + (−1) × 200 = 2 > 0 if ω = ω2 ,
200 × 1.01 + (−1) × 198 = 4 > 0 if ω = ω .
3
Thus,
P [ϑ · S 1 ≥ 0] = 1 and P [ϑ · S 1 > 0] = 0.2,
Remark 2.4. If the market S admits arbitrage, there always exists an arbitrage opportunity
with ϑ · S 0 = 0. Indeed, if η = (η 0 , η) is an arbitrage opportunity with η · S 0 < 0, set
ϑ := (ϑ0 , ϑ) = (η 0 − η · S 0 , η). Then
ϑ · S 0 = ϑ0 + ϑ · S0 = η 0 − η · S 0 + η · S0 = η 0 − η 0 − η · S0 + η · S0 = 0.
Moreover, as −η · S 0 > 0,
ϑ · S 1 = ϑ0 (1 + r) + ϑ · S1 = η 0 (1 + r) + (−η · S 0 )(1 + r) + η · S1
= η · S 1 + (−η · S 0 )(1 + r) ≥ (−η · S 0 )(1 + r) > 0 P -a.s.
2.3 Discounting
Our next aim is to give a necessary and sufficient condition on the market S to be arbitrage-
free. To this end, we need to introduce two further concepts.
The first concept is the notion of discounting. Assets are denoted in units of something,
e.g. GBP or EUR. Notwithstanding, it is clear that prices (and values) are relative. So basic
concepts of financial markets (like being arbitrage-free) should not and do not depend on the
choice of unit. For this reason, we are free to change the unit, in particular if this makes the
mathematics simpler. It turns out that a good choice is a unit which itself is a traded asset,
and the canonical choice is to use the risk-free asset S 0 . So we discount with S 0 or take S 0 as
numéraire, and define the discounted assets X 0 , X 1 , . . . , X d by
Sti
Xti = , t ∈ {0, 1}, i ∈ {0, 1, . . . , d}.
St0
Then X 0 ≡ 1 and X = (X 1 , . . . , X d ) expresses the value of the risky assets in units of the
14
Here, we have implicitly assumed that Ω = {ω1 , ω2 , ω3 }, S11 (ω1 ) = 202, S11 (ω2 ) = 200 and S11 (ω3 ) = 198,
F = 2Ω and P [{ω1 }] = 0.8, P [{ω2 }] = 0.1, and P [{ω3 }] = 0.1.
13
numéraire S 0 .
Remark 2.5. The economic reason for taking a traded asset (as opposed to a standard
currency like GBG) as numéraire is that a standard currency does not reflect the time value
of money: one pound today is not the same as one pound in a year or a pound in 100 years
– think of the (gigantic) value of one pound in the past. By discounting we make prices at
different times comparable.15
The mathematical reason for taking a traded asset as numéraire is that it allows to reduce
the dimension of the market from 1 + d to d.
Example 2.6. Consider the one-period Binomial model from Example 2.1. Then the dis-
counted risky asset X 1 is given by X01 = 1 and
1+u 1+d
X11 (ω1 ) = and X11 (ω2 ) = .
1+r 1+r
We can reformulate the notion of arbitrage in terms of the discounted risky assets X only.
(b) The discounted risky assets X satisfy NA, i.e., there does not exist an arbitrage oppor-
tunity16 ϑ = (ϑ1 , . . . , ϑd ) ∈ Rd for X such that
ϑ · X 0 = ϑ0 X00 + ϑ · X0 = ϑ0 + ϑ · X0 = −ϑ · X0 + ϑ · X0 = 0. (2.2)
14
Next, as X10 − X00 = 1 − 1 = 0, we note that (2.1) is equivalent to
Now using that inequalities remain unchanged by multiplying by positive constants (here S10 ),
we obtain
ϑ · S 1 ≥ 0 P -a.s. and P [ϑ · S 1 > 0] > 0.
This together with (2.3) shows that ϑ is an arbitrage opportunity for S, in contradiction to
the hypothesis that S satisfies NA.
Definition 2.8. Let (Ω, F) be a measurable space. Two probability measures P and Q on
(Ω, F) are called equivalent (notation: P ≈ Q) if, for A ∈ F, Q[A] = 0 if and only if P [A] = 0.
Two probability measures are equivalent, if they agree on which events will not happen,
i.e., have probability zero. But they may still assign different probabilities to events that
might happen.
Remark 2.11. The terminology equivalent martingale measure stems from the fact that the
X i ’s are martingales under the equivalent measure Q. Martingales will be studied in some
detail in Chapter 6.
Alternatively, Q is also often called a risk-neutral measure.
15
2.5 The Fundamental Theorem of Asset Pricing
We have now all the tools to state and prove the so-called Fundamental Theorem of Asset
pricing, giving necessary and sufficient conditions for the absence of arbitrage. For multiperiod
models, this was only established in 1990 by Dalang, Morton, and Willinger. For this reason,
it is sometimes also referred to as the Dalang-Morton-Willinger theorem.
Theorem 2.12 (Fundamental Theorem of Asset Pricing). Let S = (St0 , St )t∈{0,1} be a one-
period financial market on some probability space (Ω, F, P ). The following are equivalent:
(b) There exists an EMM for the discounted risky assets X = S/S 0 .
Proof. We first establish the “easy” direction (b) ⇒ (a). So let Q ≈ P be an EMM. By
Proposition 2.7, it suffices to show that X satisfies NA. Seeking a contradiction, suppose there
is ϑ ∈ Rd such that
E Q [ϑ · (X1 − X0 )] > 0.
But by linearity of the expectation operator (cf. Lemma 1.6 (a)) and the fact that Q is an
EMM,
d
X d
X
E Q [ϑ · (X1 − X0 )] = ϑi E Q X1i − X0i = ϑi × 0 = 0,
i=1 i=1
Then K corresponds to the collection of all random variables of the form ϑ·(X1 −X0 ) for ϑ ∈ Rd .
Mathematically, K is an (at most d-dimensional) vector subspace of RN . By Proposition 2.7,
17
For a proof with general Ω and F (which requires more measure theory), we refer to [2, Theorem 1.7].
16
1
a·x=λ
∆N −1
−1 1
K ∩ RN
+ = {0}, (2.6)
where RN N
+ = [0, ∞) . Next, define the standard simplex of dimension N − 1 by
N
X
N −1
∆ := x∈ RN
+ : i
x =1 .
i=1
Then ∆N −1 ⊂ RN / ∆N −1 , so that
+ and 0 ∈
K ∩ ∆N −1 = ∅.
As K and ∆N −1 are both nonempty and convex, K is closed and ∆N −1 is compact, it follows
from the strict separating hyperplane theorem 18 and the fact that K is a vector subspace that
there exists a vector a ∈ RN \ {0} and λ > 0 such that
a · k = 0 for all k ∈ K,
a · x ≥ λ > 0 for all x ∈ ∆N −1 .
a · ei = ai > 0, i ∈ {1, . . . , N }.
an
Q[{ωn }] = PN > 0.
k=1 ak
18
For a proof, we refer to [1, Proposition B.14].
17
Then Q ≈ P by Example 2.9. Moreover, for i ∈ {1, . . . , d}, set
Example 2.13. Consider the Binomial model from Example 2.1. Using the FTAP, we want
to check when S satisfies NA. So let Q be a measure on (Ω, F), and set q1 := Q[{ω1 }] and
q2 := Q[{ω2 }]. We know from Example 2.9 that Q ≈ P if and only if q1 > 0 and q2 > 0.
Moreover, Q is an EMM for X 1 if and only if
1+u 1+d
E Q X11 = X01 = 1 q1 X11 (ω1 ) + q2 X11 (ω2 ) = 1
⇔ ⇔ q1 + q2 = 1.
1+r 1+r
r−d
q1 (1 + u) + (1 − q1 )(1 + d) = 1 + r ⇔ q1 = .
u−d
and thus
u−r
q2 = 1 − q1 = .
u−d
Clearly, q1 , q2 > 0 if and only if u > r > d.
So S is arbitrage free if and only if u > r > d, in which case the (unique) EMM satisfies
r−d u−r
q1 = and q2 =
u−d u−d
The condition u > r > d is economically quite intuitive as it says that the risky asset must
offer the chance of a higher return than the interest rate in one state of the world (u > r) but
also have a lower return than the interest rate in another state of the world (d < r). Note
that the EMM Q does not depend on the values of p1 and p2 .
18
3 Mean-Variance Portfolio Selection and the CAPM
In this chapter, we seek to answer the question how to optimally invest in a financial market
taking into account the mean and the variance of the return of a portfolio. We then deduce
that if all market participants behave optimally in a mean-variance sense, the financial market
has a special structure, which is described by the Capital Asset Pricing Model (CAPM).
with r > −1. We assume that all assets today are all positive, i.e., S00 , S01 , . . . , S0d > 0, and
have finite second moments, i.e., E (Sti )2 < ∞, i ∈ {0, 1, . . . , d}, t ∈ {0, 1}.19 We also
assume that S satisfies NA. We may then assume without loss of generality20 that the risky
assets are non-redundant in the sense that
ϑ · S 1 = 0 P -a.s. =⇒ ϑ = 0. (3.1)
S1i − S0i
Ri := , i ∈ {0, . . . , d}.
S0i
µi := E Ri ,
i ∈ {0, . . . , d}.
Set µ = (µ1 , . . . , µd ) and µ = (µ0 , µ). Note that R0 ≡ r = µ0 , i.e., the return of asset 0
is deterministic and equals the interest rate r. For the risky assets S 1 , . . . , S d , however, the
return is stochastic, and we denote by Σ = (Σij )1≤i,j≤d ∈ Rd×d , the covariance matrix of the
19
For t = 0 (as well as i = 0), this integrability condition is of course trivially satisfied.
20
Otherwise, there is j ∈ {1, . . . , d} 6= 0 with ϑj 6= 0 such that
X −ϑi i
S1j = S1 P -a.s.
ϑj
i6=j
i.e., the risky asset j can be written as a linear combinations of the other assets, and hence be omitted.
19
return vector R = (R1 , . . . , Rd ) of the risky assets, given by
One can show that by the non-redundancy assumption (3.1) on S, it follows that Σ is positive
definite and hence invertible.
For a portfolio ϑ ∈ R1+d with ϑ · S 0 6= 0,21 we denote its return by
ϑ · S1 − ϑ · S0
Rϑ := .
ϑ · S0
The expected return and the variance of the return of ϑ are then given by
µϑ := E Rϑ ,
h 2 i
σϑ2 := Var Rϑ = E Rϑ − E Rϑ
.
ϑ · S1 − ϑ · S0
Rϑ := ,
ϑ · S0
µϑ := E [Rϑ ] ,
h i
σϑ2 := Var [Rϑ ] = E (Rϑ − E [Rϑ ])2 .
ϑ · S 0 = x0 . (3.2)
20
2
Example 3.1. Consider a Binomial model with u = 0.05, r = 0.01, d = 0, p1 = 5 and p2 = 35 .
Then
2 3
E S11 = × 1.05 + × 1 = 1.02.
5 5
Let x0 = 1000. Then ϑ ∈ R2 is x0 -feasible if and only if ϑ0 = 1000 − ϑ1 . Moreover,
0.01ϑ1
sup µϑ = 0.01 + sup = +∞.
{ϑ∈R2 :ϑ0 +ϑ1 =1000} ϑ1 ∈R 1000
Let us briefly comment on what goes wrong in Example 3.1. Focussing on expected
return only, motivates to buy huge amounts of the risky asset by borrowing money from the
bank account. By doing this, one can achieve any expected return one likes. However, this
completely ignores the risk inherent in the investment. To illustrate this point, suppose we
choose ϑ1 = 1, 000, 000 and ϑ0 = −999, 000. Then we have an expected return of µϑ = 10.01 =
1001%, which sounds amazing. However,
−999, 000 × 1.01 + 1, 000, 000 × 1.05 = 41, 010 if ω = ω ,
1
ϑ · S 1 (ω) =
−999, 000 × 1.01 + 1, 000, 000 × 1 = −8, 990 if ω = ω2 ,
so that with a probability of P [{ω2 }] = 60% we lose everything and even have a debt of 8990
in one year.22
21
There are two versions of the mean-variance problem, each of which has a formulation
with risk-only and one with general portfolios, i.e., portfolios which also allow an investent in
the riskless asset:
(1) Given an initial wealth x0 > 0 and a minimal desired expected return µmin > 0, minimise
the variance of the return σϑ2 among all x0 -feasible portfolios ϑ ∈ R1+d that satisfy
µϑ ≥ µmin .
(1’) Given an initial wealth x0 > 0 and a minimal desired expected return µmin > 0, minimise
the variance of the return σϑ2 among all risk-only x0 -feasible portfolios ϑ ∈ Rd that satisfy
µϑ ≥ µmin .
2
(2) Given an initial wealth x0 > 0 and a maximal variance of the return σmax ≥ 0, maximise
2 .
the expected return µϑ among all x0 -feasible portfolios ϑ ∈ R1+d that satisfy σϑ2 ≤ σmax
2
(2’) Given an initial wealth x0 > 0 and a maximal variance of the return σmax ≥ 0, maximise
the expected return µϑ among all risk-only x0 -feasible portfolios ϑ ∈ Rd that satisfy
2 .
σϑ2 ≤ σmax
We shall see that problems (1) and (2) (and (1’) and (2’)) are two sides of the same coin.
The idea underlying the mean-variance problems is that (a high) expected return is de-
sirable whereas (a high) variance of the return is undesirable. We say that an investor has
0 0
mean-variance preferences if for two portfolios ϑ, ϑ , portfolio ϑ is preferred over portfolio ϑ ,
whenever µϑ ≥ µϑ0 and σϑ2 ≤ σ 2 0 , with at least one inequality being strict.23
ϑ
ϑi S0i
π i := , i ∈ {0, . . . , d}. (3.3)
ϑ · S0
and call π i the fraction of wealth invested in asset i ∈ {0, . . . , d}. We set π := (π 0 , π) =
(π 0 , π 1 , . . . π d ). Note that π ∈ H 1+d−1 .
23 0
Note that in general two portfolios ϑ, ϑ cannot be compared in a mean-variance sense, i.e., neither is ϑ
0 0
preferred over ϑ nor is ϑ preferred over ϑ.
22
Similarly, if ϑ is a risk-only portfolio parametrised in numbers of shares with ϑ · S0 > 0,
ϑi S0i
define π i := i
ϑ·S0 for i ∈ {1, . . . , d}, call π the fraction of wealth invested in asset i ∈ {1, . . . , d},
and set π = (π 1 , . . . π d ). Note that in the risk-only case π ∈ H d−1 .
Conversely, for π ∈ H 1+d−1 and an initial wealth x0 > 0, define the portfolio ϑ ∈ R1+d
(parametrised in numbers of shares) by
πi
ϑi = x0 , i ∈ {0, . . . , d}, (3.4)
S0i
πi
ϑi = x0 , i ∈ {1, . . . , d}.
S0i
Remark 3.2. Calling π ∈ H 1+d−1 a portfolio is a slight abuse of notation because to recover
the numbers of shares corresponding to π, we also need to specify the initial wealth x0 .
However, if π ∈ H 1+d−1 is a portfolio parametrised in fractions of wealth and x0 , x00 > 0
0
are different initial wealths with corresponding portfolios ϑ and ϑ in numbers of shares, then
ϑ ϑ0
x0 = x00
and Rϑ = Rϑ0 . So if we are only interested in returns, it suffices to considers portfolios
parametrised in fractions of wealth; see also Lemma 3.3 below.24
Parametrising portfolios in fractions of wealth is tailor-made for studying the mean variance
problems.
23
fractions of wealth of an x0 -feasible portfolios ϑ ∈ R1+d , then
d
X
Rϑ = Rπ := π i Ri = π · R, (3.5)
i=0
d
X
µϑ = µπ := E [Rπ ] = π i µi = π · µ, (3.6)
i=0
Xd
σϑ2 = σπ2 := Var[Rπ ] = π i π j Σij = π > Σπ. (3.7)
i,j=1
d
X
Rϑ = Rπ := π i Ri = π · R, (3.8)
i=1
d
X
µϑ = µπ := E [Rπ ] = π i µi = π · µ, (3.9)
i=1
Xd
σϑ2 = σπ2 := Var[Rπ ] = π i π j Σij = π > Σπ. (3.10)
i,j=1
Proof. We only prove the case of general portfolios; the risk-only case is completely analogous.
The first statement follows from the fact that the map
n o ϑi S0i
Φ : ϑ ∈ R1+d : ϑ · S 0 = x0 → H 1+d−1 , Φi (ϑ) = , i ∈ {0, . . . d},
x0
ϑi S0i
π i := , i ∈ {0, . . . , d}.
ϑ · S0
Thus,
d d d
ϑ · S 1 − ϑ · S 0 X ϑi X ϑi S i X
Rϑ = = (S1i − S0i ) = 0
Ri = π i Ri = π · R.
ϑ · S0 i=0
ϑ · S 0 i=0
ϑ · S 0 i=0
25
This uses that x0 > 0 and S00 , . . . , S0d > 0.
24
So we have (3.5), and (3.6) follows by linearity of the expectation. Finally, to establish (3.7),
we use that Cov[Ri , R0 ] = 0 for i ∈ {0, . . . , d} because R0 is deterministic, and obtain
d
X d
X
σϑ2 = Var[Rϑ ] = Var[Rπ ] = Var i
πR i
= π i π j Cov[Ri , Rj ]
i=0 i,j=0
d
X d
X
= π i π j Cov[Ri , Rj ] = π i π j Σij = π > Σπ.
i,j=1 i,j=1
It is given by26
Σ−1 1
πmin = (3.11)
1> Σ−1 1
and satisfies
µ> Σ−1 1 1
µπmin = and σπ2min = .
1> Σ−1 1 1> Σ−1 1
1> Σ−1 1
Proof. First, 1 · πmin = 1> Σ−1 1
= 1, it follows that πmin ∈ H d−1 . Next, let π ∈ H d−1 be
arbitrary. Set
y := π − πmin
Then by (3.10), the fact that Σ is symmetric, the definition of πmin and the fact that y and 1
26
Note that as the market is non-redundant, it follows that Σ is positive definite and hence invertible.
Therefore, Σ−1 is well defined and 1> Σ−1 1 > 0 by the fact that Σ−1 is positive definite.
25
are orthogonal,
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. This
1
shows both that πmin is the unique optimiser and yields σπ2min = 1> Σ−1 1
. The formula for
µπmin follows directly from (3.9).
Next, we seek to find the risk-only portfolio which minimises the variance among all risk-
only portfolios with a given expected return µ0 . To this end, note that if µ and 1 are collinear,
then every risk-only portfolio has the same expected return. Indeed, if µ and 1 are collinear,
then µ1 = . . . = µd and hence, for any π ∈ H d−1 ,
µπ = µ · π = µ 1 1 · π = µ 1 .
It is given by27
C − Bµ0 −1 Aµ0 − B −1
πµ0 = Σ 1+ Σ µ, (3.12)
AC − B 2 AC − B 2
where A = 1> Σ−1 1, B = 1> Σ−1 µ, and C = µ> Σ−1 µ. Moreover,
Aµ20 − 2Bµ0 + C A
σπ2µ = = σπ2min + (µ0 − µπmin )2 . (3.13)
0 AC − B 2 AC − B 2
Proof. First, as Σ is positive definite, it is invertible and Σ−1 is again positive definite. Hence
27
It is part of the assertion that AC − B 2 6= 0; we even have AC − B 2 > 0.
26
it induces a scalar product h·, ·iΣ−1 on Rd given by
where the inequality is an equality if and only if µ and 1 are collinear. As they are not, it
follows that AC − B 2 > 0.
Next, we check that πµ0 given by (3.12) is indeed in H d−1 and has expected return µ0 . By
the definitions of A, B and C and (3.9),
y := π − πµ0 .
y · µ = π · µ − πµ0 · µ = µπ − µπµ0 = µ0 − µ0 = 0.
Then by (3.10), the fact that Σ is symmetric, the definition of πµ0 , the fact that πµ0 · µ = µ0
and the fact that y is orthogonal to 1 and µ,
σπ2 = π > Σπ = (πµ0 + y)> Σ(πµ0 + y) = πµ>0 Σπµ0 + 2y > Σπµ0 + y > Σy
> C − Bµ0 Aµ0 − B > C − Bµ0 Aµ0 − B
= πµ0 1+ µ + 2y 1+ µ + y > Σy
AC − B 2 AC − B 2 AC − B 2 AC − B 2
C − Bµ0 Aµ0 − B
= ×1+ µ0 + 2 × (0 + 0) + y > Σy
AC − B 2 AC − B 2
Aµ20 − 2Bµ0 + C
= + y > Σy
AC − B 2
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. This
shows both that πµ0 is the unique optimiser and yields the first equality in (3.13). The second
27
1 B
equality in (3.13) follows by a simple rearrangement using that σπ2min = A and µπmin = A.
For the following result, we introduce the key concept of a risk-only efficient portfolio.
Definition 3.6. A risk-only portfolio π ∈ H d−1 is called risk-only efficient (in the mean-
variance sense) if there does not exist another risk-only portfolio π 0 ∈ H d−1 such that µπ0 ≥ µπ
and σπ2 0 ≤ σπ2 with at least one inequality being strict.
B 2 Aµ20 − 2Bµ0 + C
E := (σ02 , µ0 ) ∈ R2 : µ0 ≥ ,σ = .
A 0 AC − B 2
(a) For each point (σ02 , µ0 ) ∈ E, there exists exactly one risk-only portfolio π ∈ H d−1 such
that (σπ2 , µπ ) = (σ02 , µ0 ). It is given by
C − Bµ0 −1 Aµ0 − B −1
π = πµ0 = 2
Σ 1+ Σ µ.
AC − B AC − B 2
Proof. (a). Let (σ02 , µ0 ) ∈ E. It follows from Lemma 3.5 that πµ0 satisfies (σπ2µ , µπµ0 ) =
0
(σ02 , µ0 ). If π 0 ∈ H d−1 is any other portfolio with µπ0 = µ0 but π 0 6= πµ0 , then σπ2 0 > σπ2µ = σ02
0
again by Lemma 3.5 . So we have both existence and uniqueness of π.
(b). First, assume that π ∈ H d−1 is risk-only efficient. Then σπ2 ≥ σπ2min by Lemma 3.4, and
B Aµ20 −2Bµ0 +C
so µπ ≥ µπmin = A by the definition of efficiency. Set µ0 := µπ and σ02 := AC−B 2
. Then
(σ02 , µ0 ) ∈ E and by (a), (σπ2µ , µπµ0 ) = (σ02 , µ0 ). Efficiency of π together with µπ = µ0 = µπµ0
0
gives σπ2 ≤ σπ2µ = σ02 . On the other hand, Lemma 3.5 gives σπ2 ≥ σπ2µ = σ02 , whence σπ2 = σ02 ,
0 0
and so (σπ2 , µπ ) ∈ E.28
Conversely, let π ∈ H d−1 be such that (σπ2 , µπ ) ∈ E. Set µ0 := µπ and σ02 := σπ2 . Then
π = πµ0 by (a). Seeking a contradiction, suppose there is π 0 ∈ H d−1 such that µπ0 ≥ µ0 and
σπ2 0 ≤ σ02 with at least one of the inequalities being strict. If µπ0 = µ0 , then σπ2 0 < σ02 = σπ2µ ,
0
and by Lemma 3.5, we arrive at a contradiction. Otherwise, if µπ0 > µ0 , set µ1 := µπ0 and
Aµ21 −2Bµ1 +C 2 −2Bx+C
σ12 = AC−B 2
. Then σ12 > σ02 because the function x 7→ AxAC−B 2 is strictly increasing
B 2 2 2 2
for x ≥ A . This means, that σπ0 < σ1 . But σ1 = σπµ by (a). Thus, µπ0 = µ1 = µπµ1 and
1
σπ2 0 < σ12 = σπ2µ , and again by Lemma 3.5, we arrive at a contradiction.
1
28
By (a), it even follows that π = πµ0 .
28
Minimum-variance portfolio
µπ
Efficient frontier E
σπ2min σπ2
With the help of Theorem 3.7, we can now fully solve the risk-only versions of the mean-
variance problems.
B
(1’) Let µmin ≥ A be given.29 Then the risk-only mean-variance problem
C − Bµmin −1 Aµmin − B −1
π∗ = πµmin = Σ 1+ Σ µ
AC − B 2 AC − B 2
Aµ2min − 2Bµmin + C
µπ∗ = µmin and σπ2∗ = .
AC − B 2
2 1
(2’) Let σmax ≥ A be given.30 Then the risk-only mean-variance problem
C − Bµσmax
2 Aµσmax
2 − B −1
π∗ = πµσ2 = 2
Σ−1 1 + Σ µ,
max AC − B AC − B 2
29 B
Note that by Lemma 3.4, A
= µπmin , the mean of the minimum-variance portfolio.
30
Note that by Lemma 3.4, 1
A
= σπ2 min , the variance of the minimum-variance portfolio.
29
where p
B (AC − B 2 )(Aσmax
2 − 1)
µσmax
2 = + .
A A
It is risk-only efficient and satisfies
p
B (AC − B 2 )(Aσmax
2 − 1)
µπ∗ = µσmax
2 = + and σπ2∗ = σmax
2
.
A A
Proof. We only establish (2’). The proof of (1’) is very similar and even easier.
First, it follows from Theorem 3.7(a) and (b), that π∗ = πµσ2 is efficient, and a straight-
max
forward calculation (using (3.13)) shows that indeed σπ2∗ = 2 .
σmax So π∗ satisfies the constraint
in (3.14) with equality. Let π 0 ∈ H d−1 be any portfolio satisfying the constraint in (3.14).
Then by efficiency of π∗ it follows that µπ0 ≤ µπ∗ , which gives existence of a solution to (3.14).
To establish uniqueness, note that if µπ0 = µπ∗ , then σπ2 0 ≥ σπ2∗ by efficiency of π∗ and hence
by the constraint in (3.14), σπ2 0 = σπ2∗ . But this then implies that (σπ2 0 , µπ0 ) = (σπ2∗ , µπ∗ ) ∈ E,
and by Theorem 3.7(a), it follows that π 0 = π∗ .
µπ − r = (µ − r1d ) · π.
We now seek to find the portfolio in H 1+d−1 which minimises the variance among all
portfolios in H 1+d−1 with a given expected return µ0 . Note that below 1 will always be 1d .
30
Lemma 3.10. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a proba-
bility space (Ω, F, P ). Assume that S has finite second moments and S00 , . . . , S0d > 0. Denote
by µ and Σ the mean vector and covariance matrix of the return vector R of the risky assets,
and let r be the interest rate. Assume that µ 6= r1. Let µ0 ∈ R be given. Then there exists a
unique portfolio π µ0 ,r ∈ H 1+d−1 such that µπµ0 ,r = µ0 and
µ0 − r
πµ0 ,r = Σ−1 (µ − r1). (3.15)
(µ − r1)> Σ−1 (µ − r1)
It satisfies
(µ0 − r)2 (µ0 − r)2
σπ2 µ = = , (3.16)
0 ,r (µ − r1)> Σ−1 (µ − r1) Ar2 − 2Br + C
where A = 1> Σ−1 1, B = 1> Σ−1 µ, and C = µ> Σ−1 µ.
Proof. First, it follows from the condition µ 6= r1 and the fact that Σ−1 is symmetric and
positive definite that (µ − r1)> Σ−1 (µ − r1) > 0. We proceed to check that π µ0 ,r ∈ H 1+d−1
and µπµ0 ,r = µ0 . By the definition of π µ0 ,r and Proposition 3.9,
y := π − π µ0 ,r .
Then by (3.7), the fact that Σ is symmetric, the definition of πµ0 ,r , Proposition 3.9 and
31
the fact that y is orthogonal to µ − r1,
σπ2 = π > Σπ = (πµ0 ,r + y)> Σ(πµ0 ,r + y) = πµ>0 ,r Σπµ0 ,r + 2y > Σπµ0 ,r + y > Σy
πµ>0 (µ − r1) y > (µ − r1)
= (µ0 − r) + 2(µ 0 − r) + y > Σy
(µ − r1)> Σ−1 (µ − r1) (µ − r1)> Σ−1 (µ − r1)
(µ0 − r)
= (µ0 − r) + 2(µ0 − r) × 0 + y > Σy
(µ − r1)> Σ−1 (µ − r1)
(µ0 − r)2
= + y > Σy. (3.17)
(µ − r1)> Σ−1 (µ − r1)
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. Moreover,
since π, π µ0 ,r ∈ H 1+d−1 , it follows that
d
X
yk = y · 11+d = π · 11+d − π µ0 ,r · 11+d = 1 − 1 = 0,
k=0
which implies in particular that y 6= 0 if and only if y 6= 0. Hence, (3.17) shows both that
π µ0 ,r is the unique optimiser and yields the first equality in (3.16). The second equality in
(3.16) follows by expanding the denominator and using the definitions of A, B and C.
We proceed to formulate the analogue of Theorem 3.7. To this end, we also need to
consider the notion of efficiency for general portfolios.
Definition 3.11. A portfolio π ∈ H 1+d−1 is called efficient (in the mean-variance sense) if
there does not exist another portfolio π 0 ∈ H 1+d−1 such that µπ0 ≥ µπ and σπ2 0 ≤ σπ2 with one
inequality being strict.
(µ0 − r)2
E := (σ02 , µ0 ) 2
∈ R : µ0 ≥ r, σ02 = .
Ar2 − 2Br + C
(a) For each point (σ02 , µ0 ) ∈ E, there exists exactly one portfolio π ∈ H 1+d−1 such that
(σπ2 , µπ ) = (σ02 , µ0 ). It is given by π = (1 − πµ0 ,r · 1, πµ0 ,r ), where
µ0 − r
πµ0 ,r = Σ−1 (µ − r1).
Ar2 − 2Br + C
32
Proof. The argument for (a) and (b) is almost verbatim the same as in the proof of Theorem
3.7. The only difference is that we replace the minimum variance portfolio by the riskless
portfolio and use Lemma 3.10 instead of Lemma 3.5.
We proceed to formulate the analogue of Theorem 3.8, the solution to the mean-variance
problems with a riskless asset. The proof is very similar to the proof of Theorem 3.8 and
hence omitted.
(1) Let µmin ≥ r be given. Then the mean-variance problem with a riskless asset
µmin − r
πµmin ,r = Σ−1 (µ − r1).
Ar2 − 2Br + C
(µmin − r)2
µπ∗ = µmin and σπ2 ∗ = .
Ar2 − 2Br + C
2
(2) Let σmax ≥ 0 be given. Then the mean-variance problem with riskless asset
µσmax
2 −r
πµσ2 ,r = Σ−1 (µ − r1)
max Ar2− 2Br + C
and
p
µσmax
2 = r + σmax Ar2 − 2Br + C.
33
3.7 The Markowitz tangency portfolio and the capital market line
Next, we study the relationship between (general) efficient and risk-only efficient portfolios.
(a) There exists a unique efficient portfolio π tan , called the Markowitz tangency portfolio,
that is risk-only, i.e., π tan = (0, πtan ). It satisfies32
1
πtan = Σ−1 (µ − r1)
B − rA
Proof. (a). Let π ∈ H 1+d−1 be an efficient portfolio. It follows from Theorem 3.12, that there
is µ0 ≥ r such that π = π µ0 ,r . Moreover, π µ0 ,r is risk-only if and only if πµ0 ,r · 1 = 1. Using
the definitions of πµ0 ,r , A and B, we get
µ0 − r µ0 − r
πµ0 ,r · 1 = 1> Σ−1 (µ − r1) = (B − rA).
Ar2 − 2Br + C 2
Ar − 2Br + C
Ar2 − 2Br + C
µ0 = r + := µtan .
B − rA
Note that µtan > r33 since Ar2 − 2Br + C = (µ − r1)> Σ−1 (µ − r1) > 0 and B − rA > 0.
This implies by Theorem 3.12 that π µtan ,r is indeed efficient. We conclude that an efficient
31
Note that the condition r < BA
implies that µ 6= r1. Indeed, if µ = r1, then B = rA and so r = BA
.
32
Note that B − rA > 0 because r < B A
so that πtan is well defined. Moreover, note that if µ and 1 are
collinear, then πtan = πmin .
33
We even have µtan ≥ B A
. This can be seen as follows: By Cauchy-Schwarz, it follows that AC − B 2 ≥ 0
(note that we do not assume here that µ and 1 are not collinear). This together with B − rA > 0 gives:
34
portfolio π is risk-only if and only if π = π µtan ,r . Moreover,
1
πµtan ,r = Σ−1 (µ − r1) = πtan .
B − rA
Ar2 −2Br+C
for some λ ≥ 0. Using that π min,r = (1, 0) and π tan = π µtan ,r , where µtan = r + B−rA ,
we obtain
π = λπtan + (1 − λ)πmin,r
µtan − r
=λ 2 Σ−1 (µ − r1) + (1 − λ) × 0
Ar − 2Br + C
µλ − r
= 2
Σ−1 (µ − r1), (3.19)
Ar − 2Br + C
where µλ := λµtan + (1 − λ)r. Thus, π = πµλ ,r , and then also π = π µλ ,r . Since µλ ≥ r (this
uses that µtan > r and λ ≥ 0), it follows from Theorem 3.12 that π is efficient.
Conversely, suppose that π ∈ H 1+d−1 is efficient. By Theorem 3.12, there is µ0 ≥ r such
that π = π µ0 ,r . Set
µ0 − r
λ := ,
µtan − r
where µtan is defined as above. Then λ ≥ 0 (because µ0 ≥ r and µtan > r) and µ0 =
λµtan + (1 − λ)r. The same calculation as in (3.19) shows that π = λπtan + (1 − λ)πmin,r and
then also π = λπ tan + (1 − λ)π min,r .
Remark 3.15. (a). Theorem 3.14(b) is usually referred to as a mutual fund theorem because
it states that every efficient portfolio is a combination of the (efficient) mutual funds π tan and
π min,r , the first one containing only risky assets, the second one containing only the riskless
asset.
(b) In the setting of Theorem 3.14, define the capital market line (CML) by
Then it follows from Theorem 3.14 that a portfolio π is efficient if and only if it lies on the
capital market line in the sense that (µπ , σπ ) ∈ CML.
(c) Note that the capital market line CML is just a reparametrisation of the efficient
frontier E.
35
Markowitz tangency portfolio
r
σπmin σπ
Figure 3: A graphical illustration of the relationship between the risk-only and the general
case
36
3.8 On mean-variance equilibria
We proceed to study what happens if all agents investing in the financial market S are mean-
variance optimisers in the sense that they solve the mean-variance problem (1) or (2) (for
2 ) and hold the corresponding optimal portfolio.
some choice of µmin or σmax
For the arguments that follows, one must be very careful not to run into circular reasoning.
So far, we have always assumed that a financial market S is given exogenously, i.e., prices are
not influenced by the investment decisions of the market participants. We have then derived
optimal trading strategies for agents that are mean-variance optimisers. Now, we want use the
form of these optimal trading strategies to draw conclusions on the structure of the financial
market S. To avoid circular reasoning, we have therefore to assume a priori that the structure
of S is consistent with the derived mean-variance optimal strategies. This is a big assumption.
In economic terms, we have to assume a priori that there exists a mean-variance equilibrium.
For the following definition, we need the notion of shares outstanding: For a stock S i ,
the shares outstanding denotes the total number η i of shares of that stock held by all market
participants together.34 The shares outstanding times the market value of the stock η i S0i gives
the market capitalisation of the stock.
Remark 3.17. (a) Property (1) in Definition 3.16 is usually referred to as individual opti-
mality, and property (2) as market clearing. Both requirements together are at the core of
the concept of equilibrium, which extends beyond mean-variance preferences. Note that the
market clearing condition (b) consists in fact of d conditions, one for each stock S 1 , . . . , S d .
(b) We have not specified how many “shares” of the bank account S 0 are outstanding nor
have we required market clearing for the bank account. It is usually assumed that there are
34
In addition to the shares outstanding (which have legal ownership rights), there are also treasury shares,
which are shares held by the corporation issuing the shares itself and have no exercisable rights. The issued
shares is the sum of the shares outstanding and the treasury shares.
35
This means that the corresponding portfolio π k in fractions of wealth is mean-variance optimal, i.e., a
2
solution to the mean-variance problem (1) or (2) (for some choice of µmin or σmax ).
37
0 “shares outstanding” of the bank account, and in this case one says that the bank account
is in zero net supply. If we would make this assumption, we would also have to require that
PK 0
k=1 ϑk = 0. However, in the context of the CAPM, this does not really matter.
(c) In the context of equilibrium, it is often assumed (w.l.o.g.) that η = 1, in which case
one says that the stocks are in unit net supply.
i η i S0i
πm = , i ∈ {1, . . . , d}.
η · S0
Cov[Rπ , Rm ]
βπ = .
Var[Rm ]
Proof. (a). Since (S, η) is a mean-variance equilibrium, there are market participants 1, . . . , K
with portfolios ϑ1 , . . . , ϑK ∈ R1+d (parametrised in numbers of shares) such that each indi-
vidual portfolio ϑk is mean-variance optimal and the stock markets clear:
K
X
ϑk = η.
k=1
38
Denote by π k ∈ H 1+d−1 the portfolio parametrised in fractions of wealth corresponding to ϑk .
It follows from Theorem 3.13 and the fact that ϑk is mean-variance optimal, that each π k is
efficient. Hence, for each k ∈ {1, . . . , K} , Theorem 3.14(b) gives λk ≥ 0 such that
πk = λk πtan ,
πki i
πtan
ϑik = xk = x λ
k k , i ∈ {1, . . . , d}.
S0i S0i
K K K
!
X X πi X i
πtan
i
η = ϑik = xk λk tan = xk λ k , i ∈ {1, . . . , d}.
S0i S0i
k=1 k=1 k=1
(b). Fix π ∈ H 1+d−1 . Then by (a), a similar calculation as in Lemma 3.3, Theorem 3.14(a)
and Proposition 3.9, we obtain
π > (µ − r1) µπ − r
Cov[Rπ , Rm ] = Cov[Rπ , Rπtan ] = π > Σπtan = = ,
B − rA B − rA
> π > (µ − r1) µπ − r
Var[Rm ] = Var[Rπtan ] = πtan Σπtan = tan = tan .
B − rA B − rA
µπ − r
βπ (E [Rm ] − r) = βπ (µπtan − r) = (µπtan − r) = µπ − r = E [Rπ ] − r.
µπtan − r
Remark 3.19. (a) It is often said that the CAPM is a single factor model. However, this is
only partly true since one requires for a factor model residuals to be uncorrelated:36 It follows
from the CAPM formula, that each return Ri can be written as
Ri = r + β i (Rm − r) + εi , (3.21)
36
For a thorough discussion on factor models, we refer to Chapter 20 (and particular Section 20.5) of [6].
39
Cov[Ri ,Rm ]
where β i := Var[Rm ] is the beta of asset i and εi is a random variable with mean zero and
Cov[εi , Rm ] = 0. Indeed, setting εi := Ri − r − β i (Rm − r), the CAPM formula gives
E εi = E Ri − r − β i (E [Rm ] − r) = 0
(3.22)
Cov[εi , Rm ] = Cov[Ri , Rm ] − β i Var[Rm ] = Cov[Ri , Rm ] − Cov[Ri , Rm ] = 0 (3.23)
However, it does not follow that Cov[εi , εj ] = 0 for i 6= j as would be required for a single
factor model. Indeed,
One calls (β i )2 σπ2m the systematic or undiversifiable risk of asset i and Var[εi ] the unsystematic
risk of asset i that can be eliminated by diversification.37 Indeed, if π ∈ H 1+d−1 is any
portfolio, note that
d d
X
i i 1 X 1
πβ = π i Cov[Ri , Rm ] = Cov[π · R, Rm ]
Var[Rm ] Var[Rm ]
i=1 i=1
1
= Cov[π · R, Rm ] = βπ .
Var[Rm ]
d
X d
X d
X
Rπ = π · R = π 0 r + πi · Ri = π 0 r + πir + π i β i (Rm − r) + π i εi
i=1 i=1 i=1
= r + βπ (Rm − r) + επ ,
Pd i εi .
where επ := i=1 π Moreover, it follows from (3.22) and (3.23) and linearity of the
expectation and the covariance in the first component that
40
Now if π is efficient, then π = λπm by Theorems 3.14(b) and 3.18(a). This implies that
where the εi are mean-zero and uncorrelated (this is an assumption!) error terms. This model
is also called the single index model (SIM).39 The constant term/intercept αi in this regression
is usually referred to as Jensen’s alpha or just alpha and called the abnormal return. Note
that in the CAPM setting, the alpha is zero.
• The mean-variance criterion assumes that the variance of the return of a portfolio is a
good measure of the risk related to the portfolio. However, if returns are not normally
distributed, this is arguably not the case; cf. Chapter 5.
• The CAPM is a one period model, so that there is no opportunity to consume and re-
balance portfolios over time. For this reason, the CAPM has been extend to multiperiod
and continuous time models by Rubinstein and Merton.
• One key quantity in the CAPM is the market portfolio. In real financial markets,
however, this is quite an ambiguous object. In particular, choosing a major stock index
as market portfolio (as often done in practice) is somewhat arbitrary.
• Empirical studies indicate that the CAPM does not fully explain the variation of stock
returns. For instance, stocks with a high book to market ratio 41 tend to offer higher
38
This together with E [επ ] = 0 implies that even επ = 0 P -a.s.
39
In its original version, the εi are also assumed to follow a multivariate normal distribution.
40
Here, we do not consider the criticism that mean-variance portfolio selection and the CAPM ignore fric-
tions, trading constraints, etc. Moreover, we do not consider the criticism that markets might be inefficient
or that market participants might act in a non-rational or only partially rational way (which is studied in the
field of Behavioural Finance).
41
This is the book value of a company (calculated via balance sheet considerations) divided by its market
capitalisation.
41
expected returns than the CAPM would predict. For this reason, the CAPM in its
“linear regression version” of Remark 3.19(c) has been extended to multi-factor models
like the three-factor model by Fama and French or the four-factor model by Carhart.
42
4 Utility Theory
In this chapter, we seek to systematically describe preferences of an investor who has to com-
pare random outcomes like the future payoff of a financial asset or the return of a portfolio. To
this end, we will follow the axiomatic approach proposed by von Neumann and Morgenstern.
P X [B] := P [X ∈ B].
Lemma 4.1. Let (Ω, F, P ) be a probability space and D ⊂ R a nonempty interval. Moreover,
let X be a D-valued random variable with distribution ν := P X and h : D → R a measurable
function. Then h(X) is P -integrabe if and only if h is ν-integrable and in this case
Z
E [h(X)] = h(x) ν(dx).
The Dirac measure represents the distribution of a D-valued random variable X that takes
42
The main examples are D = R, D = [0, ∞) and D = (0, ∞).
43
Note that usually the expectation is defined via the integral (and not vice versa).
43
the value x with probability 1, i.e., X = x P -a.s. Note that for any measurable function
h : D → R, Z
h(y) δx (dy) = h(x).
In the sequel, we also need the notion of the mixture of two distributions. If ν1 and ν2
are probability distributions on (D, BD ) and α ∈ [0, 1], then the distribution αν1 + (1 − α)ν2
is called the mixture of ν1 and ν2 with weights α and (1 − α).44 If (Ω, F, P ) is a probability
space, then a random variable X has distribution αν1 + (1 − α)ν2 , if and only if there is a
Bernoulli random variable Y with P [Y = 1] = α and P [Y = 0] = (1 − α) such that X has
condition distribution ν1 given that Y = 1 and conditional distribution ν2 given that Y = 0.45
Warning: If X1 is a random variable with distribution ν1 and X2 is a random variable
with distribution ν2 , then in general αX1 +(1−α)X2 does not have distribution αν1 +(1−α)ν2 .
R
The following result states that the integral h(x) ν(dx) is linear not only in the integrand
h but also in the integrator ν.
ν = α1 δx1 + · · · + αN δxN .
Note that ν is the distribution of a discrete random variable Y taking the value xn with
probability αn , n ∈ {1, . . . , N }.
Definition 4.3. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
probability measures on (D, BD ). A preference order on M is a binary relation with the
following properties:
44
The cases α = 0 and α = 1 are somewhat degenerate.
45
PN More generally, if PN ν1 , . . . , νN are probability distributions on (D, BD ) and α1 , . . . , αN ∈ [0, 1] with
n=1 α n = 1, then νN with weights α1 , . . . , αN . If (Ω, F, P )
n=1 αn νn is called the mixture of ν1 , . . . ,P
is a probability space, then a random variable X has distribution N n=1 αn νn if and only if there is a random
variable Y with P [Y = n] = αn such that X has condition distribution νn given that Y = n, n ∈ {1, . . . , N }.
44
(a) Completeness: For all ν1 , ν2 ∈ M, either ν1 ν2 or ν2 ν1 or both are true.
Example 4.4. Let M be the (convex) set of all probability measures on (R, B) with finite
second moment, i.e., ν ∈ M if and only if R x2 ν( dx) < ∞. For ν ∈ M denote by µν :=
R
2 2 46
R R
R x ν( dx) the mean of ν and by σν := R (x − µν ) ν( dx) the variance of ν.
is a preference order.
γ 2 γ
ν1 ν2 if and only if µν1 − σν1 ≥ µν1 − σν22 ,
2 2
45
Definition 4.5. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
probability measures on (D, BD ). A preference order on M is said to have a von Neumann-
Morgenstern representation if there exists a measurable function U : D → R that is integrable
with respect to any ν ∈ M such that
Z Z
ν1 ν2 ⇔ U (x) ν1 (dx) ≥ U (x) ν2 (dx).
Remark 4.6. Linearity of the expectation and the fact that inequalities remain unchanged by
multiplication with positive constants imply that a von Neumann-Morgenstern representation
can only be unique up to a positive affine transformation, i.e., if U describes a preference
order, then aU + b, where a > 0 and b ∈ R, describe the same preference order.
Our goal is now to find axioms for preference orders that together imply a von Neumann-
Morgenstern representation. Surprisingly, essentially only two axioms are needed to ensure a
von Neumann-Morgenstern representation
The first axiom is quite intuitive from an economic perspective.
The independence axiom says that if we strictly prefer lottery ν1 over lottery ν2 , then
we should also strictly prefer the mixed lottery αν1 + (1 − α)ν3 over the mixed lottery
αν2 + (1 − α)ν3 . From a normative perspective, this is quite reasonable: Comparing the
mixed lotteries, with probability α, we have to choose between ν1 and ν2 , and with probabil-
ity (1−α), we do not have to make any choice because we get the lottery ν3 . The independence
axiom says that the conditional and the unconditional choice should coincide.
Even though the independence axiom has a good theoretical foundation, it is not clear if
it reflects people’s preferences in practice.
ν1 := δ2400 ,
ν2 := 0.33δ2500 + 0.66δ2400 + 0.01δ0 .
Which one would you choose? Empirical studies show that most people would prefer the sure
amount and so choose lottery ν1 .
46
Now consider the following lotteries:
Again, which one would you choose? Here, empirical studies show that most people would
prefer the slightly riskier lottery ν20 because it has the higher expectation (µν10 = 816 and
µν20 = 825).
However, the choice ν1 ν2 together with the choice ν20 ν10 violates the independence
axiom.
Indeed, if the independence axiom were satisfied, then
1 1 1 1 1 0 1 1 1
ν1 + ν20 ν2 + ν20 and ν + ν2 ν10 + ν2
2 2 2 2 2 2 2 2 2
Transitivity of yields47
1 1 1 1
ν1 + ν20 ν10 + ν2 .
2 2 2 2
But this is a contradiction because 12 ν1 + 21 ν20 = 0.33
2 δ2500 + 12 δ2400 + 0.67
2 δ0 = 12 ν10 + 21 ν2 . This
ends the example.
The second axiom is economically less intuitive but natural from a mathematical perspec-
tive.
Definition 4.9. Let D ⊂ R be an nonempty interval and M a nonempty convex subset of all
probability measures on (D, BD ). A preference order on M is said to satisfy the continuity
axiom if for any triple ν1 ν2 ν3 , there is α ∈ (0, 1) such that
αν1 + (1 − α)ν3 ∼ ν2 .
The continuity axiom says that if a lottery ν2 lies preference-wise strictly in between two
other lotteries ν1 and ν3 , then there is a convex combination of ν1 and ν3 such that one is
indifferent between ν2 and this convex combination.
For simple lotteries, the independence and the continuity axiom together imply a von
Neumann-Morgestern representation. For a proof of the following result, we refer to [2, Section
2.2].48
Theorem 4.10. Let M denote the collection of all simple lotteries on (D, BD ), where D ⊂ R
is a nonempty interval. Let be a preference order on M satisfying the independence and
47
Note that transitivity of implies transitivity of . Indeed, if ν1 ν2 and ν2 ν3 , then a fortiori ν1 ν2
and ν2 ν3 . So transitivity of gives ν1 ν3 . Seeking a contradiction, suppose that ν1 ν3 . Then ν3 ν1 ,
and as ν1 ν2 , transitivity of gives ν3 ν2 . But this is in contradiction to ν2 ν3 .
48
Note that the result is wrong for general lotteries on (D, BD ). For the general case, one needs stronger
continuity properties of ; see [2, Theorems 2.27 and 2.29].
47
U (x)
0 x1 x2 x
It is called strictly concave if the inequality in (4.1) is strict for x1 6= x2 and λ ∈ (0, 1).
Graphically speaking, (strict) concavity means that straight line segments joining (x1 , U (x1 ))
to (x2 , U (x2 )) always lie (strictly) below the graph of U .
We proceed to state and prove the fundamental inequality for concave functions.
Lemma 4.13 (Jensen’s inequality). Let (Ω, F, P ) be a probability space and X an integrable
random variable with values in a non-empty interval D ⊂ R. Let U : D → R be concave and
49
The converse is not true: For example, the function U : R → R, x 7→ −x4 is strictly concave, but U 00 (0) = 0.
48
suppose that E [|U (X)|] < ∞. Then
E [U (X)] ≤ U (E [X]) .
Moreover, the inequality is strict when U is strictly concave and X is not P -a.s. constant.
Proof. First, using the definition of concavity, one can show that for each a ∈ D, there is
b ∈ R such that
U (x) ≤ U (a) + b(x − a), (4.2)
and
P [U (X) < U (a) + b(X − a)] = P [X 6= a] > 0,
if U is strictly concave and X is not P -a.s. constant. Thus, by monotonicity and linearity of
the integral and the fact that a = E [X],
E [U (X)] ≤ E [U (a) + b(X − a)] = U (a) + b(E [X] − a) = U (a) + b(a − a) = U (a)
= U (E [X]),
where the inequality is strict if U is strictly concave and X is not P -a.s. constant.
Definition 4.14. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
R
probability measures on (D, B) having a finite expectation, i.e., |x| ν(dx) < ∞ for ν ∈ M.
Moreover, assume that M contains all Dirac measures δx for x ∈ D. Then a preference order
on M is called
49
R
• risk averse if δµν ν for ν ∈ M unless ν = δµν , where µν = x ν(dx).
Monotonicity of a preference order means that we strictly prefer a higher sure amount over
a lower sure amount, i.e., we strictly prefer “more to less”. This is a very natural assumption
both from a conceptual and an empirical perspective.
Risk-aversion of a preference order means that for a lottery ν ∈ M, we strictly prefer
to receive the actuarially fair value µν over the lottery ν itself, unless, of course, the lottery
is deterministic and there is no difference between the two. While risk aversion is a natural
requirement from a normative perspective, in reality, there are persons who are not risk averse
but even risk-seeking, i.e., they strictly prefer a lottery over the actuarially fair value.51
The following result characterises monotonicity and risk aversion for preference orders that
admit a von-Neumann-Morgenstern representation.
Lemma 4.15. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
probability measures on (D, B) having a finite expectation, and assume that M contains all
Dirac measures δx for x ∈ D. Suppose a preference order on M has a von Neumann-
Morgenstern representation
Z Z
ν1 ν2 ⇔ U (x) ν1 (dx) ≥ U (x) ν2 (dx),
(a). Let x, y ∈ D with x > y. Then (4.3) shows that δx δy if and only if U (x) > U (y).
(b). First, assume that is risk averse. Let x 6= y and α ∈ (0, 1). Then
Hence, by the von Neumann-Morgenstern representation, linearity of the integral in the inte-
51
A sad illustration of this fact is the existence of the gambling industry.
50
grator, and (4.3), we obtain
Z Z
U (αx + (1 − α)y) = U (z) δαx+(1−α)y (dz) > U (z) (αδx + (1 − α)δy )(dz)
Z Z
= α U (z) δx (dz) + (1 − α) U (z) δy (dz)
U (x) = − exp(−γx)
• Power utility. Let γ ∈ (0, ∞) \ {1}. Moreover, let D = [0, ∞) for γ ∈ (0, 1) and
D = (0, ∞) for γ ∈ (1, ∞). Then the function U : D → R given by
x1−γ
U (x) =
1−γ
51
• Logarithmic utility. Let D = (0, ∞). Then the function U : D → R given by
U (x) = log(x)
Remark 4.18. Logarithmic utility can be seen as power utility with parameter γ = 1. Indeed,
differentiating the utility function gives
x−γ for power utility with γ ∈ (0, ∞) \ {1},
U 0 (x) =
x−1 for logarithmic utility.
i.e., agents are indifferent between receiving the sure amount cU (ν) or the lottery ν. For this
reason, cU (ν) is called the certainty equivalent of ν for the utility function U . Moreover,
ρU (ν) := µν − cU (ν)
R
is called the risk premium of ν for the utility function U , where µν := x ν(dx) denotes the
actuarially fair value of ν. By risk aversion and monotonicity of it follows that ρU (ν) ≥ 0,
where the inequality is strict unless ν = δµν .
We proceed to study how the risk premium depends on the utility function U . To this
end, we argue heuristically and also assume that U is twice continuously differentiable and ν
has second moments. First, a Taylor expansion of order 1 on the left-hand side of (4.4) gives
U (cU (ν)) ≈ U (µν ) + U 0 (µν )(cU (ν) − µν ) = U (µν ) − U 0 (µν )ρU (ν). (4.5)
Next, a Taylor expansion under the integral of order 2 on the right-hand side of (4.4) and
52
linearity of the integral gives
Z Z
1
U (x) ν(dx) ≈ 0
U (µν ) + U (µν )(x − µν ) + U 00 (µν )(x − µν )2 ν(dx)
2
1
= U (µν ) + U 0 (µν )(µν − µν ) + U 00 (µν )σν2
2
1 00 2
= U (µν ) + U (µν )σν . (4.6)
2
U 00 (µν )
1
ρU (ν) ≈ − 0 σν2 . (4.7)
2 U (µν )
So the risk premium of ν is approximately 1/2 times the variance of ν times the coeffi-
00
cient − UU 0 (µ
(µν )
ν)
. For this reason, for a utility function U : D → R that is twice continuously
differentiable in the interior of D, the function AU : D◦ → (0, ∞) defined by
U 00 (x)
AU (x) = −
U 0 (x)
is called the Arrow-Pratt coefficient of absolute risk aversion of U . The higher the absolute
risk aversion AU of a utility function U , the more risk averse an agent is.
Risk aversion can not only be measured in absolute but also in relative terms. If we divide
(4.7) by µν on both sides to get an approximation of the relative risk premium, we obtain
µν U 00 (µν ) σν2
ρU (ν) 1
≈ − ,
µν 2 U 0 (µν ) µ2ν
σν2 R x−µν 2
where µ2ν
= µν ν(dx) denotes the relative variance of ν. So the relative risk premium
00
of ν is approximately 1/2 times the relative variance of ν times the coefficient − µνUU0 (µ(µν)
ν)
. For
this reason, for a utility function U : D → R that is twice continuously differentiable in the
interior of D, the function RU : D◦ → (0, ∞), defined by
xU 00 (x)
RU (x) = xAU (x) := −
U 0 (x)
is called the Arrow-Pratt coefficient of relative risk aversion of U . The higher the relative risk
aversion AU of a utility function U , the more risk averse an agent is.
Example 4.19. (a). Let U : R → R be an exponential utility function with parameter γ > 0,
i.e., U (x) = − exp(−γx). Then
U 00 (x) −γ 2 exp(−γx)
AU (x) = − = − = γ.
U 0 (x) γ exp(−γx)
53
So for exponential utility, the absolute risk aversion is constant and equal to the parameter γ.
For this reason, exponential utility is also called CARA utility, where CARA is an acronym
for constant absolute risk aversion.
(b). Let U : D → R be a power utility function with parameter γ ∈ (0, ∞) \ {1}, where
x1−γ
D = [0, ∞) for γ ∈ (0, 1) and D = (0, ∞) for γ ∈ (1, ∞), i.e., U (x) = 1−γ . Then
x(−γx−γ−1 )
RU (x) = − = γ.
x−γ
So for power (and logarithmic) utility, the relative risk aversion is constant and equal to the
parameter γ (1 for the logarithmic case). For this reason, power utility is also called CRRA
utility, where CRRA is an acronym for constant relative risk aversion.
Of course, we also need to make sure that the expectation in (4.8) is well defined; in particular
we need to check that ϑ · S 1 ∈ D P -a.s.
The problem (4.8) is surprisingly difficult, and only for very special cases closed-form
solutions obtain. So we proceed to study existence and uniqueness of (4.8). To this end, we
reformulate (4.8) in terms of the discounted risky assets X = S/S 0 . If ϑ ∈ Rd+1 is such that
ϑ · S 0 = x0 , then with X = (1, X) and using that X10 − X00 = 1 − 1 = 0, we obtain
ϑ · X 0 = ϑ0 × 1 + ϑ · X0 = x0 − ϑ · X0 + ϑ · X0 = x0 .
54
which is an unconstrained optimisation problem.55 Finally, if D is invariant by multiplication
e : D → R by
with positive constants, we can define the function U
U
e (x) = U ((1 + r)x), x ∈ D.
For the following theorem, we need one key result from Measure Theory. For a proof and
more information, we refer to [5, Chapter 28].
E Q [X] = E P [ZX] .
55
dP
and so Z = dQ .
The following result gives for the domain D = [0, ∞) and under relatively weak assump-
tions, existence, uniqueness and further properties of the expected utility maximisation prob-
lem in its simplified version (4.10), where we write again U instead of Ue.
and t ∈ {0, 1}. Let U : [0, ∞) → R be a utility function that is continuously differentiable on
(0, ∞) and satisfies U (0) = 0 and lim U (x) = +∞. Fix x > 0 and set
x→∞
n o
A(x) := ϑ ∈ Rd : x + ϑ · (X1 − X0 ) ≥ 0 P -a.s. ,
u(x) := sup E U x + ϑ · (X1 − X0 ) ,
ϑ∈A(x)
(a) The set A(x) is convex. It is compact if and only if S satisfies NA.
(d) If S satisfies NA and the unique ϑ∗ ∈ A(x) from part (c) lies in the interior of A(x),
then E U 0 x + ϑ∗ · (X1 − X0 ) < ∞ and the measure Q ≈ P on F defined by
U 0 x + ϑ∗ · (X1 − X0 )
dQ
= 0
dP E U x + ϑ∗ · (X1 − X0 )
Remark 4.24. (a) The set A(x) is called the set of admissible strategies for initial wealth x,
and the function u is called the indirect utility function. If S satisfies NA, one can show that
u is again a utility function.
(b) If Ω is finite and U satisfies the Inada condition limx→0 U 0 (x) = +∞, it is not difficult
to check that ϑ∗ lies in the interior of A(x).57
(c) For finite Ω, Theorem 4.23(d) can be seen as a constructive version of the Fundamental
√
Theorem of Asset Pricing. Indeed, choose U (x) = x.
57
If Ω is infinite, even with the Inada condition, ϑ∗ does in general not lie in the interior of A(x) and the
assertion in (b) is false.
56
Sketch of part (c) and (d) of Theorem 4.23. (c). First, we establish existence of ϑ∗ . By part
(b), u(x) < ∞. So there is a sequence (ϑn )n∈N in A(x) such that
Since A(x) is compact by part (a), by the Bolzano-Weierstraß theorem, there exists a sub-
sequence, denoted again by (ϑn )n∈N , converging to some ϑ∗ ∈ A(x). Assuming that we can
interchange expectation and limits and using continuity of U , we obtain
h i
u(x) = lim E [U (x + ϑn · (X1 − X0 ))] = E lim U (x + ϑn · (X1 − X0 ))
n→∞ n→∞
= E [U (x + ϑ∗ · (X1 − X0 ))]
Next,
h we i of ϑ∗ . Seeking a contradiction, let ϑ∗ 6= ϑ∗ ∈ A(x) be such
establish uniqueness
e
∗ 1 1
that E U x + ϑ̃∗ · (X1 − X0 ) = u(x). Set ϑb := 2 ϑ∗ + 2 ϑe∗ . Then ϑb∗ ∈ A(x) by convexity
of A(x). By strict concavity of U and nonredundancy of S,
1 1
U x + ϑb∗ · (X1 − X0 ) ≥ U (x + ϑ∗ · (X1 − X0 )) + U x + ϑe∗ · (X1 − X0 ) ,
2 2
where the inequality is strict with positive probability. Taking expectation gives
h i 1 1 h i
E U x + ϑb∗ · (X1 − X0 ) > E [U (x + ϑ∗ · (X1 − X0 ))] + E U x + ϑe∗ · (X1 − X0 )
2 2
1 1
= u(x) + u(x) = u(x)
2 2
= sup E U x + ϑ · (X1 − X0 ) ,
ϑ∈A(x)
which is a contradiction.
(d). Fix i ∈ {1, . . . , d}. Since ϑ∗ is an interior point of A(x), for all sufficiently small ε > 0,
ϑ∗ ± εei ∈ A(x), where ei denotes the ith unit vector in Rd . Maximality of ϑ∗ then gives
Dividing by ε, letting ε → 0 and assuming that we can interchange limits and expectations,
we obtain
57
and hence
E U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) = 0.
U 0 (x + ϑ∗ · (X1 − X0 ))
Q dQ i
X1i X0i P i P i i
E − =E (X − X0 ) = E (X − X0 )
dP 1 E P [U 0 (x + ϑ∗ · (X1 − X0 ))] 1
1
E P U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) = 0.
= P 0
E [U (x + ϑ∗ · (X1 − X0 ))]
58
5 Introduction to Risk Measures
In this short chapter, we briefly discuss how to quantify the downside risk of a financial
position. To this end, we follow the axiomatic approach initiated by Artzner, Delbaen, Ebner
and Heath in a seminal paper in 1999.
Definition 5.1. Let X be linear subspace of all real-valued random variables on a measurable
space (Ω, F), containing the constants. A map ρ : X → R is called a monetary measure of
risk if it has the following properties:
• Normalisation: ρ(0) = 0.
Let us briefly comment on the above properties: The number ρ(X) is the amount in capital
(or cash) which has to be added to the position X to make X acceptable, e.g. from the point
of view of a regulator. This explains normalisation and cash-invariance, which is also called
translation invariance. The financial meaning of monotonicity is that the downside risk of a
position is decreased if the payoff is increased.
We proceed to show that the variance – even transformed in an appropriate way – is not a
good measure of risk because it fails to be monotone, unless we restrict ourselves to the case
of normal random variables.
Example 5.2. Let (Ω, F, P ) be a probability space and X the set of all real-valued random
variables having finite second moment, i.e., X ∈ X if and only if E X 2 < ∞. Moreover, let
XN be a linear subspace of X such that all X ∈ XN are normally distributed, where we agree
that a normal distribution with variance zero and mean µ ∈ R is the Dirac distribution δµ .
Define the map ρ : X → R by
p
ρ(X) = Var[X] − E [X] .
59
Then ρ is normalised. It is also cash-invariant on X (and on XN ). Indeed, let X ∈ X and
m ∈ R. Then
p p
ρ(X + m) = Var[X + m] − E [X + m] = Var[X] − E [X] − m = ρ(X) − m.
Then X2 ≥ 1 = X1 .58 Using the formula for the mean and the variance of a Pareto distribu-
tion, we obtain
3 3 3 3
E [X2 ] = = and Var[X2 ] = 2
= .
3−1 2 (3 − 1) (3 − 2) 4
Thus, r √
3 3 3−3
ρ(X2 ) = − = > −1 = ρ(X1 ),
4 2 2
and so ρ is not monotone on X .
However, ρ is monotone on XN . Indeed, let X1 , X2 ∈ XN with X1 ≥ X2 . We may assume
without loss of generality both X1 and X2 have non vanishing variances σ12 > 0 and σ22 > 0.59
As X1 ≥ X2 it follows from monotonicity of the integral that µ1 := E [X1 ] ≥ E [X2 ] =: µ2 .
Let Φ be the cdf of a standard normal distribution, i.e.,
Z x 2
1 u
Φ(x) := √ exp − du, x ∈ R.
−∞ 2π 2
As X1 ≥ X2 , it follows that
x − µ1 x − µ2
FX1 (x) ≤ FX2 (x) ⇔ Φ ≤Φ , x∈R
σ1 σ2
58
We may assume without loss of generality that this equation does not hold only P -a.s. but even for all ω.
59
Note that X1 ≥ X2 implies that either both X 1 and X 2 are Dirac measures (in which case the claim is
trivial) or both X 1 and X 2 have nonvanishing variance.
60
As Φ is strictly increasing, it follows that
x − µ1 x − µ2
≤ (5.2)
σ1 σ2
Dividing (5.2) by x and letting x → ±∞, may deduce that σ12 = σ22 . Together with µ1 ≥ µ2 ,
we obtain
ρ(X1 ) = σ1 − µ1 ≤ σ2 − µ2 = ρ(X2 ).
61
5.2 Value at Risk and Expected Shortfall
We proceed to discuss the two most important risk measures used in practice. The first
example is value at risk which was “invented” in 1993 as part of the seminal “G-30 report”
and widely propagated by the “RiskMetrics” of JP Morgan launched in 1994.60
Definition 5.6. Let (Ω, F, P ) be a probability space and X the set of all random variables.
Let α ∈ (0, 1) be a confidence level. For X ∈ X , the Value at Risk of X at level α is given
by61
VaRα (X) = inf{m ∈ R : P [m + X < 0] ≤ α}.
The Value at Risk at level α is the smallest amount of capital which, if added to X, keeps
the probability of a negative outcome below or equal to α. Typical values for α are 0.05, 0.01
or 0.001.
Value at Risk is probably the most widely used risk measure in practice. One can easily
check that it is normalised, monotone, cash-invariant and positively homogeneous.
However, Value at Risk fails to be convex, i.e., it may punish diversification instead of
encouraging it.
Example 5.7. Let X1 and X2 be two independent identically distributed random variables
on some probability space (Ω, F, P ), where
and so
VaR0.01 (Xi ) = −90, i ∈ {1, 2}.
Hence, both X1 and X2 are acceptable and even very good from a VaR0.01 perspective. Now
consider the “diversified position”
1 1
X := X1 + X2 .
2 2
60
However, the use of value at risk (without the name) goes back to the early decades of the 20th century.
61
Note that in part of the literature α is replaced by 1 − α.
62
Then the distribution of X satisfies
and we have
1 if m ∈ (−∞, −90),
0.0199 if m ∈ [−90, 5),
P [X + m < 0] = (5.4)
0.0001 if m ∈ [5, 100),
if m ∈ [100, ∞),
0
Hence,
VaR0.01 (Xi ) = 5,
Let us briefly comment on what goes wrong in Example 5.7. The probability of a loss is
higher for X than for Xi . Indeed,
However, the expected size of a loss when it does happen is much larger for Xi than for X.
Indeed,
100 × 0.01
E −X i | − X i > 0 =
= 100, i ∈ {1, 2},
0.01
whereas
100 × 0.0001 + 5 × 0.0198
E [−X | − X > 0] = = 5.477
0.0199
So even though the probability of a loss is a bit higher for X than for Xi , the expected loss
given default is dramatically lower for X than for Xi . Therefore, from a regulatory point of
view, X is a much better risk than Xi .
For this reason, one might want to look at a more conservative, i.e., larger risk measure
than Value at Risk, which also takes into account the size of the loss given default.
Definition 5.8. Let (Ω, F, P ) be a probability space and X the set of all real-valued random
variables having finite first moments. Let α ∈ (0, 1) be a confidence level. For X ∈ X , the
Expected Shortfall of X at level α is given by
Z α
1
ESα (X) = VaRu (X) du.
α 0
Since VaRu is nonincreasing in u, it follows that ESα (X) ≥ VaRα (X) for all α and all
63
X. Moreover, one can show that unlike Value at Risk, Expected Shortfall is a coherent risk
measure, i.e., it is in particular a convex risk measure; see [2, Theorem 4.52]. One can even
show that it is “optimal” in the sense that it is the smallest law-invariant convex risk measure
that is more conservative than Value at Risk, see [2, Theorem 4.67].
We state an alternative characterisation of Expected Shortfall for continuous distributions,
which shows that Expected Shortfall takes care of the size of the loss given default. For a
proof of this result, we refer to [7, Lemma 2.13].
Lemma 5.9. Let X be an integrable random variable on a probability space (Ω, F, P ). Suppose
that the distribution of X is continuous. Then for α ∈ (0, 1),
The following example shows that Expected Shortfall encourages diversification and also
demonstrates that Lemma 5.9 is false without the assumption of a continuous distribution.
Example 5.10. Consider the setup of Example 5.7. Using (5.3), we obtain for i ∈ {1, 2},
100 if u ∈ (0, 0.01),
VaRu (Xi ) =
−90 if u ∈ [0.01, 1).
This gives
Z 0.01 Z 0.0001 Z 0.01
1 1
ES0.01 (X) = VaRu (X) du = 100 du + 5 du
0.01 0 0.01 0 0.0001
1
= (0.0001 × 100 + 0.0099 × 5) = 5.95.
0.01
So the Expected Shortfall of the “diversified position” X is significantly lower than the Ex-
pected Shortfall of the individual positions Xi .
64
Moreover, note that
65
6 Pricing and Hedging in Finite Discrete Time
The goal of this chapter is to describe pricing and hedging of derivative contracts like call or
put options of financial markets in finite discrete time.
(1) Y is G-measurable,
The random variable Y in Definition 6.1 is to be interpreted as the best prognosis for
X given the information G. The measurability property (1) ensures that Y only uses the
information given in G, and the averaging property (2) ensures that Y is indeed the best
prognosis.
G := σ(A1 , . . . , AN ),
i.e, G is the smallest σ-algebra containing A1 , . . . , AN . One can check that A ∈ G if and only
if there are indices n1 , . . . , nK ∈ {1, . . . , N } such that A = K
S
k=1 Ank . One can then show that
66
a random variable Y : Ω → R is G-measurable if and only if it can be written as
N
X
Y = yn 1A n , (6.1)
n=1
N
X
Y := E [X | An ] 1An (6.2)
n=1
is (a version of) the conditional expectation of X given G, where for A ∈ F, the elementary
conditional expectation of X given A is defined by
E [X1A ] if P [A] > 0,
P [A]
E [X | A] := (6.3)
0 if P [A] = 0.
Indeed, it follows from (6.1) that Y satisfies the measurability property (1) of a conditional
expectation. To check that Y also satisfies the averaging property (2), let A ∈ G. Then there
are indices n1 , . . . , nK ∈ {1, . . . , N } such that A = K
S
k=1 Ank . Note that by definition of Y
and (6.3), for each k ∈ {1, . . . , K},
h i h i h i
E Y 1Ank = E E [X | Ank ] 1Ank = E [X | Ank ] P [Ank ] = E X1Ank .
K
X h i XK h i
E [Y 1A ] = E Y 1 A nk = E X1Ank = E [X1A ] .
k=1 k=1
The following result gives existence, uniqueness and further properties of conditional ex-
pectations.62
67
(d) If X1 and X2 are integrable random variables and a, b ∈ R, then
If in addition P [X1 > X2 ] > 0, then P [E [X1 | G] > E [X2 | G]] > 0.
E [E [X | G] | H] = E [X | H] P -a.s.
E [ZX | G] = ZE [X | G] P -a.s.
68
by the σ-algebra Ft . As information increases over time, it is naturally to assume that
F0 ⊂ F1 ⊂ · · · ⊂ FT , (6.4)
and we call any increasing sequence of σ-algebras (Ft )t∈{0,...,T } satisfying (6.4) a filtration and
(Ω, F, (Ft )t∈{0,...,T } , P ) a filtered probability space. To simplify the presentation, we always
assume that F0 = {∅, Ω} (“trivial information”) and FT = F (“full information”). With regard
to a filtration (Ft )t∈{0,...,T } , there are two important notions for stochastic processes.
• A stochastic process X = (Xt )t∈{0,...,T } is called adapted (to the filtration (Ft )t∈{0,...,T } ),
if each Xt is Ft -measurable.
• A stochastic process Y = (Yt )t∈{1,...,T } is called predictable 64 (for the filtration (Ft )t∈{0,...,T } ),
if each Yt is Ft−1 -measurable.
If a stochastic process X = (Xt )t∈{0,...,T } is adapted, then at each time t, we are given the
information Ft and so can fully observe Xt (and also X0 , . . . , Xt−1 ). By contrast, we may not
be able to fully observe Xt+1 , . . . , XT . If a stochastic process Y = (Yt )t∈{1,...,T } is predictable,
Yt (and also Y1 , . . . , Yt−1 ) can not only be fully observed at time t but already at time t − 1,
i.e., one period ahead. So we can accurately “predict” Yt already at time t − 1.
For an adapted process X = (Xt )t∈{0,...,T } , knowledge about Xt does not give any knowl-
edge about Xt+1 in general. The special case that Xt gives the best available information
about Xt+1 , i.e., Xt = E [Xt+1 | Ft ] P -a.s. leads to the concept of a martingale.
Definition 6.5. Let M = (Mt )t∈{0,...,T } be a real-valued stochastic process on some filtered
probability space (Ω, F, (Ft )t∈{0,...,T } , P ). Then M is called a martingale (with respect to P
and (Ft )t∈{0,...,T } ) if
Remark 6.6. (a) In Definition 6.5, property (1) is referred to as adaptedness, property (2)
is referred to as integrability, and property (3), which is the crucial property, is referred to as
martingale property.
(b) The martingale property (3) in Definition 6.5 is equivalent to the formally weaker
property
E [Mt | Ft−1 ] = Mt−1 P -a.s. for all t ∈ {1, . . . , T }. (6.5)
64
Note that in our definition, predictable processes start at t = 1.
69
Indeed, if the martingale property (3) in Definition 6.5 is satisfied, then also (6.5) is satisfied
(pick s = t − 1). Otherwise, if (6.5) is satisfied, let 0 ≤ s ≤ t ≤ T . If s = t, then
E [Mt | Fs ] = E [Mt | Ft ] = Mt = Ms P -a.s. by adaptedness of M and the pull-out property of
conditional expectations. Otherwise, there is n ∈ {1, . . . , T } such that s = t − n. The tower
property of conditional expectations and (6.5) give
(c) If “=” in property (3) of Definition 6.5 is replaced by “≥”, M is called a submartingale,
and if it is replaced by “≤”, M is called a supermartingale.
Example 6.7. (a) Let (Ω, F, P ) be a probability space and X1 , . . . , XT independent integrable
random variables with mean 0. Also assume that F = σ(X1 , . . . , XT ). Set F0 := {∅, Ω} and
Ft := σ(X1 , . . . , Xt ), t ∈ {1, . . . , T },
i.e., Ft is the smallest σ-algebra for which X1 , . . . , Xt are measurable. Define the process
M = (Mt )t∈{0,...,T } by65
t
X
Mt := Xi .
i=1
So M is a martingale.
(b) Let (Ω, F, (Ft )t∈{0,...,T } , P ) be a filtered probability space and Z an F-measurable
integrable random variable. Then the process M = (Mt )t∈{0,...,T } defined by
Mt := E [Z | Ft ]
65 P0
By convention i=1 Xi := 0.
70
is a martingale with MT = Z P -a.s. Indeed, adaptedness and integrability of M follow from
the definition of conditional expectations. The martingale property of M follows from the
tower property of conditional expectations, and MT = Z P -a.s. follows from the pull-out
property of conditional expectations and our standing assumption that FT = F.
t
Y
St0 := (1 + rk ),
k=1
where rk > −1 P -a.s. is Fk−1 -measurable and denotes the interest rate in period k, i.e, from
k − 1 to k. So the process (rt )t∈{1,...,T } is predictable. We also refer to S 0 as bank account.
We set as in Chapter 2,
and call the Rd -valued stochastic process S = (St1 , . . . , Std )t∈{0,...,T } the risky assets.
Example 6.8 (Binomial model). Assume that d = 1, i.e., there is only one risky asset. Let
r > −1 and u > d > −1. Assume that the bank account is given by
t
Y
St1 = S01 Yi ,
i=1
where S01 > 0 and Y1 , . . . , YT are i.i.d. random variables on some suitable probability space
(Ω, F, P ) satisfying
71
where p1 , p2 > 0 and p1 + p2 = 1. We assume that the filtration (Ft )t∈{0,...,T } is given by
i.e., Ft is the smallest σ-algebra with respect to which S01 , . . . , St1 are measurable. One can
check that Ft = σ(Y1 , . . . , Yt ) for t ∈ {1, . . . , T }, i.e., Ft is also the smallest σ-algebra generated
by Y1 , . . . , Yt .
For a small number of T , e.g. T = 3, the above model can be nicely illustrated by the
following trees, where the numbers beside the branches denote transition probabilities. For
convenience, we assume that S01 = 1.
1 1 1
S0 : 1 1+r (1 + r)2 (1 + r)3
p1 (1 + u)3
p1 (1 + u)2
p2
p1 1+u (1 + u)2 (1 + d)
p2 p1
S1 : 1 (1 + u)(1 + d)
p2 p1 p2
1+d (1 + u)(1 + d)2
p2 p1
(1 + d)2
p2 (1 + d)3
Let us finally describe how to give a rigorous description of the binomial model. This is
for instance important for implementing the binomial model on a computer. For Ω, we take
the path space
n o
Ω := {1, 2}T = ω = (x1 , . . . , xT ) : x1 , . . . , xT ∈ {1, 2} ,
i.e., each ω = (x1 , . . . , xT ) describes one path in the tree corresponding to the model; e.g.
ω = (1, . . . , 1) describes the state of the world that the stock goes up in each period. We set
F := 2Ω , define the random variables Y1 , . . . , YT by
1 + u if x = 1,
t
Yt (ω) = Yt ((x1 , . . . , xT )) =
1 + d if x = 2,
t
T
Y
P [{ω}] := P [{(x1 , . . . , xT )}] = p xt .
t=1
72
Next, we set for t ∈ {1, . . . , T } and x1 , . . . , xt ∈ {1, 2},
A(x1 ,...,xt ) := {ω = (e eT ) ∈ Ω : x
x1 , . . . , x et = xt } .
e1 = x1 , . . . , x
Then A(x1 ,...,xt ) denotes all states of the world with “path up to time t” given by (x1 , . . . , xt ).
It is not difficult to check that
Ft := σ A(x1 ,...,xt ) : x1 , . . . , xt ∈ {1, 2} , t ∈ {1, . . . , T }
so for a state of the world ω = (x1 , . . . , xT ) ∈ Ω given the information in Ft , we can determine
the values of x1 , . . . , xt but not the values of xt+1 , . . . , xT , i.e., we can say if the stock went
up or down in period 1, . . . , t, but we cannot say if the stock will go up or down in periods
t + 1, . . . , T .
Remark 6.9. Note that for the binomial model, the tree for S 1 is recombining, so that the
number of nodes only grows linearly in time. For non-recombining trees, the number of nodes
grows exponentially in time. This difference is very important from a computational/numerical
perspective.
As in Chapter 2, we discount with S 0 (or take S 0 as numéraire) and define the discounted
assets X 0 , . . . , X d by
Sti
Xti := , t ∈ {0, . . . , T }, i ∈ {0, . . . , d}.
St0
Then X 0 ≡ 1, and X = (Xt1 , . . . , Xtd )t∈{0,...,T } expresses the value of the risky assets in units
of the numéraire S 0 .
Definition 6.10. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). A trading strategy is an R1+d -valued stochastic process ϑ =
(ϑ0t , ϑt )t∈{1,...,T } that is predictable.
73
together, the amount invested at time t − 1 (after trading) is
d
X
ϑt · S t−1 = ϑit St−1
i
,
i=0
d
X
ϑt · S t = ϑit Sti .
i=0
i.e., the pre-trading and the post-trading value of the strategy coincide. This leads to the
notion of a self-financing strategy.
Definition 6.11. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). A trading strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } is called a self-financing
strategy if
ϑt · S t = ϑt+1 · S t P -a.s. for t ∈ {1, . . . , T − 1} (6.6)
The self-financing condition (6.6) is extremely natural from an economic perspective. From
a mathematical perspective, however, it is a rather inconvenient constraint. For this reason,
we seek to find an alternative characterisation of self-financing strategies. It turns out that to
this end, we have to look at discounted quantities.
Definition 6.12. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). For a trading strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } , define the
t
X
G0 (ϑ) := 0 and Gt (ϑ) := ϑk · (Xk − Xk−1 ) for t ∈ {1, . . . , T };
k=1
The name “value process” for V (ϑ) comes from the fact that V (ϑ) denotes the (discounted)
value of the strategy ϑ at time t (after trading for t = 0 and before trading for t ∈ {1, . . . , T }).
74
To understand the name “gains process” for G(ϑ), first note that for t ∈ {1, . . . , T }, by the
fact that Xt0 − Xt−1
0 = 1 − 1 = 0,
expresses the (discounted) gains (or losses) from trading in period t, i.e, from t − 1 to t. So
Gt (ϑ) = tk=1 ϑk · (Xk − Xk−1 ) denotes the (discounted) accumulated gains (or losses) from
P
Proposition 6.13. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ) and ϑ = (ϑ0t , ϑt )t∈{1,...,T } a trading strategy. Then the following
are equivalent:
(a) ϑ is self-financing.
Proof. Dividing both sides of (6.6) by S 0 shows that (a) is equivalent to (b). Moreover, (b)
is equivalent to
Moreover, the definition of the value and the gains process and the fact that X10 − X00 =
1 − 1 = 0 give
Now assuming (b), summing over (6.8) and then adding (6.9) gives (c), and assuming (c) and
subtracting (c) for t from (c) for t + 1 gives (6.8), which in turn is equivalent to (b).
The equivalence of (a) and (c) in Proposition 6.13 has a very important consequence. Any
pair (V0 , ϑ), where V0 ∈ R and ϑ = (ϑt )t∈{1,...,T } is a Rd -valued predictable process can be
66
Note that the value process V (ϑ) depends on all 1 + d coordinates of ϑ, whereas the gains process G(ϑ)
only depends on the last d coordinates ϑ of ϑ.
75
identified with the self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } whose value process satisfies
and set ϑ := (ϑ0 , ϑ). It follows from the second equality in (6.11) that ϑ0 and hence also ϑ
are predictable. Moreover, by the definition of the value process and (6.11), we obtain
We will make use of the identification (6.10) throughout the rest of the chapter. To this end,
we introduce the shorthand notation
ϑ=
b (V0 , ϑ).
Definition 6.14. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probabil-
ity space (Ω, F, (Ft )t∈{0,...,T } , P ). A self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } is called an
arbitrage opportunity if
The financial market S is called arbitrage-free if there are no arbitrage opportunities. In this
case one also says that S satisfies NA.
Remark 6.15. By the same argument as in Remark 2.4, one can show that if the market S
admits arbitrage, there always exists an arbitrage opportunity with ϑ1 · S 0 = 0.
The following result is the multiperiod version of Proposition 2.7. Its proof is left as an
exercise.
Proposition 6.16. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). The following are equivalent:
76
(b) There does not exist a predictable process ϑ = (ϑ1t , . . . , ϑdt )t∈{1,...,T } such that
Definition 6.17. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). Denote by X := S/S0 the discounted risky assets. A measure
Q on (Ω, F) is called an equivalent martingale measure (EMM) for X if Q ≈ P and each X i
is a Q-martingale, i.e., a martingale under the measure Q.
Theorem 6.18. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). Denote by X := S/S0 the discounted risky assets. Let Q ≈ P
be an equivalent measure. Then the following are equivalent:
(b) For all self-financing strategies ϑ = (ϑ0t , ϑt )t∈{1,...,T } for which ϑ is bounded, the value
process V (ϑ) is a Q-martingale.67
(c) For all self-financing strategies ϑ = (ϑ0t , ϑt )t∈{1,...,T } with VT (ϑ) ≥ 0 Q-a.s., the value
process V (ϑ) is a (nonnegative) Q-martingale.
Proof. We only establish (a) ⇒ (b).68 So let Q be an EMM for X and ϑ = (ϑ0t , ϑt )t∈{1,...,T } a
self-financing strategy for which ϑ is bounded. Proposition 6.13 gives
t
X
Vt (ϑ) = V0 (ϑ) + Gt (ϑ) = V0 (ϑ) + ϑk · (Xk − Xk−1 ), t ∈ {0, . . . , T }. (6.12)
k=1
As |V0 (ϑ)| is a constant and each |Xki | is Q-integrable by the fact that each X i is a Q-
martingale, it follows that |Vt (ϑ)| and hence also Vt (ϑ) is Q-integrable.
67
Note that (ϑ0t )t∈{1,...,T } might be unbounded.
68
For a proof of the more difficult directions (b) ⇒ (c) and (c) ⇒ (a), we refer to [2, Theorem 5.14].
77
To check the Q-martingale property of V (ϑ) (in the form of (6.5)), fix t ∈ {1, . . . , T }.
Then by (6.12), the fact that Vt−1 (ϑ) and ϑt are Ft−1 -measurable, linearity and the pull-out
property of conditional expectations, and the Q-martingale property of each X i , we obtain
d
X
E Q ϑit (Xti − Xt−1
i
= Vt−1 (ϑ) + ) Ft−1
i=1
d
X
ϑit E Q Xti − Xt−1
i
= Vt−1 (ϑ) + Ft−1
i=1
Xd
= Vt−1 (ϑ) + ϑit (Xt−1
i i
− Xt−1 )
i=1
(b) There exists an EMM for the discounted risky assets X = S/S 0 .
Proof. We only establish the easy direction (b) ⇒ (a). So let Q ≈ P be an EMM and
ϑ = (ϑ1t , . . . , ϑdt )t∈{1,...,T } a predictable process with GT (ϑ) ≥ 0 P -a.s. By Proposition 6.16, it
suffices to show that GT (ϑ) = 0 P -a.s. As Q ≈ P , we have GT (ϑ) ≥ 0 Q-a.s. and it suffices
b (0, ϑ). Then V0 (ϑ) = 0 and VT (ϑ) = 0 + GT (ϑ) ≥ 0
to show that GT (ϑ) = 0 Q-a.s. Set ϑ =
Q-a.s. Theorem 6.18 gives that V (ϑ) is a Q-martingale, which implies in particular that
Q[{(x1 , . . . , xT )}] = qx1 qx2 |x1 qx3 |(x1 ,x2 ) × · · · × qxT |(x1 ,...,xT −1 )
78
where qx1 = Q[Y1 = yx1 ] and
By Example 2.9, Q ≈ P if and only if qx1 , . . . , qxT |(x1 ,...,xt−1 ) > 0 for all x1 , . . . , xT ∈ {1, 2}.
Moreover, if Q ≈ P , then by the equivalent characterisation (6.5) of the martingale property,
Q is an EMM for X 1 if and only if69
1
E Q Xt1 Ft−1 = Xt−1
Q-a.s., t ∈ {1, . . . , T }.
x1 ,...,xt−1 ∈{1,2}
q1 (1 + u) + q2 (1 + d) = 1 + r
(1 + u)q1|(x1 ,...,xt−1 ) + (1 + d)q2|(x1 ,...,xt−1 ) = (1 + r), x1 , . . . , xt−1 ∈ {1, 2}, t ∈ {2, . . . , T }
Using that q1 + q2 = 1 and q1|(x1 ,...,xt−1 ) + q2|(x1 ,...,xt−1 ) = 1 it follows that X 1 is a Q-martingale
69
Note that Q integrability of X 1 is trivially satisfied as X 1 is bounded.
79
if and only if
r−d u−r
q1 = and q2 = ,
u−d u−d
r−d u−r
q1|(x1 ,...,xt−1 ) = and q2|(x1 ,...,xt−1 ) = , x1 , . . . , xt−1 ∈ {1, 2}, t ∈ {2, . . . , T }.
u−d u−d
Note that q1|(x1 ,...,xt−1 ) and q2|(x1 ,...,xt−1 ) do not depend on x1 , . . . , xt−1 . This implies that the
Yk are also independent under Q. Clearly, q1 , q2 , q1|(x1 ,...,xt−1 ) , q2|(x1 ,...,xt−1 ) > 0 if and only if
d < r < u. So, in conclusion, the multi-period Binomial model is arbitrage-free if and only if
d < r < u, and in this case there exists a unique EMM for X 1 satisfying
T
Y
Q[{x1 , . . . , xT }] = q xt ,
t=1
r−d u−r
where q1 := u−d and q2 := u−d .
Definition 6.21. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). A nonnegative FT -measurable random variable C is called a
European contingent claim (with maturity T ). It is called a derivative security if it can be
written as a measurable function of St0 , . . . , Std for t ∈ {0, . . . , T }.
Example 6.22. (a) The owner of a European call option on asset i ∈ {1, . . . , d} with strike
K > 0 and maturity T has the right but not the obligation to buy asset i at time T for
price K. Of course, any rational person will exercise (i.e. make use of) the right if and only
if STi (ω) > K, and in this case the net payoff of the option is STi (ω) − K. Otherwise, i.e.,
if STi (ω) ≤ K, the option is worthless. So the value of the option at time T is given by the
contingent claim
C = (STi − K)+ ,
80
course, any rational person will exercise (i.e. make use of) the right if and only if STi (ω) < K,
and in this case the net payoff of the option is K − STi (ω). Otherwise, i.e., if STi (ω) ≥ K, the
option is worthless. So the value of the option at time T is given by the contingent claim
C = (K − STi )+ .
(c) Let A denote the event of an extreme weather situation like hail at time T . It is natural
to assume that A is FT -measurable but independent of the market S. A toy example of a
weather derivative is a contract that pays one unit of money at time T if the extreme event
A happens and zero otherwise. The corresponding contingent claim is given by
C = 1A .
Remark 6.23. (a) The (somewhat confusing) qualifier “European” signifies that the contin-
gent claim may exercised only at one date, i.e., at maturity. By contrast, so-called American
contingent claims can be exercised at any time up to and including maturity. Whereas in
reality most contingent claims are American, we only consider European ones in the sequel
because the theory for them is somewhat easier. For an excellent treatment of American
contingent claims, we refer to [2, Chapter 7].
(b) The notion of a European contingent claim can be extended to contracts with maturity
t < T . A contingent claim C with maturity t < T is just a nonnegative Ft -random variable.
We proceed to study the question how we can assign to a contingent claim C a value at
times t < T , in particular at t = 0. Assuming that the underlying market S satisfies NA, we
want to do this in such a way that we do not create any new arbitrage opportunities.
The following example illustrates that this not as straightforward as one might think, and
naively approaching the problem does not work.
Example 6.24. Consider the one-period model S = (St0 , St1 )t∈{0,1} described by the following
trees, where the numbers beside the branches denote probabilities.
1
S0 : 1 1
110
0.98
S 1 : 100 100
0.01
0.01
90
More precisely, set Ω := {ω1 , ω2 , ω3 }, F = F1 = 2Ω , S11 (ω1 ) = 110, S11 (ω2 ) = 100, S11 (ω3 ) = 90
and P [{ω1 }] = 0.98, P [{ω2 }] = 0.01, and P [{ω3 }] = 0.01. Note that the interest rate is zero
81
so that S 1 = X 1 . It is not difficult to check that the market S satisfies NA. Now consider a
call option on S 1 with strike K = 100 and maturity 1, i.e., the contingent claim
C = (S11 − 100)+
If we agree that v C ∈ R is a fair value at time 0, we can represent this “new asset” S 2 by the
tree
10
0.98
S 2 : vC 0
0.01
0.01
0
82
We proceed to describe in a systematic way how to find fair, i.e., arbitrage free values for
a contingent claim. We first consider the ideal case.
Definition 6.25. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). A contingent claim C is called attainable or replicable if there
exists a self-financing trading strategy ϑ such that
ϑT · S T = C P -a.s.
and in this case, we say that the discounted contingent claim H is attainable and call ϑ a
replication strategy for H.
The following result shows that for arbitrage-free markets, the value process of an at-
tainable contingent claim is unique and can be easily computed if one knows at least one
EMM.
Theorem 6.26. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ) and H an attainable discounted contingent claim. Assume that
S satisfies NA and denote by P the set of all EMMs for X = S/S 0 . Then
(c) There exists a P -a.s. unique adapted process (VtH )t∈{0,...,T } with VTH = H P -a.s. such
that the extended (1+d+1)-dimensional market (S, S 0 V H ) = (St0 , St1 , . . . , Std , St0 VtH )t∈{0,...,T }
satisfies NA. It is given by
(d) If ϑ = (ϑ0t , ϑt )t∈{1,...,T } is any replication strategy for H, then VtH = Vt (ϑ) P -a.s., for
t ∈ {0, . . . , T }.
Proof. As H is attainable, there exists a self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } such that
H = VT (ϑ) P -a.s.
83
Let Q ∈ P and note that P =
6 ∅ by the fundamental theorem of asset pricing (Theorem 6.19).
As H = VT (ϑ) ≥ 0 P -a.s. and hence also Q-a.s., it follows from Theorem 6.18 that V (ϑ) is a
Q-martingale. This implies in particular that H = VT (ϑ) is Q-integrable, and so we have (a).
Next, fix Q1 ∈ P and define the process (VtH )t∈{0,...,T } by
Then the Q1 -martingale property of V (ϑ), the fact that H = VT (ϑ) and the fact that Q1 ≈ P
imply that
On the other hand V H is a Q-martingale for any Q ∈ P by (6.15) and Example 6.7(b). So we
have (c).
Theorem 6.26 completely answers all questions concerning the valuation of a (discounted)
contingent claim H provided that it is attainable. However, it does not give any criterion to
determine whether a containing claim is attainable or not, nor does it provide any guidance
concerning the valuation of non-attainable contingent claims. Not surprisingly, this is more
difficult.
The next result provides a necessary and sufficient criterion to decide whether a contingent
claim is attainable or not. Moreover, it gives a full description of all arbitrage-free prices at
time 0 for a non-attainable contingent claim.71 For a proof, we refer to [2, Theorems 5.29 and
5.32].
71
One can also give a full characterisation of all arbitrage-free values at intermediate times t ∈ {1, . . . , T − 1}
for a non-attainable contingent claim. However, this is rather complicated.
84
Theorem 6.27. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ) and H a discounted contingent claim. Assume that S satisfies
NA and denote by P the set of all EMMs for X = S/S 0 . Then the set of arbitrage-free prices
for H is non-empty and given by
Moreover:
(a) H is attainable if and only if Π(H) consists of a single element. In this case,
Π(H) = V0H ,
(b) H is not attainable if and only if Π(H) consists of more than one element. In this case,
Π(H) = πinf (H), πsup (H) ,
πinf (H) := inf E Q [H] ∈ [0, ∞) and πsup (H) := sup E Q [H] ∈ (0, ∞].
Q∈P Q∈P
Example 6.28. Consider the setup of Example 6.24. If we describe an equivalent measure
Q ≈ P by the probability vector (q1 , q2 , q3 ), where qi = Q[{ωi }], i ∈ {1, 2, 3}, one can check
that the set P of equivalent martingale measures for X 1 is given by
So by Theorem 6.27 the fact that that H is bounded, the set of arbitrage-free prices for H = C
(because S11 = 1) is given by
Qλ
1
Π(H) = E [H] : λ ∈ (0, 1) = λ × 10 : λ ∈ (0, 1) = (0, 5).
2
As Π(H) contains more than one element, it follows that H is not attainable and any number
in (0, 5) is a fair value for v C .
85
Let us summarise the steps to value a (discounted) contingent claim H in an arbitrage-free
financial market S in finite discrete time.
(3a) If Q 7→ E Q [H] is constant, then H is attainable, and its unique arbitrage-free value
process is given by
VtH = E Q [H | Ft ] , t ∈ {0, . . . , T },
(3b) If Q 7→ E Q [H] is not constant, then H is not attainable and E Q [H] is a fair price at
time 0 for all Q ∈ P with E Q [H] < ∞.
In case (3b), we are faced with a genuine problem: There is no unique fair price. So unlike in
the complete case, we have to take preferences of the investor into account. How to do this
exactly, is an ongoing debate in the literature, and there is no easy answer to this question.
Warning: In large parts of the literature, in particular, in credit risk and in more applied
and computational settings, one often just fixes one “nice” equivalent martingale measure Q
(often referred to as the risk-neutral measure) and calls
VtH,Q := E Q [H | Ft ]
the “risk-neutral price” of H at time t. However, if H is not attainable VtH,Q crucially depends
on Q, so that one should at least think very carefully which Q ∈ P one chooses. Otherwise,
there might not be much economic warrant for the resulting “prices”. The key problem is that
unlike in the complete case, those “prices” are not linked to any hedging strategy, and so if one
sells a derivative product for a certain “price”, it is unclear how to hedge the risk involved with
selling the product.
The next result shows that there is a very simple criterion in terms of EMMs to decide
whether an arbitrage-free financial market is complete or incomplete.
86
Theorem 6.30. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). Assume that S satisfies NA. Then S is complete if and only if
there exists a unique EMM for X = S/S 0 .
Proof. Denote by P the set of all EMMs for X. As S satisfies NA, it is nonempty by the
FTAP (Theorem 6.19).
First, if P only contains one element Q, Theorem 6.27 implies that for each discounted
contingent claim H,
Π(H) = {E Q [H]},
Theorem 6.30 is sometimes called the second fundamental theorem of asset pricing. To-
gether with the first fundamental theorem of asset pricing (Theorem 6.19) it gives a very
beautiful and conclusive description of financial markets in finite discrete time:
• Existence of an EMM is equivalent to the market being arbitrage-free.
• Uniqueness of an EMM is equivalent to the market being complete (and arbitrage free).
Both results can be extended to continuous or infinite discrete time. However, the precise
formulations become more subtle and the proofs far more difficult.
Remark 6.31. One can show that if a financial market S is complete, then necessarily F = FT
is finite. More precisely, it may have at most (1 + d)T atoms; see [2, Theorem 5.37]. This
shows that even though it makes things nice and simple, completeness is a very restrictive
assumption.
As an application, we establish the so-called put-call parity for complete markets.72
Theorem 6.32. Let S = (St0 , St )t∈{0,...,T } be complete and arbitrage-free financial market on
some filtered probability space (Ω, F, (Ft )t∈{0,...,T } , P ). Assume that St0 = (1 + r)t for some
call
constant interest rate r > −1, t ∈ {0, . . . , T }. Fix i ∈ {1, . . . , d} and K > 0. Denote by V H
the (discounted) value process corresponding to the (undiscounted) call option
87
put
on asset i with (undiscounted) strike K and maturity T and by V H the (discounted) value
process corresponding to the (undiscounted) put option
C put = (K − STi )+
on asset i with (undiscounted) strike K and maturity T . Then we have put-call parity:
call put K
VtH − VtH = Xti − P -a.s., t ∈ {0, . . . , T }.
(1 + r)T
Proof. The (discounted) contingent claims corresponding to the put and the call option are
are given by
+
(S i − K)+
K
H call
= T 0 = XTi − ,
ST (1 + r)T
+
(K − STi )+
K
H put
= = − XTi .
ST0 (1 + r)T
Note that
K
H call − H put = XTi − . (6.16)
(1 + r)T
Since S is arbitrage-free complete, both the call and the put option are attainable, and there
exists an unique EMM Q for the discounted risky assets X = S/S 0 by Theorems 6.19 and
6.30. Moreover, by Theorem 6.26(c)
call put
VtH = E Q H call Ft P -a.s. and VtH = E Q H put Ft P -a.s.
88
6.26, it admits a unique arbitrage-free price process (VtH )t∈{0,...,T } given by
VtH := E [H | Ft ] t ∈ {0, . . . , T }
By the Q-martingale property of V H , we can calculate the latter by the recursive algorithm
VTH := H H
:= E Q VtH Ft−1 ,
and Vt−1 t ∈ {1, . . . , T }.
Using that Ft = σ(A(x1 ,...,xt ) : x1 , . . . , xt ∈ {1, 2}) for t ∈ {1, . . . , T }, it follows from Examples
6.2 and 6.20 that
X
VtH = vtH (x1 , . . . , xt )1A(x1 ,...,xt ) , t ∈ {1, . . . , T },
x1 ,...,xt ∈{1,2}
V0H = v0H
where the functions vtH : {1, 2}t → [0, ∞), t ∈ {1, . . . , T }, and the number v0H can be calculated
recursively by
H H
If ϑ b (V0 (ϑ ), ϑH,1 ) is a replication strategy for H, it follows from Theorem 6.26(d) that
=
H
Vt (ϑ ) = VtH , t ∈ {0, . . . , T }.
H H
VtH − Vt−1
H
= Vt (ϑ ) − Vt−1 (ϑ ) = Gt (ϑH ) − Gt−1 (ϑH ) = ϑH,1 1 1
t (Xt − Xt−1 ), t ∈ {1, . . . , T }.
Rearranging yields,
VtH − Vt−1
H
∆VtH
ϑH,1
t = 1 1 := , t ∈ {1, . . . , T } (6.17)
Xt − Xt−1 ∆Xt
As ϑH,1
t is Ft−1 measurable and Ft−1 = σ(A(x1 ,...,xt−1 ) : x1 , . . . , xt−1 ∈ {1, 2}), it follows that
ϑH,1
X
t = ζtH (x1 , . . . , xt−1 )1A(x1 ,...,xt−1 ) , t ∈ {2, . . . , T },
x1 ,...,xt−1 ∈{1,2}
ϑH,1
1 = ζ1H ,
89
where the functions ζtH : {1, 2}t−1 → R, t ∈ {2, . . . T }, and the number ζ1H in R can be
calculated as
Remark 6.33. (a) Formula (6.17) can be seen as the discrete-time version of the “Delta-
Hedge”, i.e., the derivative of the value process with respect to the underlying.
(b) Note that the value process V H and the (risky part of the) hedging strategy ϑH,1 can
be calculated in parallel while working backwards through the tree for V H .
73
Recall that y1 = (1 + u) and y2 = (1 + d).
90
References
[1] D. Bertsekas, Nonlinear programming, second ed., Athena Scientific, Belmont, MA, 1999.
[2] H. Föllmer and A. Schied, Stochastic Finance, 4th ext. ed., de Gruyter Studies in Mathe-
matics, vol. 27, Walter de Gruyter & Co., Berlin, 2016.
[3] H.-O. Georgii, Stochastics, De Gruyter Textbook, Walter de Gruyter & Co., Berlin, 2008.
[4] J. C. Hull, Options, futures, and other derivatives, eight ed., Pearson, Boston, MA, 2012.
[5] J. Jacod and P. Protter, Probability essentials, second ed., Universitext, Springer-Verlag,
Berlin, 2003.
[6] S. Le Roy and J. Werner, Principles of Financial Economics, second ed., Cambridge Uni-
versity Press, New York, NY, 2014.
[7] A. McNeil, R. Frey, and P. Embrechts, Quantitative risk management, second revised ed.,
Princeton Series in Finance, Princeton University Press, Princeton, NJ, 2015.
91