ST339 - 19-03 Notes

ST339 Introduction to Mathematical Finance
Lecture notes by Martin Herdegen1

Department of Statistics
University of Warwick
This version: March 24, 2020
1
I would like to thank the students and teaching assistants Shiyao Bian, Nikolaos Constantinou, Chester
Gan, Alia Hajji, Scott Hamilton, Kairav Hirani, Nazem Khan, Kevin Lam, Rahul Mathur, Noah Prasad,
Anthony Shaffu, Osian Shelley, Haodong Sun, Anastasiya Tsyhanova, and Ben Windsor for spotting typos in
previous versions. Moreover, I am grateful to my colleague Sebastian Herrmann for fruitful discussions on the
material. Any remaining mistakes and errors are of course mine.
Contents
1 Introduction and Preliminaries 4
1.1 What is Mathematical Finance? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 A primer on financial assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fundamental concepts of Probability Theory . . . . . . . . . . . . . . . . . . . . 5
2 No-Arbitrage and the Fundamental Theorem of Asset Pricing 10

2.1 A mathematical model for a financial market in one period . . . . . . . . . . . . 10
2.2 Trading strategies and arbitrage opportunities . . . . . . . . . . . . . . . . . . . 11
2.3 Discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Equivalent Martingale Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 The Fundamental Theorem of Asset Pricing . . . . . . . . . . . . . . . . . . . . 16
3 Mean-Variance Portfolio Selection and the CAPM 19

3.1 The return of an asset and of a portfolio . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Maximising the expected return . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 The mean-variance problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Portfolios in fractions of wealth . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 The case without a riskless asset . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 The case with a riskless asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 The Markowitz tangency portfolio and the capital market line . . . . . . . . . . 34
3.8 On mean-variance equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.9 The Capital Asset Pricing Model (CAPM) . . . . . . . . . . . . . . . . . . . . . 38
3.10 Criticism of mean-variance portfolio selection and the CAPM . . . . . . . . . . 41
4 Utility Theory 43
4.1 Measure theoretic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Preferences on lotteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Von Neumann-Morgenstern representation . . . . . . . . . . . . . . . . . . . . . 45
4.4 Concave functions and Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . 48
4.5 Expected utility representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Measuring risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 A primer on utility maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Introduction to Risk Measures 59

5.1 Monetary measures of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Value at Risk and Expected Shortfall . . . . . . . . . . . . . . . . . . . . . . . . 62
6 Pricing and Hedging in Finite Discrete Time 66
6.1 Conditional expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 Filtrations and martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Financial markets in finite discrete time . . . . . . . . . . . . . . . . . . . . . . 71
6.4 Self-financing strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.5 The Fundamental Theorem of Asset Pricing revisited . . . . . . . . . . . . . . . 76
6.6 Valuation of contingent claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.7 Complete markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.8 Pricing and hedging in the binomial model . . . . . . . . . . . . . . . . . . . . . 88
3
1 Introduction and Preliminaries
In this chapter, we briefly discuss what Mathematical Finance is about as a subject, give a
short introduction to financial assets, and then review some key concepts from Probability
Theory.
1.1 What is Mathematical Finance?

Broadly speaking, Mathematical Finance as a discipline studies financial markets using meth-
ods from probability theory and statistics. One main goal is to understand how financial
assets such as stocks, bonds, and options are correctly priced taking into account
• uncertainty about the future or risk,
• the passage of time,
• other investment opportunities.
Another main goal is to understand how to optimally invest into a financial market (e.g. stocks
traded on the London Stock Exchange) taking into account
• uncertainty about the future or risk,
• the passage of time,
• preferences of investors.
This module aims to give basic answers to the above questions. Along the way, we also need
to make concepts like risk or preferences of investors mathematically precise.
This module focusses on key concepts and the probability side of Mathematical Finance.
This means that we assume throughout that all stochastic parameters (like the distribution
of assets) are known. In reality, of course, all parameters must be estimated from data. This
is a very interesting and challenging topic of its own and usually referred to as Financial
Econometrics. Moreover, we will not address computational aspects of Mathematical Finance,
which are crucial for the financial industry. This is again a very interesting and challenging
topic of its own and usually referred to as Computational Finance. Finally, we will not address
any questions related to machine learning which is of course becoming increasingly important
in the financial industry.
1.2 A primer on financial assets

Financial assets, also called securities, can be grouped in various different ways, and the notion
of a financial asset itself is fuzzy.
4
One main distinction between financial assets is whether they are equity or debt. An equity
security entitles the holder to a share of the profits of a business, usually paid out as dividends.
The prime example are stocks, and they are often traded at large exchanges as the London
Stock Exchange (LSE) or the New York Stock Exchange (NYSE). By contrast, a debt security
entitles the holder to a fixed, predefined payment stream from a counterparty. Important
examples are corporate or government bonds. They are often also traded at large exchanges
like the LSE or NYSE. Usually equity is considered more “risky” than debt, because dividends
of business are uncertain whereas the payments of a bond are certain. Notwithstanding, bonds
are not “riskless” because businesses (and also countries) may default.
Another main distinction between financial assets is whether they are primary or deriva-
tive. The payment structure of a derivative security (or just derivative) depends on some
other, more basic, underlying variable or financial asset, whereas the payment structure of a
primary asset does not. The prime examples of primary assets are stocks and bonds. The
most important examples of derivative assets are options, futures and swaps.2 They are either
traded at large exchanges or over the counter (OTC). Derivates can either reduce or magnify
risk, depending on how they are used.
If you buy a financial asset you pay today a certain amount of money to get the asset. You
are then said to have a long position/to be long in the asset. By contrast if you (short-)sell a
financial asset that you do not own, you get some money today and have to deliver the asset in
the future. You are then said to have a short position/to be short in the asset. More complex
portfolios, held e.g. by hedge funds, often involve a mixture of both long and short positions.
1.3 Fundamental concepts of Probability Theory

Probability Theory is a cornerstone of Mathematical Finance. Here, we review some of its key
concepts. They are fundamental for the understanding of this module. For more details, we
refer to the textbook “Probability Essentials” by Jacod and Protter [5].
Probability spaces
A sample space Ω is a (finite or infinite) set. Each ω ∈ Ω describes a possible “state of the
world”.
Given a sample space Ω, a σ-algebra F on Ω is a collection of subsets of Ω such that
(a) Ω ∈ F;
(b) A ∈ F =⇒ Ac = Ω \ A ∈ F;
S
(c) A1 , A2 , . . . ∈ F =⇒ n∈N An ∈ F.
The pair (Ω, F) is called a measurable space and each A ∈ F is called an F-measurable event.
2
For an excellent introduction to derivative securities, we refer to the textbook by Hull [4].
5
Example 1.1. (a) If Ω is finite (or countable), i.e., Ω = {ω1 , . . . , ωN } (or Ω = {ω1 , ω2 , . . .}),
then the usual choice for a σ-algebra on Ω is F := 2Ω := {A : A ⊂ Ω}, the power set of Ω.
(b) If Ω = R, it turns out that the power set 2Ω is “too big” to be chosen as σ-algebra.3 For
this reason, one chooses instead F := BR , the Borel σ-algebra on R, i.e., the smallest σ-algebra
on R that contains all intervals of the form (a, b), (a, b], [a, b) and [a, b] for −∞ < a < b < ∞.
Given a measurable space (Ω, F), a probability measure P on (Ω, F) is a map F → [0, 1]
such that
(a) P [∅] = 0 and P [Ω] = 1;

S P∞
(b) A1 , A2 , . . . ∈ F with Ai ∩ Aj = ∅ for i 6= j =⇒ P [ n∈ N An ] = n=1 P [An ].
The triple (Ω, F, P ) is called a probability space.

If Ω is finite (or countable) and F = 2Ω , then a probability measure P is characterised by its
values on the elementary events {ω}, i.e., we only need to specify P [{ωn }] for n ∈ {1, . . . , N }
(or n ∈ N).
If Ω = R and F = BR , then a probability measure P is uniquely characterised by its
(cumulative) distribution function FP : R → [0, 1] defined by
FP (x) = P [(−∞, x]].
Random variables
A (real valued) random variable X on a measurable space (Ω, F) is a function Ω → R such

that for any Borel set B ∈ BR ,
{X ∈ B} := {ω ∈ Ω : X(ω) ∈ B} ∈ F.
We then also say that X is F-measurable. Whereas this definition is nice from a theoretical
perspective, it is almost useless for checking in practice that a given function X : Ω → R is
F-measurable. But fortunately, one can show that it suffices to check this condition only for
Borel sets B of the form B = (−∞, x] for x ∈ R, i.e., X is F-measurable if and only if for all
x ∈ R,
{X ≤ x} := {ω ∈ Ω : X(ω) ≤ x} ∈ F.
If (Ω, F) = (R, BR ), a random variable is often also called a measurable function and
denoted by f or g, etc. Note that all continuous functions on R are measurable.
The definition of a random variable does not mention any probability measure P at all.
However, if we add a probability measure P to a measurable space (Ω, F), we get some further
3
See [3, Theorem 1.5] for a precise formulation of this statement.
6
notions: For B ∈ BR we say that X ∈ B P -almost surely (P -a.s.) if P [X ∈ B] = 1. The
distribution (or law ) of X is defined by
P X [B] := P [X ∈ B].
One can check that P X is again a probability measure – on the measurable space (R, BR ).
A random variable X is called discrete if there exists a finite or countable set B ∈ BR
such that P X [B] = P [X ∈ B] = 1 P -a.s. In this case, we define the probability mass function
(pmf) pX : R → [0, 1] by
pX (x) = P X [{x}] = P [X = x], x ∈ R.
Then pX (x) = 0 for x ∈ B c and

P
x∈B pX (x) = 1.
Example 1.2. Let (Ω, F, P ) be a probability space and A ∈ F an F-measurable event. Then
the indicator function 1A defined by

1 if ω ∈ A,
1A (ω) =
0 if ω ∈ Ac .
is a discrete random variable (taking only the values 0 and 1).
Remark 1.3. If Ω is finite or countable, then every random variable on (Ω, F) is discrete.
A random variable X is called continuous if there exists a measurable (e.g. continuous)

nonnegative function fX : R → [0, ∞) satisfying
Z x
fX (x) dx = P X [(−∞, x]] = P [X ≤ x], x ∈ R.
−∞
The function fX is also called the probability density function (pdf) of X.
Remark 1.4. Note that there are many random variables which are neither discrete nor
continuous.
Expectation
For a random variable X on a probability space (Ω, F, P ), the expectation of X under P (also
called the integral of X with respect to P ), is defined as follows:
• If X is a simple random variable, i.e., X = c1 1A1 + · · · cn 1An , where ci ∈ R and Ai ∈ F,

one sets
E P [X] := c1 P [A1 ] + · · · + cn P [An ].
7
• If X is nonnegative, writing X = limn→∞ Xn , where (Xn )n∈N is a nondecreasing se-
quence of nonnegative simple random variables,4 one sets5
E P [X] := lim E P [Xn ] ∈ [0, ∞].

n→∞
• If X is general, one says that X is integrable or has finite expectation if E P [|X|] < ∞.
In this case one sets
E P [X] := E P X + − E P X − ,

where X + = max{0, X} denotes the positive part and X − = max{0, −X} denotes the
negative part of X.6
If there is no danger of confusion, we often drop the qualifier P in E P .
Example 1.5. Let Ω = {ω1 , . . . , ωN } be a finite sample space, F = 2Ω and P a probability

measure on (Ω, F). Then any random variable X on (Ω, F) is simple and
N
X
E [X] = X(ωn )P [{ωn }].
n=1
Indeed, setting cn := X(ωn ) and An := {ωn } for n ∈ {1, . . . , N }, we get
N
X
X(ω) = cn 1An (ω).
n=1
By definition of the expectation for simple random variables, we obtain
N
X N
X
E [X] = cn P [An ] = X(ωn )P [{ωn }].
n=1 n=1
The following result lists two important properties of the expectation operator.
Lemma 1.6. Let X and Y be integrable random variables on some probability space (Ω, F, P ).
(a) For a, b, c ∈ R, aX + bY + c is again an integrable random variable and
E[aX + bY + c] = aE[X] + bE[Y ] + c.
(b) If X ≥ Y P -a.s., then

E[X] ≥ E[Y ]. (1.1)
4
For instance, one can set Xn (ω) := max(2−n b2n X(ω)c, n), where b·c denotes the floor function.
5
Of course, one has to show that this is well defined, i.e., independent of the choice of the approximating
sequence (Xn )n∈N .
6
Note that both X + and X − are nonnegative random variables, X = X + − X − , and |X| = X + + X − .
8
Moreover, the inequality in (1.1) is an equality if and only if X = Y P -a.s.
Property (a) is referred to as linearity of the expectation and property (b) as monotonicity
of the expectation.
The following result is very useful for calculating expectations (of functions of) discrete or
continuous random variables.
Lemma 1.7. Let X be a random variable on a probability space (Ω, F, P ) and h : R → R a

measurable (e.g. continuous) function. Then h(X) is again a random variable.7 Moreover,
(a) if X is discrete with pmf pX , then h(X) is integrable if and only if

X
|h(x)|pX (x) < ∞,
x
and in this case,

X
E[h(X)] = h(x)pX (x).
x
(b) if X is continuous with pdf fX , then h(X) is integrable if and only if

Z ∞
|h(x)|fX (x) dx < ∞,
−∞
and in this case,

Z ∞
E[h(X)] = h(x)fX (x) dx.
−∞
7
If X is discrete, then h(X) is again discrete. However, if X is continuous, h(X) need not be continuous.
9
2 No-Arbitrage and the Fundamental Theorem of Asset Pricing
In this chapter, we develop a mathematical model for financial markets in one period, introduce
the key concept of no-arbitrage, and formulate and prove the so-called Fundamental Theorem
of Asset Pricing on the absence of arbitrage in this setting.
2.1 A mathematical model for a financial market in one period

We consider a financial market with 1 + d assets. We assume that the assets are priced at two
times, at t = 0 (“today”) and at t = 1 (“in one year”). Asset prices today are known and given
by the (usually positive) constants S00 , S01 , . . . , S0d ∈ R.8 Asset prices in one year, however, are
usually not known today. So we model them as real-valued (usually positive) random variables
S10 (ω), S11 (ω), . . . , S1d (ω) on some probability space (Ω, F, P ). Every ω ∈ Ω corresponds to a
possible “state of the world” in one year, and S1i (ω) denotes the price of asset i if the state of
the world in one year happens to be ω ∈ Ω.
It is convenient to identify each asset i with the stochastic process S i = (Sti )t∈{0,1} , i.e.,
with the collection of the two random variables S0i and S1i (where S0i (ω) := S0i ).
In most financial markets, not all asset prices in one year are unknown. Usually, there is a
riskless asset, often also called bank account, which will pay a sure amount in one year.9 We
will assume throughout that S 0 is riskless and satisfies
S00 = 1 and S10 (ω) ≡ 1 + r,
where r > −1 denotes the interest rate.10

In order to distinguish the riskless asset S 0 from the risky assets S 1 , . . . , S d , we will use
the notation11
St = (St1 , . . . , Std ) and S t = (St0 , St ), t ∈ {0, 1}
and call the Rd -valued stochastic process S = (St )t∈{0,1} the risky assets.12
Example 2.1 (One-period Binomial model). Assume that d = 1, i.e., there is only one risky
asset, and there are only two states of the world at time 1, i.e., Ω = {ω1 , ω2 }. We assume that
S01 = 1 and
S 1 (ω1 ) = 1 + u and S 1 (ω2 ) = 1 + d,
8
Note that some derivative assets such as swaps have zero initial value.
9
Government bonds are usually considered to be “riskless” in reality, in particular German or Swiss govern-
ment bonds. Notwithstanding the “bank account” is a somewhat fictitious security, in particular in continuous
time where it denotes the “rollover” of very short term “riskless” bonds.
10
Before the financial crisis of 2008, interest rates tended to be positive (or at least nonnegative).
11
Note that S0 is an Rd -valued vector and S1 is an Rd -valued random vector. Likewise, S 0 is an R1+d -valued
vector and S 1 is an R1+d -valued random vector.
12 d
R -valued stochastic process here means the collection of the two Rd -valued random vectors S0 and S1
(where S0 (ω) := S0 ).
10
where u > d > −1. Here, u and d are mnemonics for “up” and “down”, and it is often assumed
that u > 0. The probabilities for “up” and “down” are given by
P [{ω1 }] = p1 and P [{ω2 }] = p2
where p1 , p2 ∈ (0, 1) and p1 + p2 = 1.13 One can nicely illustrate this model by the following
trees, where the numbers beside the branches denote probabilities:
1
S0 : 1 1+r
p1 1+u
S1 : 1
p2 1+d
2.2 Trading strategies and arbitrage opportunities

We shall assume throughout that we have a frictionless market. This means that there are
no transaction costs, i.e., assets can be bought and sold at the same price, and there are no
constraints on the number of assets one holds. In particular, one can hold a negative amount
of some asset, i.e., assets can be shorted. Moreover, we shall assume that asset prices are
exogeneously given and not influenced by the trading activities of other market participants.
All this is of course an idealisation of reality but one has to start with the simplest case before
building more realistic and therefore more complex models.
Given a financial market S = (St0 , St )t∈{0,1} as above, a trading strategy, often also called
a portfolio, is a vector
ϑ = (ϑ0 , ϑ) = (ϑ0 , ϑ1 , . . . , ϑd ) ∈ R1+d ,
where ϑi denotes the number of shares held in asset i. The price today for buying the trading
strategy/portfolio ϑ is
d
X
ϑ · S0 = ϑi S0i = ϑ0 + ϑ · S0 .
i=0
In one year, i.e., at t = 1, the value of the trading strategy/portfolio will be
d
X
ϑ · S 1 (ω) = ϑi S1i (ω) = ϑ0 (1 + r) + ϑ · S1 (ω),
i=0
depending on the state of the world ω ∈ Ω.

The following definition is one of the cornerstones of Mathematical Finance:
13
To make the model mathematically rigorous, we also have to specify the σ-algebra F. This is – as standard
in models with finite (or countable) Ω – given by F = 2Ω , so that F = {∅, {ω1 }, {ω2 }, {ω1 , ω2 }}.
11
Definition 2.2. A trading strategy ϑ ∈ R1+d is called an arbitrage opportunity for S if
ϑ · S 0 ≤ 0, ϑ · S 1 ≥ 0 P -a.s. and P [ϑ · S 1 > 0] > 0.
The financial market S is called arbitrage-free if there are no arbitrage opportunities. In this
case one also says that S satisfies NA.
An arbitrage opportunity gives something (a positive chance of strictly positive final wealth
P [ϑ·S 1 > 0] > 0) out of nothing (zero or negative initial wealth ϑ·S 0 ≤ 0) without risk (almost
sure nonnegative final wealth ϑ · S 1 ≥ 0 P -a.s.).
In well-functioning real financial markets, arbitrage opportunities do not exist for long
because they are immediately exploited by so-called arbitrageurs (often hedge funds) and dis-
appear. The reason that arbitrage opportunities disappear is that in real financial markets
(unlike in our textbook setting) prices adjust to the trading activities of the market par-
ticipants, i.e., prices of assets in high demand rise and prices of assets in low demand fall.
Therefore, it is reasonable to assume that financial markets are arbitrage-free, which is indeed
a key assumption of Mathematical Finance.
Example 2.3 (An arbitrage opportunity). Consider the financial market given by the follow-
ing trees where the numbers beside the branches denote probabilities:
1
S0 : 1 1.01
202
0.8
0.1
S 1 : 200 200
0.1
198
We claim that the market S = (St0 , St1 )t∈{0,1} admits arbitrage. Indeed, the riskless asset
always has in all states of the world the same or a higher return than the risky asset. So we
short the risky asset and use it to buy the riskless asset. Mathematically, we set
ϑ = (ϑ0 , ϑ1 ) := (200, −1).
Then at time 0,
ϑ · S 0 = 200 × 1 + (−1) × 200 = 0.
12
and at time 1,



200 × 1.01 + (−1) × 202 = 0 if ω = ω1 ,

ϑ · S 1 (ω) = 200 × 1.01 + (−1) × 200 = 2 > 0 if ω = ω2 ,


200 × 1.01 + (−1) × 198 = 4 > 0 if ω = ω .

3
Thus,
P [ϑ · S 1 ≥ 0] = 1 and P [ϑ · S 1 > 0] = 0.2,
whence ϑ is an arbitrage opportunity.14
Remark 2.4. If the market S admits arbitrage, there always exists an arbitrage opportunity
with ϑ · S 0 = 0. Indeed, if η = (η 0 , η) is an arbitrage opportunity with η · S 0 < 0, set
ϑ := (ϑ0 , ϑ) = (η 0 − η · S 0 , η). Then
ϑ · S 0 = ϑ0 + ϑ · S0 = η 0 − η · S 0 + η · S0 = η 0 − η 0 − η · S0 + η · S0 = 0.
Moreover, as −η · S 0 > 0,
ϑ · S 1 = ϑ0 (1 + r) + ϑ · S1 = η 0 (1 + r) + (−η · S 0 )(1 + r) + η · S1
= η · S 1 + (−η · S 0 )(1 + r) ≥ (−η · S 0 )(1 + r) > 0 P -a.s.
Thus, ϑ is an arbitrage opportunity with ϑ · S 0 = 0.
2.3 Discounting
Our next aim is to give a necessary and sufficient condition on the market S to be arbitrage-
free. To this end, we need to introduce two further concepts.
The first concept is the notion of discounting. Assets are denoted in units of something,
e.g. GBP or EUR. Notwithstanding, it is clear that prices (and values) are relative. So basic
concepts of financial markets (like being arbitrage-free) should not and do not depend on the
choice of unit. For this reason, we are free to change the unit, in particular if this makes the
mathematics simpler. It turns out that a good choice is a unit which itself is a traded asset,
and the canonical choice is to use the risk-free asset S 0 . So we discount with S 0 or take S 0 as
numéraire, and define the discounted assets X 0 , X 1 , . . . , X d by
Sti
Xti = , t ∈ {0, 1}, i ∈ {0, 1, . . . , d}.
St0
Then X 0 ≡ 1 and X = (X 1 , . . . , X d ) expresses the value of the risky assets in units of the
14
Here, we have implicitly assumed that Ω = {ω1 , ω2 , ω3 }, S11 (ω1 ) = 202, S11 (ω2 ) = 200 and S11 (ω3 ) = 198,
F = 2Ω and P [{ω1 }] = 0.8, P [{ω2 }] = 0.1, and P [{ω3 }] = 0.1.
13
numéraire S 0 .
Remark 2.5. The economic reason for taking a traded asset (as opposed to a standard
currency like GBG) as numéraire is that a standard currency does not reflect the time value
of money: one pound today is not the same as one pound in a year or a pound in 100 years
– think of the (gigantic) value of one pound in the past. By discounting we make prices at
different times comparable.15
The mathematical reason for taking a traded asset as numéraire is that it allows to reduce
the dimension of the market from 1 + d to d.
Example 2.6. Consider the one-period Binomial model from Example 2.1. Then the dis-
counted risky asset X 1 is given by X01 = 1 and
1+u 1+d
X11 (ω1 ) = and X11 (ω2 ) = .
1+r 1+r
We can reformulate the notion of arbitrage in terms of the discounted risky assets X only.
Proposition 2.7. The following are equivalent:
(a) The market S satisfies NA.
(b) The discounted risky assets X satisfy NA, i.e., there does not exist an arbitrage oppor-
tunity16 ϑ = (ϑ1 , . . . , ϑd ) ∈ Rd for X such that
ϑ · (X1 − X0 ) ≥ 0 P -a.s. and P [ϑ · (X1 − X0 ) > 0] > 0.
Proof. We only prove the more difficult direction (a) ⇒ (b).

Seeking a contradiction, suppose that there exist an arbitrage opportunity ϑ ∈ Rd for X
satisfying
ϑ · (X1 − X0 ) ≥ 0 P -a.s. and P [ϑ · (X1 − X0 ) > 0] > 0. (2.1)
We aim to extend ϑ to an arbitrage opportunity ϑ for S by choosing ϑ0 in an appropriate

way. Set
ϑ0 := −ϑ · X0 .
Then with ϑ := (ϑ0 , ϑ) and X 0 := (X00 , X0 ) = (1, X), we get
ϑ · X 0 = ϑ0 X00 + ϑ · X0 = ϑ0 + ϑ · X0 = −ϑ · X0 + ϑ · X0 = 0. (2.2)
Multiplying (2.2) by S00 = 1 gives

ϑ · S 0 = 0. (2.3)
15
This has been known and used by actuaries for centuries.
16
This is of course a slight abuse of notation, but a very common one in Mathematical Finance.
14
Next, as X10 − X00 = 1 − 1 = 0, we note that (2.1) is equivalent to
ϑ · (X 1 − X 0 ) ≥ 0 P -a.s. and P [ϑ · (X 1 − X 0 ) > 0] > 0. (2.4)
Then plugging (2.2) into (2.4) shows that
ϑ · X 1 ≥ 0 P -a.s. and P [ϑ · X 1 > 0] > 0.
Now using that inequalities remain unchanged by multiplying by positive constants (here S10 ),
we obtain
ϑ · S 1 ≥ 0 P -a.s. and P [ϑ · S 1 > 0] > 0.
This together with (2.3) shows that ϑ is an arbitrage opportunity for S, in contradiction to
the hypothesis that S satisfies NA.
2.4 Equivalent Martingale Measures

The other concept is the notion of an equivalent martingale measure (EMM).
To this end, we need another concept from Probability Theory.
Definition 2.8. Let (Ω, F) be a measurable space. Two probability measures P and Q on
(Ω, F) are called equivalent (notation: P ≈ Q) if, for A ∈ F, Q[A] = 0 if and only if P [A] = 0.
Two probability measures are equivalent, if they agree on which events will not happen,
i.e., have probability zero. But they may still assign different probabilities to events that
might happen.
Example 2.9. Let Ω = {ω1 , . . . , ωN } be a finite sample space and F = 2Ω . Let P be a

probability measure on (Ω, F) with P [{ωn }] > 0 for all n ∈ {1, . . . , N }. Then a probability
measure Q on (Ω, F) is equivalent to P if and only if Q[{ωn }] > 0 for all n ∈ {1, . . . , N }.
Indeed, if Q[{ωn }] = 0 for some n ∈ {1, . . . , N }, then Q cannot be equivalent to P . Otherwise,
fix A ∈ F. Then P [A], Q[A] > 0 unless A = ∅; and if A = ∅, then trivially P [A] = 0 = Q[A].
We now can define the concept of an equivalent martingale measure.
Definition 2.10. Let X be discounted risky assets on a probability space (Ω, F, P ). A

measure Q on (Ω, F) is called an equivalent martingale measure for X if Q ≈ P , each X i is
Q-integrable and
E Q X1i = X0i ,

i ∈ {1, . . . , d}.
Remark 2.11. The terminology equivalent martingale measure stems from the fact that the
X i ’s are martingales under the equivalent measure Q. Martingales will be studied in some
detail in Chapter 6.
Alternatively, Q is also often called a risk-neutral measure.
15
2.5 The Fundamental Theorem of Asset Pricing
We have now all the tools to state and prove the so-called Fundamental Theorem of Asset
pricing, giving necessary and sufficient conditions for the absence of arbitrage. For multiperiod
models, this was only established in 1990 by Dalang, Morton, and Willinger. For this reason,
it is sometimes also referred to as the Dalang-Morton-Willinger theorem.
Theorem 2.12 (Fundamental Theorem of Asset Pricing). Let S = (St0 , St )t∈{0,1} be a one-
period financial market on some probability space (Ω, F, P ). The following are equivalent:
(b) There exists an EMM for the discounted risky assets X = S/S 0 .
Proof. We first establish the “easy” direction (b) ⇒ (a). So let Q ≈ P be an EMM. By
Proposition 2.7, it suffices to show that X satisfies NA. Seeking a contradiction, suppose there
is ϑ ∈ Rd such that
ϑ · (X1 − X0 ) ≥ 0 P -a.s. and P [ϑ · (X1 − X0 ) > 0] > 0.
By the fact that Q is equivalent to P , we have
ϑ · (X1 − X0 ) ≥ 0 Q-a.s. and Q[ϑ · (X1 − X0 ) > 0] > 0.
By monotonicity of the expectation operator (cf. Lemma 1.6 (b)),
E Q [ϑ · (X1 − X0 )] > 0.
But by linearity of the expectation operator (cf. Lemma 1.6 (a)) and the fact that Q is an
EMM,
d
X d
X
E Q [ϑ · (X1 − X0 )] = ϑi E Q X1i − X0i = ϑi × 0 = 0,

i=1 i=1
and we arrive at a contradiction.

For the proof of the “difficult” direction (a) ⇒ (b), we only consider the special case that
Ω = {ω1 , . . . , ωN } is finite, F = 2Ω and P [{ωn }] > 0 for all n ∈ {1, . . . , N }.17 As is the case
with many abstract existence theorems, the proof is not constructive. We are going to identify
a random variable Y on (Ω, F) with the RN -valued vector (Y (ω1 ), . . . , Y (ωN )). First, set
K := {(ϑ · (X1 (ω1 ) − X0 ), · · · , ϑ · (X1 (ωN ) − X0 )) : ϑ ∈ Rd } (2.5)
Then K corresponds to the collection of all random variables of the form ϑ·(X1 −X0 ) for ϑ ∈ Rd .
Mathematically, K is an (at most d-dimensional) vector subspace of RN . By Proposition 2.7,
17
For a proof with general Ω and F (which requires more measure theory), we refer to [2, Theorem 1.7].
16
1
a·x=λ
∆N −1
−1 1
Figure 1: A illustration of the separating hyperplane theorem for N = 2
X satisfies NA. In terms of K this means that
K ∩ RN
+ = {0}, (2.6)
where RN N
+ = [0, ∞) . Next, define the standard simplex of dimension N − 1 by
N
X
N −1
∆ := x∈ RN
+ : i
x =1 .
i=1
Then ∆N −1 ⊂ RN / ∆N −1 , so that
+ and 0 ∈
K ∩ ∆N −1 = ∅.
As K and ∆N −1 are both nonempty and convex, K is closed and ∆N −1 is compact, it follows
from the strict separating hyperplane theorem 18 and the fact that K is a vector subspace that
there exists a vector a ∈ RN \ {0} and λ > 0 such that
a · k = 0 for all k ∈ K,
a · x ≥ λ > 0 for all x ∈ ∆N −1 .
As ∆N −1 contains all standard unit vectors ei in RN , it follows that
a · ei = ai > 0, i ∈ {1, . . . , N }.
Now define the probability measure Q on (Ω, F) by
an
Q[{ωn }] = PN > 0.
k=1 ak
18
For a proof, we refer to [1, Proposition B.14].
17
Then Q ≈ P by Example 2.9. Moreover, for i ∈ {1, . . . , d}, set
k i = (ei · (X1 (ω1 ) − X0 ), . . . , ei · (X1 (ωN ) − X0 )) ∈ K,
where ei denotes the unit vector in Rd . Then

N N
X 1 X
E Q X1i − X0i = (X1i (ωn ) − X0i )Q[{ωn }] = PN an (X1i (ωn ) − X0i )

n=1 k=1 ak n=1
N
1 X a · ki
an ei · (X1 (ωn ) − X0 ) = PN

= PN = 0,
k=1 ak n=1 k=1 a k
where we have used in the last step that a · k = 0 for all k ∈ K.
Example 2.13. Consider the Binomial model from Example 2.1. Using the FTAP, we want
to check when S satisfies NA. So let Q be a measure on (Ω, F), and set q1 := Q[{ω1 }] and
q2 := Q[{ω2 }]. We know from Example 2.9 that Q ≈ P if and only if q1 > 0 and q2 > 0.
Moreover, Q is an EMM for X 1 if and only if
1+u 1+d
E Q X11 = X01 = 1 q1 X11 (ω1 ) + q2 X11 (ω2 ) = 1

⇔ ⇔ q1 + q2 = 1.
1+r 1+r
Rearranging and using that q2 = (1 − q1 ) gives
r−d
q1 (1 + u) + (1 − q1 )(1 + d) = 1 + r ⇔ q1 = .
u−d
and thus
u−r
q2 = 1 − q1 = .
u−d
Clearly, q1 , q2 > 0 if and only if u > r > d.
So S is arbitrage free if and only if u > r > d, in which case the (unique) EMM satisfies
r−d u−r
q1 = and q2 =
u−d u−d
The condition u > r > d is economically quite intuitive as it says that the risky asset must
offer the chance of a higher return than the interest rate in one state of the world (u > r) but
also have a lower return than the interest rate in another state of the world (d < r). Note
that the EMM Q does not depend on the values of p1 and p2 .
18
3 Mean-Variance Portfolio Selection and the CAPM
In this chapter, we seek to answer the question how to optimally invest in a financial market
taking into account the mean and the variance of the return of a portfolio. We then deduce
that if all market participants behave optimally in a mean-variance sense, the financial market
has a special structure, which is described by the Capital Asset Pricing Model (CAPM).
3.1 The return of an asset and of a portfolio

Throughout this chapter, we consider a 1 + d-dimensional financial market S = (St0 , St )t∈{0,1}
on some probability space (Ω, F, P ), where S 0 is a riskless bank account and satisfies
S00 = 1 and S10 = 1 + r,
with r > −1. We assume that all assets today are all positive, i.e., S00 , S01 , . . . , S0d > 0, and
have finite second moments, i.e., E (Sti )2 < ∞, i ∈ {0, 1, . . . , d}, t ∈ {0, 1}.19 We also

assume that S satisfies NA. We may then assume without loss of generality20 that the risky
assets are non-redundant in the sense that
ϑ · S 1 = 0 P -a.s. =⇒ ϑ = 0. (3.1)
Define the (relative) return of asset S i by
S1i − S0i
Ri := , i ∈ {0, . . . , d}.
S0i
The expected return of asset S i is then given by
µi := E Ri ,

i ∈ {0, . . . , d}.
Set µ = (µ1 , . . . , µd ) and µ = (µ0 , µ). Note that R0 ≡ r = µ0 , i.e., the return of asset 0
is deterministic and equals the interest rate r. For the risky assets S 1 , . . . , S d , however, the
return is stochastic, and we denote by Σ = (Σij )1≤i,j≤d ∈ Rd×d , the covariance matrix of the
19
For t = 0 (as well as i = 0), this integrability condition is of course trivially satisfied.
20
Otherwise, there is j ∈ {1, . . . , d} 6= 0 with ϑj 6= 0 such that
X −ϑi i
S1j = S1 P -a.s.
ϑj
i6=j
Thus, by the fundamental theorem of asset pricing, where Q ≈ P is an EMM,

j X
−ϑi Q S1i −ϑi i X −ϑi i
X
S1
S0j = X0j = E Q 0
= j
E 0
= X0 = S0 ,
S1 ϑ S1 ϑj ϑj
i6=j i6=j i6=j
i.e., the risky asset j can be written as a linear combinations of the other assets, and hence be omitted.
19
return vector R = (R1 , . . . , Rd ) of the risky assets, given by
Σij = Cov[Ri , Rj ] = E (Ri − µi )(Rj − µj ) ,

i, j ∈ {1, . . . , d}.
One can show that by the non-redundancy assumption (3.1) on S, it follows that Σ is positive
definite and hence invertible.
For a portfolio ϑ ∈ R1+d with ϑ · S 0 6= 0,21 we denote its return by
ϑ · S1 − ϑ · S0
Rϑ := .
ϑ · S0
The expected return and the variance of the return of ϑ are then given by

µϑ := E Rϑ ,
h 2 i
σϑ2 := Var Rϑ = E Rϑ − E Rϑ

.
We call a portfolio ϑ = (ϑ0 , ϑ) risk-only if ϑ0 = 0. In a slight abuse of notation, we then

identify ϑ = (ϑ1 , . . . , ϑd ) with (0, ϑ) and call ϑ itself a risk-only portfolio and define the return,
the expected return and the variance of the return of ϑ by
ϑ · S1 − ϑ · S0
Rϑ := ,
ϑ · S0
µϑ := E [Rϑ ] ,
h i
σϑ2 := Var [Rϑ ] = E (Rϑ − E [Rϑ ])2 .
3.2 Maximising the expected return

We consider an investor with initial wealth x0 > 0, who wants to invest in the financial market
S. So she chooses a portfolio ϑ ∈ R1+d subject to the budget constraint
ϑ · S 0 = x0 . (3.2)
A portfolio ϑ satisfying the budget constraint (3.2) is called x0 -feasible.

Similarly, a risk-only portfolio ϑ satisfying the budget constraint ϑ · S0 = x0 is called
risk-only x0 -feasible.
In general, there are many x0 -feasible portfolios. So one would like to “maximise” the
return Rϑ among all x0 -feasible portfolios ϑ. But since Rϑ is a random variable, it is not
clear how to do this. Therefore, one might try to maximise the expected return µϑ among all
x0 -feasible portfolios ϑ. This is, however, not a good criterion, as can be seen by the following
example.
21
If ϑ · S 0 = 0, the return of ϑ is not defined.
20
2
Example 3.1. Consider a Binomial model with u = 0.05, r = 0.01, d = 0, p1 = 5 and p2 = 35 .
Then
2 3
E S11 = × 1.05 + × 1 = 1.02.
5 5
Let x0 = 1000. Then ϑ ∈ R2 is x0 -feasible if and only if ϑ0 = 1000 − ϑ1 . Moreover,
ϑ0 S10 + ϑ1 S11 − (ϑ0 S00 + ϑ1 S01 ) ϑ0 (1 + r) + ϑ1 S11 − 1000

Rϑ = 0 1 =
ϑ0 S0 + ϑ1 S0 1000
1 1 1
(1000 − ϑ ) × 1.01 + ϑ S1 − 1000 10 − 1.01ϑ1 + ϑ1 S11
= = ,
1000 1000
10 − 1.01ϑ1 + 1.02ϑ1 0.01ϑ1
µϑ = E Rϑ = = 0.01 + .
1000 1000
Maximising over all x0 -feasible portfolios yields
0.01ϑ1
sup µϑ = 0.01 + sup = +∞.
{ϑ∈R2 :ϑ0 +ϑ1 =1000} ϑ1 ∈R 1000
Moreover, a maximising strategy does not exist.
Let us briefly comment on what goes wrong in Example 3.1. Focussing on expected
return only, motivates to buy huge amounts of the risky asset by borrowing money from the
bank account. By doing this, one can achieve any expected return one likes. However, this
completely ignores the risk inherent in the investment. To illustrate this point, suppose we
choose ϑ1 = 1, 000, 000 and ϑ0 = −999, 000. Then we have an expected return of µϑ = 10.01 =
1001%, which sounds amazing. However,

−999, 000 × 1.01 + 1, 000, 000 × 1.05 = 41, 010 if ω = ω ,
1
ϑ · S 1 (ω) =
−999, 000 × 1.01 + 1, 000, 000 × 1 = −8, 990 if ω = ω2 ,
so that with a probability of P [{ω2 }] = 60% we lose everything and even have a debt of 8990
in one year.22
3.3 The mean-variance problems

As we have seen, maximising the expected return alone is not a good criterion for portfolio
choice since it does not control the risk inherent in an investment. Markowitz, in a seminal
work in 1952 (for which he was awarded the Nobel Prize in Economics in 1990), proposed to
consider the variance of the return as a measure of the risk of a portfolio and introduced what
is now known as mean-variance portfolio selection.
22
Borrowing money to buy stocks was a popular investment strategy in the late 1920. It was one of the key
drivers of the Wall Street crash of October 1929, which lead to the financial ruin of many investors.
21
There are two versions of the mean-variance problem, each of which has a formulation
with risk-only and one with general portfolios, i.e., portfolios which also allow an investent in
the riskless asset:
(1) Given an initial wealth x0 > 0 and a minimal desired expected return µmin > 0, minimise
the variance of the return σϑ2 among all x0 -feasible portfolios ϑ ∈ R1+d that satisfy
µϑ ≥ µmin .
(1’) Given an initial wealth x0 > 0 and a minimal desired expected return µmin > 0, minimise
the variance of the return σϑ2 among all risk-only x0 -feasible portfolios ϑ ∈ Rd that satisfy
µϑ ≥ µmin .
2
(2) Given an initial wealth x0 > 0 and a maximal variance of the return σmax ≥ 0, maximise
2 .
the expected return µϑ among all x0 -feasible portfolios ϑ ∈ R1+d that satisfy σϑ2 ≤ σmax
2
(2’) Given an initial wealth x0 > 0 and a maximal variance of the return σmax ≥ 0, maximise
the expected return µϑ among all risk-only x0 -feasible portfolios ϑ ∈ Rd that satisfy
2 .
σϑ2 ≤ σmax
We shall see that problems (1) and (2) (and (1’) and (2’)) are two sides of the same coin.
The idea underlying the mean-variance problems is that (a high) expected return is de-
sirable whereas (a high) variance of the return is undesirable. We say that an investor has
0 0
mean-variance preferences if for two portfolios ϑ, ϑ , portfolio ϑ is preferred over portfolio ϑ ,
whenever µϑ ≥ µϑ0 and σϑ2 ≤ σ 2 0 , with at least one inequality being strict.23
ϑ
3.4 Portfolios in fractions of wealth

In order to study the mean-variance problems, it is convenient to parametrise portfolios not
in numbers of shares but in fractions of wealth. To this end, we first introduce some notation.
For N ∈ N, set 1N := (1, . . . , 1) ∈ RN and denote by H N −1 the unit hyperplane in RN given
by
H N −1 := {x ∈ RN : x · 1N = 1}.
If there is no danger of confusion, we write 1 instead of 1N .

If ϑ ∈ R1+d is a portfolio parametrised in numbers of shares with ϑ · S 0 > 0, define
ϑi S0i
π i := , i ∈ {0, . . . , d}. (3.3)
ϑ · S0
and call π i the fraction of wealth invested in asset i ∈ {0, . . . , d}. We set π := (π 0 , π) =
(π 0 , π 1 , . . . π d ). Note that π ∈ H 1+d−1 .
23 0
Note that in general two portfolios ϑ, ϑ cannot be compared in a mean-variance sense, i.e., neither is ϑ
0 0
preferred over ϑ nor is ϑ preferred over ϑ.
22
Similarly, if ϑ is a risk-only portfolio parametrised in numbers of shares with ϑ · S0 > 0,
ϑi S0i
define π i := i
ϑ·S0 for i ∈ {1, . . . , d}, call π the fraction of wealth invested in asset i ∈ {1, . . . , d},
and set π = (π 1 , . . . π d ). Note that in the risk-only case π ∈ H d−1 .
Conversely, for π ∈ H 1+d−1 and an initial wealth x0 > 0, define the portfolio ϑ ∈ R1+d
(parametrised in numbers of shares) by
πi
ϑi = x0 , i ∈ {0, . . . , d}, (3.4)
S0i
and call π a portfolio parametrised in fractions of wealth.

Similarly, for π ∈ H d−1 and an initial wealth x0 > 0, define the risk-only portfolio ϑ ∈ Rd
(parametrised in numbers of shares) by
πi
ϑi = x0 , i ∈ {1, . . . , d}.
S0i
and call π a risk-only portfolio parametrised in fractions of wealth.
Remark 3.2. Calling π ∈ H 1+d−1 a portfolio is a slight abuse of notation because to recover
the numbers of shares corresponding to π, we also need to specify the initial wealth x0 .
However, if π ∈ H 1+d−1 is a portfolio parametrised in fractions of wealth and x0 , x00 > 0
0
are different initial wealths with corresponding portfolios ϑ and ϑ in numbers of shares, then
ϑ ϑ0
x0 = x00
and Rϑ = Rϑ0 . So if we are only interested in returns, it suffices to considers portfolios
parametrised in fractions of wealth; see also Lemma 3.3 below.24
Parametrising portfolios in fractions of wealth is tailor-made for studying the mean variance
problems.
Lemma 3.3. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant 1 + d-dimensional

market on some probability space (Ω, F, P ). Assume that S has finite second moments and
S00 , . . . , S0d > 0. Denote by µ and Σ the mean vector and covariance matrix of the return
vector R of the risky assets. Let x0 > 0 be some fixed initial wealth. Then there is a one-to-
one correspondence between x0 -feasible portfolios ϑ ∈ R1+d parametrised in numbers of shares
and portfolios π ∈ H 1+d−1 parametrised in fractions of wealth. If π ∈ H 1+d−1 denotes the
24
Of course, all this carries over to the risk-only case.
23
fractions of wealth of an x0 -feasible portfolios ϑ ∈ R1+d , then
d
X
Rϑ = Rπ := π i Ri = π · R, (3.5)
i=0
d
X
µϑ = µπ := E [Rπ ] = π i µi = π · µ, (3.6)
i=0
Xd
σϑ2 = σπ2 := Var[Rπ ] = π i π j Σij = π > Σπ. (3.7)
i,j=1
Similarly, there is a one-to-one correspondence between risk-only x0 -feasible portfolios ϑ ∈ Rd

parametrised in numbers of shares and risk-only portfolios π ∈ H d−1 parametrised in fractions
of wealth. If π ∈ H d−1 denotes the fractions of wealth of a risk-only x0 -feasible portfolios
ϑ ∈ Rd , then
d
X
Rϑ = Rπ := π i Ri = π · R, (3.8)
i=1
d
X
µϑ = µπ := E [Rπ ] = π i µi = π · µ, (3.9)
i=1
Xd
σϑ2 = σπ2 := Var[Rπ ] = π i π j Σij = π > Σπ. (3.10)
i,j=1
Proof. We only prove the case of general portfolios; the risk-only case is completely analogous.
The first statement follows from the fact that the map
n o ϑi S0i
Φ : ϑ ∈ R1+d : ϑ · S 0 = x0 → H 1+d−1 , Φi (ϑ) = , i ∈ {0, . . . d},
x0
is bijective.25 To prove the second statement, let ϑ ∈ R1+d be an x0 -feasible portfolio

parametrised in numbers of shares. The corresponding portfolio π ∈ H 1+d−1 parametrised in
fractions of wealth is then given by
ϑi S0i
π i := , i ∈ {0, . . . , d}.
ϑ · S0
Thus,
d d d
ϑ · S 1 − ϑ · S 0 X ϑi X ϑi S i X
Rϑ = = (S1i − S0i ) = 0
Ri = π i Ri = π · R.
ϑ · S0 i=0
ϑ · S 0 i=0
ϑ · S 0 i=0
25
This uses that x0 > 0 and S00 , . . . , S0d > 0.
24
So we have (3.5), and (3.6) follows by linearity of the expectation. Finally, to establish (3.7),
we use that Cov[Ri , R0 ] = 0 for i ∈ {0, . . . , d} because R0 is deterministic, and obtain
d
X d
X
σϑ2 = Var[Rϑ ] = Var[Rπ ] = Var i
πR i
= π i π j Cov[Ri , Rj ]
i=0 i,j=0
d
X d
X
= π i π j Cov[Ri , Rj ] = π i π j Σij = π > Σπ.
i,j=1 i,j=1
3.5 The case without a riskless asset

We start by considering the risk-only case. First, we characterise the so-called minimum
variance portfolio.
Lemma 3.4. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on some

probability space (Ω, F, P ). Assume that S has finite second moments and S00 , . . . , S0d > 0.
Denote by µ and Σ the mean vector and covariance matrix of the return vector R of the
risky assets. Then there exists a unique risk-only portfolio πmin ∈ H d−1 , called the minimum
variance portfolio, such that
σπ2min ≤ σπ2 for all π ∈ H d−1 .
It is given by26
Σ−1 1
πmin = (3.11)
1> Σ−1 1
and satisfies
µ> Σ−1 1 1
µπmin = and σπ2min = .
1> Σ−1 1 1> Σ−1 1
1> Σ−1 1
Proof. First, 1 · πmin = 1> Σ−1 1
= 1, it follows that πmin ∈ H d−1 . Next, let π ∈ H d−1 be
arbitrary. Set
y := π − πmin
and note that

y · 1 = π · 1 − πmin · 1 = 1 − 1 = 0.
Then by (3.10), the fact that Σ is symmetric, the definition of πmin and the fact that y and 1
26
Note that as the market is non-redundant, it follows that Σ is positive definite and hence invertible.
Therefore, Σ−1 is well defined and 1> Σ−1 1 > 0 by the fact that Σ−1 is positive definite.
25
are orthogonal,
σπ2 = π > Σπ = (πmin + y)> Σ(πmin + y) = πmin

>
Σπmin + 2y > Σπmin + y > Σy
> Σ−1 1 > Σ−1 1
= πmin Σ + 2y Σ + y > Σy
1> Σ−1 1 1> Σ−1 1
π> 1 y>1
= >min−1 + 2 > −1 + y > Σy
1 Σ 1 1 Σ 1
1
= > −1 + 2 × 0 + y > Σy.
1 Σ 1
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. This
1
shows both that πmin is the unique optimiser and yields σπ2min = 1> Σ−1 1
. The formula for
µπmin follows directly from (3.9).
Next, we seek to find the risk-only portfolio which minimises the variance among all risk-
only portfolios with a given expected return µ0 . To this end, note that if µ and 1 are collinear,
then every risk-only portfolio has the same expected return. Indeed, if µ and 1 are collinear,
then µ1 = . . . = µd and hence, for any π ∈ H d−1 ,
µπ = µ · π = µ 1 1 · π = µ 1 .
So we assume in the sequel that µ and 1 are not collinear.
Lemma 3.5. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a proba-

bility space (Ω, F, P ). Assume that S has finite second moments and S00 , . . . , S0d > 0. Denote
by µ and Σ the mean vector and covariance matrix of the return vector R of the risky assets.
Assume that µ and 1 are not collinear. Let µ0 ∈ R be given. Then there exists a unique
risk-only portfolio πµ0 ∈ H d−1 such that µπµ0 = µ0 and
σπ2µ ≤ σπ2 for all π ∈ H d−1 with µπ = µ0 .

0
It is given by27
C − Bµ0 −1 Aµ0 − B −1
πµ0 = Σ 1+ Σ µ, (3.12)
AC − B 2 AC − B 2
where A = 1> Σ−1 1, B = 1> Σ−1 µ, and C = µ> Σ−1 µ. Moreover,
Aµ20 − 2Bµ0 + C A
σπ2µ = = σπ2min + (µ0 − µπmin )2 . (3.13)
0 AC − B 2 AC − B 2
Proof. First, as Σ is positive definite, it is invertible and Σ−1 is again positive definite. Hence
27
It is part of the assertion that AC − B 2 6= 0; we even have AC − B 2 > 0.
26
it induces a scalar product h·, ·iΣ−1 on Rd given by
hx, yiΣ−1 := x> Σ−1 y.
Thus, by the Cauchy-Schwarz inequality,
B 2 = (h1, µiΣ−1 )2 ≤ h1, 1iΣ−1 hµ, µiΣ−1 = AC,
where the inequality is an equality if and only if µ and 1 are collinear. As they are not, it
follows that AC − B 2 > 0.
Next, we check that πµ0 given by (3.12) is indeed in H d−1 and has expected return µ0 . By
the definitions of A, B and C and (3.9),
C − Bµ0 > −1 Aµ0 − B > −1 C − Bµ0 Aµ0 − B

1 · πµ0 = 1 Σ 1+ 1 Σ µ= A+ B
AC − B 2 AC − B 2 AC − B 2 AC − B 2
AC − ABµ0 + ABµ0 − B 2
= = 1,
AC − B 2
C − Bµ0 > −1 Aµ0 − B > −1 C − Bµ0 Aµ0 − B
µπµ0 = µ · πµ0 = µ Σ 1+ µ Σ µ= B+ C
AC − B 2 AC − B 2 AC − B 2 AC − B 2
CB − B 2 µ0 + ACµ0 − BC
= = µ0 .
AC − B 2
Now let π ∈ H d−1 with µπ = µ0 . Set
y := π − πµ0 .
Note that y · 1 = π · 1 − πµ0 · 1 = 1 − 1 = 0 and
y · µ = π · µ − πµ0 · µ = µπ − µπµ0 = µ0 − µ0 = 0.
Then by (3.10), the fact that Σ is symmetric, the definition of πµ0 , the fact that πµ0 · µ = µ0
and the fact that y is orthogonal to 1 and µ,
σπ2 = π > Σπ = (πµ0 + y)> Σ(πµ0 + y) = πµ>0 Σπµ0 + 2y > Σπµ0 + y > Σy

> C − Bµ0 Aµ0 − B > C − Bµ0 Aµ0 − B
= πµ0 1+ µ + 2y 1+ µ + y > Σy
AC − B 2 AC − B 2 AC − B 2 AC − B 2

C − Bµ0 Aµ0 − B
= ×1+ µ0 + 2 × (0 + 0) + y > Σy
AC − B 2 AC − B 2
Aµ20 − 2Bµ0 + C
= + y > Σy
AC − B 2
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. This
shows both that πµ0 is the unique optimiser and yields the first equality in (3.13). The second
27
1 B
equality in (3.13) follows by a simple rearrangement using that σπ2min = A and µπmin = A.
For the following result, we introduce the key concept of a risk-only efficient portfolio.
Definition 3.6. A risk-only portfolio π ∈ H d−1 is called risk-only efficient (in the mean-
variance sense) if there does not exist another risk-only portfolio π 0 ∈ H d−1 such that µπ0 ≥ µπ
and σπ2 0 ≤ σπ2 with at least one inequality being strict.
Theorem 3.7. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on some

Denote by µ and Σ the mean vector and covariance matrix of the return vector R of the risky
assets. Assume that µ and 1 are not collinear, and set A = 1> Σ−1 1, B = 1> Σ−1 µ, and
C = µ> Σ−1 µ. Define the risk-only efficient frontier by
B 2 Aµ20 − 2Bµ0 + C

E := (σ02 , µ0 ) ∈ R2 : µ0 ≥ ,σ = .
A 0 AC − B 2
(a) For each point (σ02 , µ0 ) ∈ E, there exists exactly one risk-only portfolio π ∈ H d−1 such
that (σπ2 , µπ ) = (σ02 , µ0 ). It is given by
C − Bµ0 −1 Aµ0 − B −1
π = πµ0 = 2
Σ 1+ Σ µ.
AC − B AC − B 2
(b) A risk-only portfolio π ∈ H d−1 is risk-only efficient if and only if (σπ2 , µπ ) ∈ E.
Proof. (a). Let (σ02 , µ0 ) ∈ E. It follows from Lemma 3.5 that πµ0 satisfies (σπ2µ , µπµ0 ) =
0
(σ02 , µ0 ). If π 0 ∈ H d−1 is any other portfolio with µπ0 = µ0 but π 0 6= πµ0 , then σπ2 0 > σπ2µ = σ02
0
again by Lemma 3.5 . So we have both existence and uniqueness of π.
(b). First, assume that π ∈ H d−1 is risk-only efficient. Then σπ2 ≥ σπ2min by Lemma 3.4, and
B Aµ20 −2Bµ0 +C
so µπ ≥ µπmin = A by the definition of efficiency. Set µ0 := µπ and σ02 := AC−B 2
. Then
(σ02 , µ0 ) ∈ E and by (a), (σπ2µ , µπµ0 ) = (σ02 , µ0 ). Efficiency of π together with µπ = µ0 = µπµ0
0
gives σπ2 ≤ σπ2µ = σ02 . On the other hand, Lemma 3.5 gives σπ2 ≥ σπ2µ = σ02 , whence σπ2 = σ02 ,
0 0
and so (σπ2 , µπ ) ∈ E.28
Conversely, let π ∈ H d−1 be such that (σπ2 , µπ ) ∈ E. Set µ0 := µπ and σ02 := σπ2 . Then
π = πµ0 by (a). Seeking a contradiction, suppose there is π 0 ∈ H d−1 such that µπ0 ≥ µ0 and
σπ2 0 ≤ σ02 with at least one of the inequalities being strict. If µπ0 = µ0 , then σπ2 0 < σ02 = σπ2µ ,
0
and by Lemma 3.5, we arrive at a contradiction. Otherwise, if µπ0 > µ0 , set µ1 := µπ0 and
Aµ21 −2Bµ1 +C 2 −2Bx+C
σ12 = AC−B 2
. Then σ12 > σ02 because the function x 7→ AxAC−B 2 is strictly increasing
B 2 2 2 2
for x ≥ A . This means, that σπ0 < σ1 . But σ1 = σπµ by (a). Thus, µπ0 = µ1 = µπµ1 and
1
σπ2 0 < σ12 = σπ2µ , and again by Lemma 3.5, we arrive at a contradiction.
1
28
By (a), it even follows that π = πµ0 .
28
Minimum-variance portfolio
µπ
Efficient frontier E
µπmin Feasible portfolios
σπ2min σπ2
Figure 2: A graphical illustration of the risk-only case
With the help of Theorem 3.7, we can now fully solve the risk-only versions of the mean-
variance problems.
Theorem 3.8. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a proba-

by µ and Σ the mean vector and covariance matrix of the return vector R of the risky assets.
Assume that µ and 1 are not collinear, and set A = 1> Σ−1 1, B = 1> Σ−1 µ, and C = µ> Σ−1 µ.
B
(1’) Let µmin ≥ A be given.29 Then the risk-only mean-variance problem
argmin σπ2 subject to µπ ≥ µmin

π∈H d−1
has a unique solution π∗ given by
C − Bµmin −1 Aµmin − B −1
π∗ = πµmin = Σ 1+ Σ µ
AC − B 2 AC − B 2
It is risk-only efficient and satisfies
Aµ2min − 2Bµmin + C
µπ∗ = µmin and σπ2∗ = .
AC − B 2
2 1
(2’) Let σmax ≥ A be given.30 Then the risk-only mean-variance problem
argmax µπ subject to σπ2 ≤ σmax

2
(3.14)
π∈H d−1
has a unique solution π∗ given by
C − Bµσmax
2 Aµσmax
2 − B −1
π∗ = πµσ2 = 2
Σ−1 1 + Σ µ,
max AC − B AC − B 2
29 B
Note that by Lemma 3.4, A
= µπmin , the mean of the minimum-variance portfolio.
30
Note that by Lemma 3.4, 1
A
= σπ2 min , the variance of the minimum-variance portfolio.
29
where p
B (AC − B 2 )(Aσmax
2 − 1)
µσmax
2 = + .
A A
It is risk-only efficient and satisfies
p
B (AC − B 2 )(Aσmax
2 − 1)
µπ∗ = µσmax
2 = + and σπ2∗ = σmax
2
.
A A
Proof. We only establish (2’). The proof of (1’) is very similar and even easier.
First, it follows from Theorem 3.7(a) and (b), that π∗ = πµσ2 is efficient, and a straight-
max
forward calculation (using (3.13)) shows that indeed σπ2∗ = 2 .
σmax So π∗ satisfies the constraint
in (3.14) with equality. Let π 0 ∈ H d−1 be any portfolio satisfying the constraint in (3.14).
Then by efficiency of π∗ it follows that µπ0 ≤ µπ∗ , which gives existence of a solution to (3.14).
To establish uniqueness, note that if µπ0 = µπ∗ , then σπ2 0 ≥ σπ2∗ by efficiency of π∗ and hence
by the constraint in (3.14), σπ2 0 = σπ2∗ . But this then implies that (σπ2 0 , µπ0 ) = (σπ2∗ , µπ∗ ) ∈ E,
and by Theorem 3.7(a), it follows that π 0 = π∗ .
3.6 The case with a riskless asset

We proceed to study the case with a riskless asset. The analogue of Lemma 3.4 is trivial,
because with a riskless asset, we can achieve zero risk by just investing in the riskless asset,
i.e., by choosing the portfolio π min,r := (1, 0, . . . , 0) ∈ H 1+d−1 . So we directly consider the
analogue of Lemma 3.5. Since we can now also invest in the riskless asset, we do not need to
assume that µ and 1d are not collinear but only that µ and 11+d are not collinear. By the
fact that µ0 = r, this is equivalent to assuming that µ 6= r1d .
Before stating the next lemma, we need a little result on the excess return µπ − r of a
portfolio π over the risk-free rate r.
Proposition 3.9. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a

Denote by µ and Σ the mean vector and covariance matrix of the return vector R of the risky
assets, and let r be the interest rate. Let π = (π 0 , π) ∈ H 1+d−1 . Then
µπ − r = (µ − r1d ) · π.
Proof. By the fact that π 0 = 1 − π · 1d and µ0 = r,
(µ − r1d ) · π = µ · π + r(−π · 1d ) = µ · π + r(1 − π · 1d ) − r = µ · π + µ0 π 0 − r

= µ · π − r = µπ − r.
We now seek to find the portfolio in H 1+d−1 which minimises the variance among all
portfolios in H 1+d−1 with a given expected return µ0 . Note that below 1 will always be 1d .
30
Lemma 3.10. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a proba-
by µ and Σ the mean vector and covariance matrix of the return vector R of the risky assets,
and let r be the interest rate. Assume that µ 6= r1. Let µ0 ∈ R be given. Then there exists a
unique portfolio π µ0 ,r ∈ H 1+d−1 such that µπµ0 ,r = µ0 and
σπ2 µ ≤ σπ2 for all π ∈ H 1+d−1 with µπ = µ0 .

0 ,r
It is given by π µ0 ,r = (1 − πµ0 ,r · 1, πµ0 ,r ), where
µ0 − r
πµ0 ,r = Σ−1 (µ − r1). (3.15)
(µ − r1)> Σ−1 (µ − r1)
It satisfies
(µ0 − r)2 (µ0 − r)2
σπ2 µ = = , (3.16)
0 ,r (µ − r1)> Σ−1 (µ − r1) Ar2 − 2Br + C
where A = 1> Σ−1 1, B = 1> Σ−1 µ, and C = µ> Σ−1 µ.
Proof. First, it follows from the condition µ 6= r1 and the fact that Σ−1 is symmetric and
positive definite that (µ − r1)> Σ−1 (µ − r1) > 0. We proceed to check that π µ0 ,r ∈ H 1+d−1
and µπµ0 ,r = µ0 . By the definition of π µ0 ,r and Proposition 3.9,
π µ0 ,r · 11+d = πµ0 0 ,r + πµ0 ,r · 1d = (1 − πµ0 ,r · 1d ) + (πµ0 ,r · 1d ) = 1,

µπµ0 ,r = (µπµ0 ,r − r) + r = (µ − r1) · πµ0 ,r + r
(µ − r1)> Σ−1 (µ − r1)
= (µ0 − r) + r = (µ0 − r) + r = µ0 .
(µ − r1)> Σ−1 (µ − r1)
Next, let π ∈ H 1+d−1 with µπ = µ0 . Set
y := π − π µ0 ,r .
Proposition 3.9 gives
(µ − r1) · y = (µ − r1) · π − (µ − r1) · πµ0 ,r = (µπ − r) − (µπµ0 ,r − r)

= (µ0 − r) − (µ0 − r) = 0.
Then by (3.7), the fact that Σ is symmetric, the definition of πµ0 ,r , Proposition 3.9 and
31
the fact that y is orthogonal to µ − r1,
σπ2 = π > Σπ = (πµ0 ,r + y)> Σ(πµ0 ,r + y) = πµ>0 ,r Σπµ0 ,r + 2y > Σπµ0 ,r + y > Σy
πµ>0 (µ − r1) y > (µ − r1)
= (µ0 − r) + 2(µ 0 − r) + y > Σy
(µ − r1)> Σ−1 (µ − r1) (µ − r1)> Σ−1 (µ − r1)
(µ0 − r)
= (µ0 − r) + 2(µ0 − r) × 0 + y > Σy
(µ − r1)> Σ−1 (µ − r1)
(µ0 − r)2
= + y > Σy. (3.17)
(µ − r1)> Σ−1 (µ − r1)
As Σ is symmetric and positive definite, y > Σy ≥ 0 with equality if and only if y = 0. Moreover,
since π, π µ0 ,r ∈ H 1+d−1 , it follows that
d
X
yk = y · 11+d = π · 11+d − π µ0 ,r · 11+d = 1 − 1 = 0,
k=0
which implies in particular that y 6= 0 if and only if y 6= 0. Hence, (3.17) shows both that
π µ0 ,r is the unique optimiser and yields the first equality in (3.16). The second equality in
(3.16) follows by expanding the denominator and using the definitions of A, B and C.
We proceed to formulate the analogue of Theorem 3.7. To this end, we also need to
consider the notion of efficiency for general portfolios.
Definition 3.11. A portfolio π ∈ H 1+d−1 is called efficient (in the mean-variance sense) if
there does not exist another portfolio π 0 ∈ H 1+d−1 such that µπ0 ≥ µπ and σπ2 0 ≤ σπ2 with one
inequality being strict.

risky assets, and let r be the interest rate. Assume that µ − r1 6= 0, and set A = 1> Σ−1 1,
B = 1> Σ−1 µ, and C = µ> Σ−1 µ. Define the efficient frontier by
(µ0 − r)2

E := (σ02 , µ0 ) 2
∈ R : µ0 ≥ r, σ02 = .
Ar2 − 2Br + C
(a) For each point (σ02 , µ0 ) ∈ E, there exists exactly one portfolio π ∈ H 1+d−1 such that
(σπ2 , µπ ) = (σ02 , µ0 ). It is given by π = (1 − πµ0 ,r · 1, πµ0 ,r ), where
µ0 − r
πµ0 ,r = Σ−1 (µ − r1).
Ar2 − 2Br + C
(b) A portfolio π ∈ H 1+d−1 is efficient if and only if (σπ2 , µπ ) ∈ E.
32
Proof. The argument for (a) and (b) is almost verbatim the same as in the proof of Theorem
3.7. The only difference is that we replace the minimum variance portfolio by the riskless
portfolio and use Lemma 3.10 instead of Lemma 3.5.
We proceed to formulate the analogue of Theorem 3.8, the solution to the mean-variance
problems with a riskless asset. The proof is very similar to the proof of Theorem 3.8 and
hence omitted.
Theorem 3.13. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on a prob-

ability space (Ω, F, P ). Assume that S has finite second moments and S00 , . . . , S0d > 0. Denote
by µ and Σ the mean vector and covariance matrix of the return vector R of the risky assets,
and let r be the interest rate. Assume that µ 6= r1. Set A = 1> Σ−1 1, B = 1> Σ−1 µ, and
C = µ> Σ−1 µ.
(1) Let µmin ≥ r be given. Then the mean-variance problem with a riskless asset
argmin σπ2 subject to µπ ≥ µmin

π∈H 1+d−1
has a unique solution π ∗ given by π ∗ = (1 − πµmin ,r · 1, πµmin ,r ), where
µmin − r
πµmin ,r = Σ−1 (µ − r1).
Ar2 − 2Br + C
It is efficient and satisfies
(µmin − r)2
µπ∗ = µmin and σπ2 ∗ = .
Ar2 − 2Br + C
2
(2) Let σmax ≥ 0 be given. Then the mean-variance problem with riskless asset
argmax µπ subject to σπ2 ≤ σmax

2
π∈H 1+d−1
has a unique solution π ∗ given by π ∗ = (1 − πµσ2 ,r · 1, πµσ2 ,r ), where

max max
µσmax
2 −r
πµσ2 ,r = Σ−1 (µ − r1)
max Ar2− 2Br + C
and
p
µσmax
2 = r + σmax Ar2 − 2Br + C.
It is efficient and satisfies

p
µπ∗ = µσmax
2 = r + σmax Ar2 − 2Br + C and σπ2 ∗ = σmax
2
.
33
3.7 The Markowitz tangency portfolio and the capital market line
Next, we study the relationship between (general) efficient and risk-only efficient portfolios.

risky asset. Let r be the interest rate, and assume that r < B 31
A, where A = 1> Σ−1 1 and
B = 1> Σ−1 µ.
(a) There exists a unique efficient portfolio π tan , called the Markowitz tangency portfolio,
that is risk-only, i.e., π tan = (0, πtan ). It satisfies32
1
πtan = Σ−1 (µ − r1)
B − rA
(b) A portfolio π ∈ H 1+d−1 is efficient if and only if it can be written as
π = λπ tan + (1 − λ)π min,r = (1 − λ, λπtan ) (3.18)
where π min,r = (1, 0) denotes the riskless portfolio and λ ≥ 0.
Proof. (a). Let π ∈ H 1+d−1 be an efficient portfolio. It follows from Theorem 3.12, that there
is µ0 ≥ r such that π = π µ0 ,r . Moreover, π µ0 ,r is risk-only if and only if πµ0 ,r · 1 = 1. Using
the definitions of πµ0 ,r , A and B, we get
µ0 − r µ0 − r
πµ0 ,r · 1 = 1> Σ−1 (µ − r1) = (B − rA).
Ar2 − 2Br + C 2
Ar − 2Br + C
Solving πµ0 ,r · 1 = 1 for µ0 , we obtain that πµ0 ,r ∈ H d−1 if and only if
Ar2 − 2Br + C
µ0 = r + := µtan .
B − rA
Note that µtan > r33 since Ar2 − 2Br + C = (µ − r1)> Σ−1 (µ − r1) > 0 and B − rA > 0.
This implies by Theorem 3.12 that π µtan ,r is indeed efficient. We conclude that an efficient
31
Note that the condition r < BA
implies that µ 6= r1. Indeed, if µ = r1, then B = rA and so r = BA
.
32
Note that B − rA > 0 because r < B A
so that πtan is well defined. Moreover, note that if µ and 1 are
collinear, then πtan = πmin .
33
We even have µtan ≥ B A
. This can be seen as follows: By Cauchy-Schwarz, it follows that AC − B 2 ≥ 0
(note that we do not assume here that µ and 1 are not collinear). This together with B − rA > 0 gives:
A2 r2 − 2ABr + AC A2 r2 − 2ABr + B 2 (B − rA)2

1 1 1
µtan = Ar + ≥ Ar + = Ar +
A B − rA A B − rA A B − rA
1 B
= (Ar + (B − rA)) = .
A A
34
portfolio π is risk-only if and only if π = π µtan ,r . Moreover,
1
πµtan ,r = Σ−1 (µ − r1) = πtan .
B − rA
(b). First suppose that π ∈ H 1+d−1 is of the form
π = λπ tan + (1 − λ)π min,r
Ar2 −2Br+C
for some λ ≥ 0. Using that π min,r = (1, 0) and π tan = π µtan ,r , where µtan = r + B−rA ,
we obtain
π = λπtan + (1 − λ)πmin,r
µtan − r
=λ 2 Σ−1 (µ − r1) + (1 − λ) × 0
Ar − 2Br + C
µλ − r
= 2
Σ−1 (µ − r1), (3.19)
Ar − 2Br + C
where µλ := λµtan + (1 − λ)r. Thus, π = πµλ ,r , and then also π = π µλ ,r . Since µλ ≥ r (this
uses that µtan > r and λ ≥ 0), it follows from Theorem 3.12 that π is efficient.
Conversely, suppose that π ∈ H 1+d−1 is efficient. By Theorem 3.12, there is µ0 ≥ r such
that π = π µ0 ,r . Set
µ0 − r
λ := ,
µtan − r
where µtan is defined as above. Then λ ≥ 0 (because µ0 ≥ r and µtan > r) and µ0 =
λµtan + (1 − λ)r. The same calculation as in (3.19) shows that π = λπtan + (1 − λ)πmin,r and
then also π = λπ tan + (1 − λ)π min,r .
Remark 3.15. (a). Theorem 3.14(b) is usually referred to as a mutual fund theorem because
it states that every efficient portfolio is a combination of the (efficient) mutual funds π tan and
π min,r , the first one containing only risky assets, the second one containing only the riskless
asset.
(b) In the setting of Theorem 3.14, define the capital market line (CML) by
CML = {(λµπtan + (1 − λ)r, λσπtan ), λ ≥ 0}.
Then it follows from Theorem 3.14 that a portfolio π is efficient if and only if it lies on the
capital market line in the sense that (µπ , σπ ) ∈ CML.
(c) Note that the capital market line CML is just a reparametrisation of the efficient
frontier E.
35
Markowitz tangency portfolio
µπ Capital Market Line

E (reparametrised)
µπmin Minimum-variance portfolio
r
σπmin σπ
Figure 3: A graphical illustration of the relationship between the risk-only and the general
case
36
3.8 On mean-variance equilibria
We proceed to study what happens if all agents investing in the financial market S are mean-
variance optimisers in the sense that they solve the mean-variance problem (1) or (2) (for
2 ) and hold the corresponding optimal portfolio.
some choice of µmin or σmax
For the arguments that follows, one must be very careful not to run into circular reasoning.
So far, we have always assumed that a financial market S is given exogenously, i.e., prices are
not influenced by the investment decisions of the market participants. We have then derived
optimal trading strategies for agents that are mean-variance optimisers. Now, we want use the
form of these optimal trading strategies to draw conclusions on the structure of the financial
market S. To avoid circular reasoning, we have therefore to assume a priori that the structure
of S is consistent with the derived mean-variance optimal strategies. This is a big assumption.
In economic terms, we have to assume a priori that there exists a mean-variance equilibrium.
For the following definition, we need the notion of shares outstanding: For a stock S i ,
the shares outstanding denotes the total number η i of shares of that stock held by all market
participants together.34 The shares outstanding times the market value of the stock η i S0i gives
the market capitalisation of the stock.
Definition 3.16. Let S = (St0 , St )t∈{0,1} be an arbitrage-free non-redundant market on some

probability space (Ω, F, P ) and η ∈ Rd++ the number of risky shares outstanding. Assume
that S has finite second moments and S00 , . . . , S0d > 0. Denote by µ and Σ the mean vector
and covariance matrix of the return vector R of the risky assets. Let r be the interest rate,
and assume that r < B
A, where A = 1> Σ−1 1 and B = 1> Σ−1 µ. Then the pair (S, η) is
called a mean-variance equilibrium, if there exists market participants 1, . . . , K with portfolios
ϑ1 , . . . , ϑK (parametrised in numbers of shares) such that
(1) Each individual portfolio ϑk is mean-variance optimal.35
(2) The stock markets clear :

K
X
ϑk = η.
k=1
Remark 3.17. (a) Property (1) in Definition 3.16 is usually referred to as individual opti-
mality, and property (2) as market clearing. Both requirements together are at the core of
the concept of equilibrium, which extends beyond mean-variance preferences. Note that the
market clearing condition (b) consists in fact of d conditions, one for each stock S 1 , . . . , S d .
(b) We have not specified how many “shares” of the bank account S 0 are outstanding nor
have we required market clearing for the bank account. It is usually assumed that there are
34
In addition to the shares outstanding (which have legal ownership rights), there are also treasury shares,
which are shares held by the corporation issuing the shares itself and have no exercisable rights. The issued
shares is the sum of the shares outstanding and the treasury shares.
35
This means that the corresponding portfolio π k in fractions of wealth is mean-variance optimal, i.e., a
2
solution to the mean-variance problem (1) or (2) (for some choice of µmin or σmax ).
37
0 “shares outstanding” of the bank account, and in this case one says that the bank account
is in zero net supply. If we would make this assumption, we would also have to require that
PK 0
k=1 ϑk = 0. However, in the context of the CAPM, this does not really matter.
(c) In the context of equilibrium, it is often assumed (w.l.o.g.) that η = 1, in which case
one says that the stocks are in unit net supply.
3.9 The Capital Asset Pricing Model (CAPM)

Assuming that a mean-variance equilibrium exists, we can now deduce the Capital Asset
Pricing Model (CAPM). This model was developed in the 1960s by Treynor, Sharpe (who was
awarded the Nobel prize in Economics in 1990), Lintner and Mossin.
If S = (St0 , St )t∈{0,1} is a market and η ∈ Rd++ the number of risky shares outstanding,
then the (risk-only) portfolio in fractions of wealth corresponding to η is called the market
portfolio and denoted by πm . To wit,
i η i S0i
πm = , i ∈ {1, . . . , d}.
η · S0
We denote the return of the market portfolio by Rm , i.e., Rm := Rπm .

probability space (Ω, F, P ) and η ∈ Rd++ the number of risky shares outstanding. Assume
that S has finite second moments and S00 , . . . , S0d > 0. Denote by µ and Σ the mean vector
and covariance matrix of the return vector R of the risky assets. Let r be the interest rate,
and assume that r < B
A, where A = 1> Σ−1 1 and B = 1> Σ−1 µ. Suppose that (S, η) is a
mean-variance equilibrium.
(a) The market portfolio πm equals the Markowitz tangency portfolio πtan .
(b) For all π ∈ H 1+d−1 , we have the CAPM formula:
E [Rπ ] = r + βπ (E [Rm ] − r). (3.20)
where the portfolio beta is given by
Cov[Rπ , Rm ]
βπ = .
Var[Rm ]
Proof. (a). Since (S, η) is a mean-variance equilibrium, there are market participants 1, . . . , K
with portfolios ϑ1 , . . . , ϑK ∈ R1+d (parametrised in numbers of shares) such that each indi-
vidual portfolio ϑk is mean-variance optimal and the stock markets clear:
K
X
ϑk = η.
k=1
38
Denote by π k ∈ H 1+d−1 the portfolio parametrised in fractions of wealth corresponding to ϑk .
It follows from Theorem 3.13 and the fact that ϑk is mean-variance optimal, that each π k is
efficient. Hence, for each k ∈ {1, . . . , K} , Theorem 3.14(b) gives λk ≥ 0 such that
πk = λk πtan ,
where πtan is the (risk-only) Markowitz tangency portfolio. With xk := ϑk · S 0 for k ∈

{1, . . . , K}, it follows that
πki i
πtan
ϑik = xk = x λ
k k , i ∈ {1, . . . , d}.
S0i S0i
Summing over k and using market clearing, we obtain
K K K
!
X X πi X i
πtan
i
η = ϑik = xk λk tan = xk λ k , i ∈ {1, . . . , d}.
S0i S0i
k=1 k=1 k=1
Thus, we may deduce that

P
K i
i η i S0i k=1 xk λk πtan
i
πm = = P = πtan , i ∈ {1, . . . , d}.
η · S0 K
πtan · 1
k=1 xk λk
(b). Fix π ∈ H 1+d−1 . Then by (a), a similar calculation as in Lemma 3.3, Theorem 3.14(a)
and Proposition 3.9, we obtain
π > (µ − r1) µπ − r
Cov[Rπ , Rm ] = Cov[Rπ , Rπtan ] = π > Σπtan = = ,
B − rA B − rA
> π > (µ − r1) µπ − r
Var[Rm ] = Var[Rπtan ] = πtan Σπtan = tan = tan .
B − rA B − rA
Thus, using again (a), (3.20) follows via
µπ − r
βπ (E [Rm ] − r) = βπ (µπtan − r) = (µπtan − r) = µπ − r = E [Rπ ] − r.
µπtan − r
Remark 3.19. (a) It is often said that the CAPM is a single factor model. However, this is
only partly true since one requires for a factor model residuals to be uncorrelated:36 It follows
from the CAPM formula, that each return Ri can be written as
Ri = r + β i (Rm − r) + εi , (3.21)
36
For a thorough discussion on factor models, we refer to Chapter 20 (and particular Section 20.5) of [6].
39
Cov[Ri ,Rm ]
where β i := Var[Rm ] is the beta of asset i and εi is a random variable with mean zero and
Cov[εi , Rm ] = 0. Indeed, setting εi := Ri − r − β i (Rm − r), the CAPM formula gives
E εi = E Ri − r − β i (E [Rm ] − r) = 0

(3.22)
Cov[εi , Rm ] = Cov[Ri , Rm ] − β i Var[Rm ] = Cov[Ri , Rm ] − Cov[Ri , Rm ] = 0 (3.23)
However, it does not follow that Cov[εi , εj ] = 0 for i 6= j as would be required for a single
factor model. Indeed,
Cov[εi , εj ] = Cov[Ri , Rj ] − β j Cov[Ri , Rm ] − β i Cov[Rj , Rm ] + β i β j Var[Rm ]

= Σij − β i β j Var[Rm ], i 6= j. (3.24)
In general the left-hand side of (3.24) does not vanish.

(b) Writing Ri as in (3.21), it follows from (3.23) that
Var[Ri ] = (β i )2 σπ2m + Var[εi ].
One calls (β i )2 σπ2m the systematic or undiversifiable risk of asset i and Var[εi ] the unsystematic
risk of asset i that can be eliminated by diversification.37 Indeed, if π ∈ H 1+d−1 is any
portfolio, note that
d d
X
i i 1 X 1
πβ = π i Cov[Ri , Rm ] = Cov[π · R, Rm ]
Var[Rm ] Var[Rm ]
i=1 i=1
1
= Cov[π · R, Rm ] = βπ .
Var[Rm ]
Hence, it follows from (3.21) that
d
X d
X d
X
Rπ = π · R = π 0 r + πi · Ri = π 0 r + πir + π i β i (Rm − r) + π i εi
i=1 i=1 i=1
= r + βπ (Rm − r) + επ ,
Pd i εi .
where επ := i=1 π Moreover, it follows from (3.22) and (3.23) and linearity of the
expectation and the covariance in the first component that
E [επ ] = 0 and Cov[επ , Rm ] = 0,
which in turn gives

Var[Rπ ] = βπ2 σπ2m + Var[επ ]. (3.25)
37
Calling εi the ideosyncratic risk of asset i is, however, not justified because in general εi is not uncorrelated
from εj for i 6= j; cf. (a).
40
Now if π is efficient, then π = λπm by Theorems 3.14(b) and 3.18(a). This implies that
Var[Rπ ] = π > Σπ = (λπm )> Σ(λπm ) = λ2 πm

>
Σπm = λ2 Var[Rm ],
Cov[Rπ , Rm ] π > Σπm (λπm )> Σπm
βπ = = = > Σπ
= λ.
Var[Rm ] Var[Rm ] πm m
Thus, it follows from (3.25) that Var[επ ] = 0.38

(c) Without the assumptions of the CAPM, one can always solve the (multiple) linear
regression
Ri − r = αi + β i (Rm − r) + εi , i ∈ {1, . . . d},
where the εi are mean-zero and uncorrelated (this is an assumption!) error terms. This model
is also called the single index model (SIM).39 The constant term/intercept αi in this regression
is usually referred to as Jensen’s alpha or just alpha and called the abnormal return. Note
that in the CAPM setting, the alpha is zero.
3.10 Criticism of mean-variance portfolio selection and the CAPM

Mean-variance portfolio selection and the CAPM are very appealing for their tractability,
and so they have rightly become cornerstones of modern financial theory. Notwithstanding
there are important points which one can and should criticise from a conceptual and empirical
perspective:40
• The mean-variance criterion assumes that the variance of the return of a portfolio is a
good measure of the risk related to the portfolio. However, if returns are not normally
distributed, this is arguably not the case; cf. Chapter 5.
• The CAPM is a one period model, so that there is no opportunity to consume and re-
balance portfolios over time. For this reason, the CAPM has been extend to multiperiod
and continuous time models by Rubinstein and Merton.
• One key quantity in the CAPM is the market portfolio. In real financial markets,
however, this is quite an ambiguous object. In particular, choosing a major stock index
as market portfolio (as often done in practice) is somewhat arbitrary.
• Empirical studies indicate that the CAPM does not fully explain the variation of stock
returns. For instance, stocks with a high book to market ratio 41 tend to offer higher
38
This together with E [επ ] = 0 implies that even επ = 0 P -a.s.
39
In its original version, the εi are also assumed to follow a multivariate normal distribution.
40
Here, we do not consider the criticism that mean-variance portfolio selection and the CAPM ignore fric-
tions, trading constraints, etc. Moreover, we do not consider the criticism that markets might be inefficient
or that market participants might act in a non-rational or only partially rational way (which is studied in the
field of Behavioural Finance).
41
This is the book value of a company (calculated via balance sheet considerations) divided by its market
capitalisation.
41
expected returns than the CAPM would predict. For this reason, the CAPM in its
“linear regression version” of Remark 3.19(c) has been extended to multi-factor models
like the three-factor model by Fama and French or the four-factor model by Carhart.
42
4 Utility Theory
In this chapter, we seek to systematically describe preferences of an investor who has to com-
pare random outcomes like the future payoff of a financial asset or the return of a portfolio. To
this end, we will follow the axiomatic approach proposed by von Neumann and Morgenstern.
4.1 Measure theoretic preliminaries

It turns out that from a mathematical perspective, it is easier to describe preferences on the
level of probability distributions than on the level of random variables (even though the latter
might be more intuitive from an economic perspective). For this reason, we will throughout
this chapter consider probability distributions on a nonempty interval D ⊂ R.42 We start by
recalling some facts from measure theory.
For a nonempty interval D ⊂ R, denote by BD the Borel σ-algebra on D, i.e., BD =
{A ∩ D : A ∈ B}. If (Ω, F, P ) is a probability space and X a D-valued random variable, then
the distribution P X of X is a probability measure on (D, BD ), defined by
P X [B] := P [X ∈ B].
If ν is a probability measure on (D, BD ) and h : D → R a measurable function that is

ν-integrable, we set Z
h(x) ν(dx) := E ν [h]
h(x) ν(dx) the integral of h with respect to ν.43

R
and call
The following result is a generalisation of Lemma 1.7.
Lemma 4.1. Let (Ω, F, P ) be a probability space and D ⊂ R a nonempty interval. Moreover,
let X be a D-valued random variable with distribution ν := P X and h : D → R a measurable
function. Then h(X) is P -integrabe if and only if h is ν-integrable and in this case
Z
E [h(X)] = h(x) ν(dx).
An important example of a probability measure ν on (D, BD ) is the Dirac measure δx for

a point x ∈ D, which is defined by

1 if x ∈ B,
δx (B) :=
0 if x ∈
/ B.
The Dirac measure represents the distribution of a D-valued random variable X that takes
42
The main examples are D = R, D = [0, ∞) and D = (0, ∞).
43
Note that usually the expectation is defined via the integral (and not vice versa).
43
the value x with probability 1, i.e., X = x P -a.s. Note that for any measurable function
h : D → R, Z
h(y) δx (dy) = h(x).
In the sequel, we also need the notion of the mixture of two distributions. If ν1 and ν2
are probability distributions on (D, BD ) and α ∈ [0, 1], then the distribution αν1 + (1 − α)ν2
is called the mixture of ν1 and ν2 with weights α and (1 − α).44 If (Ω, F, P ) is a probability
space, then a random variable X has distribution αν1 + (1 − α)ν2 , if and only if there is a
Bernoulli random variable Y with P [Y = 1] = α and P [Y = 0] = (1 − α) such that X has
condition distribution ν1 given that Y = 1 and conditional distribution ν2 given that Y = 0.45
Warning: If X1 is a random variable with distribution ν1 and X2 is a random variable
with distribution ν2 , then in general αX1 +(1−α)X2 does not have distribution αν1 +(1−α)ν2 .
R
The following result states that the integral h(x) ν(dx) is linear not only in the integrand
h but also in the integrator ν.
Lemma 4.2. Let D ⊂ R be a nonempty interval, ν1 , ν2 probability measures on (D, BD ) and

α ∈ [0, 1]. Then αν1 + (1 − α)ν2 is again a probability measure on (D, BD ). Moreover, if
h : D → R is a measurable function that this integrable with respect to ν1 and ν2 , then h is
also integrable with respect to αν1 + (1 − α)ν2 , and we have
Z Z Z

h(x) αν1 + (1 − α)ν2 (dx) = α h(x) ν1 (dx) + (1 − α) h(x) ν2 (dx).
4.2 Preferences on lotteries

Let D ⊂ R be a nonempty interval. A probability measure ν on (D, BD ) is also called a lottery
(on D). It is called a simple lottery (on D) if it is a mixture of finitely many Dirac measures,
i.e., there exist x1 , . . . , xN ∈ D and α1 , . . . , αN ∈ (0, 1) with N
P
n=1 αn = 1 such that
ν = α1 δx1 + · · · + αN δxN .
Note that ν is the distribution of a discrete random variable Y taking the value xn with
probability αn , n ∈ {1, . . . , N }.
Definition 4.3. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
probability measures on (D, BD ). A preference order on M is a binary relation with the
following properties:
44
The cases α = 0 and α = 1 are somewhat degenerate.
45
PN More generally, if PN ν1 , . . . , νN are probability distributions on (D, BD ) and α1 , . . . , αN ∈ [0, 1] with
n=1 α n = 1, then νN with weights α1 , . . . , αN . If (Ω, F, P )
n=1 αn νn is called the mixture of ν1 , . . . ,P
is a probability space, then a random variable X has distribution N n=1 αn νn if and only if there is a random
variable Y with P [Y = n] = αn such that X has condition distribution νn given that Y = n, n ∈ {1, . . . , N }.
44
(a) Completeness: For all ν1 , ν2 ∈ M, either ν1 ν2 or ν2 ν1 or both are true.
(b) Transitivity: If ν1 ν2 and ν2 ν3 , then also ν1 ν3 .
If ν1 ν2 , we say that ν1 is weakly preferred over ν2 .
If ν1 ν2 and ν2 ν1 , we say to be indifferent between ν1 and ν2 , and write ν1 ∼ ν2 . By

contrast, if ν1 ν2 and ν2 ν1 , we say that ν1 is strictly preferred over ν2 and write ν1 ν2 .
Example 4.4. Let M be the (convex) set of all probability measures on (R, B) with finite
second moment, i.e., ν ∈ M if and only if R x2 ν( dx) < ∞. For ν ∈ M denote by µν :=
R
2 2 46
R R
R x ν( dx) the mean of ν and by σν := R (x − µν ) ν( dx) the variance of ν.
(a) The binary relation defined by
ν1 ν2 if and only if µν1 ≥ µν2 ,
is a preference order.
(b) The binary relation defined by
ν1 ν2 if and only if µν1 ≥ µν2 and σν21 ≤ σν22
is not a preference order because it fails to be complete.
(c) The binary relation defined by
γ 2 γ
ν1 ν2 if and only if µν1 − σν1 ≥ µν1 − σν22 ,
2 2
where γ > 0 denotes a risk aversion parameter, is a preference order.
4.3 Von Neumann-Morgenstern representation

Mathematically, the definition of a preference order is satisfactory. From a practical perspec-
tive, however, it is very unhandy because we need to specify for each pair of lotteries ν1 and ν2 ,
whether we weakly prefer ν1 over ν2 , or ν2 over ν1 , or both. For this reason, we seek to find
another description of preference orders that encodes preferences by a single mathematical
object.
In a seminal paper in 1944, von Neumann and Morgenstern showed that many preference
orders can be neatly described by specifying a single function.
46
If Xν is a random variable with distribution ν, it follows from Lemma 4.1 that µν = E [Xν ] and σν2 =
Var[Xν ].
45
probability measures on (D, BD ). A preference order on M is said to have a von Neumann-
Morgenstern representation if there exists a measurable function U : D → R that is integrable
with respect to any ν ∈ M such that
Z Z
ν1 ν2 ⇔ U (x) ν1 (dx) ≥ U (x) ν2 (dx).
Remark 4.6. Linearity of the expectation and the fact that inequalities remain unchanged by
multiplication with positive constants imply that a von Neumann-Morgenstern representation
can only be unique up to a positive affine transformation, i.e., if U describes a preference
order, then aU + b, where a > 0 and b ∈ R, describe the same preference order.
Our goal is now to find axioms for preference orders that together imply a von Neumann-
Morgenstern representation. Surprisingly, essentially only two axioms are needed to ensure a
von Neumann-Morgenstern representation
The first axiom is quite intuitive from an economic perspective.
Definition 4.7. Let D ⊂ R be a nonempty interval and M a nonempty convex subset

of all probability measures on (D, BD ). A preference order on M is said to satisfy the
independence axiom, if for all lotteries ν1 , ν2 ∈ M, the strict preference ν1 ν2 implies
αν1 + (1 − α)ν3 αν2 + (1 − α)ν3
for all lotteries ν3 ∈ M and all α ∈ (0, 1).
The independence axiom says that if we strictly prefer lottery ν1 over lottery ν2 , then
we should also strictly prefer the mixed lottery αν1 + (1 − α)ν3 over the mixed lottery
αν2 + (1 − α)ν3 . From a normative perspective, this is quite reasonable: Comparing the
mixed lotteries, with probability α, we have to choose between ν1 and ν2 , and with probabil-
ity (1−α), we do not have to make any choice because we get the lottery ν3 . The independence
axiom says that the conditional and the unconditional choice should coincide.
Even though the independence axiom has a good theoretical foundation, it is not clear if
it reflects people’s preferences in practice.
Example 4.8 (Allais’ Paradox). First, consider the following lotteries:
ν1 := δ2400 ,
ν2 := 0.33δ2500 + 0.66δ2400 + 0.01δ0 .
Which one would you choose? Empirical studies show that most people would prefer the sure
amount and so choose lottery ν1 .
46
Now consider the following lotteries:
ν10 := 0.34δ2400 + 0.66δ0 ,

ν20 := 0.33δ2500 + 0.67δ0 .
Again, which one would you choose? Here, empirical studies show that most people would
prefer the slightly riskier lottery ν20 because it has the higher expectation (µν10 = 816 and
µν20 = 825).
However, the choice ν1 ν2 together with the choice ν20 ν10 violates the independence
axiom.
Indeed, if the independence axiom were satisfied, then
1 1 1 1 1 0 1 1 1
ν1 + ν20 ν2 + ν20 and ν + ν2 ν10 + ν2
2 2 2 2 2 2 2 2 2
Transitivity of yields47
1 1 1 1
ν1 + ν20 ν10 + ν2 .
2 2 2 2
But this is a contradiction because 12 ν1 + 21 ν20 = 0.33
2 δ2500 + 12 δ2400 + 0.67
2 δ0 = 12 ν10 + 21 ν2 . This
ends the example.
The second axiom is economically less intuitive but natural from a mathematical perspec-
tive.
Definition 4.9. Let D ⊂ R be an nonempty interval and M a nonempty convex subset of all
probability measures on (D, BD ). A preference order on M is said to satisfy the continuity
axiom if for any triple ν1 ν2 ν3 , there is α ∈ (0, 1) such that
αν1 + (1 − α)ν3 ∼ ν2 .
The continuity axiom says that if a lottery ν2 lies preference-wise strictly in between two
other lotteries ν1 and ν3 , then there is a convex combination of ν1 and ν3 such that one is
indifferent between ν2 and this convex combination.
For simple lotteries, the independence and the continuity axiom together imply a von
Neumann-Morgestern representation. For a proof of the following result, we refer to [2, Section
2.2].48
Theorem 4.10. Let M denote the collection of all simple lotteries on (D, BD ), where D ⊂ R
is a nonempty interval. Let be a preference order on M satisfying the independence and
47
Note that transitivity of implies transitivity of . Indeed, if ν1 ν2 and ν2 ν3 , then a fortiori ν1 ν2
and ν2 ν3 . So transitivity of gives ν1 ν3 . Seeking a contradiction, suppose that ν1 ν3 . Then ν3 ν1 ,
and as ν1 ν2 , transitivity of gives ν3 ν2 . But this is in contradiction to ν2 ν3 .
48
Note that the result is wrong for general lotteries on (D, BD ). For the general case, one needs stronger
continuity properties of ; see [2, Theorems 2.27 and 2.29].
47
U (x)
0 x1 x2 x
Figure 4: Example of a concave function
the continuity axiom. Then admits a von Neumann-Morgenstern representation:

Z Z
ν1 ν2 ⇔ U (x) ν1 (dx) ≥ U (x) ν2 (dx),
where the measurable function U : D → R is unique up to a positive affine transformation.
4.4 Concave functions and Jensen’s inequality

In order to study further properties of preference orders admitting a von Neumann-Morgenstern
representation, we need to recall the notion of a concave function.
Definition 4.11. Let D ⊂ R be a non-empty interval. A real-valued function U : D → R is

called concave if
U (λx1 + (1 − λ)x2 ) ≥ λU (x1 ) + (1 − λ)U (x2 ), x1 , x2 ∈ D, λ ∈ [0, 1]. (4.1)
It is called strictly concave if the inequality in (4.1) is strict for x1 6= x2 and λ ∈ (0, 1).
Graphically speaking, (strict) concavity means that straight line segments joining (x1 , U (x1 ))
to (x2 , U (x2 )) always lie (strictly) below the graph of U .
Remark 4.12. (a) If U is (strictly) concave then, −U is (strictly) convex.

(b) If U : D → R is twice continuously differentiable then U is concave if and only if
U 00 ≤ 0. Moreover, it is strictly concave if U 00 < 0.49
We proceed to state and prove the fundamental inequality for concave functions.
Lemma 4.13 (Jensen’s inequality). Let (Ω, F, P ) be a probability space and X an integrable
random variable with values in a non-empty interval D ⊂ R. Let U : D → R be concave and
49
The converse is not true: For example, the function U : R → R, x 7→ −x4 is strictly concave, but U 00 (0) = 0.
48
suppose that E [|U (X)|] < ∞. Then
E [U (X)] ≤ U (E [X]) .
Moreover, the inequality is strict when U is strictly concave and X is not P -a.s. constant.
Proof. First, using the definition of concavity, one can show that for each a ∈ D, there is
b ∈ R such that
U (x) ≤ U (a) + b(x − a), (4.2)
where the inequality in (4.2) is strict for x 6= a if U is strictly concave.50

Next, choose a := E [X]. One can show that a ∈ D because D is an interval. Let b ∈ R
be such that (4.2) is satisfied. Then
U (X) ≤ U (a) + b(X − a).
and
P [U (X) < U (a) + b(X − a)] = P [X 6= a] > 0,
if U is strictly concave and X is not P -a.s. constant. Thus, by monotonicity and linearity of
the integral and the fact that a = E [X],
E [U (X)] ≤ E [U (a) + b(X − a)] = U (a) + b(E [X] − a) = U (a) + b(a − a) = U (a)
= U (E [X]),
where the inequality is strict if U is strictly concave and X is not P -a.s. constant.
4.5 Expected utility representation

In this section, we study preference orders admitting a von Neumann-Morgenstern represen-
tation that are in addition monotone and risk averse.
R
probability measures on (D, B) having a finite expectation, i.e., |x| ν(dx) < ∞ for ν ∈ M.
Moreover, assume that M contains all Dirac measures δx for x ∈ D. Then a preference order
on M is called
• monotone if δx δy for x > y, x, y ∈ D,

50
If U is twice continuously differentiable, the (weak) inequality (4.2) can be easily derived as follows: Fix
a ∈ D and set b := U 0 (a). By a Taylor expansion of U in a of order 1 with Lagrange remainder term, we
obtain for fixed x ∈ D
1
U (x) = U (a) + b(x − a) + U 00 (ξ)(x − a)2 ,
2
where ξ lies in the interval with the endpoints x and a. Since U 00 ≤ 0 by concavity of U , (4.2) follows.
49
R
• risk averse if δµν ν for ν ∈ M unless ν = δµν , where µν = x ν(dx).
Monotonicity of a preference order means that we strictly prefer a higher sure amount over
a lower sure amount, i.e., we strictly prefer “more to less”. This is a very natural assumption
both from a conceptual and an empirical perspective.
Risk-aversion of a preference order means that for a lottery ν ∈ M, we strictly prefer
to receive the actuarially fair value µν over the lottery ν itself, unless, of course, the lottery
is deterministic and there is no difference between the two. While risk aversion is a natural
requirement from a normative perspective, in reality, there are persons who are not risk averse
but even risk-seeking, i.e., they strictly prefer a lottery over the actuarially fair value.51
The following result characterises monotonicity and risk aversion for preference orders that
admit a von-Neumann-Morgenstern representation.
Lemma 4.15. Let D ⊂ R be a nonempty interval and M a nonempty convex subset of all
probability measures on (D, B) having a finite expectation, and assume that M contains all
Dirac measures δx for x ∈ D. Suppose a preference order on M has a von Neumann-
Morgenstern representation
Z Z
ν1 ν2 ⇔ U (x) ν1 (dx) ≥ U (x) ν2 (dx),
where U : D → R is a measurable function that is integrable for every ν ∈ M. Then
(a) is monotone if and only if U is strictly increasing.
(b) is risk averse if and only if U is strictly concave.
Proof. First note that for x ∈ D,

Z
U (x) = U (z) δx (dz). (4.3)
(a). Let x, y ∈ D with x > y. Then (4.3) shows that δx δy if and only if U (x) > U (y).
(b). First, assume that is risk averse. Let x 6= y and α ∈ (0, 1). Then
δαx+(1−α)y αδx + (1 − α)δy .
Hence, by the von Neumann-Morgenstern representation, linearity of the integral in the inte-
51
A sad illustration of this fact is the existence of the gambling industry.
50
grator, and (4.3), we obtain
Z Z
U (αx + (1 − α)y) = U (z) δαx+(1−α)y (dz) > U (z) (αδx + (1 − α)δy )(dz)
Z Z
= α U (z) δx (dz) + (1 − α) U (z) δy (dz)
= αU (x) + (1 − α)U (y)
Thus, it follows that U is strictly concave.52

Conversely assume that U is strictly concave. Let ν ∈ M with ν 6= δµν . Then by Jensen’s
inequality (Lemma 4.13), it follows that
Z Z Z
U (x) δµν (dx) = U (µν ) = U x ν(dx) > U (x) ν(dx).
Thus, we may deduce that δµν ν.
Definition 4.16. Let D ⊂ R be an nonempty interval. A function U : D → R is called a

utility function if U is continuous, strictly increasing and strictly concave.53
Definition 4.17. Let D ⊂ R be an nonempty interval and M be a nonempty convex subset

of all probability measures on (D, BD ). A preference order on M is said to have an expected
utility representation if it has a von-Neumann Morgenstern representation, where the function
U : D → R is a utility function.
We proceed to list standard examples of utility functions:
• Exponential utility. Let D = R and γ > 0. Then the function U : D → R given by
U (x) = − exp(−γx)
is called exponential utility with parameter γ.
• Power utility. Let γ ∈ (0, ∞) \ {1}. Moreover, let D = [0, ∞) for γ ∈ (0, 1) and
D = (0, ∞) for γ ∈ (1, ∞). Then the function U : D → R given by
x1−γ
U (x) =
1−γ
is called power utility with parameter γ.

52
Note that for x = y or α ∈ {0, 1}, the condition for concavity is trivially satisfied.
53
One can show that an increasing concave function is automatically continuous on any interval (a, b] ⊂ D,
so the requirement of continuity is only relevant if D contains its infimum.
51
• Logarithmic utility. Let D = (0, ∞). Then the function U : D → R given by
U (x) = log(x)
is called logarithmic utility.
Remark 4.18. Logarithmic utility can be seen as power utility with parameter γ = 1. Indeed,
differentiating the utility function gives

x−γ for power utility with γ ∈ (0, ∞) \ {1},
U 0 (x) =
x−1 for logarithmic utility.
4.6 Measuring risk aversion

Agents having preferences admitting an expected utility representation are risk averse by
Lemma 4.15. In this section, we try to quantify in terms of their utility function how risk
averse they are.
To this end, fix a nonempty interval D ⊂ R and M a nonempty convex subset of all
probability measures on (D, B) having a finite expectation, and assume that M contains
all point masses δx for x ∈ D. Moreover, let be a preference order on M admitting an
expected utility representation with a utility function U : D → R. It follows from the fact
that U is continuous and strictly increasing that U (D) is again an interval and the function
U : D 7→ U (D) is bijective. So for each ν ∈ M, there exists a unique constant cU (ν) ∈ D
such that Z
U (cU (ν)) = U (x) ν(dx), (4.4)
i.e., agents are indifferent between receiving the sure amount cU (ν) or the lottery ν. For this
reason, cU (ν) is called the certainty equivalent of ν for the utility function U . Moreover,
ρU (ν) := µν − cU (ν)
R
is called the risk premium of ν for the utility function U , where µν := x ν(dx) denotes the
actuarially fair value of ν. By risk aversion and monotonicity of it follows that ρU (ν) ≥ 0,
where the inequality is strict unless ν = δµν .
We proceed to study how the risk premium depends on the utility function U . To this
end, we argue heuristically and also assume that U is twice continuously differentiable and ν
has second moments. First, a Taylor expansion of order 1 on the left-hand side of (4.4) gives
U (cU (ν)) ≈ U (µν ) + U 0 (µν )(cU (ν) − µν ) = U (µν ) − U 0 (µν )ρU (ν). (4.5)
Next, a Taylor expansion under the integral of order 2 on the right-hand side of (4.4) and
52
linearity of the integral gives
Z Z
1
U (x) ν(dx) ≈ 0
U (µν ) + U (µν )(x − µν ) + U 00 (µν )(x − µν )2 ν(dx)
2
1
= U (µν ) + U 0 (µν )(µν − µν ) + U 00 (µν )σν2
2
1 00 2
= U (µν ) + U (µν )σν . (4.6)
2
Now equating (4.5) with (4.6), we obtain
U 00 (µν )

1
ρU (ν) ≈ − 0 σν2 . (4.7)
2 U (µν )
So the risk premium of ν is approximately 1/2 times the variance of ν times the coeffi-
00
cient − UU 0 (µ
(µν )
ν)
. For this reason, for a utility function U : D → R that is twice continuously
differentiable in the interior of D, the function AU : D◦ → (0, ∞) defined by
U 00 (x)
AU (x) = −
U 0 (x)
is called the Arrow-Pratt coefficient of absolute risk aversion of U . The higher the absolute
risk aversion AU of a utility function U , the more risk averse an agent is.
Risk aversion can not only be measured in absolute but also in relative terms. If we divide
(4.7) by µν on both sides to get an approximation of the relative risk premium, we obtain
µν U 00 (µν ) σν2

ρU (ν) 1
≈ − ,
µν 2 U 0 (µν ) µ2ν
σν2 R x−µν 2
where µ2ν
= µν ν(dx) denotes the relative variance of ν. So the relative risk premium
00
of ν is approximately 1/2 times the relative variance of ν times the coefficient − µνUU0 (µ(µν)
ν)
. For
this reason, for a utility function U : D → R that is twice continuously differentiable in the
interior of D, the function RU : D◦ → (0, ∞), defined by
xU 00 (x)
RU (x) = xAU (x) := −
U 0 (x)
is called the Arrow-Pratt coefficient of relative risk aversion of U . The higher the relative risk
aversion AU of a utility function U , the more risk averse an agent is.
Example 4.19. (a). Let U : R → R be an exponential utility function with parameter γ > 0,
i.e., U (x) = − exp(−γx). Then
U 00 (x) −γ 2 exp(−γx)
AU (x) = − = − = γ.
U 0 (x) γ exp(−γx)
53
So for exponential utility, the absolute risk aversion is constant and equal to the parameter γ.
For this reason, exponential utility is also called CARA utility, where CARA is an acronym
for constant absolute risk aversion.
(b). Let U : D → R be a power utility function with parameter γ ∈ (0, ∞) \ {1}, where
x1−γ
D = [0, ∞) for γ ∈ (0, 1) and D = (0, ∞) for γ ∈ (1, ∞), i.e., U (x) = 1−γ . Then
x(−γx−γ−1 )
RU (x) = − = γ.
x−γ
So for power (and logarithmic) utility, the relative risk aversion is constant and equal to the
parameter γ (1 for the logarithmic case). For this reason, power utility is also called CRRA
utility, where CRRA is an acronym for constant relative risk aversion.
4.7 A primer on utility maximisation

We now consider an investor who wants to invest in a financial market S = (St0 , St )t∈{0,1} on
some probability space (Ω, F, P ). Recall that S00 = 1 and S01 = 1 + r, where r > −1 denotes
the interest rate. The investor has initial wealth x0 > 0 and wants to maximise her final
wealth ϑ · S 1 among all feasible strategies ϑ ∈ R1+d which satisfy ϑ · S 0 = x0 . We assume that
the preferences of the investor are described by a utility function U : D → R. So she solves

E U ϑ · S 1 → max ! subject to ϑ · S 0 = x0 . (4.8)
ϑ∈Rd+1
Of course, we also need to make sure that the expectation in (4.8) is well defined; in particular
we need to check that ϑ · S 1 ∈ D P -a.s.
The problem (4.8) is surprisingly difficult, and only for very special cases closed-form
solutions obtain. So we proceed to study existence and uniqueness of (4.8). To this end, we
reformulate (4.8) in terms of the discounted risky assets X = S/S 0 . If ϑ ∈ Rd+1 is such that
ϑ · S 0 = x0 , then with X = (1, X) and using that X10 − X00 = 1 − 1 = 0, we obtain
ϑ · S 1 = (1 + r)(ϑ · X 1 ) = (1 + r)(ϑ · X 1 − ϑ · X 0 + x0 ) = (1 + r)(x0 + ϑ · (X 1 − X 0 ))

= (1 + r)(x0 + ϑ · (X1 − X0 )).
So (4.8) is equivalent to54

E U (1 + r)(x0 + ϑ · (X1 − X0 )) → max !, (4.9)
ϑ∈Rd
54
Note that for ϑ ∈ Rd , setting ϑ := (ϑ0 , ϑ) := (x0 − ϑ · X0 , ϑ) gives
ϑ · X 0 = ϑ0 × 1 + ϑ · X0 = x0 − ϑ · X0 + ϑ · X0 = x0 .
54
which is an unconstrained optimisation problem.55 Finally, if D is invariant by multiplication
e : D → R by
with positive constants, we can define the function U
U
e (x) = U ((1 + r)x), x ∈ D.
e is again a utility function, and (4.9) is equivalent to56

Then U
h i
E Ue x0 + ϑ · (X1 − X0 ) → max ! (4.10)
ϑ∈Rd
For the following theorem, we need one key result from Measure Theory. For a proof and
more information, we refer to [5, Chapter 28].
Theorem 4.20 (Radon-Nikodým). Let (Ω, F, P ) be a probability space. A probability measure

Q on (Ω, F) is equivalent to P if and only if there exists an F-measurable P -integrable random
variable Z > 0 P -a.s. such that
Q[A] = E P [Z1A ] .
Moreover, if it exists, Z is P -a.s. unique.

dQ dQ
If Q ≈ P and Z is as in Theorem 4.20, we often write Z = dP and call Z or dP the
Radon-Nikodým derivative of Q with respect to P . Note that EP [Z] = Q[Ω] = 1.
We also have the following important corollary.
Corollary 4.21. Let (Ω, F, P ) be a probability space, Q ≈ P an equivalent probability measure

dQ
and Z = dP the corresponding Radon-Nikodým derivative. Then a random variable X is Q-
integrable if and only if ZX is P -integrable, in which case
E Q [X] = E P [ZX] .
Example 4.22. Let Ω = {ω1 , . . . , ωN } be a finite sample space, F = 2Ω and P and Q

probability measure on (Ω, F) with P [{ωn }], Q[{ωn }] > 0 for all n ∈ {1, . . . , N }. Then Q ≈ P
dQ Q[{ω}]
and dP (ω) = P [{ω}] , ω ∈ Ω. Indeed, the first part of the assertion follows from Example 2.9.
For the second part of the assertion, we check the definition of a Radon-Nikodým derivative.
Let A ∈ F. Then
X X Q[{ωn }] X
Q[A] = Q[{ωn }] = P [{ωn }] = Z(ωn )P [{ωn }]
P [{ωn }]
{n:ωn ∈A} {n:ωn ∈A} {n:ωn ∈A}
N
X
= Z(ωn )1A (ωn )P [{ωn }] = E P [Z1A ] ,
n=1
55
Of course, we still have the constraint that (1 + r)(x0 + ϑ · (X1 − X0 )) ∈ D P -a.s.
56
We still have the constraint that x0 + ϑ · (X1 − X0 ) ∈ D P -a.s.
55
dP
and so Z = dQ .
The following result gives for the domain D = [0, ∞) and under relatively weak assump-
tions, existence, uniqueness and further properties of the expected utility maximisation prob-
lem in its simplified version (4.10), where we write again U instead of Ue.
Theorem 4.23. Let S = (S 0 , St )t∈{0,1} be a non-redundant market on some probability space

(Ω, F, P ), and assume that S has finite first moments, i.e., E |Sti | < ∞ for i ∈ {0, . . . , d}

and t ∈ {0, 1}. Let U : [0, ∞) → R be a utility function that is continuously differentiable on
(0, ∞) and satisfies U (0) = 0 and lim U (x) = +∞. Fix x > 0 and set
x→∞
n o
A(x) := ϑ ∈ Rd : x + ϑ · (X1 − X0 ) ≥ 0 P -a.s. ,

u(x) := sup E U x + ϑ · (X1 − X0 ) ,
ϑ∈A(x)
where X = S/S 0 denote the discounted risky assets. Then:
(a) The set A(x) is convex. It is compact if and only if S satisfies NA.
(b) We have u(x) < ∞ if and only if S satisfies NA.
(c) If S satisfies NA, there exists a unique ϑ∗ ∈ A(x) such that

E U x + ϑ∗ · (X1 − X0 ) = u(x) < ∞.
(d) If S satisfies NA and the unique ϑ∗ ∈ A(x) from part (c) lies in the interior of A(x),
then E U 0 x + ϑ∗ · (X1 − X0 ) < ∞ and the measure Q ≈ P on F defined by

U 0 x + ϑ∗ · (X1 − X0 )

dQ
= 0
dP E U x + ϑ∗ · (X1 − X0 )
is an equivalent martingale measure for X.
Remark 4.24. (a) The set A(x) is called the set of admissible strategies for initial wealth x,
and the function u is called the indirect utility function. If S satisfies NA, one can show that
u is again a utility function.
(b) If Ω is finite and U satisfies the Inada condition limx→0 U 0 (x) = +∞, it is not difficult
to check that ϑ∗ lies in the interior of A(x).57
(c) For finite Ω, Theorem 4.23(d) can be seen as a constructive version of the Fundamental
√
Theorem of Asset Pricing. Indeed, choose U (x) = x.
57
If Ω is infinite, even with the Inada condition, ϑ∗ does in general not lie in the interior of A(x) and the
assertion in (b) is false.
56
Sketch of part (c) and (d) of Theorem 4.23. (c). First, we establish existence of ϑ∗ . By part
(b), u(x) < ∞. So there is a sequence (ϑn )n∈N in A(x) such that
lim E [U (x + ϑn · (X1 − X0 ))] = u(x) < ∞.

n→∞
Since A(x) is compact by part (a), by the Bolzano-Weierstraß theorem, there exists a sub-
sequence, denoted again by (ϑn )n∈N , converging to some ϑ∗ ∈ A(x). Assuming that we can
interchange expectation and limits and using continuity of U , we obtain
h i
u(x) = lim E [U (x + ϑn · (X1 − X0 ))] = E lim U (x + ϑn · (X1 − X0 ))
n→∞ n→∞
= E [U (x + ϑ∗ · (X1 − X0 ))]
Next,
h we i of ϑ∗ . Seeking a contradiction, let ϑ∗ 6= ϑ∗ ∈ A(x) be such
establish uniqueness
e
∗ 1 1
that E U x + ϑ̃∗ · (X1 − X0 ) = u(x). Set ϑb := 2 ϑ∗ + 2 ϑe∗ . Then ϑb∗ ∈ A(x) by convexity
of A(x). By strict concavity of U and nonredundancy of S,
1 1
U x + ϑb∗ · (X1 − X0 ) ≥ U (x + ϑ∗ · (X1 − X0 )) + U x + ϑe∗ · (X1 − X0 ) ,
2 2
where the inequality is strict with positive probability. Taking expectation gives
h i 1 1 h i
E U x + ϑb∗ · (X1 − X0 ) > E [U (x + ϑ∗ · (X1 − X0 ))] + E U x + ϑe∗ · (X1 − X0 )
2 2
1 1
= u(x) + u(x) = u(x)
2 2

= sup E U x + ϑ · (X1 − X0 ) ,
ϑ∈A(x)
which is a contradiction.
(d). Fix i ∈ {1, . . . , d}. Since ϑ∗ is an interior point of A(x), for all sufficiently small ε > 0,
ϑ∗ ± εei ∈ A(x), where ei denotes the ith unit vector in Rd . Maximality of ϑ∗ then gives
E U x + (ϑ∗ + εei ) · (X1 − X0 ) − U (x + ϑ∗ · (X1 − X0 )) ≤ 0,

E U x + (ϑ∗ − εei ) · (X1 − X0 ) − U (x + ϑ∗ · (X1 − X0 )) ≤ 0.

Dividing by ε, letting ε → 0 and assuming that we can interchange limits and expectations,
we obtain
E U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) ≤ 0,

E −U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) ≤ 0,

57
and hence
E U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) = 0.

Assuming that U 0 (x + ϑ∗ · (X1 − X0 )) is P -integrable, and defining the measure Q ≈ P on F

by
dQ U 0 (x + ϑ∗ · (X1 − X0 ))
= ,
dP E [U 0 (x + ϑ∗ · (X1 − X0 ))]
we obtain by Corollary 4.21,
U 0 (x + ϑ∗ · (X1 − X0 ))

Q dQ i
X1i X0i P i P i i

E − =E (X − X0 ) = E (X − X0 )
dP 1 E P [U 0 (x + ϑ∗ · (X1 − X0 ))] 1
1
E P U 0 x + ϑ∗ · (X1 − X0 ) (X1i − X0i ) = 0.

= P 0
E [U (x + ϑ∗ · (X1 − X0 ))]
Thus, Q is an EMM for X i . As i ∈ {1, . . . , d} was arbitrary, the claim follows.
58
5 Introduction to Risk Measures
In this short chapter, we briefly discuss how to quantify the downside risk of a financial
position. To this end, we follow the axiomatic approach initiated by Artzner, Delbaen, Ebner
and Heath in a seminal paper in 1999.
5.1 Monetary measures of risk

Throughout this chapter, we consider a linear subspace X of all real-valued random variables
on a measurable space (Ω, F), containing the constants. Each X ∈ X is interpreted as a
possible financial position of a large company, the prime example beings banks and insurance
companies. We aim to measure the downside risk related to the position X.
Definition 5.1. Let X be linear subspace of all real-valued random variables on a measurable
space (Ω, F), containing the constants. A map ρ : X → R is called a monetary measure of
risk if it has the following properties:
• Normalisation: ρ(0) = 0.
• Monotonicity: If X1 ≤ X2 , then ρ(X1 ) ≥ ρ(X2 ).
• Cash-invariance: If m ∈ R, then ρ(X + m) = ρ(X) − m.
Moreover, if ρ : X → R is a monetary measure of risk, a position X ∈ X is called acceptable

for ρ if ρ(X) ≤ 0.
Let us briefly comment on the above properties: The number ρ(X) is the amount in capital
(or cash) which has to be added to the position X to make X acceptable, e.g. from the point
of view of a regulator. This explains normalisation and cash-invariance, which is also called
translation invariance. The financial meaning of monotonicity is that the downside risk of a
position is decreased if the payoff is increased.
We proceed to show that the variance – even transformed in an appropriate way – is not a
good measure of risk because it fails to be monotone, unless we restrict ourselves to the case
of normal random variables.
Example 5.2. Let (Ω, F, P ) be a probability space and X the set of all real-valued random
variables having finite second moment, i.e., X ∈ X if and only if E X 2 < ∞. Moreover, let

XN be a linear subspace of X such that all X ∈ XN are normally distributed, where we agree
that a normal distribution with variance zero and mean µ ∈ R is the Dirac distribution δµ .
Define the map ρ : X → R by
p
ρ(X) = Var[X] − E [X] .
59
Then ρ is normalised. It is also cash-invariant on X (and on XN ). Indeed, let X ∈ X and
m ∈ R. Then
p p
ρ(X + m) = Var[X + m] − E [X + m] = Var[X] − E [X] − m = ρ(X) − m.
However, ρ is not monotone on X . Indeed, let X1 = 1 and X2 be a Pareto distributed random

variable with Parameter 3, i.e., the cdf of X2 is given by

1 − x−3 if x ≥ 1,
FX2 (x) = (5.1)
0 if x < 1.
Then X2 ≥ 1 = X1 .58 Using the formula for the mean and the variance of a Pareto distribu-
tion, we obtain
3 3 3 3
E [X2 ] = = and Var[X2 ] = 2
= .
3−1 2 (3 − 1) (3 − 2) 4
Thus, r √
3 3 3−3
ρ(X2 ) = − = > −1 = ρ(X1 ),
4 2 2
and so ρ is not monotone on X .
However, ρ is monotone on XN . Indeed, let X1 , X2 ∈ XN with X1 ≥ X2 . We may assume
without loss of generality both X1 and X2 have non vanishing variances σ12 > 0 and σ22 > 0.59
As X1 ≥ X2 it follows from monotonicity of the integral that µ1 := E [X1 ] ≥ E [X2 ] =: µ2 .
Let Φ be the cdf of a standard normal distribution, i.e.,
Z x 2
1 u
Φ(x) := √ exp − du, x ∈ R.
−∞ 2π 2
Then by the scaling properties of normal distributions,

Xi − µi x − µi x − µi
FXi (x) := P [Xi ≤ x] = P ≤ =Φ , x ∈ R, i ∈ {1, 2}.
σi σi σi
As X1 ≥ X2 , it follows that

x − µ1 x − µ2
FX1 (x) ≤ FX2 (x) ⇔ Φ ≤Φ , x∈R
σ1 σ2
58
We may assume without loss of generality that this equation does not hold only P -a.s. but even for all ω.
59
Note that X1 ≥ X2 implies that either both X 1 and X 2 are Dirac measures (in which case the claim is
trivial) or both X 1 and X 2 have nonvanishing variance.
60
As Φ is strictly increasing, it follows that
x − µ1 x − µ2
≤ (5.2)
σ1 σ2
Dividing (5.2) by x and letting x → ±∞, may deduce that σ12 = σ22 . Together with µ1 ≥ µ2 ,
we obtain
ρ(X1 ) = σ1 − µ1 ≤ σ2 − µ2 = ρ(X2 ).
This shows that ρ is monotone on XN .

Remark 5.3. One can show that if one replaces in Example 5.2 the variance Var[X] by the
semivariance SVar[X], which is defined by
SVar[X] := E (X − µ)2 1{X≤µ} ,

then the corresponding map ρ : X → R given by

p
ρ(X) := SVar[X] − E [X]
is a monetary measure of risk; it is in particular monotone.

We proceed to introduce two further desirable properties of monetary measures of risk.
Definition 5.4. Let X be linear subspace of all real-valued random variables on a measurable
space (Ω, F), containing the constants. A monetary measure of risk ρ : X → R is called a
convex measure of risk if in addition it has the property:
• Convexity: ρ(λX1 + (1 − λ)X2 ) ≤ λρ(X1 ) + (1 − λ)ρ(X2 ) for X1 , X2 ∈ X and λ ∈ [0, 1].
Moreover, a convex measure of risk ρ : X → R is called a coherent measure of risk if in
addition it has the property
• Positive homogeneity: ρ(λX) = λρ(X) for X ∈ X and λ ≥ 0.
Remark 5.5. Under the assumption of positive homogeneity, convexity of a risk measure ρ is
equivalent to subadditivity of ρ, which means that ρ(X1 +X2 ) ≤ ρ(X1 )+ρ(X2 ) for X1 , X2 ∈ X
Convexity of a risk measure formalises the idea that diversification should reduce the
risk. This is best seen in the equivalent formulation as subadditivity (assuming also positive
homogeneity): If a large company has different product lines or “desks”, the total risk of the
aggregate position is bounded by the sum of the individual risks related to each product line
or “desk”. Apart from being a reasonable requirement from an economic perspective, this
property is quite useful from a management perspective.
Positive homogeneity of a risk measure means that risk grows in a linear way. Even though
this is convenient from a mathematical perspective, it is less clear, if this is a good assumption
from an economic perspective.
61
5.2 Value at Risk and Expected Shortfall
We proceed to discuss the two most important risk measures used in practice. The first
example is value at risk which was “invented” in 1993 as part of the seminal “G-30 report”
and widely propagated by the “RiskMetrics” of JP Morgan launched in 1994.60
Definition 5.6. Let (Ω, F, P ) be a probability space and X the set of all random variables.
Let α ∈ (0, 1) be a confidence level. For X ∈ X , the Value at Risk of X at level α is given
by61
VaRα (X) = inf{m ∈ R : P [m + X < 0] ≤ α}.
The Value at Risk at level α is the smallest amount of capital which, if added to X, keeps
the probability of a negative outcome below or equal to α. Typical values for α are 0.05, 0.01
or 0.001.
Value at Risk is probably the most widely used risk measure in practice. One can easily
check that it is normalised, monotone, cash-invariant and positively homogeneous.
However, Value at Risk fails to be convex, i.e., it may punish diversification instead of
encouraging it.
Example 5.7. Let X1 and X2 be two independent identically distributed random variables
on some probability space (Ω, F, P ), where
P [Xi = −100] = 0.01 and P [Xi = 90] = 0.99, i ∈ {1, 2}.
Then for i ∈ {1, 2}, 



1 if m ∈ (−∞, −90),

P [Xi + m < 0] = 0.01 if m ∈ [−90, 100), (5.3)


if m ∈ [100, ∞),

0
and so
VaR0.01 (Xi ) = −90, i ∈ {1, 2}.
Hence, both X1 and X2 are acceptable and even very good from a VaR0.01 perspective. Now
consider the “diversified position”
1 1
X := X1 + X2 .
2 2
60
However, the use of value at risk (without the name) goes back to the early decades of the 20th century.
61
Note that in part of the literature α is replaced by 1 − α.
62
Then the distribution of X satisfies
P [X = −100] = (0.01)2 = 0.0001,

P [X = −5] = 2 × 0.01 × 0.99 = 0.0198,
P [X = 90] = (0.99)2 = 0.9801
and we have 


1 if m ∈ (−∞, −90),


0.0199 if m ∈ [−90, 5),

P [X + m < 0] = (5.4)
0.0001 if m ∈ [5, 100),




if m ∈ [100, ∞),

0
Hence,
VaR0.01 (Xi ) = 5,
and the diversified position X = 12 X1 + 21 X2 is no longer acceptable.
Let us briefly comment on what goes wrong in Example 5.7. The probability of a loss is
higher for X than for Xi . Indeed,
P [X < 0] = 0.0199 > 0.01 = P [Xi < 0], i ∈ {1, 2}.
However, the expected size of a loss when it does happen is much larger for Xi than for X.
Indeed,
100 × 0.01
E −X i | − X i > 0 =

= 100, i ∈ {1, 2},
0.01
whereas
100 × 0.0001 + 5 × 0.0198
E [−X | − X > 0] = = 5.477
0.0199
So even though the probability of a loss is a bit higher for X than for Xi , the expected loss
given default is dramatically lower for X than for Xi . Therefore, from a regulatory point of
view, X is a much better risk than Xi .
For this reason, one might want to look at a more conservative, i.e., larger risk measure
than Value at Risk, which also takes into account the size of the loss given default.
Definition 5.8. Let (Ω, F, P ) be a probability space and X the set of all real-valued random
variables having finite first moments. Let α ∈ (0, 1) be a confidence level. For X ∈ X , the
Expected Shortfall of X at level α is given by
Z α
1
ESα (X) = VaRu (X) du.
α 0
Since VaRu is nonincreasing in u, it follows that ESα (X) ≥ VaRα (X) for all α and all
63
X. Moreover, one can show that unlike Value at Risk, Expected Shortfall is a coherent risk
measure, i.e., it is in particular a convex risk measure; see [2, Theorem 4.52]. One can even
show that it is “optimal” in the sense that it is the smallest law-invariant convex risk measure
that is more conservative than Value at Risk, see [2, Theorem 4.67].
We state an alternative characterisation of Expected Shortfall for continuous distributions,
which shows that Expected Shortfall takes care of the size of the loss given default. For a
proof of this result, we refer to [7, Lemma 2.13].
Lemma 5.9. Let X be an integrable random variable on a probability space (Ω, F, P ). Suppose
that the distribution of X is continuous. Then for α ∈ (0, 1),
ESα (X) = E [−X | − X ≥ VaRα (X)] . (5.5)
The following example shows that Expected Shortfall encourages diversification and also
demonstrates that Lemma 5.9 is false without the assumption of a continuous distribution.
Example 5.10. Consider the setup of Example 5.7. Using (5.3), we obtain for i ∈ {1, 2},

100 if u ∈ (0, 0.01),
VaRu (Xi ) =
−90 if u ∈ [0.01, 1).
This gives for i ∈ {1, 2},

Z 0.01 Z 0.01
1 1 1
ES0.01 (Xi ) = VaRu (Xi ) du = 100 du = 0.01 × 100 = 100.
0.01 0 0.01 0 0.01
Moreover, using (5.4), we obtain




100 if u ∈ (0, 0.0001),

VaRu (X) = 5 if u ∈ [0.0001, 0.0199),


−90 if u ∈ [0.0199, 1).

This gives
Z 0.01 Z 0.0001 Z 0.01
1 1
ES0.01 (X) = VaRu (X) du = 100 du + 5 du
0.01 0 0.01 0 0.0001
1
= (0.0001 × 100 + 0.0099 × 5) = 5.95.
0.01
So the Expected Shortfall of the “diversified position” X is significantly lower than the Ex-
pected Shortfall of the individual positions Xi .
64
Moreover, note that
100 × 0.0001 + 5 × 0.0198

E [−X | −X ≥ VaR0.01 (X)] = E [−X | −X ≥ 5] =
0.0199
≈ 5.47 < 5.95 = ES0.01 (X).
This shows that (5.5) is not true in this example.
65
6 Pricing and Hedging in Finite Discrete Time
The goal of this chapter is to describe pricing and hedging of derivative contracts like call or
put options of financial markets in finite discrete time.
6.1 Conditional expectations

One of the key objects of mathematical finance is a martingale. In order to define and study
them in the next section, we first need to introduce the measure theoretic version of a condi-
tional expectation.
Before giving the precise definition, let us explain the underlying idea. Consider a random
variable X on some probability space (Ω, F, P ). If we have full information, i.e., we know F,
for each state of the world ω ∈ Ω, we know if ω ∈ A or ω ∈
/ A for all A ∈ F. In particular, as
{X = x} ∈ F for all x ∈ R, we can fully observe X. But if we have only partial information,
i.e., we are given a sub-σ-algebra G ⊂ F, for each state of the world ω ∈ Ω, we know if
ω ∈ A or ω ∈
/ A only for all A ∈ G. In particular, as {X = x} may not be in G, we can no
longer fully observe X. The extreme case is that we have trivial information, i.e., G = {∅, Ω},
where we can only assert that ω ∈ Ω. So how we can make a best prognosis for X if we
have only partial information, i.e., if we are given a sub-σ-algebra G. For trivial information,
i.e., G = {∅, Ω}, this best prognosis is the expectation E [X]. The conditional expectation
generalises the concept of expectation to general partial information.
Definition 6.1. Let (Ω, F, P ) be a probability space, G ⊂ F a sub-σ-algebra, and X an

F-measurable integrable random variable. Any integrable random variable Y satisfying
(1) Y is G-measurable,
(2) E [Y 1A ] = E [X1A ] for all A ∈ G,
is called (a version of ) the conditional expectation of X given G, and we write Y =: E [X | G].
The random variable Y in Definition 6.1 is to be interpreted as the best prognosis for
X given the information G. The measurability property (1) ensures that Y only uses the
information given in G, and the averaging property (2) ensures that Y is indeed the best
prognosis.
Example 6.2. Let (Ω, F, P ) be a probability space and A1 , . . . , AN an F-measurable partition

of Ω, i.e., An ∈ F for n ∈ {1, . . . , N }, N
S
n=1 An = Ω and Ai ∩ Aj = ∅ for i 6= j. Let
G := σ(A1 , . . . , AN ),
i.e, G is the smallest σ-algebra containing A1 , . . . , AN . One can check that A ∈ G if and only
if there are indices n1 , . . . , nK ∈ {1, . . . , N } such that A = K
S
k=1 Ank . One can then show that
66
a random variable Y : Ω → R is G-measurable if and only if it can be written as
N
X
Y = yn 1A n , (6.1)
n=1
for y1 , . . . , yN ∈ R, i.e., Y is constant on each An .

If X is an F-measurable integrable random variable, then
N
X
Y := E [X | An ] 1An (6.2)
n=1
is (a version of) the conditional expectation of X given G, where for A ∈ F, the elementary
conditional expectation of X given A is defined by

 E [X1A ] if P [A] > 0,
P [A]
E [X | A] := (6.3)
0 if P [A] = 0.
Indeed, it follows from (6.1) that Y satisfies the measurability property (1) of a conditional
expectation. To check that Y also satisfies the averaging property (2), let A ∈ G. Then there
are indices n1 , . . . , nK ∈ {1, . . . , N } such that A = K
S
k=1 Ank . Note that by definition of Y
and (6.3), for each k ∈ {1, . . . , K},
h i h i h i
E Y 1Ank = E E [X | Ank ] 1Ank = E [X | Ank ] P [Ank ] = E X1Ank .
This and linearity of the expectation give
K
X h i XK h i
E [Y 1A ] = E Y 1 A nk = E X1Ank = E [X1A ] .
k=1 k=1
The following result gives existence, uniqueness and further properties of conditional ex-
pectations.62
Theorem 6.3. Let (Ω, F, P ) be a probability space, G ⊂ F a sub-σ-algebra, and X an F-

measurable integrable random variable. Then the conditional expectation E [X | G] exists and
is P -a.s. unique, i.e., if Y and Y 0 are random variables satisfying properties (a) and (b) in
Definition 6.1, then Y = Y 0 P -a.s. Moreover, we have the following properties:
(a) E [E [X | G]] = E [X].
(b) If X is G-measurable (e.g. a constant), then E [X | G] = X P -a.s.
(c) If G = {∅, Ω}, then E [X | G] = E [X].

62
For a proof, we refer to [5, Chapter 23].
67
(d) If X1 and X2 are integrable random variables and a, b ∈ R, then
E [aX1 + bX2 | G] = aE [X1 | G] + bE [X2 | G] P -a.s.
(e) If X1 and X2 are integrable random variables with X1 ≥ X2 P -a.s., then
E [X1 | G] ≥ E [X2 | G] P -a.s.
If in addition P [X1 > X2 ] > 0, then P [E [X1 | G] > E [X2 | G]] > 0.
(f) If H ⊂ G is a sub-σ-algebra of G, then
E [E [X | G] | H] = E [X | H] P -a.s.
(g) If Z is G-measurable and ZX is integrable, then
E [ZX | G] = ZE [X | G] P -a.s.
(h) If X is independent of G, i.e., P [{X ∈ B} ∩ A] = P [X ∈ B]P [A] for all B ∈ BR and

A ∈ G, then
E [X | G] = E [X] P -a.s.
(i) If U : R → R is convex63 and E [|U (X)|] < ∞,
E [U (X) | G] ≥ U (E [X | G]) P -a.s.
In the above theorem, property (d) is referred to as linearity of conditional expectations,

property (e) is referred to as monotonicity of conditional expectations, property (f) is referred
to as tower property of conditional expectations, property (g) is referred to as pull-out property
of conditional expectations, property (h) is referred to as independence property of conditional
expectations, and property (i) is referred to as Jensen’s inequality for conditional expectations.
6.2 Filtrations and martingales

We proceed to study stochastic processes on a probability space (Ω, F, P ). By definition,
a stochastic process is simply a family of random variables indexed by time, where in our
context, the index set is either {0, . . . , T } or {1, . . . , T }.
The basic definition of a stochastic process does not say anything about the flow of infor-
mation. To model the latter, we assume that the information available at time t is described
63
Note that unlike in Lemma 4.13, we consider a convex instead of concave function, so the direction of the
inequality is changed.
68
by the σ-algebra Ft . As information increases over time, it is naturally to assume that
F0 ⊂ F1 ⊂ · · · ⊂ FT , (6.4)
and we call any increasing sequence of σ-algebras (Ft )t∈{0,...,T } satisfying (6.4) a filtration and
(Ω, F, (Ft )t∈{0,...,T } , P ) a filtered probability space. To simplify the presentation, we always
assume that F0 = {∅, Ω} (“trivial information”) and FT = F (“full information”). With regard
to a filtration (Ft )t∈{0,...,T } , there are two important notions for stochastic processes.
Definition 6.4. Let (Ω, F, (Ft )t∈{0,...,T } , P ) be a filtered probability space.
• A stochastic process X = (Xt )t∈{0,...,T } is called adapted (to the filtration (Ft )t∈{0,...,T } ),
if each Xt is Ft -measurable.
• A stochastic process Y = (Yt )t∈{1,...,T } is called predictable 64 (for the filtration (Ft )t∈{0,...,T } ),
if each Yt is Ft−1 -measurable.
If a stochastic process X = (Xt )t∈{0,...,T } is adapted, then at each time t, we are given the
information Ft and so can fully observe Xt (and also X0 , . . . , Xt−1 ). By contrast, we may not
be able to fully observe Xt+1 , . . . , XT . If a stochastic process Y = (Yt )t∈{1,...,T } is predictable,
Yt (and also Y1 , . . . , Yt−1 ) can not only be fully observed at time t but already at time t − 1,
i.e., one period ahead. So we can accurately “predict” Yt already at time t − 1.
For an adapted process X = (Xt )t∈{0,...,T } , knowledge about Xt does not give any knowl-
edge about Xt+1 in general. The special case that Xt gives the best available information
about Xt+1 , i.e., Xt = E [Xt+1 | Ft ] P -a.s. leads to the concept of a martingale.
Definition 6.5. Let M = (Mt )t∈{0,...,T } be a real-valued stochastic process on some filtered
probability space (Ω, F, (Ft )t∈{0,...,T } , P ). Then M is called a martingale (with respect to P
and (Ft )t∈{0,...,T } ) if
(1) M is adapted to (Ft )t∈{0,...,T } ;
(2) M is P -integrable, i.e., each Mt is P -integrable;
(3) E [Mt | Fs ] = Ms P -a.s. for all 0 ≤ s ≤ t ≤ T .
Remark 6.6. (a) In Definition 6.5, property (1) is referred to as adaptedness, property (2)
is referred to as integrability, and property (3), which is the crucial property, is referred to as
martingale property.
(b) The martingale property (3) in Definition 6.5 is equivalent to the formally weaker
property
E [Mt | Ft−1 ] = Mt−1 P -a.s. for all t ∈ {1, . . . , T }. (6.5)
64
Note that in our definition, predictable processes start at t = 1.
69
Indeed, if the martingale property (3) in Definition 6.5 is satisfied, then also (6.5) is satisfied
(pick s = t − 1). Otherwise, if (6.5) is satisfied, let 0 ≤ s ≤ t ≤ T . If s = t, then
E [Mt | Fs ] = E [Mt | Ft ] = Mt = Ms P -a.s. by adaptedness of M and the pull-out property of
conditional expectations. Otherwise, there is n ∈ {1, . . . , T } such that s = t − n. The tower
property of conditional expectations and (6.5) give
E [Mt | Fs ] = E [Mt | Ft−n ] = E [E [Mt | Ft−1 ] | Ft−n ] = E [Mt−1 | Ft−n ]

= E [E [Mt−1 | Ft−2 ] | Ft−n ] = E [Mt−2 | Ft−n ] = · · ·

= E Mt−(n−1) Ft−n = Mt−n = Ms P -a.s.
(c) If “=” in property (3) of Definition 6.5 is replaced by “≥”, M is called a submartingale,
and if it is replaced by “≤”, M is called a supermartingale.
There are countless examples of martingales; here we just consider two.
Example 6.7. (a) Let (Ω, F, P ) be a probability space and X1 , . . . , XT independent integrable
random variables with mean 0. Also assume that F = σ(X1 , . . . , XT ). Set F0 := {∅, Ω} and
Ft := σ(X1 , . . . , Xt ), t ∈ {1, . . . , T },
i.e., Ft is the smallest σ-algebra for which X1 , . . . , Xt are measurable. Define the process
M = (Mt )t∈{0,...,T } by65
t
X
Mt := Xi .
i=1
Then M is a martingale. Indeed, fix t ∈ {0, . . . , T }. Then Mt is the sum of Ft -measurable

and integrable random variables and hence again Ft -measurable and integrable. This gives
adaptedness and integrability of M . To check the martingale property, we use the alternative
characterisation (6.5). So let t ∈ {1, . . . , T }. By linearity, the pull-out property, the indepen-
dence property of conditional expectations (using that Xt is independent of Ft−1 ), and fact
that E [Xt ] = 0, we obtain
E [Mt | Ft−1 ] = E [Mt−1 + Xt | Ft−1 ] = E [Mt−1 | Ft−1 ] + E [Xt | Ft−1 ]

= Mt−1 + E [Xt ] = Mt−1 P -a.s.
So M is a martingale.
(b) Let (Ω, F, (Ft )t∈{0,...,T } , P ) be a filtered probability space and Z an F-measurable
integrable random variable. Then the process M = (Mt )t∈{0,...,T } defined by
Mt := E [Z | Ft ]
65 P0
By convention i=1 Xi := 0.
70
is a martingale with MT = Z P -a.s. Indeed, adaptedness and integrability of M follow from
the definition of conditional expectations. The martingale property of M follows from the
tower property of conditional expectations, and MT = Z P -a.s. follows from the pull-out
property of conditional expectations and our standing assumption that FT = F.
6.3 Financial markets in finite discrete time

We now consider a financial market with 1 + d assets as in Chapter 2, but now assume that
the assets are priced at times t = 0, 1, . . . , T for some finite time horizon T ∈ N. We work on
on some filtered probability space (Ω, F, (Ft )t∈{0,...,T } , P ), where the filtration (Ft )t∈{0,...,T }
describes the flow of information. We always assume without further mentioning that F0 =
{∅, Ω} and FT = F. We model the assets as adapted stochastic processes (Sti )t∈{0,...,T } ,
i ∈ {0, . . . , d}. We assume that S 0 is locally riskless in the sense that St0 is already known one
period ahead, i.e., St0 is Ft−1 -measurable for t ∈ {1, . . . , T }. More precisely, we assume that
t
Y
St0 := (1 + rk ),
k=1
where rk > −1 P -a.s. is Fk−1 -measurable and denotes the interest rate in period k, i.e, from
k − 1 to k. So the process (rt )t∈{1,...,T } is predictable. We also refer to S 0 as bank account.
We set as in Chapter 2,
St = (St1 , . . . , Std ) and S t = (St0 , St ), t ∈ {0, . . . , T },
and call the Rd -valued stochastic process S = (St1 , . . . , Std )t∈{0,...,T } the risky assets.
Example 6.8 (Binomial model). Assume that d = 1, i.e., there is only one risky asset. Let
r > −1 and u > d > −1. Assume that the bank account is given by
St0 = (1 + r)t , t ∈ {0, . . . , T }.
Moreover, assume that the risky asset S 1 = (St1 )t∈{0,...,T } is given by
t
Y
St1 = S01 Yi ,
i=1
where S01 > 0 and Y1 , . . . , YT are i.i.d. random variables on some suitable probability space
(Ω, F, P ) satisfying
P [Yi = 1 + u] = p1 and P [Yi = 1 + d] = p2 ,
71
where p1 , p2 > 0 and p1 + p2 = 1. We assume that the filtration (Ft )t∈{0,...,T } is given by
Ft = σ(S01 , S11 , . . . , St1 ),
i.e., Ft is the smallest σ-algebra with respect to which S01 , . . . , St1 are measurable. One can
check that Ft = σ(Y1 , . . . , Yt ) for t ∈ {1, . . . , T }, i.e., Ft is also the smallest σ-algebra generated
by Y1 , . . . , Yt .
For a small number of T , e.g. T = 3, the above model can be nicely illustrated by the
following trees, where the numbers beside the branches denote transition probabilities. For
convenience, we assume that S01 = 1.
1 1 1
S0 : 1 1+r (1 + r)2 (1 + r)3
p1 (1 + u)3
p1 (1 + u)2
p2
p1 1+u (1 + u)2 (1 + d)
p2 p1
S1 : 1 (1 + u)(1 + d)
p2 p1 p2
1+d (1 + u)(1 + d)2
p2 p1
(1 + d)2
p2 (1 + d)3
Let us finally describe how to give a rigorous description of the binomial model. This is
for instance important for implementing the binomial model on a computer. For Ω, we take
the path space
n o
Ω := {1, 2}T = ω = (x1 , . . . , xT ) : x1 , . . . , xT ∈ {1, 2} ,
i.e., each ω = (x1 , . . . , xT ) describes one path in the tree corresponding to the model; e.g.
ω = (1, . . . , 1) describes the state of the world that the stock goes up in each period. We set
F := 2Ω , define the random variables Y1 , . . . , YT by

1 + u if x = 1,
t
Yt (ω) = Yt ((x1 , . . . , xT )) =
1 + d if x = 2,
t
and let the probability measure P be given by
T
Y
P [{ω}] := P [{(x1 , . . . , xT )}] = p xt .
t=1
72
Next, we set for t ∈ {1, . . . , T } and x1 , . . . , xt ∈ {1, 2},
A(x1 ,...,xt ) := {ω = (e eT ) ∈ Ω : x
x1 , . . . , x et = xt } .
e1 = x1 , . . . , x
Then A(x1 ,...,xt ) denotes all states of the world with “path up to time t” given by (x1 , . . . , xt ).
It is not difficult to check that

Ft := σ A(x1 ,...,xt ) : x1 , . . . , xt ∈ {1, 2} , t ∈ {1, . . . , T }
so for a state of the world ω = (x1 , . . . , xT ) ∈ Ω given the information in Ft , we can determine
the values of x1 , . . . , xt but not the values of xt+1 , . . . , xT , i.e., we can say if the stock went
up or down in period 1, . . . , t, but we cannot say if the stock will go up or down in periods
t + 1, . . . , T .
Remark 6.9. Note that for the binomial model, the tree for S 1 is recombining, so that the
number of nodes only grows linearly in time. For non-recombining trees, the number of nodes
grows exponentially in time. This difference is very important from a computational/numerical
perspective.
As in Chapter 2, we discount with S 0 (or take S 0 as numéraire) and define the discounted
assets X 0 , . . . , X d by
Sti
Xti := , t ∈ {0, . . . , T }, i ∈ {0, . . . , d}.
St0
Then X 0 ≡ 1, and X = (Xt1 , . . . , Xtd )t∈{0,...,T } expresses the value of the risky assets in units
of the numéraire S 0 .
6.4 Self-financing strategies

To describe trading in the multiperiod market S = (St0 , St )t∈{0,...,T } is more complicated than
in the one-period setting in Chapter 2. We have to describe for each stock i and for each
trading period t, the number ϑit of shares held in asset i in period t, i.e., from t − 1 to t. This
quantity is in general no longer a number but a random variable. However, as we cannot look
into the future, ϑit can only use the information available at the beginning of period t, i.e., at
time t − 1, so it must be Ft−1 -measurable.
Definition 6.10. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). A trading strategy is an R1+d -valued stochastic process ϑ =
(ϑ0t , ϑt )t∈{1,...,T } that is predictable.
If ϑ is a trading strategy for S, then ϑit St−1

i is the amount invested into asset i at time
t − 1 (after trading), and ϑit Sti is the resulting value at time t (before trading). So for all assets
73
together, the amount invested at time t − 1 (after trading) is
d
X
ϑt · S t−1 = ϑit St−1
i
,
i=0
and the resulting value at time t (before trading) is
d
X
ϑt · S t = ϑit Sti .
i=0
So ϑt · S t is the pre-trading value of ϑ at time t and ϑt+1 · S t is the post-trading value of ϑ at

time t. If we assume that we neither withdraw nor inject money at time t, we must have the
“book-keeping identity”
ϑt · S t = ϑt+1 · S t P -a.s.,
i.e., the pre-trading and the post-trading value of the strategy coincide. This leads to the
notion of a self-financing strategy.
space (Ω, F, (Ft )t∈{0,...,T } , P ). A trading strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } is called a self-financing
strategy if
ϑt · S t = ϑt+1 · S t P -a.s. for t ∈ {1, . . . , T − 1} (6.6)
The self-financing condition (6.6) is extremely natural from an economic perspective. From
a mathematical perspective, however, it is a rather inconvenient constraint. For this reason,
we seek to find an alternative characterisation of self-financing strategies. It turns out that to
this end, we have to look at discounted quantities.
space (Ω, F, (Ft )t∈{0,...,T } , P ). For a trading strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } , define the
• (discounted) value process (Vt (ϑ))t∈{0,...,T } by
V0 (ϑ) := ϑ1 · X 0 and Vt (ϑ) := ϑt · X t for t ∈ {1, . . . , T };
• (discounted) gains process (Gt (ϑ))t∈{0,...,T } by
t
X
G0 (ϑ) := 0 and Gt (ϑ) := ϑk · (Xk − Xk−1 ) for t ∈ {1, . . . , T };
k=1
The name “value process” for V (ϑ) comes from the fact that V (ϑ) denotes the (discounted)
value of the strategy ϑ at time t (after trading for t = 0 and before trading for t ∈ {1, . . . , T }).
74
To understand the name “gains process” for G(ϑ), first note that for t ∈ {1, . . . , T }, by the
fact that Xt0 − Xt−1
0 = 1 − 1 = 0,
Gt (ϑ) − Gt−1 (ϑ) = ϑt · (Xt − Xt−1 ) = ϑt · (X t − X t−1 ) = ϑt · X t − ϑt · X t−1
expresses the (discounted) gains (or losses) from trading in period t, i.e, from t − 1 to t. So
Gt (ϑ) = tk=1 ϑk · (Xk − Xk−1 ) denotes the (discounted) accumulated gains (or losses) from
P
trading up to time t.66

The following result provides an equivalent characterisation of self-financing strategies.
Proposition 6.13. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ) and ϑ = (ϑ0t , ϑt )t∈{1,...,T } a trading strategy. Then the following
are equivalent:
(a) ϑ is self-financing.
(b) ϑt · X t = ϑt+1 · X t P -a.s. for t ∈ {1, . . . , T − 1}.

Pt
(c) Vt (ϑ) = V0 (ϑ) + Gt (ϑ) = ϑ1 · X 0 + k=1 ϑk · (Xk − Xk−1 ) P -a.s. for t ∈ {0, . . . , T }.
Proof. Dividing both sides of (6.6) by S 0 shows that (a) is equivalent to (b). Moreover, (b)
is equivalent to
ϑt+1 · X t+1 − ϑt · X t = ϑt+1 · X t+1 − ϑt+1 · X t = ϑt+1 · (X t+1 − X t )

= ϑt+1 · (Xt+1 − Xt ) P -a.s., t ∈ {1, . . . , T − 1}, (6.7)
0 − X 0 = 1 − 1 = 0. Rewriting (6.7) by using the

where we have used in the last step that Xt+1 t
definition of the value and the gains process, it follows that (b) is equivalent to
Vt+1 (ϑ) − Vt (ϑ) = Gt+1 (ϑ) − Gt (ϑ), t ∈ {1, . . . , T − 1}. (6.8)
Moreover, the definition of the value and the gains process and the fact that X10 − X00 =
1 − 1 = 0 give
V1 (ϑ) − V0 (ϑ) = ϑ1 · (X 1 − X 0 ) = ϑ1 · (X1 − X0 ) = G1 (ϑ) (6.9)
Now assuming (b), summing over (6.8) and then adding (6.9) gives (c), and assuming (c) and
subtracting (c) for t from (c) for t + 1 gives (6.8), which in turn is equivalent to (b).
The equivalence of (a) and (c) in Proposition 6.13 has a very important consequence. Any
pair (V0 , ϑ), where V0 ∈ R and ϑ = (ϑt )t∈{1,...,T } is a Rd -valued predictable process can be
66
Note that the value process V (ϑ) depends on all 1 + d coordinates of ϑ, whereas the gains process G(ϑ)
only depends on the last d coordinates ϑ of ϑ.
75
identified with the self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } whose value process satisfies
Vt (ϑ) = V0 + Gt (ϑ), t ∈ {0, . . . , T }. (6.10)
More precisely, define the process (ϑ0t )t∈{1,...,T } by
ϑ0t := V0 + Gt (ϑ) − ϑt · Xt = V0 + Gt−1 (ϑ) − ϑt · Xt−1 , (6.11)
and set ϑ := (ϑ0 , ϑ). It follows from the second equality in (6.11) that ϑ0 and hence also ϑ
are predictable. Moreover, by the definition of the value process and (6.11), we obtain
V0 (ϑ) = ϑ01 + ϑ1 · X0 = V0 + G0 (ϑ) − ϑ1 · X0 + ϑ1 · X0 = V0 ,

Vt (ϑ) = ϑ0t + ϑt · Xt = V0 + Gt (ϑ) − ϑt · Xt + ϑt · Xt = V0 + Gt (ϑ), t ∈ {1, . . . , T }.
We will make use of the identification (6.10) throughout the rest of the chapter. To this end,
we introduce the shorthand notation
ϑ=
b (V0 , ϑ).
6.5 The Fundamental Theorem of Asset Pricing revisited

We proceed to extend the definition of an arbitrage opportunity from Chapter 2 to the mul-
tiperiod setup.
Definition 6.14. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probabil-
ity space (Ω, F, (Ft )t∈{0,...,T } , P ). A self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } is called an
arbitrage opportunity if
ϑ1 · S 0 ≤ 0, ϑT · S T ≥ 0 P -a.s. and P [ϑT · S T > 0] > 0.
The financial market S is called arbitrage-free if there are no arbitrage opportunities. In this
case one also says that S satisfies NA.
Remark 6.15. By the same argument as in Remark 2.4, one can show that if the market S
admits arbitrage, there always exists an arbitrage opportunity with ϑ1 · S 0 = 0.
The following result is the multiperiod version of Proposition 2.7. Its proof is left as an
exercise.
Proposition 6.16. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). The following are equivalent:
76
(b) There does not exist a predictable process ϑ = (ϑ1t , . . . , ϑdt )t∈{1,...,T } such that
GT (ϑ) ≥ 0 P -a.s. and P [GT (ϑ) > 0] > 0.
We next extend the definition of an equivalent martingale measure to the multiperiod

setting.
space (Ω, F, (Ft )t∈{0,...,T } , P ). Denote by X := S/S0 the discounted risky assets. A measure
Q on (Ω, F) is called an equivalent martingale measure (EMM) for X if Q ≈ P and each X i
is a Q-martingale, i.e., a martingale under the measure Q.
The following result on equivalent martingale measures is a version of Doob’s famous

“systems theorem” for martingales.
Theorem 6.18. Let S = (St0 , St )t∈{0,...,T } be a financial market on some filtered probability
space (Ω, F, (Ft )t∈{0,...,T } , P ). Denote by X := S/S0 the discounted risky assets. Let Q ≈ P
be an equivalent measure. Then the following are equivalent:
(a) Q is an EMM for X.
(b) For all self-financing strategies ϑ = (ϑ0t , ϑt )t∈{1,...,T } for which ϑ is bounded, the value
process V (ϑ) is a Q-martingale.67
(c) For all self-financing strategies ϑ = (ϑ0t , ϑt )t∈{1,...,T } with VT (ϑ) ≥ 0 Q-a.s., the value
process V (ϑ) is a (nonnegative) Q-martingale.
Proof. We only establish (a) ⇒ (b).68 So let Q be an EMM for X and ϑ = (ϑ0t , ϑt )t∈{1,...,T } a
self-financing strategy for which ϑ is bounded. Proposition 6.13 gives
t
X
Vt (ϑ) = V0 (ϑ) + Gt (ϑ) = V0 (ϑ) + ϑk · (Xk − Xk−1 ), t ∈ {0, . . . , T }. (6.12)
k=1
Adaptedness of V (ϑ) follows from adaptedness of X and predictability of ϑ.

To show Q-integrability of V (ϑ), fix t ∈ {0, . . . T } and let c > 0 be such that |ϑit | ≤ c for
all i ∈ {1, . . . , d} and all t ∈ {0, . . . , T }. Then by (6.12),

t X
X d t X
X d
ϑik (Xki − Xk−1
i
c |Xki | + |Xk−1
i

|Vt (ϑ)| = V0 (ϑ) + ) ≤ |V0 (ϑ)| + | .

k=1 i=1 k=1 i=1
As |V0 (ϑ)| is a constant and each |Xki | is Q-integrable by the fact that each X i is a Q-
martingale, it follows that |Vt (ϑ)| and hence also Vt (ϑ) is Q-integrable.
67
Note that (ϑ0t )t∈{1,...,T } might be unbounded.
68
For a proof of the more difficult directions (b) ⇒ (c) and (c) ⇒ (a), we refer to [2, Theorem 5.14].
77
To check the Q-martingale property of V (ϑ) (in the form of (6.5)), fix t ∈ {1, . . . , T }.
Then by (6.12), the fact that Vt−1 (ϑ) and ϑt are Ft−1 -measurable, linearity and the pull-out
property of conditional expectations, and the Q-martingale property of each X i , we obtain
E Q Vt (ϑ) Ft−1 = E Q Vt−1 (ϑ) + ϑt · (Xt − Xt−1 ) Ft−1

d
X
E Q ϑit (Xti − Xt−1
i

= Vt−1 (ϑ) + ) Ft−1
i=1
d
X
ϑit E Q Xti − Xt−1
i

= Vt−1 (ϑ) + Ft−1
i=1
Xd
= Vt−1 (ϑ) + ϑit (Xt−1
i i
− Xt−1 )
i=1
= Vt−1 (ϑ) Q-a.s.
Thus V (ϑ) is a Q−martingale.
We proceed to formulate the fundamental theorem of asset pricing in the multiperiod

setup.
Theorem 6.19 (Fundamental Theorem of Asset Pricing). Let S = (St0 , St )t∈{0,...,T } be a
financial market on some filtered probability space (Ω, F, (Ft )t∈{0,...,T } , P ). The following are
equivalent:
(b) There exists an EMM for the discounted risky assets X = S/S 0 .
Proof. We only establish the easy direction (b) ⇒ (a). So let Q ≈ P be an EMM and
ϑ = (ϑ1t , . . . , ϑdt )t∈{1,...,T } a predictable process with GT (ϑ) ≥ 0 P -a.s. By Proposition 6.16, it
suffices to show that GT (ϑ) = 0 P -a.s. As Q ≈ P , we have GT (ϑ) ≥ 0 Q-a.s. and it suffices
b (0, ϑ). Then V0 (ϑ) = 0 and VT (ϑ) = 0 + GT (ϑ) ≥ 0
to show that GT (ϑ) = 0 Q-a.s. Set ϑ =
Q-a.s. Theorem 6.18 gives that V (ϑ) is a Q-martingale, which implies in particular that
0 = V0 (ϑ) = E Q VT (ϑ) F0 = E Q VT (ϑ) .

As VT (ϑ) ≥ 0 Q-a.s., this yields VT (ϑ) = 0 Q-a.s.
We proceed to illustrate the FTAP in the case of the Binomial model.

Example 6.20. Consider the Binomial model from Example 6.8. Set y1 := 1 + u and y2 :=
1 + d so that each Yk takes the values y1 or y2 . Any probability measure Q on F can be
described by
Q[{(x1 , . . . , xT )}] = qx1 qx2 |x1 qx3 |(x1 ,x2 ) × · · · × qxT |(x1 ,...,xT −1 )
78
where qx1 = Q[Y1 = yx1 ] and
qxt |(x1 ,...,xt−1 ) := Q[Yt = yxt | Y1 = yx1 , . . . , Yt−1 = yxt−1 ], t ∈ {2, . . . , T }.
By Example 2.9, Q ≈ P if and only if qx1 , . . . , qxT |(x1 ,...,xt−1 ) > 0 for all x1 , . . . , xT ∈ {1, 2}.
Moreover, if Q ≈ P , then by the equivalent characterisation (6.5) of the martingale property,
Q is an EMM for X 1 if and only if69
1
E Q Xt1 Ft−1 = Xt−1

Q-a.s., t ∈ {1, . . . , T }.
By the fact that

t
Y Yk Yt
Xt1 = S01 = X1 ,
1+r 1 + r t−1
k=1
it follows from the pull-out property of conditional expectations, that X 1 is a Q-martingale if

and only if

Q Yt
E F t−1 = 1 Q-a.s. ⇔ E Q [Yt | Ft−1 ] = 1 + r Q-a.s.
1+r
It follows from the properties of conditional expectations, the fact that

Ft−1 := σ A(x1 ,...,xt−1 ) : x1 , . . . , xt−1 ∈ {1, 2}
and Example 6.2 that
E Q [Y1 | F0 ] = E Q [Y1 ] = q1 (1 + u) + q2 (1 + d),

X
E Q [Yt | Ft−1 ] = E Q Yt A(x1 ,...,xt−1 ) 1A(x1 ,...,xt−1 ) Q-a.s.

x1 ,...,xt−1 ∈{1,2}
Moreover, for x1 , . . . , xt−1 ∈ {1, 2}
E Q Yt A(x1 ,...,xt−1 ) = (1 + u)q1|(x1 ,...,xt−1 ) + (1 + d)q2|(x1 ,...,xt−1 )

So X 1 is a Q-martingale if and only if
q1 (1 + u) + q2 (1 + d) = 1 + r
(1 + u)q1|(x1 ,...,xt−1 ) + (1 + d)q2|(x1 ,...,xt−1 ) = (1 + r), x1 , . . . , xt−1 ∈ {1, 2}, t ∈ {2, . . . , T }
Using that q1 + q2 = 1 and q1|(x1 ,...,xt−1 ) + q2|(x1 ,...,xt−1 ) = 1 it follows that X 1 is a Q-martingale
69
Note that Q integrability of X 1 is trivially satisfied as X 1 is bounded.
79
if and only if
r−d u−r
q1 = and q2 = ,
u−d u−d
r−d u−r
q1|(x1 ,...,xt−1 ) = and q2|(x1 ,...,xt−1 ) = , x1 , . . . , xt−1 ∈ {1, 2}, t ∈ {2, . . . , T }.
u−d u−d
Note that q1|(x1 ,...,xt−1 ) and q2|(x1 ,...,xt−1 ) do not depend on x1 , . . . , xt−1 . This implies that the
Yk are also independent under Q. Clearly, q1 , q2 , q1|(x1 ,...,xt−1 ) , q2|(x1 ,...,xt−1 ) > 0 if and only if
d < r < u. So, in conclusion, the multi-period Binomial model is arbitrage-free if and only if
d < r < u, and in this case there exists a unique EMM for X 1 satisfying
T
Y
Q[{x1 , . . . , xT }] = q xt ,
t=1
r−d u−r
where q1 := u−d and q2 := u−d .
6.6 Valuation of contingent claims

So far, we have fully characterised those models for financial markets in finite discrete time
that are reasonable in the sense that they are arbitrage-free. We proceed to study what we
can say about the price or value of a derivative asset like a European call or put option in
an arbitrage-free market. More precisely, we assume that the market without the derivative
asset is arbitrage-free and want to price the derivative asset in such a way that there are no
new arbitrage opportunities.
Let us first give a general definition of derivative assets.
space (Ω, F, (Ft )t∈{0,...,T } , P ). A nonnegative FT -measurable random variable C is called a
European contingent claim (with maturity T ). It is called a derivative security if it can be
written as a measurable function of St0 , . . . , Std for t ∈ {0, . . . , T }.
Example 6.22. (a) The owner of a European call option on asset i ∈ {1, . . . , d} with strike
K > 0 and maturity T has the right but not the obligation to buy asset i at time T for
price K. Of course, any rational person will exercise (i.e. make use of) the right if and only
if STi (ω) > K, and in this case the net payoff of the option is STi (ω) − K. Otherwise, i.e.,
if STi (ω) ≤ K, the option is worthless. So the value of the option at time T is given by the
contingent claim
C = (STi − K)+ ,
where x+ := max(0, x) for x ∈ R.

(b) The owner of a European put option on asset i ∈ {1, . . . , d} with strike K > 0 and
maturity T has the right but not the obligation to sell the asset i at time T for the price K. Of
80
course, any rational person will exercise (i.e. make use of) the right if and only if STi (ω) < K,
and in this case the net payoff of the option is K − STi (ω). Otherwise, i.e., if STi (ω) ≥ K, the
option is worthless. So the value of the option at time T is given by the contingent claim
C = (K − STi )+ .
(c) Let A denote the event of an extreme weather situation like hail at time T . It is natural
to assume that A is FT -measurable but independent of the market S. A toy example of a
weather derivative is a contract that pays one unit of money at time T if the extreme event
A happens and zero otherwise. The corresponding contingent claim is given by
C = 1A .
This is unlike the call or put option not a derivative security.
Remark 6.23. (a) The (somewhat confusing) qualifier “European” signifies that the contin-
gent claim may exercised only at one date, i.e., at maturity. By contrast, so-called American
contingent claims can be exercised at any time up to and including maturity. Whereas in
reality most contingent claims are American, we only consider European ones in the sequel
because the theory for them is somewhat easier. For an excellent treatment of American
contingent claims, we refer to [2, Chapter 7].
(b) The notion of a European contingent claim can be extended to contracts with maturity
t < T . A contingent claim C with maturity t < T is just a nonnegative Ft -random variable.
We proceed to study the question how we can assign to a contingent claim C a value at
times t < T , in particular at t = 0. Assuming that the underlying market S satisfies NA, we
want to do this in such a way that we do not create any new arbitrage opportunities.
The following example illustrates that this not as straightforward as one might think, and
naively approaching the problem does not work.
Example 6.24. Consider the one-period model S = (St0 , St1 )t∈{0,1} described by the following
trees, where the numbers beside the branches denote probabilities.
1
S0 : 1 1
110
0.98
S 1 : 100 100
0.01
0.01
90
More precisely, set Ω := {ω1 , ω2 , ω3 }, F = F1 = 2Ω , S11 (ω1 ) = 110, S11 (ω2 ) = 100, S11 (ω3 ) = 90
and P [{ω1 }] = 0.98, P [{ω2 }] = 0.01, and P [{ω3 }] = 0.01. Note that the interest rate is zero
81
so that S 1 = X 1 . It is not difficult to check that the market S satisfies NA. Now consider a
call option on S 1 with strike K = 100 and maturity 1, i.e., the contingent claim
C = (S11 − 100)+
If we agree that v C ∈ R is a fair value at time 0, we can represent this “new asset” S 2 by the
tree
10
0.98
S 2 : vC 0
0.01
0.01
0
But what is a fair value for v C ?

A first guess might be to take the expectation v C = E [C] = 9.8.70 However, this is not
b (0, (0.1, −1)) for the “extended market” (S 0 , S 1 , S 2 ).
a good idea. Consider the strategy ϑ =
Then the “extended gains process” G((0.1, −1)) satisfies
G1 ((0.1, −1))(ω) = 0.1 × (X11 − X01 )(ω) − 1 × (X12 − X02 )(ω)




 0.1 × (110 − 100) − 1 × (10 − 9.8) = 0.8 > 0 if ω = ω1 ,

= 0.1 × (100 − 100) − 1 × (0 − 9.8) = 9.8 > 0 if ω = ω2 ,


0.1 × (90 − 100) − 1 × (0 − 9.8) = 8.8 > 0

if ω = ω3 .
So by Proposition 6.16, the extended market (S 0 , S 1 , S 2 ) admits arbitrage.

As the above arbitrage strategy involves shortselling the option, it follows that the option
was overpriced, i.e., we need a smaller value for v C . One might think that v C = 5 (which is
almost half of the expectation of C) is small enough. However, this is not the case. Consider
b (0, (0.5, −1)) for the “extended market” (S 0 , S 1 , S 2 ). Then the “extended
the strategy ϑ =
gains process” G((0.5, −1)) satisfies
G1 ((0.5, −1))(ω) = 0.5 × (X11 − X01 )(ω) − 1 × (X12 − X02 )(ω)




 0.5 × (110 − 100) − 1 × (10 − 5) = 0 ≥ 0 if ω = ω1 ,

= 0.5 × (100 − 100) − 1 × (0 − 5) = 5 > 0 if ω = ω2 ,


0.5 × (90 − 100) − 1 × (0 − 5) = 0 ≥ 0

if ω = ω3 .
So by Proposition 6.16, the extended market (S 0 , S 1 , S 2 ) admits arbitrage.

So the question remains: What is a fair value for v C ?
70
Note that there is not need to discount as the interest rate is 0.
82
We proceed to describe in a systematic way how to find fair, i.e., arbitrage free values for
a contingent claim. We first consider the ideal case.
space (Ω, F, (Ft )t∈{0,...,T } , P ). A contingent claim C is called attainable or replicable if there
exists a self-financing trading strategy ϑ such that
ϑT · S T = C P -a.s.
In this case ϑ is called a replication strategy or (perfect) hedge for C.
A contingent claim C is attainable if and only if the corresponding discounted contingent

claim
C
H := .
ST0
satisfies
H = ϑT · X T = VT (ϑ),
and in this case, we say that the discounted contingent claim H is attainable and call ϑ a
replication strategy for H.
The following result shows that for arbitrage-free markets, the value process of an at-
tainable contingent claim is unique and can be easily computed if one knows at least one
EMM.
space (Ω, F, (Ft )t∈{0,...,T } , P ) and H an attainable discounted contingent claim. Assume that
S satisfies NA and denote by P the set of all EMMs for X = S/S 0 . Then
(a) E Q [H] < ∞ for all Q ∈ P.
(b) E Q1 [H | Ft ] = E Q2 [H | Ft ] P -a.s. for all Q1 , Q2 ∈ P and all t ∈ {0, . . . , T }.
(c) There exists a P -a.s. unique adapted process (VtH )t∈{0,...,T } with VTH = H P -a.s. such
that the extended (1+d+1)-dimensional market (S, S 0 V H ) = (St0 , St1 , . . . , Std , St0 VtH )t∈{0,...,T }
satisfies NA. It is given by
VtH = E Q [H | Ft ] P -a.s., t ∈ {0, . . . , T }, for all Q ∈ P.
(d) If ϑ = (ϑ0t , ϑt )t∈{1,...,T } is any replication strategy for H, then VtH = Vt (ϑ) P -a.s., for
t ∈ {0, . . . , T }.
Proof. As H is attainable, there exists a self-financing strategy ϑ = (ϑ0t , ϑt )t∈{1,...,T } such that
H = VT (ϑ) P -a.s.
83
Let Q ∈ P and note that P =
6 ∅ by the fundamental theorem of asset pricing (Theorem 6.19).
As H = VT (ϑ) ≥ 0 P -a.s. and hence also Q-a.s., it follows from Theorem 6.18 that V (ϑ) is a
Q-martingale. This implies in particular that H = VT (ϑ) is Q-integrable, and so we have (a).
Next, fix Q1 ∈ P and define the process (VtH )t∈{0,...,T } by
VtH = E Q1 [H | Ft ] , t ∈ {0, . . . , T }. (6.13)
Then the Q1 -martingale property of V (ϑ), the fact that H = VT (ϑ) and the fact that Q1 ≈ P
imply that
VtH = E Q1 [H | Ft ] = E Q1 VT (ϑ) Ft = Vt (ϑ) P -a.s.,

t ∈ {0, . . . , T }. (6.14)
As the left-hand side of (6.14) does not depend on ϑ, we have (d).

Now, let Q ∈ P be arbitrary. Then by the Q-martingale property of V (ϑ), the fact that
H = VT (ϑ), (6.14) and the fact that Q ≈ P ,
E Q [H | Ft ] = E Q VT (ϑ) Ft = Vt (ϑ) = VtH P -a.s.,

t ∈ {0, . . . , T } (6.15)
As the right-hand side of (6.15) does not depend on Q, we have (b).

Finally, if (Vt )t∈{0,...,T } is an adapted process with VT = H, by the Fundamental Theorem
of Asset Pricing, the extended market (S, S 0 V ) = (St0 , St1 , . . . , Std , St0 Vt )t∈{0,...,T } satisfies NA
if and only if there exists an EMM for (X, V ), i.e., if and only if there is Q ∈ P such that V
is a Q-martingale. If there exists Q ∈ P such that V is a Q-martingale, then by the fact that
H = VT Q-a.s. (this uses that Q ≈ P ), the martingale property and (6.15), we obtain
Vt = E Q [VT | Ft ] = E Q [H | Ft ] = VtH P -a.s., t ∈ {0, . . . , T }.
On the other hand V H is a Q-martingale for any Q ∈ P by (6.15) and Example 6.7(b). So we
have (c).
Theorem 6.26 completely answers all questions concerning the valuation of a (discounted)
contingent claim H provided that it is attainable. However, it does not give any criterion to
determine whether a containing claim is attainable or not, nor does it provide any guidance
concerning the valuation of non-attainable contingent claims. Not surprisingly, this is more
difficult.
The next result provides a necessary and sufficient criterion to decide whether a contingent
claim is attainable or not. Moreover, it gives a full description of all arbitrage-free prices at
time 0 for a non-attainable contingent claim.71 For a proof, we refer to [2, Theorems 5.29 and
5.32].
71
One can also give a full characterisation of all arbitrage-free values at intermediate times t ∈ {1, . . . , T − 1}
for a non-attainable contingent claim. However, this is rather complicated.
84
space (Ω, F, (Ft )t∈{0,...,T } , P ) and H a discounted contingent claim. Assume that S satisfies
NA and denote by P the set of all EMMs for X = S/S 0 . Then the set of arbitrage-free prices
for H is non-empty and given by
Π(H) = E Q [H] : Q ∈ P and E Q [H] < ∞ .

Moreover:
(a) H is attainable if and only if Π(H) consists of a single element. In this case,
Π(H) = V0H ,

where V H is as in Theorem 6.26.
(b) H is not attainable if and only if Π(H) consists of more than one element. In this case,

Π(H) = πinf (H), πsup (H) ,
where πinf (H) < πsup (H) and
πinf (H) := inf E Q [H] ∈ [0, ∞) and πsup (H) := sup E Q [H] ∈ (0, ∞].
Q∈P Q∈P
With the help of Theorem 6.27, we can complete Example 6.24.
Example 6.28. Consider the setup of Example 6.24. If we describe an equivalent measure
Q ≈ P by the probability vector (q1 , q2 , q3 ), where qi = Q[{ωi }], i ∈ {1, 2, 3}, one can check
that the set P of equivalent martingale measures for X 1 is given by
P = {Qλ : λ ∈ (0, 1)},
where Qλ is described by the probability vector

1 1
λ, 1 − λ, λ .
2 2
So by Theorem 6.27 the fact that that H is bounded, the set of arbitrage-free prices for H = C
(because S11 = 1) is given by

Qλ
1
Π(H) = E [H] : λ ∈ (0, 1) = λ × 10 : λ ∈ (0, 1) = (0, 5).
2
As Π(H) contains more than one element, it follows that H is not attainable and any number
in (0, 5) is a fair value for v C .
85
Let us summarise the steps to value a (discounted) contingent claim H in an arbitrage-free
financial market S in finite discrete time.
(1) Find the set P of all EMMs Q for X = S/S 0 .
(2) Compute E Q [H] for all EMMs Q ∈ P.
(3a) If Q 7→ E Q [H] is constant, then H is attainable, and its unique arbitrage-free value
process is given by
VtH = E Q [H | Ft ] , t ∈ {0, . . . , T },
where Q is any EMM.
(3b) If Q 7→ E Q [H] is not constant, then H is not attainable and E Q [H] is a fair price at
time 0 for all Q ∈ P with E Q [H] < ∞.
In case (3b), we are faced with a genuine problem: There is no unique fair price. So unlike in
the complete case, we have to take preferences of the investor into account. How to do this
exactly, is an ongoing debate in the literature, and there is no easy answer to this question.
Warning: In large parts of the literature, in particular, in credit risk and in more applied
and computational settings, one often just fixes one “nice” equivalent martingale measure Q
(often referred to as the risk-neutral measure) and calls
VtH,Q := E Q [H | Ft ]
the “risk-neutral price” of H at time t. However, if H is not attainable VtH,Q crucially depends
on Q, so that one should at least think very carefully which Q ∈ P one chooses. Otherwise,
there might not be much economic warrant for the resulting “prices”. The key problem is that
unlike in the complete case, those “prices” are not linked to any hedging strategy, and so if one
sells a derivative product for a certain “price”, it is unclear how to hedge the risk involved with
selling the product.
6.7 Complete markets

Valuation of contingent claims is much nicer for attainable than for non-attainable contingent
claims. The best possible case is if all contingent claims are attainable.
Definition 6.29. A financial market S = (St0 , St )t∈{0,...,T } on a filtered probability space

(Ω, F, (Ft )t∈{0,...,T } , P ) is called complete if each contingent claim C is attainable. Otherwise,
it is called incomplete.
The next result shows that there is a very simple criterion in terms of EMMs to decide
whether an arbitrage-free financial market is complete or incomplete.
86
space (Ω, F, (Ft )t∈{0,...,T } , P ). Assume that S satisfies NA. Then S is complete if and only if
there exists a unique EMM for X = S/S 0 .
Proof. Denote by P the set of all EMMs for X. As S satisfies NA, it is nonempty by the
FTAP (Theorem 6.19).
First, if P only contains one element Q, Theorem 6.27 implies that for each discounted
contingent claim H,
Π(H) = {E Q [H]},
and so H is attainable. As an (undiscounted) contingent claim C is attainable if and only if

the corresponding discounted contingent claim H = C/ST0 is attainable, it follows that S is
complete.
Conversely, assume that S is complete. Let Q1 , Q2 ∈ P. Then for A ∈ F = FT , applying
the Theorem 6.26(b) for t = 0 to the discounted contingent claim H := 1A , it follows that
Q1 [A] = E Q1 [H] = E Q1 [H | F0 ] = E Q2 [H | F0 ] = E Q2 [H] = Q2 [A].
As A ∈ F was arbitrary, it follows that Q1 = Q2 .
Theorem 6.30 is sometimes called the second fundamental theorem of asset pricing. To-
gether with the first fundamental theorem of asset pricing (Theorem 6.19) it gives a very
beautiful and conclusive description of financial markets in finite discrete time:
• Existence of an EMM is equivalent to the market being arbitrage-free.
• Uniqueness of an EMM is equivalent to the market being complete (and arbitrage free).
Both results can be extended to continuous or infinite discrete time. However, the precise
formulations become more subtle and the proofs far more difficult.
Remark 6.31. One can show that if a financial market S is complete, then necessarily F = FT
is finite. More precisely, it may have at most (1 + d)T atoms; see [2, Theorem 5.37]. This
shows that even though it makes things nice and simple, completeness is a very restrictive
assumption.
As an application, we establish the so-called put-call parity for complete markets.72
Theorem 6.32. Let S = (St0 , St )t∈{0,...,T } be complete and arbitrage-free financial market on
some filtered probability space (Ω, F, (Ft )t∈{0,...,T } , P ). Assume that St0 = (1 + r)t for some
call
constant interest rate r > −1, t ∈ {0, . . . , T }. Fix i ∈ {1, . . . , d} and K > 0. Denote by V H
the (discounted) value process corresponding to the (undiscounted) call option
C call = (STi − K)+

72
Put call-call parity also holds in incomplete markets. However, the precise formulation is a bit subtle.
87
put
on asset i with (undiscounted) strike K and maturity T and by V H the (discounted) value
process corresponding to the (undiscounted) put option
C put = (K − STi )+
on asset i with (undiscounted) strike K and maturity T . Then we have put-call parity:
call put K
VtH − VtH = Xti − P -a.s., t ∈ {0, . . . , T }.
(1 + r)T
Proof. The (discounted) contingent claims corresponding to the put and the call option are
are given by
+
(S i − K)+

K
H call
= T 0 = XTi − ,
ST (1 + r)T
+
(K − STi )+

K
H put
= = − XTi .
ST0 (1 + r)T
Note that
K
H call − H put = XTi − . (6.16)
(1 + r)T
Since S is arbitrage-free complete, both the call and the put option are attainable, and there
exists an unique EMM Q for the discounted risky assets X = S/S 0 by Theorems 6.19 and
6.30. Moreover, by Theorem 6.26(c)
call put
VtH = E Q H call Ft P -a.s. and VtH = E Q H put Ft P -a.s.

Thus, by linearity of conditional expectations, (6.16) and the Q-martingale property of X i ,

we obtain

call put K
VtH VtH = E H call − H put Ft = E Q XTi −
Q

− Ft
(1 + r)T
K
= Xti − P -a.s. t ∈ {0, . . . , T }.
(1 + r)T
6.8 Pricing and hedging in the binomial model

We conclude this chapter by briefly outlining how the preceding theory can be applied in
the special case of a binomial model S = (St0 , St1 )t∈{0,...,T } , where we always assume that
u > r > d > −1 so that S is arbitrage-free; cf. Examples 6.8 and 6.20.
r−d
It follows from Example 6.20 that S admits a unique EMM Q parametrised by q1 = u−d
u−r
and q2 = u−d . Hence the model is arbitrage-free and complete by Theorems 6.19 and 6.30.
If H is a discounted contingent claim, it is attainable by completeness of S. So by Theorem
88
6.26, it admits a unique arbitrage-free price process (VtH )t∈{0,...,T } given by
VtH := E [H | Ft ] t ∈ {0, . . . , T }
By the Q-martingale property of V H , we can calculate the latter by the recursive algorithm
VTH := H H
:= E Q VtH Ft−1 ,

and Vt−1 t ∈ {1, . . . , T }.
Using that Ft = σ(A(x1 ,...,xt ) : x1 , . . . , xt ∈ {1, 2}) for t ∈ {1, . . . , T }, it follows from Examples
6.2 and 6.20 that
X
VtH = vtH (x1 , . . . , xt )1A(x1 ,...,xt ) , t ∈ {1, . . . , T },
x1 ,...,xt ∈{1,2}
V0H = v0H
where the functions vtH : {1, 2}t → [0, ∞), t ∈ {1, . . . , T }, and the number v0H can be calculated
recursively by
vTH (x1 , . . . , xT ) = H((x1 , . . . , xT )),

H
vt−1 (x1 , . . . , xt−1 ) = q1 vtH (x1 , . . . , xt−1 , 1) + q2 vtH (x1 , . . . , xt−1 , 2), t ∈ {1, . . . , T − 1},
v0H = q1 v1H (1) + q2 v1H (2).
H H
If ϑ b (V0 (ϑ ), ϑH,1 ) is a replication strategy for H, it follows from Theorem 6.26(d) that
=
H
Vt (ϑ ) = VtH , t ∈ {0, . . . , T }.
Combining this with Proposition 6.13(c), we obtain
H H
VtH − Vt−1
H
= Vt (ϑ ) − Vt−1 (ϑ ) = Gt (ϑH ) − Gt−1 (ϑH ) = ϑH,1 1 1
t (Xt − Xt−1 ), t ∈ {1, . . . , T }.
Rearranging yields,
VtH − Vt−1
H
∆VtH
ϑH,1
t = 1 1 := , t ∈ {1, . . . , T } (6.17)
Xt − Xt−1 ∆Xt
As ϑH,1
t is Ft−1 measurable and Ft−1 = σ(A(x1 ,...,xt−1 ) : x1 , . . . , xt−1 ∈ {1, 2}), it follows that
ϑH,1
X
t = ζtH (x1 , . . . , xt−1 )1A(x1 ,...,xt−1 ) , t ∈ {2, . . . , T },
x1 ,...,xt−1 ∈{1,2}
ϑH,1
1 = ζ1H ,
89
where the functions ζtH : {1, 2}t−1 → R, t ∈ {2, . . . T }, and the number ζ1H in R can be
calculated as
vtH (x1 , . . . , xt−1 , 1) − vt−1

H (x , . . . , x
1 t−1 )
ζtH (x1 , . . . , xt−1 ) = 1 1
ξt (x1 , . . . , xt−1 , 1) − ξt−1 (x1 , . . . , xt−1 )
vtH (x1 , . . . , xt−1 , 2) − vt−1
H (x , . . . , x
1 t−1 )
= 1 1 ,
ξt (x2 , . . . , xt−1 , 2) − ξt−1 (x1 , . . . , xt−1 )
v1H (1) − v0H v H (2) − v0H
ζ1H = = 1
ξ1 (1) − ξ0 ξ1 (2) − ξ0
Qt yx k
where ξt1 (x1 , . . . , xt ) := S01 k=1 1+r , t ∈ {1, . . . , T }, and ξ0 := S01 .73
Remark 6.33. (a) Formula (6.17) can be seen as the discrete-time version of the “Delta-
Hedge”, i.e., the derivative of the value process with respect to the underlying.
(b) Note that the value process V H and the (risky part of the) hedging strategy ϑH,1 can
be calculated in parallel while working backwards through the tree for V H .
73
Recall that y1 = (1 + u) and y2 = (1 + d).
90
References
[1] D. Bertsekas, Nonlinear programming, second ed., Athena Scientific, Belmont, MA, 1999.
[2] H. Föllmer and A. Schied, Stochastic Finance, 4th ext. ed., de Gruyter Studies in Mathe-
matics, vol. 27, Walter de Gruyter & Co., Berlin, 2016.
[3] H.-O. Georgii, Stochastics, De Gruyter Textbook, Walter de Gruyter & Co., Berlin, 2008.
[4] J. C. Hull, Options, futures, and other derivatives, eight ed., Pearson, Boston, MA, 2012.
[5] J. Jacod and P. Protter, Probability essentials, second ed., Universitext, Springer-Verlag,
Berlin, 2003.
[6] S. Le Roy and J. Werner, Principles of Financial Economics, second ed., Cambridge Uni-
versity Press, New York, NY, 2014.
[7] A. McNeil, R. Frey, and P. Embrechts, Quantitative risk management, second revised ed.,
Princeton Series in Finance, Princeton University Press, Princeton, NJ, 2015.
91

ST339 - 19-03 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ST339 - 19-03 Notes

Uploaded by

Copyright:

Available Formats

ST339 Introduction to Mathematical Finance

Lecture notes by Martin Herdegen1

This version: March 24, 2020

2 No-Arbitrage and the Fundamental Theorem of Asset Pricing 10

3 Mean-Variance Portfolio Selection and the CAPM 19

5 Introduction to Risk Measures 59

1.1 What is Mathematical Finance?

• uncertainty about the future or risk,

• the passage of time,

• other investment opportunities.

• uncertainty about the future or risk,

• the passage of time,

1.2 A primer on financial assets

1.3 Fundamental concepts of Probability Theory

(a) P [∅] = 0 and P [Ω] = 1;

The triple (Ω, F, P ) is called a probability space.

FP (x) = P [(−∞, x]].

A (real valued) random variable X on a measurable space (Ω, F) is a function Ω → R such

pX (x) = P X [{x}] = P [X = x], x ∈ R.

Then pX (x) = 0 for x ∈ B c and

is a discrete random variable (taking only the values 0 and 1).

A random variable X is called continuous if there exists a measurable (e.g. continuous)

The function fX is also called the probability density function (pdf) of X.

• If X is a simple random variable, i.e., X = c1 1A1 + · · · cn 1An , where ci ∈ R and Ai ∈ F,

E P [X] := lim E P [Xn ] ∈ [0, ∞].

If there is no danger of confusion, we often drop the qualifier P in E P .

Example 1.5. Let Ω = {ω1 , . . . , ωN } be a finite sample space, F = 2Ω and P a probability

Indeed, setting cn := X(ωn ) and An := {ωn } for n ∈ {1, . . . , N }, we get

By definition of the expectation for simple random variables, we obtain

(a) For a, b, c ∈ R, aX + bY + c is again an integrable random variable and

E[aX + bY + c] = aE[X] + bE[Y ] + c.

(b) If X ≥ Y P -a.s., then

Lemma 1.7. Let X be a random variable on a probability space (Ω, F, P ) and h : R → R a

(a) if X is discrete with pmf pX , then h(X) is integrable if and only if

and in this case,

(b) if X is continuous with pdf fX , then h(X) is integrable if and only if

and in this case,

2.1 A mathematical model for a financial market in one period

S00 = 1 and S10 (ω) ≡ 1 + r,

where r > −1 denotes the interest rate.10

P [{ω1 }] = p1 and P [{ω2 }] = p2

2.2 Trading strategies and arbitrage opportunities

In one year, i.e., at t = 1, the value of the trading strategy/portfolio will be

depending on the state of the world ω ∈ Ω.

ϑ · S 0 ≤ 0, ϑ · S 1 ≥ 0 P -a.s. and P [ϑ · S 1 > 0] > 0.

ϑ = (ϑ0 , ϑ1 ) := (200, −1).

whence ϑ is an arbitrage opportunity.14

Thus, ϑ is an arbitrage opportunity with ϑ · S 0 = 0.

Proposition 2.7. The following are equivalent:

(a) The market S satisfies NA.

ϑ · (X1 − X0 ) ≥ 0 P -a.s. and P [ϑ · (X1 − X0 ) > 0] > 0.

Proof. We only prove the more difficult direction (a) ⇒ (b).

We aim to extend ϑ to an arbitrage opportunity ϑ for S by choosing ϑ0 in an appropriate

Then with ϑ := (ϑ0 , ϑ) and X 0 := (X00 , X0 ) = (1, X), we get

Multiplying (2.2) by S00 = 1 gives

ϑ · (X 1 − X 0 ) ≥ 0 P -a.s. and P [ϑ · (X 1 − X 0 ) > 0] > 0. (2.4)

Then plugging (2.2) into (2.4) shows that

ϑ · X 1 ≥ 0 P -a.s. and P [ϑ · X 1 > 0] > 0.

2.4 Equivalent Martingale Measures

Example 2.9. Let Ω = {ω1 , . . . , ωN } be a finite sample space and F = 2Ω . Let P be a

We now can define the concept of an equivalent martingale measure.

Definition 2.10. Let X be discounted risky assets on a probability space (Ω, F, P ). A

(a) The market S satisfies NA.