Professional Documents
Culture Documents
°c by Antonio Mele
London School of Economics & Political Science
October 2009
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
I Foundations 13
1 The classic capital asset pricing model 14
1.1 Portfolio selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.1 The wealth constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.2 Portfolio choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.3 Without the safe asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.4 The market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 The CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 The APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 A first derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 The APT with idiosyncratic risk and a large number of assets . . . . . . 23
1.3.3 Empirical evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Appendix 1: Some analytical details for portfolio choice . . . . . . . . . . . . . . 26
1.4.1 The primal program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.2 The dual program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Appendix 2: The market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5.1 The tangent portfolio is the market portfolio . . . . . . . . . . . . . . . . 29
1.5.2 Tangency condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 Appendix 3: An alternative derivation of the SML . . . . . . . . . . . . . . . . . 31
1.7 Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory . . . . . 32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
10
“Many of the models in the literature are not general equilibrium models in my sense. Of
those that are, most are intermediate in scope: broader than examples, but much narrower
than the full general equilibrium model. They are narrower, not for carefully-spelled-out
economic reasons, but for reasons of convenience. I don’t know what to do with models
like that, especially when the designer says he imposed restrictions to simplify the model
or to make it more likely that conventional data will lead to reject it. The full general
equilibrium model is about as simple as a model can be: we need only a few equations to
describe it, and each is easy to understand. The restrictions usually strike me as extreme.
When we reject a restricted version of the general equilibrium model, we are not rejecting
the general equilibrium model itself. So why bother testing the restricted version?”
Fischer Black, 1995, p. 4, Exploring General Equilibrium, The MIT Press.
Preface
The present Lecture Notes in Financial Economics are based on my teaching notes for advanced
undergraduate and graduate courses on financial economics, macroeconomic dynamics, finan-
cial econometrics and financial engineering. These notes are still underground. The economic
motivation and intuition are not always developed as deeply as they deserve, some derivations
are inelegant, and sometimes, the English is a bit informal. Moreover, I didn’t include yet ma-
terial on asset pricing with asymmetric information, monetary models of asset prices, bubbles,
asset prices implications of overlapping generations models, or financial frictions. Finally, I need
to include more extensive surveys for each topic I cover, especially in Part II. I plan to revise
these notes to fill these gaps. Meanwhile, any comments on this version are more than welcome.
Antonio Mele
October 2009
Part I
Foundations
13
1
The classic capital asset pricing model
X
m X
m X
m
+
w = x0 θ0 + xi θi ≡ Rπ 0 + R̃i π i and w = π 0 + πi. (1.1)
i=1 i=1 i=1
1.1. Portfolio selection c
°by A. Mele
Combining the two expressions for w+ and w, we obtain, after a few simple computations,
where ν is a Lagrange multiplier for the variance constraint. By plugging the first condition
into the second, we obtain, (2ν)−1 = ∓ w·v
√ p , where
Sh
is the Sharpe market performance. To ensure efficiency, we take the positive solution. Substitut-
ing the positive solution for (2ν)−1 into the first order condition, we obtain that the portfolio
that solves [1.P1] is
π̂ (vp ) Σ−1 (b − 1m r)
≡ √ · vp . (1.5)
w Sh
We are now ready to calculate the value of [1.P1], E [w+ (π̂ (vp ))] and, hence, the expected
portfolio return, defined as,
E [w+ (π̂(vp ))] − w √
μp (vp ) ≡ = r + Sh · vp , (1.6)
w
where the last equality follows by simple computations. Eq. (1.6) describes what is known as
the Capital Market Line (CML).
15
1.1. Portfolio selection c
°by A. Mele
where a and ũ are as defined as in Eq. (1.2). We can use Eq. (1.7) to compute the expected
return and the variance of the portfolio value, which are:
£ ¤ £ ¤
E w+ (π) = π > b + w, where w = π > 1m and var w+ (π) = π > Σπ. (1.8)
The program our investor solves, now, is:
£ ¤ £ ¤
π̂ (vp ) = arg max E w+ (π) s.t. var w+ (π) = w2 · vp2 and w = π > 1m . [1.P2]
π∈R
In the appendix, we show that provided αγ − β 2 > 0 (a second order condition), the solution
to [P2] is,
π̂ (vp ) γμp (vp ) − β −1 α − βμp (vp ) −1
= 2 Σ b+ Σ 1m , (1.9)
w αγ − β αγ − β 2
where α ≡ b> Σ−1 b, β ≡ 1> −1 > −1
m Σ b and γ ≡ 1m Σ 1m , and μp (vp ) is the expected portfolio return,
defined as in Eq. (1.6). In the appendix, we also show that,
∙ ¸
2 1 1 ¡ ¢2
vp = 1+ γμp (vp ) − β . (1.10)
γ αγ − β 2
Therefore, the global minimum variance portfolio achieves a variance equal to vp2 = γ −1 and an
expected return equal to μp = β/ γ.
Note that for each vp , there are two values of μp (vp ) that solve Eq. (1.10). The optimal choice
for our investor is that with the highest μp . We define the efficient portfolio frontier as the set
of values (vp , μp ) that solve Eq. (1.10) with the highest μp . It has the following expression,
q
β 1 ¡ 2 ¢¡ ¢
μp (vp ) = + γvp − 1 αγ − β 2 . (1.11)
γ γ
Clearly, the efficient portfolio frontier is an increasing and concave function of vp . It can be
interpreted as a sort of “production function,” one that produces “expected returns” through
inputs of “levels of risk” (see, e.g., Figure 1.1). The choice of which portfolio has effectively to
be selected depends on the investor’s preference toward risk.
Example 1.1. Let the number of risky assets m = 2. In this case, we do not need to
optimize anything, as the budget constraint, πw1 + πw2 = 1, pins down an unique relation between
the portfolio expected return and the variance of the portfolio’s value. So we simply have,
E [w+ (π)]−w
μp = w
= πw1 b1 + πw2 b2 , or,
⎧ π
⎨ μp = b1 + (b2 − b1 ) 2
³ ´2 w ³ ´ ³ ´2
⎩ v2 = 1 − π 2 σ 2 + 2 1 − π 2 π 2 σ 12 + π2 σ 2
p 1 2
w w w w
16
1.1. Portfolio selection c
°by A. Mele
0.15
0.14
ρ = −1
Expected return, mup
0.13 ρ = − 0.5
ρ=0
ρ = 0.5
0.12
ρ=1
0.11
0.1
0.09
0 0.05 0.1 0.15 0.2 0.25
Volatility, vp
FIGURE 1.1. From top to bottom: portfolio frontiers corresponding to ρ = −1, −0.5, 0, 0.5, 1. Param-
eters are set to b1 = 0.10, b2 = 0.15, σ 1 = 0.20, σ 2 = 0.25. For each portfolio frontier, the efficient
portfolio frontier includes those portfolios which yield the lowest volatility for a given expected return.
whence:
q¡ ¢2 ¡ ¢¡ ¢ ¡ ¢2
1
vp = b2 − μp σ 21 + 2 b2 − μp μp − b1 ρσ1 σ 1 + μp − b1 σ 22
b2 − b1
When ρ = 1,
(b1 − b2 ) (σ 1 − vp )
μp = b1 + .
σ2 − σ1
In the general case, diversification pays when the asset returns are not perfectly positively
correlated (see Figure 1.1). As Figure 1.1 reveals, it is even possible to obtain a portfolio that
is less risky than than the less risky asset. Moreover, risk can be zeroed when ρ = −1, which
corresponds to πw1 = σ2σ−σ
2
1
and πw2 = − σ2σ−σ
1
1
or, alternatively, to πw1 = − σ2σ−σ
2
1
and πw2 = σ2σ−σ
1
1
.
Let us return to the general case. The portfolio in Eq. (1.9) can be decomposed into two
components, as follows:
¡ ¢
π̂ (vp ) πd πg β μp (vp ) γ − β
= (vp ) + [1 − (vp )] , (vp ) ≡ ,
w w w αγ − β 2
where
πd Σ−1 b πg Σ−1 1m
≡ , ≡ .
w β w γ
17
1.1. Portfolio selection c
°by A. Mele
πg
Hence, we see that is the global minimum variance portfolio, for we know from Eq. (1.10)
w ³q ´
1 β
that the minimum variance occurs at (vp , μp ) = ,
γ γ
, in which case (vp ) = 0.1 More
generally, we can span any portfolio on the frontier by just choosing a convex combination of
πd πg
and , with weight equal to (vp ). It’s a mutual fund separation theorem.
w w
1 Itis easy to show that the covariance of the global minimum variance portfolio with any other portfolio equals γ −1 .
2 The existence of the market portfolio requires a restriction on r, derived in Eq. (1.12) below.
3 Figure 1.2 also depicts the dotted line MZ, which is the value of the investor’s problem when he invests a proportion higher
than 100% in the market portfolio, leveraged at an interest rate for borrowing higher than the interest rate for lending. In this case,
the CML coincides with rM, up to the point M. From M onwards, the CML coincides with the highest between MZ and MA.
18
1.1. Portfolio selection c
°by A. Mele
P
CML
A
M Z
µM
C
r
vM
FIGURE 1.2.
We turn to characterize the market portfolio. We need to assume that the interest rate is
sufficiently low to allow the CML to be tangent at the efficient portfolio frontier. The technical
condition that ensures this is that the return on the safe asset be less than the expected return
on the global minimum variance portfolio, viz
β
r< . (1.12)
γ
Let π M be the market portfolio. To identify π M , we note that it belongs to AMC if π>
M 1m = w,
where πM also belongs to the CML and, therefore, by Eq. (1.5), is such that:
πM Σ−1 (b − 1m r)
= √ · vM . (1.13)
w Sh
Therefore, we must be looking for the value vM that solves
Σ−1 (b − 1m r)
w = 1> >
m π M = w · 1m √ · vM ,
Sh
i.e. √
Sh
vM = . (1.14)
β − γr
Then, we plug this value of vM into the expression for π M in Eq. (1.13) and obtain,4
πM 1
= Σ−1 (b − 1m r) . (1.15)
w β − γr
4 While the market portfolio depends on r, this portfolio does not obviously include any share in the safe asset.
19
1.2. The CAPM c
°by A. Mele
Naturally, the market portfolio belongs to the efficient portfolio frontier. Indeed, on the
one hand, the market portfolio can not be above the efficient portfolio frontier, as this would
contradict the efficiency of the AMC curve, which is obtained by investing in the risky assets
only; on the other hand, the market portfolio can not be below the efficient portfolio frontier, for
by construction, it belongs to the CML which, as shown before, dominates the efficient portfolio
frontier. In the appendix, we confirm, analytically, that the market portfolio does indeed enjoy
the tangency condition.
Therefore, ¯
dμ̃p (α) ¯ bi − μM
¯ = . (1.17)
dṽp (α) ¯α=0 1
σM
(σ iM − σ 2M )
On the other hand, the slope of the CML is (μM − r)/ σ M which, equated to the slope in Eq.
(1.17), yields,
σ iM
bi − r = β i (μM − r) , β i ≡ 2 , i = 1, · · ·, m. (1.18)
vM
20
1.2. The CAPM c
°by A. Mele
CML
A
µM M A’
i
C
r
vM
FIGURE 1.3.
Eq. (1.18) is the celebrated Security Market Line (SML). The appendix provides an alternative
derivation of the SML. Assets with β i > 1 are called “aggressive” assets. Assets with β i < 1
are called “conservative” assets.
Note, the SML can be interpreted as a projection of the excess return on asset i (i.e. b̃i − r)
on the excess returns on the market portfolio (i.e. b̃M − r). In other words,
The previous relation leads to the following decomposition of the volatility (or risk) related to
the i-th asset return:
σ 2i = β 2i vM
2
+ var (εi ) , i = 1, · · ·, m.
The quantity β 2i vM
2
is usually referred to as systematic risk. The quantity var (εi ) ≥ 0, instead,
is what we term idiosyncratic risk. In the next section, we shall show that idiosyncratic risk
can be eliminated through a “well-diversified” portfolio - roughly, a portfolio that contains a
large number of assets. Naturally, economic theory does not tell us anything substantial about
how important idiosyncratic risk is for any particular asset.
The CAPM can be usefully interpreted within a classical hedging framework. Suppose we
hold an asset that delivers a return equal to z̃ - perhaps, a nontradable asset. We wish to
hedge against movements of this asset by purchasing a portfolio containing a percentage of α
in the market portfolio, and a percentage of 1 − α units in a safe asset. The hedging criterion
we wish to use is the variance of the overall exposure of the position, which we minimize by
minα var[z̃ − ((1 − α) r + αb̃M )]. It is straight forward to show that the solution to this basic
2
problem is, α̂ ≡ β z̃ ≡ cov(z̃, b̃M )/vm . That is, the proportion to hold is simply the beta of the
asset to hedge with the market portfolio.
21
1.3. The APT c
°by A. Mele
The CAPM is a model for the required return for any asset and so, it is a very first tool we
can use to evaluate risky projects. Let
E (C + )
V = value of a project = ,
1 + rC
where C + is future cash flow and rC is the risk-adjusted discount rate for this project. We have:
E (C + )
= 1 + rC
V
= 1 + r + β C (μM − r)
³ + ´
C
cov V − 1, x̃M
= 1+r+ 2
(μM − r)
vM
1 cov (C + , x̃M )
= 1+r+ 2
(μM − r)
V vM
1 ¡ ¢ λ
= 1 + r + cov C + , x̃M ,
V vM
where λ ≡ μM
vM
−r
, the unit market risk-premium.
Rearranging terms in the previous equation leaves:
λ
E (C + ) − vM
cov (C + , x̃M )
V = . (1.20)
1+r
The certainty equivalent C̄ is defined as:
E (C + ) C̄
C̄ : V = = ,
1 + rC 1+r
or,
C̄ = (1 + r) V,
and using Eq. (1.20),
¡ ¢ λ ¡ ¢
C̄ = E C + − cov C + , x̃M .
vM
where a and B are a vector and a matrix of constants, and f is a k-dimensional vector of factors
supposed to affect the asset returns, with k ≤ m. Let us normalize [var(f )]−1 = Ik×k , so that
B = cov(b̃, f ). With this normalization, we have,
⎡ ⎤ ⎡ Pk ⎤
cov(b̃1 , f) j=1 cov( b̃1 , fj )fj
⎢ .. ⎥ ⎢ .. ⎥
b̃ = a + ⎣ . ⎦ · f = a + ⎣ . ⎦.
Pk
cov(b̃m , f) j=1 cov(b̃m , fj )fj
22
1.3. The APT c
°by A. Mele
Next, let us consider a portfolio π including the m risky assets. The return of this portfolio
is,
π> b̃ = π > a + π > Bf,
where as usual, π> 1m = 1. An arbitrage opportunity arises if there exists some portfolio π
such that the return on the portfolio is certain, and different from the safe interest rate r, i.e. if
∃π : π > B = 0 and π> a 6= r. Mathematically, this is ruled out whenever ∃λ ∈ Rk : a = Bλ+1m r.
Substituting this relation into Eq. (1.21) leaves,
The APT collapses to the CAPM, once we assume that the only factor affecting the returns
is the market portfolio. To show this, we must normalize the market portfolio return so that its
variance equals one, consistently with Eq. (1.22). So let r̃M be the normalized market return,
−1
defined as r̃M ≡ vM b̃M , so that var(r̃M ) = 1. We have,
b̃i = a + β i r̃M , i = 1, · · ·, m,
−1
where β i = cov(b̃i , r̃M ) = vM cov(b̃i , b̃M ). Then, we have,
bi = r + β i λ, i = 1, · · ·, m. (1.23)
−1
In particular, β M = cov(b̃M , r̃M ) = vM var(b̃M ) = vM , and so, by Eq. (1.23),
bM − r
λ= ,
vM
which is known as the Sharpe ratio for the market portfolio, or the market price of risk.
−1
By replacing β i = vM cov(b̃i , b̃M ) and the expression for λ above into Eq. (1.23), we obtain,
cov(b̃i , b̃M )
bi = r + 2
(bM − r) , i = 1, · · ·, m.
vM
1.3.2 The APT with idiosyncratic risk and a large number of assets
[Ross (1976), and Connor (1984), Huberman (1983).]
How can idiosyncratic risk be eliminated? Consider, for example, Eq. (1.19). Intuitively, we
may form portfolios with a large number of assets, so as to make idiosyncratic risk negligible, by
the law of large numbers. But would the beta-relation still hold, in this case? More in general,
would the APT relation in Eq. (1.22) be still valid? The answer is in the affirmative, although
it deserves some qualifications.
23
1.3. The APT c
°by A. Mele
Consider the APT equation (1.21), and “add” a vector of idiosyncratic returns, ε, which are
independent of f, and have mean zero and variance σ 2ε :
b̃ = a + B · f + ε.
We wish to show that in the absence of arbitrage, to be defined below, it must be that the
number of assets such that Eq. (1.22) does not hold, N (m) say, is bounded as m gets large,
i.e.:
|ai − ((Bλ)i + r)| > 0, i = 1, · · · , N (m) , (1.24)
where
lim N (m) < ∞. (1.25)
m→∞
In other words, we wish to show that in a “large” market, Eq. (1.22) does indeed hold for most
of the assets, an approach close to that in Huang and Litzenberger (1988, p. 106-108).
By the same arguments leading to Eq. (1.1), the wealth generated by a portfolio of the assets
+
satisfying (1.24), wN(m) say, is,
+ >
¡ ¢ >
¡ ¢
wN (m) = π N(m) a N(m) − 1N (m) r + RwN(m) + π N(m) BN(m) f + εN(m) ,
where aN , BN and εN are (i) the vector of the expected returns, (ii) the return volatility (or
factor exposures) matrix and (iii) the vector of idiosyncratic return components affecting these
assets, and, finally, π N and wN are the portfolio and the initial wealth invested in these assets.
In this context, we may define an arbitrage as the portfolio π N(m) that in the limit, as the
number of all the existing assets m gets large, is riskless and yet delivers an expected return
strictly larger than the safe interest rate, viz
+
E[wN (m) ] +
lim > R, and lim var[wN(m) ] → 0. (1.26)
m→∞ wN (m) m→∞
We want to show that this situation does not arises, under the condition in (1.25), thereby
establishing that the linear APT relation in Eq. (1.22) is valid for most of the assets, in a large
market.
So suppose the linear relation, aN − 1N r = BN λ, doesn’t hold. Then, there exists a portfolio
π such that,
π > BN = 0 and π > (aN − 1N r) 6= 0. (1.27)
Consider the portfolio:
1 ¡ ¢
π̂ N = · sign π> (aN − 1N r) · π,
N
where π is as in (1.27). With this portfolio we have, clearly, that E[wN +
] = π̂ >
N (aN − 1N r) +
+
RwN > RwN , for each N, and even for N large. That is, limm→∞ E[wN(m) ]/wN(m) > R, which
is the first condition in (1.26). As regards the second condition in (1.26), we have that
¡ ¢
var[wN+
] = π̂> > 2 2 >
N BN BN + σ ε IN×N π̂ N = σ ε π̂ N π̂ N ,
+
where the second equality follows by the first relation in (1.27). Clearly, limm→∞ var[wN(m) ]→0
as N (m) → ∞. Hence, in the absence of arbitrage, the condition in (1.25) must hold.
24
1.3. The APT c
°by A. Mele
25
1.4. Appendix 1: Some analytical details for portfolio choice c
°by A. Mele
where ν 1 and ν 2 are two Lagrange multipliers. The first order conditions are,
1 −1
π̂ = Σ (b − ν 2 1m ) , π̂ > Σπ̂ = w2 · vp2 , π̂ > 1m = w. (1A.1)
2ν 1
Using the first and the third conditions, we obtain,
1 1
w = 1>
m π̂ = (1> −1 > −1
m Σ b − ν 2 1m Σ 1m ) ≡ (β − ν 2 γ).
2ν 1 | {z } | {z } 2ν 1
≡β ≡γ
where we have emphasized the dependence of μp on vp , which arises through the presence of the
Lagrange multiplier ν 1 .
Let us rewrite the first equation in (1A.5) as follows,
1 ¡ ¢−1 ¡ ¢
= αγ − β 2 γμp (vp ) − β . (1A.6)
2ν 1 w
We can use this expression for ν 1 to express π̂ in Eq. (1A.2) in terms of the portfolio expected return,
μp (vp ). We have,
µ ¶
π̂ Σ−1 1m ¡ ¢ ¡
2 −1
¢ −1 Σ−1 β
= + αγ − β γμp (vp ) − β Σ b − 1m .
w γ γ
By rearranging terms in the previous equation, we obtain Eq. (1.9) in the main text.
Finally, we substitute Eq. (1A.6) into the second equation in (1A.5), and obtain:
1h ¡ ¢−1 ¡ ¢2 i
vp2 = 1 + αγ − β 2 γμp (vp ) − β ,
γ
which is Eq. (1.10) in the main text. Note, also, that the second condition in (1A.5) reveals that,
µ ¶2
1 γvp2 − 1
= .
2ν 1 w αγ − β 2
Given that αγ − β 2 > 0, the previous equation confirms the properties of the global minimum variance
portfolio stated in the main text.
π̂ ν 1 w −1 ν 2 w −1
= Σ b+ Σ 1m ; π̂ > b = Ep − w ; w = π̂ > 1m ; (1A.7)
w 2 2
where ν 1 and ν 2 are two Lagrange multipliers. By replacing the first condition in (A14) into the second
one, ³
ν1 ν 2 > −1 2 ν1 ν2 ´
Ep − w = π̂ > b = w2 ( |b> Σ−1
{z }b + 1 Σ b) ≡ w α + β . (1A.8)
2 2 | m{z } 2 2
≡α ≡β
Ep −w
Next, let μp ≡ w . By Eqs. (1A.8) and (1A.9), the solutions for ν 1 and ν 2 are,
ν 1w μp γ − β ν 2w α − βμp
= ; =
2 αγ − β 2 2 αγ − β 2
27
1.4. Appendix 1: Some analytical details for portfolio choice c
°by A. Mele
π̂ γμp − β −1 α − βμp −1
= 2 Σ b+ Σ 1m .
w αγ − β αγ − β 2
Finally, the value of the program is,
∙ + ¸
w (π̂) 1 1 μp γ − β 1 > α − μp β γμ2p − 2βμp + α (γμp − β)2 1
var = 2 π̂ > Σπ̂ = π̂ > b + π̂ 1m = = + ,
w w w αγ − β 2 w αγ − β 2 αγ − β 2 (αγ − β 2 )γ γ
28
1.5. Appendix 2: The market portfolio c
°by A. Mele
where we have made use of the equality Sh = α − 2βr + γr2 , obtained by elaborating on the definition
of the Sharpe market performance Sh given in Eq. (1.4). This is indeed the variance of the market
portfolio given in Eq. (1.14).
30
1.6. Appendix 3: An alternative derivation of the SML c
°by A. Mele
where we have used the expression for the market portfolio given in Eq. (1.15). Next, premultiply the
π>
previous equation by M
w to obtain:
2 π>
M πM π> 1 1
vM = Σ = M (b − 1m r) = Sh, (1A.12)
w w w β − γr (β − γr)2
√
Sh
or vM = β−γr , which confirms Eq. (1.14).
Let us rewrite Eq. (1A.11) component by component. That is, for i = 1, · · ·, m,
2
vM
1 vM
σ iM ≡ cov (x̃i , x̃M ) = (bi − r) = √ (bi − r) = (bi − r) ,
β − γr Sh μM − r
√
where the last two equalities follow by Eq. (1A.12) and by the relation, Sh = μM −r
vM . By rearranging
terms, we obtain Eq. (1.18).
31
1.7. Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory c
°by A. Mele
Definition A.1 (Second-order stochastic dominance). x̃2 dominates x̃1 if, for each utility function
u satisfying u0 ≥ 0, we have also that E [u (x̃2 )] ≥ E [u (x̃1 )].
We have:
Proof. We provide the proof when the support is compact, say [a, b]. First, we show that b) ⇒ c).
We have: ∀t0 ∈ [a, b], F1 (t0 ) ≡ Pr (x̃1 ≤ t0 ) = Pr (x̃2 ≤ t0 + η) ≥ Pr (x̃2 ≤ t0 ) ≡ F2 (t0 ). Next, we show
that c) ⇒ a). By integrating by parts,
Z b Z b
E [u (x)] = u(x)dF (x) = u(b) − u0 (x)F (x)dx,
a a
where we have used the fact that: F (a) = 0 and F (b) = 1. Therefore,
Z b
E [u (x̃2 )] − E [u (x̃1 )] = u0 (x) [F1 (x) − F2 (x)] dx.
a
Finally, it is easy to show that a) ⇒ b). k
Definition A.3. x̃1 is more risky than x̃2 if, for each function u satisfying u00 < 0, we have also
that E [u (x̃1 )] ≤ E [u (x̃2 )] for x̃1 and x̃2 having the same mean.
This definition of “increasing risk” does not rely on the sign of u0 . Furthermore, if var (x̃1 ) >
var (x̃2 ), x̃1 is not necessarily more risky than x̃2 , according to the previous definition. The standard
counterexample is the following one. Let x̃2 = 1 w.p. 0.8, and 100 w.p. 0.2. Let x̃1 = 10 w.p. 0.99, and
1090 w.p. 0.01. We have, E (x̃1 ) = E (x̃2 ) = 20.8, but var (x̃1 ) = 11762.204 and var (x̃2 ) = 1647.368.
However, consider u(x) = log x. Then, E (log (x̃1 )) = 2.35 > E (log (x̃2 )) = 0.92. It is easily seen that
in this particular example, the distribution function F1 of x̃1 “intersects” F2 , which is in contradiction
with the following theorem.
32
1.7. Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory c
°by A. Mele
E [u (x̃1 )] = E [u (x̃2 + )]
= E [E ( u (x̃2 + )| x̃2 = x2 )]
≤ E [u (E ( x̃2 + | x̃2 = x2 ))]
= E [u (E ( x̃2 | x̃2 = x2 ))]
= E [u (x̃2 )] .
33
1.7. Appendix 4: Broader definitions of risk - Rothschild and Stiglitz theory c
°by A. Mele
References
Connor, G. (1984): “A Unified Beta Pricing Theory.” Journal of Economic Theory 34, 13-31.
Huang, C-f. and R.H. Litzenberger (1988): Foundations for Financial Economics. New York:
North-Holland.
Ross, S. (1976): “Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory
13, 341-360.
Rothschild, M. and J. Stiglitz (1971): “Increasing Risk: II. Its Economic Consequences.” Jour-
nal of Economic Theory 5, 66-84.
Sharpe, W. F. (1964): “Capital Asset Prices: A Theory of Market Equilibrium under Condi-
tions of Risk.” Journal of Finance 19, 425-442.
34
2
The CAPM in general equilibrium
2.1 Introduction
This chapter develops the general equilibrium foundations to the CAPM, within a framework
that abstracts from the production sphere of the economy. For this reason, we usually refer
the resulting model to as the “Consumption-CAPM.” First, we review the static model of
general equilibrium, without uncertainty. Then, we illustrate the economic rationale behind the
existence of financial assets in an uncertain world. Finally, we derive the Consumption-CAPM.
Assumption 2.1 (Preferences). The utility functions uj satisfy the following properties:
(i) Monotonicity; (ii) Continuity; and (iii) Quasi-concavity: uj (x) ≥ uj (y), and ∀α ∈ (0, 1),
∂u ∂2u
uj (αx + (1 − α)y) > uj (y) or, ∂cijj (c1j , · · ·, cmj ) ≥ 0 and ∂c2j (c1j , · · ·, cmj ) ≤ 0.
ij
Pm Pm
Let Bj (p1 , · · ·, pm ) = {(c1j , · · ·, cmj ) : i=1 pi cij ≤ i=1 pi wij ≡ Rj }, a bounded, closed and
convex set, hence a convex set. Each agent maximizes his utility function subject to the budget
constraint:
This problem has certainly a solution, for Bj is compact set and by Assumption 2.1, uj is
continuous, and a continuous function attains its maximum on a compact set. Moreover, the
Appendix shows that this maximum is unique.
The first order conditions to [P1] are, for each agent j,
⎧ ∂uj ∂uj ∂uj
⎪
⎪ ∂c1j ∂c2j ∂cmj
⎪
⎨ = = ··· =
p1 p2 pm (2.1)
⎪ X m X
m
⎪
⎪ pi cij = pi wij
⎩
i=1 i=1
These conditions form a system of m equations with m unknowns. Let us denote the solution
to this system with [ĉ1j (p, wj ), · · ·, ĉmj (p, wj )]. The total demand for the i-th commodity is,
X
n
ĉi (p, w) = ĉij (p, wj ), i = 1, · · ·, m.
j=1
We emphasize the economy we consider in this chapter is one that completely abstracts from
production. Here, prices are the key determinants of how resources are allocated in the end. The
perspective is, of course, radically different from that taken by the Classical school (Ricardo,
Marx and Sraffa), for which prices and resources allocation cannot be disentangled from the
production side of the economy. In the next chapter and more advanced parts of the lectures,
we consider the asset pricing implications of production, following the Neoclassical perspective.
X
m
¡ ¢
∀p, 0= pi ĉij (p, wj ) − wij . (2.2)
i=1
Next, define the total excess demand for the i-th commodity as ei (p, w) ≡ ĉi (p, w) − wi . By
aggregating the budget constraint across all the agents,
X
n X
m
¡ ¢ Xm
j
∀p, 0 = pi ĉij (p, w ) − wij = pi ei (p, w).
j=1 i=1 i=1
Walras’ law holds by the mere aggregation of the agents’ constraints. But the agents’ constraints
are accounting identities. In particular, Walras’ law holds for any price vector and, a fortiori,
it holds for the equilibrium price vector,
X
m X
m−1
0= p̄i ei (p̄, w) = p̄i ei (p̄, w) + p̄m em (p̄, w). (2.3)
i=1 i=1
Now suppose that the first m−1 markets are in equilibrium, or ei (p̄, w) ≤ 0, for i = 1, ···, m−1.
By the definition of an equilibrium, we have that sign (ei (p̄, w)) p̄i = 0. Therefore, by Eq. (2.3),
we conclude that if m − 1 markets are in equilibrium, then, the remaining market is also in
equilibrium.
2.2.2.2 The notion of numéraire
The excess demand functions are homogeneous of degree zero. Walras’ law implies that if m − 1
markets are in equilibrium, then, the m-th remaining market is also in equilibrium. We wish
to link these two results. A first remark is that by Walras’ law, the equations that define a
competitive equilibrium are not independent. Once m − 1 of these equations are satisfied, the
m-th remaining equation is also satisfied. In other words, there are m − 1 independent relations
and m unknowns in the equations that define a competitive equilibrium. So, there exists an
infinity of solutions.
Suppose, then, that we choose the m-th price to be a sort of exogeneous datum. The result
is that we obtain a system of m − 1 equations with m − 1 unknowns. Provided it exists, such
a solution is a function f of the m-th price, p̄i = fi (p̄m ), i = 1, · · ·, m − 1. Then, we may
refer to the m-th commodity as the numéraire. In other words, general equilibrium can only
determine a structure of relative prices. The scale of these relative prices depends on the price
level of the numéraire. It is easily checked that if the functions fi are homogeneous of degree
one, multiplying pm by a strictly positive number λ does not change the relative price structure.
Indeed, by the equilibrium condition, for all i = 1, · · ·, m,
where the second equality is due to the homogeneity property of the functions fi , and the
last equality holds because the excess demand functions ei are homogeneous of degree zero. In
particular, by defining relative prices as p̂j = pj / pm , one has that pj = p̂j · pm is a function
that is homogeneous of degree one. In other words, if λ ≡ p̄−1 m , then,
µ ¶
p̄1
0 ≥ ei (p̄1 , · · ·, p̄m , w) = ei (λp̄1 , · · ·, λp̄m , w) ≡ ei , · · ·, 1, w .
p̄m
2.2.3 Optimality
Let cj = (c1j , · · ·, cmj ) be the allocation to agent j, j = 1, · · ·, n. The following definition is the
well-known concept of a desiderable resource allocation in a society, according to Pareto.
37
2.2. The static general equilibrium in a nutshell c
°by A. Mele
Theorem 2.3 (First welfare theorem). Every competitive equilibrium is a Pareto optimum.
Proof. Let us suppose on the contrary that c̄ is an equilibrium but not a Pareto optimum.
∗ ∗ ∗
Then, there exists a c : uj ∗ (cj ) > uj ∗ (c̄j ), for some
P j ∗ . Because
P c̄j is optimal for agent j ∗ ,
/ Bj (p̄), or p̄cj > p̄wj and, by aggregating: p̄ nj=1 cj > p̄ nj=1 wj , which is unfeasible. It
∗ ∗
cj ∈
follows that c can not be an equilibrium. k
Next, we show that any Pareto optimal allocation can be “decentralized.” That is, corre-
sponding to a given Pareto optimum c̄, there exist ways of redistributing endowments around,
and a price vector p̄ : p̄c̄ = p̄w, which is an equilibrium for the initial set of resources.
Theorem 2.4 (Second welfare theorem). Every Pareto optimum can be decentralized.
The previous theorem can be interpreted as one that supports an equilibrium with transfer
payments. For any given Pareto optimum c̄j , a social planner can always give p̄wj to each
agent (with p̄c̄j = p̄wj , where wj is chosen by the planner), and agents choose c̄j . Figure 2.1
illustratres such a decentralization procedure within the Edgeworth’s box. Suppose that the
objective is to achieve c̄. Given an initial allocation w chosen by the planner, each agent is
given p̄wj . Under laissez faire, c̄ will obtain. In other words, agents are given a constraint of
the form pcj = p̄wj . If wj and p̄ are chosen so as to induce each agent to choose c̄j , then p̄ is a
supporting equilibrium price. In this case, the marginal rates of substitutions are identical, as
established by the following celebrated result:
Theorem 2.5 (Characterization of Pareto optima: I). A feasible allocation c̄ = (c̄1 , · · ·, c̄n )
m−1
is a Pareto optimum if and only if there exists a φ̃ ∈ R++ such that
à ∂uj ∂uj !
∂c2j ∂cmj
5̃uj = φ̃, j = 1, · · ·, n, where 5̃uj ≡ ∂uj
, · · ·, ∂uj
. (2.4)
∂c1j ∂c1j
But because a competitive equilibrium is also a Pareto optimum, then, by Theorem 2.5,
µ ¶
φ2 φm
5̃uj = φ̃ ≡ ,··· , .
φ1 φ1
Hence, φ̃ represents the vector of relative, shadow prices arising within the centralized allocation
process.
We provide a further characterization of Pareto optimal allocations.
39
2.2. The static general equilibrium in a nutshell c
°by A. Mele
Theorem 2.6 (Characterization of Pareto optima: II). A feasible allocation c̄ = (c̄1 , · · ·, c̄n )
is a Pareto optimum if and only if there exists > 0 such that c̄ is solution to the following
program:
X
n
¡ j¢ X
n
u (w, ) = 1maxn j uj c subject to cj ≤ w (ψj , j = 1, · · ·, m) [P2]
c ,··· ,c
j=1 j=1
Proof. The if part is simple and at the same time instructive. Let us solve the program in
[P2]. The Lagrangian is,
Xn
¡ j¢ X m X
n
L= j uj c − ψi (cij − wij ) ,
j=1 i=1 j=1
So agents with high marginal utility of income for a given price vector, will receive little
social weight in the centralized planner allocation procedure. This result is particularly useful
when it comes to study financial markets in economies with heterogeneous agents. Theorem 2.7
is also a point of reference, where to move from, when it comes to study asset prices in a world
of incomplete markets. Chapter 8 contains several examples of these applications.
By comparing the competitive equilibrium solution in Eq. (2.6) with the Pareto optimality
property of the equilibrium in Eq. (2.5), we deduce that, a competitive equilibrium (c̄, p) can
be implemented, by a social planner acting as in Theorem 2.6, whenPj = 1/κj , in which case
it also follows that, necessarily, ψ = p, by the resources constraint, nj=1 cj ≤ w, that has to
hold both in the competitive economy and the centralized one. k
40
2.3. Time and uncertainty c
°by A. Mele
General equilibrium theory can be used to study a variety of fields, by making an appropriate
use of the previous definition - from the theory of international commerce to finance. To deal
with uncertainty, Debreu (1959, Chapter 7) extended the previous definition, by emphasizing
that a commodity should be described through a list of physical properties, with the structure
of dates and places replaced by some event structure. The following example illustrates the
difference between two contracts underlying delivery of corn arising under conditions of certainty
(case A) and uncertainty (case B):
A The first agent will deliver 5000 tons of corn of a specified type to the second agent, who
will accept the delivery at date t and in place .
B The first agent will deliver 5000 tons of corn of a specified type to the second agent, who
will accept the delivery in place and in the event st at time t. If st does not occur at
time t, no delivery will take place.
nil otherwise. More generally, a financial asset is a function x : S 7→ R, where S is the set
of all future events. Then, let m be the number of financial assets. To link financial assets to
commodities, we note that if the of nature s will occurs, then, any agent could use the payoff
xi (s) promised by the i-th assets Ai to finance net transactions on the commodity markets, viz
X
m
p (s) · e (s) = θi xi (s), ∀s ∈ S, (2.7)
i=1
where p(s) and e(s) denote some vectors of prices and excess demands related to the commodi-
ties, contingent on the realization of state s, and θi is the number of assets i held by the agent.
In other words, the role of financial assets, here, is to transfer value from a state of nature to
another to finance state-contingent consumption.
Unfortunately, Eq. (2.7) does not hold, in general. A condition is that the number of assets,
m, be sufficiently high to let each agent cope with the number of future events in S, sn . Market
completeness merely reduces to a size problem - the assets have to be sufficiently diverse to
span all possible events in the future. Indeed, we shall show that if there are not payoffs that
are perfectly correlated, then, markets are complete if and only if m = sn . Note, also, that
this reduces the dimension of our original problem, for we are then considering a competitive
equilibrium in sn + m markets, instead of a competitive equilibrium in sn · m markets.
where xi (sj ) is the payoff promised by the i-th asset in the state sj . Then, to implement any
state contingent consumption plan c ∈ Rsn , Mr Law has to be able to solve the following system,
c = X · θ,
where and θ ∈ Rm , the portfolio. A unique solution to the previous system exists if rank(X) =
sn = m, and is given by θ̂ = X −1 c. Consider, for example, the previous case, in which sn = 2.
Let us assume that m = 2, for any additional assets would be redundant here. Then, we have,
⎧
⎪
⎪ (1 + x2 (r))cs − (1 + x2 (s))cr
⎨ θ̂1 =
S1 [(1 + x1 (s))(1 + x2 (r)) − (1 + x1 (r))(1 + x2 (s))]
⎪
⎪ (1 + x1 (s))cr − (1 + x1 (r))cs
⎩ θ̂2 =
S2 [(1 + x1 (s))(1 + x2 (r)) − (1 + x1 (r))(1 + x2 (s))]
Finally, assume that the second asset is safe, or that it yields the same return in the two states
of nature: x2 (r) = x2 (s) ≡ r. Let xs = x1 (s) and xr = x1 (r). Then, the pair (θ̂1 , θ̂2 ) can be
rewritten as,
cs − cr (1 + xs ) cr − (1 + xr ) cs
θ̂1 = , θ̂2 = .
S1 (xs − xr ) S2 (1 + r) (xs − xr )
As is clear, the issues we are dealing with relate to the replication of random variables. Here,
the random variable is a state contingent consumption plan (ci )i=r,s , where cr and cp are known,
which we want to replicate for hedging purposes. (Mr Law will need to buy either a pair of
sun-glasses or an umbrella, tomorrow.)
In the previous two-state example, two assets with independent payoffs are able to generate
any two-state variable. The next step, now, is to understand what happens when we assume
that there exists a third asset, A say, that delivers the same random variable (ci )i=r,s we can
obtain by using the previous pair (θ̂1 , θ̂2 ).
We claim that if the current price of the third asset A is H, then, it must be that,
for the financial market to be free of arbitrage opportunities, to be defined informally below.
Indeed, if V < H, we can buy θ̂ and sell at the same time the third asset A. The result is a sure
profit, or an arbitrage opportunity, equal to H − V , for θ̂ generates cr if tomorrow it will rain
and cr if tomorrow it will not rain. In both cases, the portfolio θ̂ generates the payments that
are necessary to honour the contract committments related to the selling of A. By a symmetric
argument, the inequality V > H would also generate an arbitrage opportunity. Hence, Eq. (2.9)
must hold true.
It remains to compute the right hand side of Eq. (2.9), which in turn leads to an evaluation
formula for the asset A. We have:
1 xr − r
H= [P ∗ cs + (1 − P ∗ )cr ] , P∗ = . (2.10)
1+r xs − xr
Importantly, then, H can be understood as the discounted (by 1 + r) expectation of payoffs
promised by A, taken under some “artificial” probability P ∗ .
Remark 2.8. In this introductory example, the asset A can be priced without making
reference to any agents’ preferences. The key observation to obtain this result is that the
43
2.5. Absence of arbitrage c
°by A. Mele
payoffs promised by A can be obtained through the portfolio θ̂. This fact does not obviously
mean that any agent should use this portfolio. For example, it may be the case that Mr Law
is so poor that his budget constraint would not even allow him to implement the portfolio θ̂.
The point underlying the previous example is that the portfolio θ̂ could be used to construct an
arbitrage opportunity, arising when Eq. (2.9) does not hold. In this case, any penniless agent
could implement the arbitrage described above.
The next step is to extend the results in Eq. (2.10) to a dynamic setting. Suppose that
an additional day is available for trading, with the same uncertainty structure: the day after
tomorrow, the asset A will pay off css if it will be sunny (provided the previous day was sunny),
and crs if it will be sunny (provided the previous day was raining). By using the same arguments
leading to Eq. (2.10), we obtain that:
1 £ ∗2 ∗ ∗ ∗ ∗ ∗2
¤
H= P css + P (1 − P )csr + (1 − P )P crs + P crr .
(1 + r)2
1
H= E ∗ (cT ) , (2.11)
(1 + r)T
Let vsi ≡ vi (ωs ), vs,· ≡ [vs1 , · · · , vsm ], v·,i ≡ [v1i , , · · · , vdi ]> . We assume that rank(V ) = m ≤ d.
44
2.5. Absence of arbitrage c
°by A. Mele
c1 − w1 = V θ.
We define an arbitrage opportunity as a portfolio that has a negative value at the first period,
and a positive value in at least one state of world in the second period, or a positive value in
all states of the world in the second period and a nonpositive value in the first period.
Notation: ∀x ∈ Rm , x > 0 means that at least one component of x is strictly positive while
the other components of x are nonnegative. x À 0 means that all components of x are strictly
positive. [Insert here further notes]
As we shall show below (Theorem 2.11), an arbitrage opportunity can not exist in a com-
petitive equilibrium, for the agents’ program would not be well defined in this case. Introduce,
then, the (d + 1) × m matrix, ∙ ¸
−S
W = ,
V
the vector subspace of Rd+1 ,
© ª
hW i = z ∈ Rd+1 : z = W θ, θ ∈ Rm ,
The economic interpretation of the vector subspace hW i is that of the excess demand space for
all the states of nature, generated by the “wealth transfers” generated
© by the investments in the
ª
⊥ ⊥ d+1
assets. Naturally, hW i and hW i are orthogonal, as hW i = x ∈ R : xz = 0m , z ∈ hW i .
Mathematically, the assumption that there are no arbitrage opportunities is equivalent to the
following condition, T
hW i Rd+1
+ = {0} . (2.12)
The interpertation of (2.12) is in fact very simple. In the absence of arbitrage opportunities,
there should be no portfolios generating “wealth transfers” that are nonnegative and strictly
positive in at least one state, i.e. @θ : W θ > 0. Hence, hW i and the positive orthant Rd+1
+ can
not intersect.
1V θ ≥ 0 means that [V θ]j ≥ 0, j = 1, · · ·, d, i.e. it allows for [V θ]j = 0, j = 1, · · ·, d.
2V θ > 0 means [V θ]j ≥ 0, j = 1, · · ·, d, with at least one j for which [V θ]j > 0.
45
2.5. Absence of arbitrage c
°by A. Mele
The following result provides a general characterization of how the no-arbitrage condition in
(2.12) restricts the price of all the assets in the economy.
The previous theorem provides the foundations for many developments in financial economics.
To provide its intuition, let us pre-multiply the second constraint by φ> , obtaining,
φ> (c1 − w1 ) = φ> V θ = Sθ = − (c0 − w0 ) ,
where the second equality follows by Theorem 2.10, and the third equality is due to the first
period budget constraint. Critically, then, Theorem 2.10 shows that in the absence of arbitrage
opportunities, each agent has access to the following budget constraint,
¡ 1 ¢ X
d
¡ ¢
> 1
0 = c0 − w0 + φ c − w = c0 − w0 + φs (cs − ws ) , with c1 − w1 ∈ hV i . (2.13)
s=1
The budget constraints in (2.13) reveal that φ can be interpreted as the vector of prices to
the commodity in the future d states of nature, and that the numéraire in this economy is
the first-period consumption. We usually refer φ to as the state price vector, or Arrow-Debreu
state price vector. However, it would be misleading to say that the budget constraint in (2.13)
is that we are used to see in the static Arrow-Debreu type model of Section 2.2. In fact, the
Arrow-Debreu economy of Section 2.2 obtains when m = d, in which case hV i = Rd in (2.13).
This case, which according to Theorem 2.10 arises when markets are complete, also implies the
remarkable property that there exists a unique φ that is compatible with the asset prices we
observe.
The situation is radically different if m < d. In other terms, hV i is the subspace of excess
demands agents have access to in the second period and can be “smaller” than Rd if markets
are incomplete. Indeed, hV i is the subspace generated by the payoffs obtained by the portfolio
choices made in the first period,
© ª
hV i = e ∈ Rd : e = V θ, θ ∈ Rm .
2
¡v1 ¢case d = 2 and m = 1. In this case, hV i = {e ∈ R : e = V θ, θ ∈ R},
Consider, for example, the
with V = V1 , where V1 = v2 say, and dim hV i = 1, as illustrated by Figure 2.2.
Next, suppose we open a new market forna second financial asset with payoffs o given by: V2 =
¡v3 ¢ v1 v3
¡θ1 v1 +θ2 v3 ¢
v4
. Then, m = 2, V = ( v2 v4 ), and hV i = e ∈ R : e = θ1 v2 +θ2 v4 , θ ∈ R , i.e. hV i = R2 . As
2 2
a result, we can now generate any excess demand in R2 , just as in the Arrow-Debreu economy
of Section 2.2. To generate any excess demand, we multiply the payoff vector V1 by θ1 and the
payoff vector V2 by θ2 . For example, suppose we wish to generate the payoff the payoff vector
V4 in Figure 2.3. Then, we choose some θ1 > 1 and θ2 < 1. (The exact values of θ1 and θ2
are obtained by solving a linear system.) In Figure 2.3, the payoff vector V3 is obtained with
θ1 = θ2 = 1.
To summarize, if markets are complete, then, hV i = Rd . If markets are incomplete, hV i
is only a subspace of Rd , which makes the agents’ choice space smaller than in the complete
markets case.
46
2.5. Absence of arbitrage c
°by A. Mele
v2
v1
<V>
v4 V3
V2 V4
v2
V1
v3 v1
47
2.6. Equivalent martingales and equilibrium c
°by A. Mele
We now present a fundamental result, about the “viability of the model.” Define the second
period consumption c1j ≡ [c1j , · · · , cdj ]> , where csj is the second-period consumption in state s,
and let,
½
¡ 1
¢ £ 1
¤ c0j − w0j = −Sθj
ĉ0j , ĉj ∈ arg max1 uj (c0j ) + β j E(ν j (cj )) , subject to [P3]
c0j ,cj c1j − wj1 = V θj
where uj and ν j are utility functions, both satisfying Assumption 2.1. Naturally, we could use
more general formulations of utilities than that in [P3], and in fact we shall in more advanced
parts of this book. For the sake of this introductory chapter, we only consider additive utility.
We have:
Theorem 2.11. The program [P3] has a solution if and only if there are no arbitrage oppor-
tunities.
Proof. Let us suppose on the contrary that the program [P3] has a solution ĉ0j , ĉ1j , θ̂j , but
that there exists a θ : W θ > 0. The program constraint is, with straight forward notation,
ĉj = wj + W θ̂j . Then, we may define a portfolio θj = θ̂j + θ, such that cj = wj + W (θ̂j + θ) =
ĉj + W θ > ĉj , which contradicts the optimality of ĉj . For the converse, note that the absence of
arbitrage opportunities implies that ∃φ ∈ Rd++ : S = φ> V , which leads to the budget constraint
in (2.13), for a given φ. This budget constraint is clearly a closed subset of the compact budget
constraint Bj in [P1] (in fact, it is Bj restricted to hV i). Therefore, it is a compact set and,
hence, the program [P3] has a solution, as a continuous function attains its maximum on a
compact set. k
Definition 2.12. An equilibrium is given by allocations and prices {(ĉ0j )nj=1 , ((ĉsj )nj=1 )ds=1 ,
(Ŝi )m n nd d
i=1 ∈ R+ × R+ × R+ }, where the allocations are solutions of the program [P3] and satisfy:
X
n X
n X
n
0= (ĉ0j − w0j ) , 0= (ĉsj − wsj ) (s = 1, · · ·, d) , 0= θij (i = 1, · · ·, d) .
j=1 j=1 j=1
We now express demand functions in terms of the stochastic discount factor, and then look
for an equilibrium by looking for the stochastic discount factor that clears the commodity
markets. By Walras’ law, this also implies the equilibrium on the financial market. Indeed, by
aggregating the agent’s constraints in the second period,
X
n
¡ 1 ¢ X
n
cj − wj1 = V θij (m).
j=1 j=1
For simplicity, we also assume that u0j (x) > 0, u00j (x) < 0 ∀x > 0 and limx→0 u0j (x) = ∞,
limx→∞ u0j (x) = 0 and that ν j satisfies the same properties.
48
2.6. Equivalent martingales and equilibrium c
°by A. Mele
X
d
Si = φ> v·,i = φs vs,i , i = 1, · · ·, m. (2.14)
s=1
Let us assume that the first asset is a safe asset, i.e. vs,1 = 1 ∀s. Then, we have
1 X d
S1 ≡ = φs . (2.15)
1+r s=1
Eq. (2.15) confirms the economic interpretation of the state prices in (2.13). Recall, the states
of nature are exhaustive and mutually exclusive. Therefore, φs can be interpreted as the price
to be paid today for obtaining, for sure, one unit of numéraire, tomorrow, in state s. This is
indeed the economic interpretation of the budget constaint in (2.13). Eq. (2.15) confirms this
as it says that the prices of all these rights sum up to the price of a pure discount bond, i.e. an
asset that yields one unit of numéraire, tomorrow, for sure.
Eq. (2.15) can be elaborated to provide us with a second interpretation of the state prices in
Theorem 2.10. Define,
Ps∗ ≡ (1 + r)φs ,
which satisfies, by construction,
X
d
Ps∗ = 1.
s=1
1 X ∗
d
1 ∗
Si = Ps vs,i = E P (v·,i ) , i = 1, · · ·, m. (2.16)
1 + r s=1 1+r
Eq. (2.16) confirms Eq. (2.10), obtained in the introductory example of Section 2.5. It says
that the price of any asset is the expectation of its future payoffs, taken under the proba-
bility P ∗ , discounted at the risk-free interest rate r. For this reason, we usually refer to the
probability P ∗ as the risk-neutral probability. Eq. (2.16) can be extended to a dynamic con-
text, as we shall see in later chapters. Intuitively, consider an asset that distributes dividends
in every period, let S (t) be its price at time t, and D (t) the dividend paid off at time t.
49
2.6. Equivalent martingales and equilibrium c
°by A. Mele
Then, the “payoff” it promises for the next period is S (t + 1) + D (t + 1). By Eq. (2.16),
S (t) = (1 + r)−1 E P (S (t + 1) + D (t + 1)) or, by rearranging terms,
∗
µ ¶
P∗ S (t + 1) + D (t + 1) − S (t)
E = r. (2.17)
S (t)
That is, the expected return on the asset under P ∗ equals the safe interest rate, r. In a dynamic
context, the risk-neutral probability P ∗ is also referred to as the risk-neutral martingale measure,
or equivalent martingale measure, for the following reason. Define a money market account as
an asset with value evolving over time as M (t) ≡ (1 + r)t . Then, Eq. (2.17) can be rewritten
∗
as S (t) /M (t) = E P [(S (t + 1) + D (t + 1)) /M (t + 1)]. This shows that if D (t + 1) = 0 for
some t, then, the discounted process S (t) /M (t) is a martingale under P ∗ .
Next, let us replace P ∗ into the budget constraint in (2.13), to obtain, for (c1 − w1 ) ∈ hV i,
X
d
1 X ∗
d
1 ∗ ¡ ¢
0 = c0 −w0 + φs (cs − ws ) = c0 −w0 + Ps (cs − ws ) = c0 −w0 + E P c1 − w1 .
s=1
1 + r s=1 1+r
(2.18)
For reasons developed below, it is also useful to derive an alternative representation of the
budget constraint, in terms of the objective probability P (say). Let us introduce, first, the
ratio η, defined as,
Ps∗ = η s Ps , s = 1, · · ·, d.
The ratio ηs indicates how far P ∗ and P are. We assume ηs is strictly positive, which means
that P ∗ and P are equivalent measures, i.e. they assign the same weight to the null sets. Finally,
let us introduce the stochastic discount factor, m = (ms )ds=1 , defined as,
ms ≡ (1 + r)−1 η s .
We have,
1 ∗ ¡ ¢ X 1 d X 1 d
£ ¡ ¢¤
E P c1 − w1 = Ps∗ (cs − ws ) = ηs (cs − ws ) Ps = E m · c1 − w1 .
1+r s=1
1+r 1+r
s=1 | {z }
=ms
Similarly, by replacing the stochastic discount factor m into Eq. (2.16) we obtain,
1 ∗
Si = E P (v·,i ) = E (m · v·,i ) , i = 1, · · ·, m. (2.19)
1+r
Naturally, despite all such different ways to express budget constraints and asset prices, the
key of the model is still φ,
Ps∗ φ
ms = (1 + r)−1 ηs = (1 + r)−1 = s,
Ps Ps
which can be recovered, once we solve for the equilibrium stochastic discount factor m, as we
shall illustrate in the next section.
50
2.6. Equivalent martingales and equilibrium c
°by A. Mele
In the complete markets case, hV i = Rd , so that the first order conditions to the program [P4]
are,
u0j (ĉ0j ) = λj , β j ν 0j (ĉsj ) = λj ms , s = 1, · · ·, d,
where λj is a Lagrange multiplier. So, really, the properties of this model are the same as those
of the static model in Section 2.2. Formally, the complete markets economy in this section is the
same as the static economy in Section 2.2, once we set m = d, where m is the dimension of the
commodity space, in Section 2.2, and ps = φs , where ps is the price of the s-th commodity in
Section 2.2, with p1 = 1 (the numéraire), and φs is the Arrow-Debreu state price in the unified
budget constraint of Eq. (2.18).
These simple observations have profound implications: an economy subject to uncertainty can
be understood through a static model, in the presence of complete markets! Under the conditions
stated in Section 2.2, even complicated models with heterogeneous agents, with potentially
interesting asset pricing implications, and still, apparently, so hopelessly difficult to analyze,
can actually be “centralized,” through a dedicated design of Pareto’s weights, as formalized
in Theorem 2.7. We can actually do much more. First, this centralization property is easily
extended to a dynamic context, as we shall see in more advanced parts of these lectures (see
Chapter 8), provided markets satisfy the property of being dynamically complete, a property
explained in the next two chapters. Second, the assumption agents can exchange Arrow-Debreu
securities for all future states of the world, is clearly unrealistic: markets are pretty likely to
be incomplete, one possible reason why financial innovation is so pervasive, in practice. Yet
the theory about centralization can be extended to an incomplete markets setting, through a
system of “stochastic Pareto weights,” as we discuss in detail in Chapter 8. For now, let us
proceed with the next simple and fundamental steps.
To illustrate the equilibrium implications of the first order conditions in a simple case, consider
an economy with a single agent. In this economy, the first order conditions immediately lead to
the following stochastic discount factor,
ν 0 (ws )
ms = β .
u0 (w0 )
The economic interpretation of this stochastic discount factor is the following. In the autarchic
state, ¯
dc0 ¯¯ ν 0 (ws )
− = β Ps = ms Ps = φs
dcs ¯c0 =w0 ,cs =ws u0 (w0 )
is the present consumption the agent is willing to give up to at t = 0, in order to obtain
additional consumption at time t = 1, in state s. In other words, φs is the price, in terms of the
present consumption numéraire, of one additional unit of consumption at time t = 1 and state
s. So it is a state price, such that, the agent is happy to consume his own endowment, without
51
2.6. Equivalent martingales and equilibrium c
°by A. Mele
any incentives to trade in the financial markets. The risk-neutral probability is,
ν 0 (ws )
Ps∗ = η s Ps = (1 + r) ms Ps = (1 + r) β 0 Ps .
u (w0 )
By the first
Pdorder∗conditions, and the pure discount bond evaluation formula, it is easily checked
that 1 = s=1 Ps . Moreover,
∙ µ 0 ¶¸−1
Ps∗ ν (ws ) 0
−1 u (w0 ) ν 0 (ws )
= ms (1 + r) = ms βE = ms β = ,
Ps u0 (w0 ) E [ν 0 (ws )] E [ν 0 (ws )]
1
where the second equality follows by the pure discount bond evaluation formula: 1+r = E(m).
In the multi-agent case, the situation is similar as soon as markets are complete. Indeed,
consider the first order conditions of each agent,
ν 0j (ĉsj )
βj = ms , s = 1, · · ·, d, j = 1, · · ·, n.
u0j (ĉ0j )
The previous relation reveals that as soon as markets are complete, agents must have the same
marginal rate of substitution, in equilibrium. This is because by Theorem 2.10, the state price
vector φ is unique if and only if markets are complete, which then implies uniqueness of ms = Pφss
ν 0 (ĉsj )
and, hence, the fact that each marginal rate of substitution β j uj0 (ĉ0j ) is independent of j. In this
j
case, the equilibrium allocation is clearly a Pareto optimum, by the discussion at the beginning
of this section, and Theorem 2.5.
The result that agents have the same marginal rate of substitution for each state of the world
is known as risk sharing. It means that, given an initial endowment distribution among the
agents, the market mechanism, through to a system of complete securities markets, is such that
consumption risk is shifted around the economy, so that it is borne by the agents most willing to
take it. For example, suppose that two agents 1 and 2 have the same discount rate, and utility
functions uj = ν j , with CRRA given by η1 and η2 , where η1 < η2 . Then, Grs1 = (Grs2 )η2 /η1 ,
where Grsi is consumption growth for the i-th agent in state s. In good times, when Grs2 > 1,
the more risk-averse agent experiences, ex-post, a lower consumption growth rate, Grs2 < Grs1 .
In bad times, however, when Grs2 < 1, the more risk-averse agent experiences, ex-post, a higher
consumption growth rate, Grs2 > Grs1 . In other words, capital markets, when complete, operate
in such a way to have the more risk-averse agent face a less volatile consumption growth.
If markets are incomplete, marginal rates of substitution cannot be equal, among agents, except
perhaps on a set of endowments distribution with measure zero. The best outcome in this case,
is a set of equilibria called constrained Pareto optima, i.e. constrained by ... the states of nature.
As it turns out, there might not even exist constrained Pareto optima in multiperiod economies
with incomplete markets–except perhaps those arising on a set of endowments distributions
with zero measure.
When market are incomplete, the state price vector φ is not unique. That is, suppose that
>
φ is an equilibrium state price. Then, all the elements of
are also equilibrium state prices - there exists an infinity of equilibrium state prices that are
consistent with absence of arbitrage opportunities. In other words, there exists an infinity of
equilibrium state prices guaranteeing the same observable assets price vector S, for φ0 > V =
φ> V = S.
How do we proceed in this case? Introduce the following budget constraint:
© ¡ ¢ ¡ ¢ ª
C = c ∈ Rd++ : 0 = c0 − w0 + φ> c1 − w1 , c1 − w1 ∈ hV i , ∀φ ∈ Rd++ : S = φ> V .
(2.21)
This budget constraint, and the previous reasoning about the set Φ in (2.20) shows that in the
context of incomplete markets, there exists many constraints to take care of, and the previous
“martingale methods,” do not apply.
Yet let Val (PI ) be the value of the following program in the incomplete markets at hand:
£ ¤
max uj (c0j ) + β j E(ν j (c1j )) . [PI ]
c∈C
and let Val (Pφ ) be the value of the program in some abstract complete markets case:
£ ¤
max uj (c0j ) + β j E(ν j (c1j )) . [Pφ ]
c∈Cφ
Clearly, we have, Val (PI ) ≤ Val (Pφ ) for all φ, for the constraint in the incomplete markets
case, C, is more stringent than that in any complete market setting, Cφ : the solution to the
program in the incomplete markets case [PI ], must satisfy the budget constraints in C, formed
using all of the possible Arrow-Debreu state prices (including the Arrow-Debreu state price φ
given in Cφ ), as the constraint of Eq. (2.21) shows. Moreover, (c1 − w1 ) ∈ hV i. These remarks
suggest to define the following “min-max” Arrow-Debreu state price:
This is indeed the case, given some regularity conditions. For the characterization of φ∗ , suppose
there exists φ̂ : Val (PI ) = Val(Pφ̂ ). Then, φ̂ = φ∗ . Indeed, suppose the contrary, i.e. there exists
φ0 : Val(Pφ0 ) < Val(Pφ̂ ). Then, we would have,
a contradiction. Note, again, this is a characterization result about φ∗ , not an existence proof.
But as mentioned earlier, Eq. (2.22) holds true, as shown in a dynamic setting by He and
Pearson (1991). Chapter 4 provides general guidance about an even more general approach to
solving problems of this kind, arising in a broader context of market imperfections, including
incomplete markets as a special case.
53
2.7. Consumption-CAPM c
°by A. Mele
We see that limx→0 z(x) = ∞, limx→∞ z(x) = 0 and z 0 (x) < 0. Therefore, there exists a unique
solution for λj : £ ¤
λj ≡ Λj w0j + E(m · wj1 ) ,
where Λ(·) denotes the inverse function of z. By replacing back into Eqs. (2.23), we obtain:
¡ ¡ ¢¢ ¡ ¡ ¢¢
ĉ0j = Ij Λj w0j + E(m · wj1 ) , ĉsj = Hj β −1j ms Λj w0j + E(m · wj )
1
.
It remains to compute the general equilibrium. The kernel m must be determined. This means
that we have d unknowns (ms , s = 1, · · ·, d). We have d + 1 equilibrium conditions (holding in
the d + 1 markets). By Walras’ law, only d of these are independent. Consider the equilibrium
conditions in the d markets at the second period:
³ ´ Xn
¡ ¡ ¢¢ Xn
gs ms ; (ms0 )s0 6=s ≡ Hj β −1
j m Λ
s j w 0j + E(m · wj
1
) = wsj ≡ ws , s = 1, · · ·, d.
j=1 j=1
2.7 Consumption-CAPM
Consider the pricing equation (2.19). It states that for every asset with gross return R̃ ≡
S −1 · payoff,
1 = E(m · R̃), (2.24)
where m is some pricing kernel.
In the previous section, we learnt that in a complete markets economy, equilibrium leads to
the following identification of the pricing kernel,
ν 0 (ws )
ms = β .
u0 (w0 )
For a riskless asset, 1 = E(m · R). By combining this equality with Eq. (2.24), leaves E[m ·
(R̃ − R)] = 0. By rearranging terms,
cov(ν 0 (w+ ), R̃)
E(R̃) = R − . (2.25)
E [ν 0 (w+ )]
54
2.7. Consumption-CAPM c
°by A. Mele
If R̃p is perfectly correlated with m, i.e. if there exists γ : R̃p = −γm, then
cov(R̃p , R̃)
β R̃,m = −γ and β R̃p ,m = −γ
var(R̃p )
and then
E(R̃) − R = β R̃,R̃p [E(R̃p ) − R] [CAPM].
This is not the only way the CAPM obtains. As we shall explain in Chapter 6, the CAPM also
obtains through the so-called “maximum correlation portfolio,” which is the portfolio that is
the most highly correlated with the pricing kernel m.
55
2.8. Infinite horizon c
°by A. Mele
The previous relation holds in a two-period economy. In a multiperiod economy, in the second
period (as in the following periods) agents save indefinitively for the future. In the appendix,
we show that, "∞ #
X ¡ ¢
0=E m0,t · pt ct − wt , (2.28)
t=0
where m0,t are the state prices. From the perspective of time 0, at time t there exist dt states
of nature and, thus, dt possible prices.
X
n X
n X
n
0= e0j (p̂, Ŝ), 0= e1j (p̂, Ŝ), 0= θj (p̂, Ŝ),
j=1 j=1 j=1
where the previous functions are the results of optimal plans of the agents. This system has
m · (d + 1) + a equations and m · (d + 1) + a unknowns, where a ≤ d. Let us aggregate the
constraints of the agents,
X
n X
n X
n X
n
p0 e0j = −S θj , p1 ¤ e1j = B θj .
j=1 j=1 j=1 j=1
Pn
Suppose the financial markets clearing condition is satisfied, i.e. j=1 θj = 0. Then,
⎧ Pn P
m
⎪
⎪ 0 = p e ≡ p e =
( ) ( )
p0 e0
⎨ 0 0j 0 0
j=1 =1
∙m ¸>
⎪
⎪ Pn P () P
m
⎩ 0d = p1 ¤ e1j ≡ p1 ¤e1 =
( ) ( ) ( )
p1 (ω 1 )e1 (ω 1 ), · · ·, p1 (ω d )e1 (ω d )
j=1 =1 =1
Therefore, there is one redundant equation for each state of nature, or d + 1 redundant
equations, in total. As a result, the equilibrium has less independent equations (m · (d + 1) − 1)
than unknowns (m·(d+1)+d), i.e., an indeterminacy degree equal to d+1. This result does not
56
2.9. Further topics on incomplete markets c
°by A. Mele
rely on whether markets are complete or not. In a sense, it is even not an indeterminacy result
when markets are complete, as we may always assume agents would organize the exchanges
at the beginning. In this case, onle the suitably normalized Arrow-Debreu state prices would
matter for agents.
The previous indeterminacy can be reduced to d−1, as we may use two additional homogene-
ity relations. To pin down these relations, let us consider the budget constaint of each agent
j,
p0 e0j = −Sθj , p1 ¤e1j = Bθj .
The first-period constraint is still the same if we multiply the spot price vector p0 and the
financial price vector S by a positive constant, λ (say). In other words, if (p̂0 , p̂1 , Ŝ) is an equi-
librium, then, (λp̂0 , p̂1 , λŜ) is also an equilibrium, which delivers a first homogeneity relation.
To derive the second homogeneity relation, we multiply the spot prices of the second period by
a positive constant, λ and increase at the same time the first period agents’ purchasing power,
by dividing each asset price by the same constant, as follows:
S
p0 e0j = − λθj , λp1 ¤e1j = Bλθj .
λ
³ ´
Therefore, if (p̂0 , p̂1 , Ŝ) is an equilibrium, then, p̂0 , λp̂1 , Ŝλ is also an equilibrium.
where As = [A1s , · · ·, Aas ] is the m × a matrix of the real payoffs. The previous constraint
now reveals how to “recover” d + 1 homogeneity relations. For each strictly positive vector
λ = [λ0 , λ1 · ··, λd ], we have that if [p̂0 , S, p1 (ω1 ), · · ·, p1 (ω s ), · · ·, p1 (ω d )] is an equilibrium, then,
[λ0 p̂0 , λ0 S, p1 (ω1 ), · · ·, p1 (ω s ), · · ·, p1 (ω d )] is also an equilibrium, and so is
[p̂0 , S, p1 (ω 1 ), · · ·, λs p1 (ω s ), · · ·, p1 (ω d )], for λs , s = 1, · · ·, d.
As is clear, the distinction between nominal and real assets has a precise meaning, when
one considers a multi-commodity economy. Even in this case, however, such a distinctions is
not very interesting without a suitable introduction of a unité de compte. These considerations
led Magill and Quinzii (1992) to solve the indetermincay while still remaining in a framework
with nominal assets. They simply propose to introduce money as a mean of exchange. The
indeterminacy can then be resolved by “fixing” the prices via the d + 1 equations defining the
money market equilibrium in all states of nature:
X
n
Ms = ps · wsj , s = 0, 1, · · ·, d.
j=1
Magill and Quinzii showed that the monetary policy (Ms )ds=0 is generically nonneutral.
57
2.10. Appendix 1 c
°by A. Mele
2.10 Appendix 1
In this appendix we prove that the program [P1] has a unique maximum. Indeed, suppose on the
contrary that we have two maxima:
¡ ¢
c̄ = (c̄1j , · · ·, c̄mj ) and c = c1j , · · ·, cmj .
P Pm
These two maxima would satisfy uj (c̄) = uj (c̄),P with mi=1 pi c̄ij = i=1 pi c̄ij = Rj . To check that this
m
claim is correct, suppose on the contrary that i=1 pi c̄ij < Rj . Then, the consumption bundle,
¡ ¢
c = c1j + ε, · · ·, cmj , ε > 0,
would be preferred to c, by Assumption 2.1, and, at the same time, it would hold that, for sufficiently
small ε,
Xm m
X
pi c̄ij = εp1 + pi cij < Rj .
i=1 i=1
Pm
[Indeed, we have, A ≡ i=1 pi cij . A < Rj ⇒ ∃ε > 0 : A + εp1 < Rj . E.g., εp1 = Rj − A − η, η > 0. The
condition is then: ∃η > 0 : Rj − A > η.] Hence, c would be a solution to [P1], thereby contradicting
the optimality of c. Therefore, the existence of two optima would imply a full use of resources. Next,
consider a point y lying between c̄ and c, viz y = αc̄ + (1 − α)c, α ∈ (0, 1). By Assumption 2.1,
¡ ¢
uj (y) = uj αc̄ + (1 − α)c > uj (c̄) = uj (c).
Moreover,
m
X m
X m
X
¡ ¢ Pm Pm
pi yi = pi αc̄ij + (1 − α)cij = α pi c̄ij + i=1 cij −α i=1 cij = αRj + Rj − αRj = Rj.
i=1 i=1 i=1
Hence, y ∈ Bj (p) and is also strictly preferred to c̄ and c, which means that c̄ and c are not optima,
as initially conjectured. This establishes uniqueness of the solution to [P1].
58
2.11. Appendix 2: Proofs of selected results c
°by A. Mele
n
X n
X
> j >
p c̄ < p cj . (2A.1)
j=1 j=1
P
Next we show that p > 0. Let c̄i = nj=1 c̄ij , i = 1, · · ·, m, and partition c̄ = (c̄1 , · · ·, c̄m ). Let us apply
the inequality in (2A.1) to c̄ ∈ A and, for μ > 0, to c = (c̄1 + μ, · · ·, c̄m ) ∈ B. We have p1 μ > 0, or
p1 > 0. By reiterating the argument, pi > 0 for all i. Finally, we choose cj = c̄j + 1m n , j = 2, · · ·, n,
> 0 in (2A.1), p> c̄1 < p> c1 + p> 1m or,
for sufficiently small. This means that u1 (c1 ) > u1 (c̄1 ) ⇒ p> c1 > p> c̄1 . This means that c̄1 =
arg maxc1 u1 (c1 ) s.t. p> c1 = p> c̄1 . By symmetry, c̄j = arg maxcj uj (cj ) s.t. p> cj = p> c̄j for all j. k
Proof of Theorem 2.10. The condition in (2.12) holds for any compact subset of Rd+1
+ , and
d+1
therefore it holds when it is restricted to the unit simplex in R+ ,
T
hW i S d = {0} .
By the Minkowski’s separation theorem, ∃φ̃ ∈ Rd+1 : w> φ̃ ≤ d1 < d2 ≤ σ > φ̃, w ∈ hW i, σ ∈ S d .
By walking along the simplex boundaries, one finds that d1 < φ̃s , s = 1, · · ·, d. On the other hand,
59
2.11. Appendix 2: Proofs of selected results c
°by A. Mele
(2)( )
Proof of Eq. (2.28). Let Ss0 ,s be the price at t = 2 in state s0 if the state in t = 1 was s, for the
(2) (2)(1) (2)(m)
Arrow security promising 1 unit of numéraire in state at t = 3. Let Ss0 ,s = [Ss0 ,s , · · ·, Ss0 ,s ]. Let
(1)(s)
θi be the quantity purchased at t = 1 in state i of Arrow securities promising 1 unit of numéraire
if s at t = 2. Let p2s,i be the price of the good at t = 2 in state s if the previous state at t = 1 was i.
(1)(i) (2)( ) (1) (2)
Let S (0)(i) and Ss correspond to Ss0 ,s ; S (0) and Ss correspond to Ss0 ,s .
The budget constraint is
⎧ m
⎪
⎪ X
⎪ (0) (0)
S (0)(i) θ(0)(i)
⎨ p0 (c0 − w0 ) = −S θ = −
⎪
i=1
m
X
⎪
⎪ ¡ 1 ¢ (1) (1) (1)(i) (1)(i)
⎪ 1 1 (0)(s) (0)(s)
⎩ ps cs − ws = θ
⎪ − Ss θs = θ − Ss θs , s = 1, · · ·, d.
i=1
(1)(i)
where Ss is the price to be paid at time 1 and in state s, for an Arrow security giving 1 unit of
numéraire if the state at time 2 is i.
By replacing the second equation of (3.9) in the first one:
m
X h ¡ ¢ i
(1) (1)
p0 (c0 − w0 ) = − S (0)(i) p1i c1i − wi1 + Si θi
i=1
⇐⇒
m
X m
¡ 1 ¢ X (1) (1)
0 = p0 (c0 − w0 ) + S (0)(i) p1i 1
ci − wi + S (0)(i) Si θi
i=1 i=1
m
X m m
¡ ¢ X X (1)(j) (1)(j)
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si θi
i=1 i=1 i=1
m
X m
m X
¡ ¢ X (1)(j) (1)(j)
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si θi
i=1 i=1 j=1
60
2.11. Appendix 2: Proofs of selected results c
°by A. Mele
At time 2,
m
X
¡ ¢ (1)(s) (2) (2) (1)(s) (2)( ) (2)( )
p2s,i c2s,i − ws,i
2
= θi − Ss,i θs,i = θi − Ss,i θs,i , s = 1, · · ·, d.
=1
(2)
Here Ss,i is the price vector, to be paid at time 2 in state s if the previous state was i, for the Arrow
securities expiring at time 3. The other symbols have a similar interpretation.
By plugging (???) into (???),
P
m ¡ ¢ Pm P
m h ¡ ¢ i
(1)(j) (2) (2)
0 = p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si p2j,i c2j,i − wj,i
2
+ Sj,i θj,i
i=1 i=1 j=1
Pm ¡ ¢ Pm P m
(1)(j) 2
= p0 (c0 − w0 ) + S (0)(i) p1i c1i − wi1 + S (0)(i) Si pj,i (c2j,i 2
− wj,i )
i=1 i=1 j=1
P
m P
m P
m
(1)(j) (2)( ) (2)( )
+ S (0)(i) Si Sj,i θj,i .
i=1 j=1 =1
In the absence of arbitrage opportunities, ∃φt+1,s0 ∈ Rd++ - the state prices vector for t + 1 if the
state in t is s0 - such that:
(t)( )
Ss0 ,s = φ0t+1,s0 · e , = 1, · · ·, m,
where e ∈ Rd+ and has all zeros except in the -th component which is 1. Next, we restate the
( )
previous relation in terms of the kernel mt+1,s0 = (mt+1,s0 )d=1 and the probability distribution Pt+1,s0 =
( )
(Pt+1,s0 )d=1 of the events in t + 1 when the state in t is s0 :
(t)( ) ( ) ( )
Ss0 ,s = mt+1,s0 · Pt+1,s0 , = 1, · · ·, m.
61
2.12. Appendix 3: The multicommodity case c
°by A. Mele
where ⎡ ⎤
e1 (ω 1 ) 0 ··· 0
⎢ 1×m2 1×m2 1×m2 ⎥
⎢ 0 e1 (ω2 ) · · · 0 ⎥
⎢ ⎥
E1 =⎢
⎢
1×m2 1×m2 1×m2 ⎥
⎥
d×d·m2 ⎢ ⎥
⎣ ⎦
0 0 · · · e1 (ωd )
1×m2 1×m2 1×m2
is the matrix of excess demands, p1 = (p1 (ω 1 ), · · ·, p1 (ωd )) is the matrix of spot prices, and
m2 ×1 m2 ×1
⎡ ⎤
v1 (ω 1 ) va (ω 1 )
⎢ .. ⎥
B =⎣ . ⎦
d×a
v1 (ω d ) va (ω d )
is the payoffs matrix. We can rewrite the second period constraint as p1 ¤e1j = B · θj , where e1j is
defined similarly as e0j , and p1 ¤e1j ≡ (p1 (ω 1 )e1j (ω 1 ), · · ·, p1 (ωd )e1j (ω d ))0 . The budget constraints are
then,
p0 e0j = −Sθj , p1 ¤e1j = Bθj .
Now suppose that markets are complete, i.e., a = d and B can be inverted. The second constraint
is then: θj = B −1 p1 ¤e1j . Consider without loss of generality Arrow securities, or B = I. We have
θj = p1 ¤e1j , and by replacing into the first constraint,
0 = p0 e0j + Sθj
= p0 e0j + Sp1 ¤e1j
= p0 e0j + S · (p1 (ω 1 )e1j (ω1 ), · · ·, p1 (ωd )e1j (ω d ))0
Pd
= p0 e0j + Si · p1 (ω i )e1j (ω i )
i=1
P1
m
(h) (h) P
d P2
m
( )
= p0 e0j + Si · p1 (ωi )e1j (ω i )
h=1 i=1 =1
P1
m
(h) (h) Pd mP2 ( )
= p0 e0j + p̂1 (ω i )e1j (ω i )
h=1 i=1 =1
62
2.12. Appendix 3: The multicommodity case c
°by A. Mele
( ) ( )
where p̃1 (ω i ) ≡ Si · p1 (ω i ). The price to be paid today for the obtention of a good in state i is equal
( )
to the price of an Arrow asset written for state i multiplied by the spot price p̃1 (ω i ) of this good in this
( )
state; here the Arrow-Debreu state price is p̃1 (ω i ). The general equilibrium can be analyzed by making
reference to such state prices. From now on, we simplify and set m1 = m2 ≡ m. Then we are left with
(1) (m) (1) (m)
determining m(d + 1) equilibrium prices, i.e. p0 = (p0 , · · ·, p0 ), p̃1 (ω 1 ) = (p̃1 (ω 1 ), · · ·, p̃1 (ω1 )),
(1) (m)
· · ·, p̃1 (ω d ) = (p̃1 (ω d ), · · ·, p̃1 (ω d )). By exactly the same arguments of the previous chapter, there
exists one degree of indeterminacy. Therefore, there are only m(d + 1) − 1 relations that can determine
the m(d + 1) prices. (Price normalization can be done by letting one of the first period commodities
be the numéraire.) On the other hand, in the initial economy we have to determine m(d + 1) + d prices
m·(d+1)
(p̂, Ŝ) ∈ R++ × Rd++ which are the solution to the system:
P
n P
n P
n
e0j (p̂, Ŝ) = 0, e1j (p̂, Ŝ) = 0, θj (p̂, Ŝ) = 0,
j=1 j=1 j=1
where the previous functions are obtained as solutions to the agents’ programs. When we solve for
Arrow-Debreu prices, in a second step we have to determine m(d + 1) + d prices starting from the
knowledge of m(d + 1) − 1 relations defining the Arrow-Debreu prices, which implies a price inde-
terminacy of the initial economy equal to d + 1. In fact, it is possible to show that the degree of
indeterminacy is only d − 1.
63
2.12. Appendix 3: The multicommodity case c
°by A. Mele
References
Arrow, K. J. (1953): “Le rôle des valeurs boursières pour la répartitition la meilleure des
risques.” Econométrie 41-48. CNRS, Paris. Translated and reprinted in 1964: “The Role
of Securities in the Optimal Allocation of Risk-Bearing.” Review of Economic Studies 31,
91-96.
Debreu, G. (1954): “Valuation Equilibrium and Pareto Optimum.” Proceedings of the National
Academy of Sciences 40, 588-592.
Duffie, D. (2001): Dynamic Asset Pricing Theory. Princeton: Princeton University Press.
Hart, O. (1974): “On the Existence of Equilibrium in a Securities Model.” Journal of Economic
Theory 9, 293-311.
He, H. and N. Pearson (1991): “Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Infinite Dimensional Case.” Journal of Economic Theory
54, 259-304.
64
3
Infinite horizon economies
3.1 Introduction
We study asset prices in multiperiod economies, where agents either live forever, and have access
to a set of complete markets, or belong to overlapping generations. We consider models without
and with production, without and with money, and develop the fundamental tools we need in
subsequent chapters, to analyze financial frictions, bubbles and sunspots in capital markets.
By replacing the wealth constraint into the maximand, it is easily checked that the first-order
condition for c leads to, u0 (ct ) = βV 0 (wt+1 )Rt+1 . Therefore, the consumption policy is a function
of both wealth and the interest rate, which for sake of simplicity we denote as c (wt ). The value
function and the first-order condition, then, can be written as:
V (wt ) = u (c(wt )) + βV ((wt − c (wt )) Rt+1 ) , u0 (c (wt )) = βV 0 ((wt − c (wt )) Rt+1 ) Rt+1 .
Therefore, V 0 (wt+1 ) = u0 (c (wt+1 )) too, and by substituting back into the first-order condition,
u0 (c (wt+1 )) 1
β 0
= . (3.2)
u (c (wt )) Rt+1
The economic intuition underlying Eq. (8.7) is the same as that we saw in the two-period
economy analyzed in Chapter 2. Eq. (8.7) says that the present consumption I give up, at t,
to obtain addition consumption at t + 1, has to equal a pure discount bond issued at t and
expiring the next period, along an optimal consumption path.
We can arrive at the very same conclusions, following an alternative approach, based on
Lagrange multipliers. This approach is useful when dealing with more intricate issues relating
to production economies or economies with financial frictions, as we shall see in this and further
chapters. So consider the constraint in program [3.P1]. Savings at time t are savt ≡ wt − ct .
Using this definition, the constraint in [3.P1] is: ct+1 + savt+1 = Rt+1 savt , with sav−1 = w0 ,
given. Let λt be a sequence of Lagrange multipliers associated to these constraints. Consider
the program,
X
∞
£ t ¤
L (sav−1 ) ≡ max β u (ct ) − λt (ct + savt − Rt savt−1 ) ,
(ct ,savt )∞
t=0
t=0
where λt is a sequence of Lagrange multipliers. The first-order condition for consumption ct is,
β t u0 (ct ) = λt , and the first-order condition for savings savt leads to: λt = λt+1 Rt+1 . Putting all
together yields precisely Eq. (8.7). Note that the same program can be cast, and solved, in a
recursive format,
The first-order condition for consumption and savings are u0 (ct ) = λt and λt = βL0 (savt ),
respectively. By replacing the first-order condition for λt , i.e. the budget constraint, and differ-
entiating L (savt−1 ), leaves L0 (savt−1 ) = βL0 (savt ) Rt . These conditions lead to Eq. (8.7).
As a simple example, consider the case of a logarithmic utility function, u (c) = ln c. Let us
guess that the value function is V (wt ) ≡ V (wt ; Rt ) = at + b ln wt . The first-order condition
then yields c (w) = b−1 w. By Eq. (8.7), then, wt+1 = βwt Rt+1 . Comparing the right hand
side of this equation with the right hand side of the constraint in the program [3.P1], leaves
c (wt ) = (1 − β) wt ; in other terms, b = (1 − β)−1 .1
Next, we introduce uncertainty.
1 To pin down the coefficient series a , use the definition of the value function, V (w ; R ) ≡ u (c (w )) + βV (w
t t t t t+1 ; Rt+1 ). By
β
plugging V (w, Rt ) = at + b log w and c (w) = (1 − β)−1 w into this definition leaves, at = ln (1 − β) + βat+1 + 1−β ln (βRt+1 ). If
β
R is constant, at is also constant, and equal to (ln (1 − β) + 1−β
ln (βR))/ (1 − β).
66
3.2. Consumption-based asset evaluation c
°by A. Mele
We consider markets for m “trees,” and assume that the only source of risk stems from the
dividends related to these trees: D = (D1 , · · ·, Dm ). We assume D is a Markov process and
denote its conditional distribution function with P (Dt+1 | Dt ). A representative agent solves
the following program:
" ∞ ¯ #
X ¯
i ¯
V (θt ) = max ∞ E β u(ct+i )¯ Ft
(ct+i ,θt+i )i=0 ¯ [3.P2]
i=0
s.t. ct + St θt+1 = (St + Dt ) θt
where θt+1 ∈ Rm is Ft -measurable, that is, θt+1 needs to be chosen at time t. We can solve the
program [3.P2], using the same recursive approach in Section 3.2.1, once due account is made
of uncertainty. The Bellman’s equation is:
V (θt , Dt ) = max E [u(ct ) + βV (θt+1 , Dt+1 )| Ft ] s.t. ct + St θt+1 = (St + Dt ) θt .
ct ,θt+1
Similarly as we did for Eq. (3.1), let us replace the budget constraint into the maximand. The
following first-order condition holds for θi :
0 = E [−u0 ((St + Dt ) θt − St θt+1 ) Si,t + βV1i (θt+1 , Dt+1 )] , (3.4)
where the subscript in the value function on the right hand side denotes a partial derivative:
V1i (θ, D) = ∂ (θ, D) /∂θi . The optimal policy, θt+1 is a function of the current state, (θt , Dt ),
say θt+1 = T (θt , Dt ). By differentiating the value function with respect to θi , and using the
previous first-order condition, leaves:
" Ã ! #
P
m P
m
V1i (θt , Dt ) = E u0 (ct ) Si,t + Di,t − Sj,t T1ji (θt , Dt ) + β V1i (θt+1 , Dt+1 ) T1ji (θt , Dt )
j=1 j=1
0
= u (ct ) (Si,t + Di,t ) ,
67
3.2. Consumption-based asset evaluation c
°by A. Mele
where we have defined T1ji (θt , Dt ) = ∂Ti (θ, D) /∂θj and Ti is the i-th component of the vector
T . Substituting this result into Eq. (3.4) yields precisely the Lucas equation (3.3), holding for
each asset i: ∙ ¸
0 0 Si,t+1 + Di,t+1
u (ct ) = βE u (ct+1 ) . (3.5)
Si,t
3.2.3.2 Rational expectations equilibrium
(0)
The asset market clears when for each t, θt = 1m and θt = 0, where θ(0) denotes the amount
of
Pmthe riskless asset. By the budget constraint, then, the market for goods also∞clears, ct =
i=1 Dit ≡ D̄t . A rational expectation equilibrium is a sequence of asset prices (St )t=0 such that
the optimality condition in Eq. (3.5) holds, the markets clear, ct = D̄t , and each asset price is
a function of the state, Si,t = Si (Dt ) say. All in all,
Z
¡ ¢
u (D̄t )Si (Dt ) = β u0 (D̄t+1 ) Si (D̄t+1 ) + Di,t+1 dP (Dt+1 | Dt ) .
0
(3.6)
This is a functional equation in Si (·). Let us focus, first, on the IID case: P (Dt+1 | Dt ) =
P (Dt+1 ).
IID shocks
Note that the right hand side of this equation is independent of D. Therefore, u0 (D̄t )Si (Dt )
equals some constant κ (say), which we can easily find by substituting it back into the previous
equation, leaving: Z
β
κi = u0 (D̄t+1 )Di,t+1 dP (Dt+1 ) .
1−β
The solution for Si (D) is then:
κi
Si (Dt ) = 0 .
u (D̄t )
00
Note, the elasticity of the price to dividend equals − uu0 ((D̄)
D̄)
Di , which collapses to relative risk-
aversion, once we assume only one tree exists, as it is customary. For example, if relative
risk-aversion is constant and equal to η,
Z
η β
S(Dt ) = κ · Dt , κ ≡ D1−η dP (D) .
1−β
Figure 3.1 depicts the behavior of the asset price function S (D), under the assumption that
κ is not increasing in η.
Only when the representative agents are risk-neutral, η = 0, does the asset price collapse to
the constant β(1 − β)−1 E(D).
Dependent shocks
R
Define gi (D) ≡ u0 (D̄)Si (D) and hi (D) ≡ β u0 (D̄t+1 )Di,t+1 dP (Dt+1 | D). In terms of these new
functions, Eq. (3.6) is:
Z
gi (D) = hi (D) + β gi (Dt+1 ) dP (Dt+1 | D) .
68
3.2. Consumption-based asset evaluation c
°by A. Mele
S(Dt)
0<η<1
β(1−β)−1
η =1
η>1
Dt
1
FIGURE 3.1. The asset pricing function S (Dt ) in the IID case and constant relative risk-aversion,
equal to η.
It is a functional equation in gi , which we can show it admits a unique solution, under the
conditions contained in the celebrated Blackwell’s theorem below:
Theorem 3.1. Let B(X) the Banach space of continuous bounded real functions on X ⊆ Rn
endowed with the norm kf k = supX |f|, f ∈ B(X). Introduce an operator T : B(X) 7→ B(X)
with the following properties:
(i) T is monotone: ∀x ∈ X and f1 , f2 ∈ B(X), f1 (x) ≤ f2 (x) ⇐⇒ T [f1 ] (x) ≤ T [f2 ] (x);
(ii) ∀x ∈ X and c ≥ 0, ∃β ∈ (0, 1) : T [f + c] (x) ≤ T [f ] (x) + βc.
Then, T is a β-contraction and, ∀f0 ∈ B(X), it has a unique fixed point limτ →∞ T τ [f0 ] = f =
T [f].
The existence of gi and, hence, Si , relies on the existence of a fixed point of T : gi = T [gi ].
It is easily checked that conditions (i) and (ii) in Theorem 3.1 hold here. To establish that
T : B(D) 7→ B(D) as well, it is sufficient to show that hi ∈ B(D). A sufficient condition given
by Lucas (1978) is that u is bounded, and bounded away by a constant ū.2
2 In this case, concavity of u implies that for each D, 0 = u (0) ≤ u (D) + u0 (D) (−D) ≤ ū − Du0 (D), which implies that
for each D, Du0 (D) ≤ ū and, hence, hi (D) ≤ β ū. Then, it is possible to show that the solution is in B(D), which implies that
T : B(D) 7→ B(D).
69
3.3. Production: foundational issues c
°by A. Mele
By using the same arguments as those in Section 2.6 of the previous chapter, we can show that
the Radon-Nikodym derivative of the risk-neutral probability, P ∗ , with respect to P , is:
dP ∗ u0 (Dt+1 )
(Dt+1 | Dt ) = .
dP E [u0 (Dt+1 | Dt )]
It is the price to pay, in state Dt , to obtain one unit of the good the next period in state Dt+1 .
Finally, define the gross return R̃ as, R̃t+1 ≡ St+1S+Dt
t+1
. Then, all the considerations made in
Section 2.7 of the previous chapter, are also valid here.
The Nt consumers live forever. We assume each consumer offers inelastically one unit of labor,
and that, for now, that N0 = 1 and n = 0. The resource constraint for the consumer is:
ct + st = Rt st−1 + wt Nt , Nt ≡ 1, t = 1, 2, · · ·. (3.7)
At each time t − 1, the consumer saves st−1 units of capital, which he lends to the firm. At time
t, the consumer receives the gross return on savings from the firm, Rt st−1 , where Rt = y 0 (kt ),
plus the wage receipts wt Nt . Then, he uses these resources to consume ct and lend st to the
firm. At time zero,
c0 + s0 = V0 ≡ Y1 (K0 , N0 )K0 + w0 N0 , N0 ≡ 1.
70
3.3. Production: foundational issues c
°by A. Mele
Following the approach developed in Chapter 2, we can write down a single budget constraint,
obtained iterating Eq. (3.7):
XT
ct − wt Nt sT
0 = c0 + Qt + QT − V0 ,
t=1 i=1 Ri i=1 Ri
Y
T
lim sT Ri−1 = 0, (3.8)
T →∞
i=1
so as to have:
X
∞ X∞
ct − wt Nt
t
max β u(ct ), s.t. V0 = c0 + Qt . [3.P3]
(ct )∞
t=0
t=1 t=1 i=1 Ri
The economic interpretation of the transversality condition (4.17) is the following. The first-
order conditions of the program [3.P3] are:
1
β t u0 (ct ) = l Qt , (3.9)
i=1 Ri
where l is a Lagrange multiplier. In equilibrium, current savings equal next period capital, or
kt+1 = st . Therefore, Eq. (4.17) is:
That is, the economic value of capital is capital weighted by discounted marginal utility, which
needs to be zero, eventually.
The first-order condition (3.9) leads to the usual optimality condition in Eq. (8.7), where this
time, Rt+1 = y 0 (kt+1 ). In this economy, an equilibrium is a sequence ((ĉ, k̂)t )∞
t=0 satisfying
⎧
⎨ kt+1 = y (kt ) − ct
u0 (ct+1 ) 1 (3.11)
⎩ β 0 = 0
u (ct ) y (kt+1 )
and the transversality condition in Eq. (3.10). The first equation in this system is simply this:
capital available for producing the next period, kt+1 , is equal to savings, st ≡ y (kt ) − ct .
The program in [3.P4] is easily solved. By replacing the constraint into the utility func-
tion, and taking derivatives with respect to kt , leads directly to the second equation in (3.11).
Alternatively, let us introduce the Lagrangian,
X
∞
£ t ¤
L (k0 ) = max β u(ct ) − λt (kt+1 − y(kt ) + ct ) .
(ct ,kt+1 )∞
t=0
t=0
The first-order condition with respect to consumption is λt = β t u0 (ct ), and the condition for
capital is λt−1 = λt y 0 (kt ). Putting these conditions together, leads to the second equation in
(3.11). The same argument can be made, following a recursive approach. We have:
The first-order condition for consumption is λt = u0 (ct ), and that for capital is λt = βL0 (kt+1 ).
By replacing the first-order condition for λt (i.e., the constraint in program [3.P4]), and dif-
ferentiating with respect to kt , yields L0 (kt ) = βL0 (kt+1 ) y 0 (kt ). These three conditions lead,
again, to the second equation in (3.11).
Finally, consider the Bellman’s equation:
The first-order condition leads to, u0 (ct ) = βV 0 (y (kt ) − ct ). Let us denote the policy with
ct = c (kt ). In terms of the policy c function, the value function and the first-order conditions
are:
V (kt ) = u (c (kt )) + βV (y (kt ) − c (kt )) , u0 (c (kt )) = βV 0 (y (kt ) − c (kt )) .
By differentiating the value function:
By replacing back into the first-order condition, we obtain the second equation in (3.11).
3.3.3 Dynamics
We study the dynamics of the system in (3.11) in a small neighborhood of the stationary state,
defined as the pair (c, k), solution to:
1
c = y (k) − k, β= .
y0 (k)
A first-order expansion of each equation in (3.11) around its stationary state, yields the
following linear system:
µ ¶ µ ¶ Ã !
0
k̂t+1 k̂t y (k) −1
=A , A≡ 0 0 . (3.12)
ĉt+1 ĉt − uu00(c)
(c)
y 00 (k) 1 + β uu00(c)
(c)
y 00 (k)
The solution to this system is obtained with the tools reviewed in Appendix 1 of this chapter.
It is:
k̂t = v11 κ1 λt1 + v12 κ2 λt2 , ĉt = v21 κ1 λt1 + v22 κ2 λt2 , (3.13)
72
3.3. Production: foundational issues c
°by A. Mele
ct
c0 = c + (v21/v11) (k0 – k)
c
c = y(k) – k
c0
kt
k0 k k*
FIGURE 3.2.
¡ ¢
where: κi are constants that depend on the initial state, λi are the eigenvalues of A, and vv11 ,
¡v12 ¢ 21
v22
are the eigenvectors associated with λi . In Appendix 1, we show that λ1 ∈ (0, 1) and
λ2 > 1. The proof we provide in the appendix is important, as it illustrates precisely how the
neoclassical model reviewed in this section, needs to be modified to induce indeterminacy in
the dynamics of capital and consumption. A critical step in that proof relies on the assumption
of diminishing returns, i.e. y 00 (k) > 0.
Let us return to the equations in (3.13). First, we need to rule out an explosive behavior
of k̂t and ĉt , for otherwise we would contradict (i) that (c, k) is a stationary point, and (ii)
the optimality of the trajectories. Since λ2 > 1, the only possibility is to “lock” the initial
state (k̂0 , ĉ0 ) in such a way that κ2 = 0, which yields the following set of initial conditions:
k̂0 = v11 κ1 and ĉ0 = v21 κ1 , or k̂ĉ0 = vv21 11
.3 Therefore, the set of initial points that ensure a
0
non-explosive path must lie on the line c0 = c + vv21 11
(k0 − k). Since k is a predetermined variable,
there exists one, and only one, value of c0 , which ensures a non-explosive path of the system
around its steady state, as Figure 3.2 illustrates. In this figure, k∗ is defined as the solution of
1 = y 0 (k∗ ) ⇔ k∗ = (y 0 )−1 [1], and k = (y 0 )−1 [β −1 ].
The usual word of caution is in order. A linear approximation might turn out to be misleading.
We develop one example where the dynamics of the system could be quite different from those
analyze here, when we start away from the stationary state. Let y(k) = kγ , u(c) = ln c. It is
easy to show that the exact solution is:
Figure 3.3 depicts the nonlinear manifold associated with this system, and its linear approxi-
mation. For example, let β = 0.99 and γ = 0.3. Then, the (linear) saddlepath is, approximately,
3 In ĉ0 v21
fact, Appendix 1 shows that the converse is also true, i.e. = v11
⇒ κ2 = 0.
k̂0
73
3.3. Production: foundational issues c
°by A. Mele
ct
linear approximation
steady state
kt
FIGURE 3.3.
In its simplest version, real business cycle theory is an extension of the neoclassical model
of Section 3.3.3, in which random productivity shocks are added. The engine of fluctuations,
then, comes from the real sphere of the economy. This approach is in contrast with the Lucas
approach of the 1970s, based on information and money, where fluctuations arise due to infor-
mation delays with which agents discover the nature of a shock (real or monetary). As further
reviewed in Chapter 9, the Lucas information-theoretic approach has been, instead, more suc-
cessful in inspiring work on the formation of asset prices, leading to the development of market
microstructure theory and, more generally, to information driven explanations of asset prices.
Despite the remarkable switch in the economic motivation, the paradigm underlying real
business cycle theory is the same as the information-based approach of Lucas, as it relies on
rational expectations: macroeconomic fluctations and, then, as we shall explain, asset prices
fluctuations, stem from the optimal response of the agents vis-à-vis exogeneous shocks: agents
implement action plans that are state-contingent, i.e. they decide to consume, to work and to
invest according to the history of shocks as well as the present shocks they observe.
3.3.4.1 Basic model
We consider an economy with complete markets and no frictions, such that its equilibrium
allocations are Pareto-optimal. To characterize these allocations, we implement them through
the following program of a social planner:
"∞ #
X
V (k0 , s0 ) = max
∞
E β t u(ct ) , (3.14)
(ct )t=0
t=0
74
3.3. Production: foundational issues c
°by A. Mele
subject to a capital accumulation constraint, with capital depreciation. Let It denote new
investment. It is:
It = Kt+1 − (1 − δ) Kt . (3.15)
At time t − 1, the available productive capital is Kt . At time t, a portion δKt of this capital is
lost, due to depreciation. Therefore, at time t, the productive system is left with (1 − δ) Kt units
of capital. The capital available at time t, Kt+1 , equals the capital already in place, (1 − δ) Kt ,
plus new investments, which is exactly Eq. (3.15).
Next, normalize population normalized to one, such that Kt = kt . The goods market clearing
condition is:
ỹ (kt , t ) = ct + It ,
where ỹ(kt , st ) is the production function, which is Ft -measurable, and s is the source of
randomness–the engine for random fluctuations of the endogeneous variables. By replacing
Eq. (3.15) into the equilibrium condition,
kt+1 = ỹ (kt , t ) − ct + (1 − δ) kt . (3.16)
So the planner maximizes the utility in Eq. (3.14), under the capital accumulation constraint
in Eq. (3.16).
We assume that ỹ (kt , st ) ≡ st y (kt ), where y is as in Section 3.2, and (st )∞
t=0 is solution to:
4 A stochastic equilibrium is the situation where there is a stationary measure (definition: p(+) = π(+/−)dp(−), where π is
the transition measure) generating (ct , kt )∞
t=1 .
75
3.3. Production: foundational issues c
°by A. Mele
A solution is λ1 = ρ. By the same arguments produced for the deterministic case of Section
3.3.3 (see Appendix 1), one finds that λ2 ∈ (0, 1) and λ3 > 1.5 As for the deterministic case in
Section 3.3.3, we can diagonalize the system by rewriting Φ = P ΛP −1 , where Λ is a diagonal
matrix that has the eigenvalues of Φ on the diagonal, and P is a matrix of the eigenvectors
associated to the roots of Φ. The system in (3.19) is, then:
where ŷt ≡ P −1 ẑt and wt ≡ P −1 Rut . The third equation of this system is:
and ŷ3 explodes unless ŷ3t = 0 for all t, which is only possible when w3t = 0 for all t.6
The condition that ŷ3t ≡ 0 carries an interesting economic interpretation: it tells us that the
only sources of uncertainty in this system can stem from shocks to the fundamentals, or that
there can not be extraneous sources of noise, or “sunspots.” The reasons for this are easy to
explain. Let ŷt = P −1 ẑt ≡ Πẑt . We have:
Eq. (3.22) shows that the three state variables, k̂t , ĉt and ŝt , are are mutually linked through a
two-dimensional plane. This plane is the saddlepoint of the economy, where the state variables
do exhibit a stable behavior, and is formally defined as:
© ¯ ª
S = x ∈ R3 ¯ π 3. x = 0 , π 3. = (π 31 , π 32 , π 33 ).
Furthermore, Eq. (3.22) implies that a linear relation exists between the two expectational
errors:
π33
For all t, uct = − ust (“no-sunspots”). (3.23)
π32
Eq. (3.23) is a “no-sunspots” condition, as it says that the expectational error to consumption
can not be independent of the expectational shock on the fundamentals of the economy, which in
this simple economy relates to technological shock. In other words, the source of uncertainty we
have assumed in this economy, relates to the technological shock. The remaining expectational
errors can only be perfectly correlated to the expectational shock in technology or, there are
no sunspots.
The manifold S brings, mathematically, the same meaning as the stable relation depicted in
Figure 3.2, for the deterministic case. In this section, S is convergent subspace, with dim(S) = 2,
5 The linearized model in this section has state variables expressed in growth rates here. However, we can always reformulate this
model in terms of first differences, by pre- and post- multiplying Φ by appropriate normalizing matrices. As an example, if G i the
3 × 3 matrix that has k1 , 1c and 1s on its diagonal, (3.19) can be written as: E(zt+1 − z) = G−1 ΦG · (zt − z), where zt = (kt , ct , st ),
and we would arrive at the same conclusions. It is tedious but easy to check that the model in this section collapses to that in
Section 3.3.3, once we set t = 1, for each t, and s0 = 1.
6 In other words, Eq. (3.21) implies that ŷ −(T −t)
3t = λ3 Et (ŷ3,t+T ), and for all T . Because λ3 > 1, this relation holds only when
ŷ3t = 0 for all t.
76
3.3. Production: foundational issues c
°by A. Mele
which is the number of roots with modulus less than one. In other words, in this economy with
two predetermined variables, k̂0 and ŝ0 , there exists one, and only one, value of of ĉ0 in S, which
ensures stability, and is given by ĉ0 = − π31 k̂0π+π
32
33 ŝ0
. This reasoning generalizes that we made
for the deterministic case in Section 3.3.3, and is generalized further in Appendix 1.
The solution to the linearized model can be computed by generalizing the reasoning for the
deterministic case. First, by Eq. (3.20) ŷ is:
X
t−1
ŷit = λti ŷi0 + ζ it , ζ it ≡ λji wi,t−j ,
j=0
X
3 X
3 X
3
ẑt = P ŷt = (v1 v2 v3 )ŷt = vi ŷit = vi ŷi0 λti + vi ζ it .
i=1 i=1 i=1
To pin down the components of ŷ0 , note that ẑ0 = P ŷ0 ⇒ ŷ0 = P −1 ẑ0 ≡ Πẑ0 . The stability
(3)
condition then requires that the state variables be in S, or ŷ0 = 0, which we now use to
implement the solution. We have:
In the neoclassical model that we are analyzing, the equilibrium is determinate. As explained,
this property arises because the number of predetermined variables equals the dimension of the
convergent subspace of the economy. If we managed to increase the dimension of the converging
subspace, the equilibrium would be indeterminate, as further formalized in Appendix 1. As it
turns out, indeterminacy goes hand in hand with sunspots, the expectational shocks extraneous
to those in the economic fundamentals, as we discussed earlier, just after Eq. (3.23).
Introducing sunspots in macroeconomics has been an approach pursued in detail by Farmer
in a series of articles (see Farmer, 1998, for an introductory account of this approach). The
idea is quite interesting, as we know that the basic real business cycle model of this section
needs many extensions in order not to be rejected, empirically, as originally shown by Watson
(1993). In other words, the basic model in this section offers little room for a rich propagation
mechanism, as it entirely relies on impulses, the productivity shocks, which “we hardly read
about in the Wall Street Journal,” as provocatively put by King and Rebelo (1999). Sunspots
offer an interesting route to enrich the propagation mechanism, although their asset pricing
implications in terms of the model analyzed in this section, have not been explored yet.
In a series of articles, David Cass showed that a Pareto-optimal economy can not harbour
sunspots equilibria. On the other hand, any market imperfection has the potential to be a
source of sunspots. The typical example is the presence of incomplete markets. The neoclassical
model analyzed in this section can not generate sunspots, as it relies on a system of perfectly
competitive markets and absence of any sort of frictions. To introduce sunspots in the economy
77
3.4. Production-based asset pricing c
°by A. Mele
of this section, we need to think about some deviation from optimality. Two possibilities ana-
lyzed in the literature are the presence of imperfect competition and/or externality effects. We
provide an example of these effects, by working out the deterministic economy in Section 3.3.3.
(Generalizations to the stochastic economy in this section are easy, although more cumbersome.)
How is it that a deterministic economy might generate “stochastic outcomes,” that is, out-
comes driven by shocks entirely unrelated to the fundamentals of the economy? Let us imagine
this can be possible. Then, both optimal consumption and capital accumulation in Section
3.3.3 are necessarily random processes. The system in (3.12), then, must be rewritten in an
expectation format, µ ¶ µ ¶
k̂t+1 k̂t
Et =A .
ĉt+1 ĉt
Next, let us introduce the expectational error process uc,t ≡ ĉt − Et−1 (ĉt ), which we plug back
into the previous system, to obtain:
µ ¶ µ ¶ µ ¶
k̂t+1 k̂t 0
=A + .
ĉt+1 ĉt uc,t+1
Naturally, we still have λ1 ∈ (0, 1) and λ2 > 1, as in Section 3.3.3. Therefore, we decompose A
as P ΛP −1 , and have:
ŷt+1 = Λŷt + P −1 (0 uc,t+1 )> .
Moreover, for ŷ2t = λ−T
2 Et (ŷ2,t+T ) to hold for all T , we need to have ŷ2t = 0, for all t. Therefore,
the second element of the vector P −1 (0 uc,t+1 )> must be zero, or, for all t,
There is no room for expectational errors and, hence, sunspots, in this model. The fact that
λ2 > 1 implies the dimension of the saddlepoint is less than the number of predetermined
variables. So a viable route to pursue here, is to look for economies such that the saddlepoint
has a dimension larger than one, i.e. such that λ2 < 1. In these economies, indeterminancy
and sunspots will be two facets of the same coin. As shown in the appendix, the reasons for
which λ2 > 1 relate to the classical assumptions about the shape of the utility function u and
the production function y. We now modify the production function, to see the effect on the
eigenvalues of A.
[Economy with increasing returns]
[Asset pricing implications in further chapters]
where ỹ (Kt , Nt ) is the firm’s production at time t, obtained with capital Kt and labor Nt , and
subject to the same random productivity shocks as those in Section 3.3.4, wt is the real wage,
N (K) is the labor demand schedule, solution to the optimality condition, ỹN (Kt , N (Kt )) = wt
for all t, and pt is the real price of the investment goods, or uninstalled capital. Finally, the
adjustment-cost function satisfies φ ≥ 0, φ0 ≥ 0, φ00 ≥ 0. In words, capital adjustment is costly
when the adjustment is made fastly. Naturally, φ is zero in the absence of adjustment costs.
What is the value of the profit, from the perspective of time zero? This question can be
answered, by utilizing the Arrow-Debreu state prices introduced in Chapter 2. At time t, and
in state s, the profit Dt (s) (say) is worth,
φ0,t (s) D (Kt (s) , It (s)) = m0,t (s) Dt (Kt (s) , It (s)) P0,t (s) ,
We assume that in each period, the firm distributes all the profits it makes, and that for a given
capital K0 , it maximizes its cum-dividend value,
" ̰ !#
X
Vc (K0 ) = max ∞ D (K0 , I0 ) + E m0,t D (Kt , It ) ,
(Kt ,It−1 )t=1
t=1
where the expectation is taken with respect to the information set as of time t. The first-order
conditions for It lead to,
−DI (Kt , It ) = E [mt+1 Vc0 (Kt+1 )] . (3.26)
That is, along the optimal capital accumulation path, the marginal cost of new installed capital
at time t, −DI , must equal the expected marginal return on the investment, i.e. the expected
value of the marginal contribution of capital to the value of the firm at time t + 1, Vc0 (Kt+1 ).
By Eq. (3.26), optimal investment is a function I (Kt ), and the value of the firm satisfies,
Differentiating the value function in the previous equation, with respect to Kt , and using Eq.
(3.26), yields the following envelope condition:
Vc0 (Kt ) = DK (Kt , I (Kt )) + DI (Kt , I (Kt )) I 0 (Kt ) + E [mt+1 Vc0 (Kt+1 ) ((1 − δ) + I 0 (Kt ))]
= DK (Kt , I (Kt )) − (1 − δ) DI (Kt , I (Kt )) .
79
3.4. Production-based asset pricing c
°by A. Mele
By replacing this expression for the value function back into Eq. (3.26), leaves:
−DI (Kt , I (Kt )) = E [mt+1 (DK (Kt+1 , I (Kt+1 )) − (1 − δ) DI (Kt+1 , I (Kt+1 )))] . (3.27)
Along the optimal capital accumulation path, the marginal cost of new installed capital
at time t, which by Eq. (3.26) is the expected marginal return on the investment, equals the
expected value of (i) the very same marginal cost at time t+1, corrected for capital depreciation,
(1 − δ), and (ii) capital productivity, net of adjustment costs. Analytically,
µ µ ¶ ¶
0 ∂ It
DK (Kt , I (Kt )) ≡ ỹK (Kt , N (Kt )) − wt N (Kt ) − φ Kt ,
∂Kt Kt
µ ¶
0 It
−DI (Kt , I (Kt )) ≡ pt + φ .
Kt
We now proceed to introduce a fundamental concept in investment theory.
3.4.1.2 q theory
The Tobin’s marginal q is defined as the ratio of the expected marginal value of an additional
unit of capital over its replacement cost:
E [mt+1 Vc0 (Kt+1 )]
TQt ≡ Tobin’s marginal q ≡ .
pt
It is easy to see that the numerator, E [mt+1 Vc0 (Kt+1 )], is simply the shadow price of installed
capital. Consider the Lagrangian at time t,
The first-order condition for investment, It , is, qt = −DI (Kt , It ), and that for capital, Kt+1 ,
is qt = E (mt+1 L0 (Kt+1 )). By Eq. (3.26), then, L0 (Kt ) = Vc0 (Kt+1 ) and, therefore, qt is the
expected marginal return on the investment, that is, the shadow price of installed capital.
Therefore, Tobin’s marginal q is the ratio of the shadow price of installed capital to its replace-
ment cost:
qt
TQt = .
pt
Next, replace the first-order condition for qt , i.e. Eq. (3.24), into Eq. (3.28), differentiate L (Kt )
with respect to Kt , and use the first-order condition for Kt+1 , obtaining, L0 (Kt ) = DK (Kt , It )+
qt (1 − δ). These conditions imply that qt satisfies the valuation equation (3.27):
The shadow price of installed capital, qt , has to equal the marginal cost of new installed capital,
and is larger than the price of uninstalled capital, pt . It is natural: to install new capital requires
some (marginal) adjustment costs, which add to the “row” price of uninstalled capital, pt .
Therefore, in the presence of adjustment costs, Tobin’s marginal q is larger than one.
Eq. (3.29) can be solved forward, leaving:
"∞ #
X
qt = E (1 − δ)s−1 m0,t+s DK (Kt+s , It+s ) .
s=1
The shadow price of installed capital is worth the sum of all its future marginal net productivity,
discounted at the depreciation rate. Moreover, Eq. (3.30) can be inverted for It /Kt , to deliver:
It
= φ0−1 (qt − pt ) , (3.31)
Kt
where φ0−1 denotes the inverse of φ0 , and is increasing, since φ0 is increasing. Given Kt , and the
fact that Kt+1 is predetermined, the firm evaluates qt through Eq. (3.29), and then determines
the level of new investments through Eq. (3.31). These investments are increasing in the dif-
ference between the shadow price of installed capital, qt , and that of uninstalled capital, pt , as
originally assumed by Tobin (1969).
In the absence of adjustement costs, when qt = pt , Eq. (3.29) delivers the usual condition,
Empirically, however, the marginal productivity of capital, ỹK (Kt , N (Kt )), is not volatile
enough, to rationalize asset returns. Moreover, as we argue in a moment, Tobin’s marginal
q can be approximated by market-to-book ratios, which are typically time-varying. Therefore,
adjustment costs are important for asset pricing.
A difficulty with Tobin’s marginal q is that it is quite difficult to estimate. Yet in the special
case we are analyzing in this section, where firms act competitively and have access to an
homogeneous production function and adjustment costs, Tobin’s marginal q can be proxied by
the market-to-book ratio of a given firm. Let V (Kt ) denote the ex-dividend value of the firm,
which is its stock market value, since it nets out the dividend it pays to its holder in the current
period. It is:
V (Kt ) ≡ Vc (Kt ) − D (Kt , I (Kt )) = E [mt+1 Vc (Kt+1 )] .
The Tobin’s average q is defined as the ratio of the stock market value of the firm over the
replacement cost of the capital:
Stock Mkt Value of the Firm V (Kt )
Tobin’s average q ≡ = .
Replacement Cost of Capital pt Kt+1
The next result was originally obtained by Hayashi (1982) in a continuous-time setting.
Theorem 3.2. Tobin’s marginal q and average q coincide. That is, we have,
V (Kt ) = qt Kt+1 .
Proof. By the homogeneity properties of the production function and the adjustment costs,
where the second line follows by Eq. (3.24). By Eq. (3.27), and the law of iterated expectations,
"∞ # "∞ #
X X
E m0,t (DK (Kt , It ) − (1 − δ) DI (Kt , It )) Kt = −DI (K0 , I0 ) K1 −E m0,t Kt+1 DI (Kt , It ) .
t=1 t=1
This result, in conjunction with that in Eq. (3.30), provides a simple rule of thumb for
investement decisions. Consider, for example, the case of quadratic adjustment costs, where
φ (x) = 12 κ−1 x2 , for some κ > 0. Then, Eq. (3.31) is:
µ ¶
Stock Mkt Value of the Firm
It = κ (qt − pt ) Kt = κ − 1 pt Kt ,
Replacement Cost of Capital
where the second equality follows by Theorem 3.2. Thus, according to q theory, we expect firms
with a market value larger than the cost of reproducing their capital to grow, and firms which
are not worth the cost of reproducing their capital to shrink. This basic observation constitutes
a first assessment that we can use to assess developments of firms future.
3.4.2 Consumers
We now generalize the budget constraint obtained in the program [3.P3], to the uncertainty
case. We claim that in this case, the relevant budget constraint is,
"∞ #
X
V0 = c0 + E m0,t (ct − wt Nt ) . (3.32)
t=1
We now have two optimality conditions, one intertemporal and another, intratemporal:
u1 (ct+1 , Nt+1 ) u2 (ct , Nt )
mt+1 = β (inter temporal); wt = − (intratemporal).
u1 (ct , Nt ) u1 (ct , Nt )
3.4.3 Equilibrium
For all t, µ ¶
It
ỹ (Kt , Nt ) = ct + pt It + φ Kt . (3.33)
Kt
It is easily seen that the condition θt = 1 in the financial market, implies that ct = Dt + wt Nt ,
which, upon substitution of the profits in Eq. (3.25), delivers the equilibrium condition in Eq.
(3.33). Implicit in this reasoning, is the idea the adjustment costs are not paid to anyone. They
represent, so to speak, capital losses incurred along the way of growth.
We initially assume the population is constant, and made up of one young and one old. The
young agent maximizes his intertemporal utility subject to his budget constraint:
½
savt + c1t = w1t
max [u (c1t ) + βu (c2,t+1 )] subject to [3.P5]
(c1t ,c2,t+1 ) c2,t+1 = savt Rt+1 + w2,t+1
where w1t and w2,t+1 are the endowments the agent receives at his young and old age.
The agent born at time t − 1, then, faces the constraints: savt−1 + c1,t−1 = w1,t−1 and c2t =
savt−1 Rt + w2t . By combining his second period constraint with the first period constraint of
the agent born at time t,
savt = 0, (3.35)
P2
and implies that the goods market is also in equilibrium, in that wt = i=1 ci,t , and for all t.
Therefore, we can analyze the model, by just analyzing the autarkic equilibrium.
As Figure 3.4 illustrates, the first-order condition for the program [3.P5] requires that the
slope of the indifference curve be equal to the slope of the lifetime budget constraint, c2,t+1 =
−Rt+1 c1,t + Rt+1 w1t + w2,t+1 , and leads to:
u0 (c2,t+1 ) 1
β 0
= . (3.36)
u (c1,t ) Rt+1
83
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
c2,t+1
w2,t+1
c2,t+1 = − Rt+1 c1,t + Rt+1 w1t + w2,t+1
c1,t
w1,t
FIGURE 3.4.
The equilibrium, then, is a sequence of gross returns Rt satisfying Eqs. (3.34), (3.35) and (3.36),
or:
1 u0 (w2,t+1 )
bt ≡ =β 0 . (3.37)
Rt+1 u (w1t )
In this relation, bt is the shadow price of a bond issued at t, and promising one unit of numéraire
at t + 1: the sequence of prices, bt , satisfying Eq. (3.37), is such that agents are happy with not
being able to lend and borrow, intergenerationally.
The previous model is easy to extend to the case where agents are heterogeneous. The program
each agent j solves is, now:
½
£ ¤ savj,t + c1j,t = w1j,t
max uj (c1j,t ) + β j uj (c2j,t+1 ) subject to
(c1j,t ,c2j,t+1 ) c2j,t+1 = savj,t Rt+1 + w2j,t+1
with obvious notation. The first-order condition is, for all time t and agent j,
u0j (c2j,t+1 ) 1
βj 0
= ≡ bt ,
uj (c1j,t ) Rt+1
and the equilibrium is a sequence of bond prices bt satisfying the previous relation and the
equilibrium in the intrageneration lending market:
X
J
savj,t = 0, (3.38)
j=1
84
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
Suppose, next, that we introduce a tree, which yields a stochastic dividend Dt in each period.
Each agent solves the following program:
½
St θt + c1t = w1t
max [u (c1t ) + βE (u(c2,t+1 )| Ft )] subject to [3.P6]
(c1t ,c2,t+1 ) c2,t+1 = (St+1 + Dt+1 )θt + w2,t+1
where St denotes the asset price and θ the units of the asset the agent chooses in his young age.
The agent born at time t − 1 faces the constraints St−1 θt−1 + c1,t−1 = w1,t−1 and w2t + (St +
Dt )θt−1 = c2,t . By combining the second period constraint of the agent born at time t − 1 with
the first period constraint of the agent born at time t,
(St + Dt ) θt−1 − St θt + wt = c1,t + c2,t .
The clearing condition in the asset market, θt = 1, implies that the market for goods also clears,
for all t: Dt + w1t + w2t = c1,t + c2,t . A characterization of the solution to the program [3.P6]
can be obtained by eliminating c from the constraint,
max [u (w1t − St θ) + βE (u ((St+1 + Dt+1 ) θ)| Ft )] .
θ
The equilibrium is one where θt = 1, implying that (i) c1t = w1t − St and (ii) c2,t+1 = St+1 +
Dt+1 + w2,t+1 . Using (i) and (ii), the first-order condition for the program [3.P6] leads to:
u0 (w1t − St ) St = βE [u0 (St+1 + Dt+1 + w2,t+1 ) (St+1 + Dt+1 )| Ft ] .
Consider, for example, the case where u (c) = ln c, and set R̃t+1 = (St+1 + Dt+1 ) /St . We
have:
" ¯ #
1 1 ¯
¯
∗
= βE R̃t+1 ¯ Ft , where sav∗t ≡ St θt , θt = 1. (3.40)
w1t − savt ∗
savt R̃t+1 + w2,t+1 ¯
In a deterministic setting,
1 1
=β Rt+1 , where savt = 0, (3.41)
w1t − savt savt Rt+1 + w2,t+1
which leads to the equilibrium bond price in Eq. (3.39). Eqs. (3.40) and (3.41) are formally
equivalent. Their fundamental difference is that in the tree economy, savings have to stay
positive, as the tree must be held by the young agent, in equilibrium: sav∗t ≡ St ≥ 0. In an
economy without a tree, instead, the interest rate, Rt , has to be such that savings are zero for
all t, savt = 0.
Eq. (3.40) can be solved explicitly for the price of the tree, St , once we assume w2t = 0 for
all t. In the absence of a tree, we cannot assume endowments are zero in the old age, since
the autarkic economy in this case would be such that the old generation would not consume
anything. In the presence of a tree, instead, this assumption is innocuous, conceptually, as the
autarkic equilibrium in this case is such that the old generation could consume the fruits of the
tree, as well as the proceedings arising from selling the tree to the young generation. Solving
Eq. (3.40) for St when w2t = 0, then, leads to a price for the tree, equal to:
β
St = w1t .
1+β
85
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
absorb is that from the old generation, m̄t−1 , and that created by the “central bank,” μt m̄t−1 . One might consider an alternative
model in which transfers are made to old.
86
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
where now, we have set the real savings equal to a function of the interest rate, savt−1 ≡ sav (Rt ),
as it should be, by the solution to the program [3.P7].
Next, suppose that μt is independent of R, and that limt→∞ μt = μ, say, a constant. Eq.
(3.45) leads to two stationary equilibria:
(a) R = 1+n
1+μ
. This stationary equilibrium relates to the “golden rule,” once we set μ = 0, as
we shall
¡ 1+μsay
¢t in the next section. For μ 6= 0, the price is, in this stationary equilibrium,
pt = 1+n p0 . Then, we have: (i) pt = NMt ptt = NM0 p00 , and (ii) pm̄
m̄t t
t+1
= NM0 p00 1+n
1+μ
. All in all, the
agents’ budget constraints are bounded and the real value of money is strictly positive.
In this stationary equilibrium, agents “trust” money.
(b) Ra : sav (Ra ) = 0. This stationary equilibrium relates to an autarkic state. Generally, we
have that Ra < R: prices increase more rapidly than per-capita money stocks. Analyti-
cally, Ra < R ⇐⇒ pt+1 pt
> 1+μ
1+n
= M t+1 /Mt
Nt+1 /Nt
= m̄m̄t+1
t
⇐⇒ m̄pt+1t+1
< m̄ptt , whence limt→∞ m̄ptt → 0.
As for pm̄ t
t+1
, we have that pm̄ t
t+1
= m̄ptt Ra < m̄ptt R = m̄ptt 1+n
1+μ
, and since limt→∞ m̄ptt → 0, then
limt→∞ pm̄ t
t+1
→ 0. In this stationary equilibrium, agents do not “trust” money.
If sav(·) is differentiable and sav0 (·) 6= 0, the dynamics of (Rt )∞ t=0 can be studied through the
slope,
dRt+1 sav0 (Rt )Rt + sav(Rt ) 1 + μt
= . (3.46)
dRt sav0 (Rt+1 ) 1+n
There are three cases:
(i) sav0 (R) > 0. Gross substituability: the income effect is dominated by the substitution
effect.
(ii) sav0 (R) = 0. Income and substitution effects compensate each other.
(iii) sav0 (R) < 0. Complementarity: the income effect dominates the substitution effect.
An example of gross substituability was provided during the presentation of the introductory
examples of the present section (log utility functions). The second case can be obtained with
the same examples after imposing that agents have no endowments in the second period. The
equilibrium is seriously compromised in this case, however. Another example is obtained with
Cobb-Douglas utility functions: u(c1t , c2,t+1 ) = cl1t1 ·cl2,t+1
2
, which generates a real savings function
l2 w
w − R2,t+1
l1 1t t+1
sav(Rt+1 ) = l the derivative of which is nil when one assumes that w2,t = 0 for all t,
1+ l2
1
m̄t l1 +l2
which also implies pt
= savt = ν1 w1t , ν ≡ l2
and, by reorganizing,
m̄t ν = pt w1t ,
an equation supporting the view of the Quantitative Theory of money. In this case, the sequence
pt t w1,t+1
of gross returns is Rt+1 = pt+1 = m̄m̄t+1 w1,t
, or
(1 + n) · (1 + gt+1 )
Rt+1 = ,
1 + μt+1
where gt+1 denotes the growth rate of endowments of young between time t and time t + 1. The
inflation factor Rt−1 is equal to the monetary creation factor corrected for the the growth rate
of the economy.
87
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
y
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2
x
FIGURE 3.5. η = 2
³ ´η/(η−1)
(η−1)/η (η−1)/η
Another example is u(c1t , c2,t+1 ) = lc1t + (1 − l)c2,t+1 . Note that
limη→1 u(c1t , c2,t+1 ) = cl1t · c1−l
2,t+1 , Cobb-Douglas. We have
⎧ Rt+1 w1t +w2,t+1 1−l
⎪
⎪ c1t = η
Rt+1 +K η Rt+1
, K ≡ l
⎨ Rt+1 w1t +w2,t+1
c2,t+1 = 1+K −η R1−η
⎪
⎪ η
K η Rt+1
t+1
⎩ sav =
w1t −w2,t+1
t Rt+1 +K η Rηt+1
To simplify, suppose that K = 1, and 0 = w2t = μt = n, and w1t = w1,t+1 ∀t. It is easily
checked that
sign (sav0 (R)) = sign (η − 1) .
The interest factor dynamics is:
Here are some hints concerning the general case. Figure 3.6 depicts the shape of the map
Rt 7→ Rt+1 in the case of gross substituability (in fact, the following arguments can also be
0 (x)x
adapted verbatim to the complementarity case whenever ∀x, sav sav(x)
< −1: indeed, in this case
dRt+1
dRt
> 0 since the numerator is negative and the denominator is also negative by assumption.
88
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
Rt+1
R M
A
Rt
Ra R
Such a case does not have any significative economic content, however). This is an increasing
sav0 (Rt )Rt +sav(Rt ) 1+μt
function since the slope dRt+1
dRt
= sav0 (Rt+1 ) 1+n
> 0. In addition, the slope (3.41) computed
1+n
in correspondence with the monetary state R = 1+μ is:
¯
dRt+1 ¯¯ sav(R)
= 1 + .
dRt ¯Rt+1 =Rt =R Rsav0 (R)
This is always greater than 1 if sav0 (.) > 0, and in this case the monetary state M is unstable
and the autarchic state A is stable. In particular, all paths starting from the right of M are
unstables. They imply an increasing sequence of R, i.e. a decreasing sequence of p. This can
not be an equilibrium because it contradicts the budget constraints (in fact, there would not
be a solution to the agents’ programs). It is necessary that the economy starts from A and M,
but we have not endowed with additional pieces of information: there is a continuum of points
R1 ∈ [Ra , R) which are candidates for the beginning of the equilibria sequence. Contrary to
the models of the previous sections, here we have indeterminacy of the equilibrium, which is
parametrized by p0 .
Is the autarchic state the only possible stable configuration?
¯ The answer is no. It is sufficient
dRt+1 ¯
that the map Rt 7→ Rt+1 bends backwards and that dRt ¯ < −1 to make M a stable
Rt+1 =Rt =R
0 (R)R
state. A condition for the curve to bend backward is that sav > −1, and the condition for
¯ sav(R)
dRt+1 ¯ sav0 (R)R sav 0 (R)R
dRt ¯
< −1 to hold is that sav(R) > − 12 . If sav(R) > − 12 , M is attained starting
Rt+1 =Rt =R
from a sufficiently small neighborhood of M. Figure 3.7 shows the emergence of a cycle of
R2 9
order two,8 in which R∗∗ = R ∗
. Notice that in this case, the dynamics of the system has been
analyzed in a backward-looking manner, not in a forward-looking manner. The reason is that
there is an indeterminacy of the forward-looking dynamics, and it is thus necessary to analyze
8 There are more complicated situations in which cycles of order 3 may exist, whence the emergence of what is known as “chaotic”
trajectories.
9 Here is the proof. Starting from relation (3.40), we have that for a 2-cycle,
1+μ
R s(R∗ ) = s(R∗∗ )
1+n ∗
1+μ
R s(R∗∗ ) = s(R∗ )
1+n ∗∗
By multiplying the two sides of these equations, one recovers the desired result.
89
3.5. Money, production, asset prices, and overlapping generations models c
°by A. Mele
Rt+1
R M
Rt
R* R R**
FIGURE 3.7.
the system dynamics in a backward-looking manner ... In any case, the condition sav0 (R) < 0
is not appealing on an economic standpoint.
10 Lucas,R.E., Jr. (1972): “Expectations and the Neutrality of Money,” J. Econ. Theory, 4, 103-124.
11 Partsof this simplified version of the model are taken from Stokey et Lucas (1989, p. 504): Stokey, N.L. and R.E. Lucas (with
E.C. Prescott) (1989): Recursive Methods in Economic Dynamics, Harvard University Press.
90
3.6. Optimality c
°by A. Mele
By replacing the previous relation into the first-order condition, and simplifying,
As in the model of section 3.6, the rational expectation assumption consists in regarding all the
model’s variables as functions of the state varibales. Here, the states of natures are generated
by , and we have:
n = n( ).
By plugging n( ) into (3.43) we get:
Z
0
¡ ¢ +¯
¯
v (n( ))n( ) = β u0 +
n( + ) +
n( + )dP ( ). (3.49)
supp( )
In this case, the r.h.s. of the previous relationship does not depend on , which implies that
the l.h.s. does not depend on neither. Therefore, the only candidate for the solution for n is
a constant n̄:12
n( ) = n̄, ∀ .
Provided such a n̄ exists, this is a result on money neutrality. More precisely, relation (3.45)
can be written as: Z
0
v (n̄) = β u0 ( + n̄) + dP ( + ),
supp( )
and it is always possible to impose reasonable conditions on v and u that ensure existence and
unicity of a strictly positive solution for n̄, as in the following example.
√ √
Example. v(x) = 12 x2 and u(x) = ln x. The solution is n̄ = β, y( ) = β and p( ) = m
√
β
.
Exercise. Extend the previous model when the money supply follows the stochastic process:
∆mt
mt−1
= μt , where {μt }t=0,1,··· is a i.i.d. sequence of shocks.
3.6 Optimality
3.6.1 Models with productive capital
The starting point is the relation
12 A rigorous proof that n( ) = n̄, ∀ is as follows. Let’s suppose the contrary, i.e. there exists a point 0 and a neighborhood
of 0 such that either n ( 0 + A) > n ( 0 )or n ( 0 + A) < n ( 0 ), where the constant A > 0. Let’s consider the first case (the proof
of the second case being entirely analogous). Since the r.h.s. of (4.43) is constant for all , we have, v0 (n ( 0 + A)) · n ( 0 + A) =
v0 (n ( 0 )) · n ( 0 ) ≤ v0 (n ( 0 + A)) · n ( 0 ), where the inequality is due to the assumption that v 00 > 0 always holds. We have thus
shown that, v0 (n ( 0 + A)) · [n ( 0 + A) − n ( 0 )] ≤ 0. Now v 0 > 0 always holds, so that n ( 0 + A) < n ( 0 ), a contradiction with
the assumption n ( 0 + A) > n ( 0 ).
91
3.6. Optimality c
°by A. Mele
Theorem 3.2 ((weak version of the) Cass-Malinvaud theory). (a) A path {(k, c)t }∞
t=0 is
y0 (kt ) ∞
consumption efficient if 1+n ≥ 1 ∀t. (b) A path {(k, c)t }t=0 is consumption inefficient if
y0 (kt )
1+n
< 1 ∀t.
or
y 0 (kt )
t+1 < t.
1+n
13 Tirole, J. (1988): “Efficacité intertemporelle, transferts intergénérationnels et formation du prix des actifs: une introduction,”
in: Melanges économiques. Essais en l’honneur de Edmond Malinvaud. Paris: Editions Economica & Editions EHESS, p. 157-185.
14 The proof we present here appears in Touzé, V. (1999): Financement de la sécurité sociale et équilibre entre les générations,
y'(kt)
β−1
kt
k k*
FIGURE 3.8. Non-necessity of the conditions of thm. 3.2 in the model with a representative agent.
0
Evaluating the previous inequality at t = 0 yields 1 < y1+n
(k0 )
0 , and since 0 = 0, one has that
y 0 (kt )
1 < 0. Since 1+n ≥ 1 ∀t, t → −∞, which contradicts (3.46).
t→∞
(b) The proof is nearly identical to the one of part (a) with the obvious exception that
lim inf t >> −∞ here. Furthermore, note that there are infinitely many such sequences that
allow for efficiency improvements. k
Are actual economies dynamically efficient? To address this issue, Abel et al. (1989)15 mod-
ified somehow the previous setup to include uncertainty, and conclude that the US economy
does satisfy their dynamic efficiency requirements.
The conditions of the previous theorem are somehow restrictive. As an example, let us take the
model of section 3.2 and fix, as in section 3.2, n = 0 to simplify. As far as k0 < k = (y 0 )−1 [β −1 ],
per-capita capital is such that y 0 (kt ) > 1 ∀t since the dynamics here is of the saddlepoint type
and then monotone (see figure 3.8). Therefore, the conditions of the theorem are fulfilled. Such
conditions also hold when k0 ∈ [k, k∗ ], again by the monotone dynamics of kt . Nevertheless, the
conditions of the theorem do not hold anymore when k0 > k∗ and yet, the capital accumulation
path is still efficient! While it is possible to show this with the tools of the evaluation equilibria
of Debreu (1954), here we provide the proof with the same tools used to show thm. 3.2. Indeed,
let
τ = inf {t : kt ≤ k∗ } = inf {t : y 0 (kt ) ≥ 1} .
½ 0
y (kt ) < 1, t = 0, 1, · · ·, τ − 1
We see that τ < ∞, and since the dynamics is monotone, . By
y 0 (kt ) ≥ 1, t = τ , τ + 1, · · ·
using again the same arguments used to show thm. 3.2, we see that since τ is finite, −∞ <
τ +1 < 0. From τ onwards, an explosive sequence starts unfolding, and t → −∞.
t→∞
15 Abel, A.B., N.G. Mankiw, L.H. Summers and R.J. Zeckhauser (1989): “Assessing Dynamic Efficiency: Theory and Evidence,”
lead towards the modified Golden Rule at the stationary state (modified by ϑ).
16 In our terminology, a second best optimum is the one in which the social planner makes the thought experiment to let the
market “play” first (with money) and then parametrizes such virtual equilibria by μt . The resulting indirect utility functions are
expressed in terms of such μt s and after creating an aggregator of such indirect utility functions, the social planner maximises such
an aggregator with respect to μt .
94
3.7. Appendix 1: Finite difference equations, with economic applications c
°by A. Mele
zt+1 = A · zt , t = 0, 1, · · ·, (3A1.1)
zt = v1 κ1 λt1 + · · · + vd κd λtd ,
where λi and vi are the eigenvalues and the corresponding eigenvectors of A, and κi are constants
which will be determined below.
The classical method of proof is based on the so-called diagonalization of system (3A1.1). Let us
consider the system of characteristic equations for A, (A − λi I) vi = 0d×1 , λi scalar and vi a column
vector d × 1, i = 1, · · ·, n, or in matrix form, AP = P Λ, where P = (v1 , · · ·, vd ) and Λ is a diagonal
matrix with λi on the diagonal. By post-multiplying by P −1 one gets the decomposition17
A = P ΛP −1 . (3A1.2)
yt+1 = Λ · yt , yt ≡ P −1 zt .
z0 = (v1 , · · ·, vd )κ = P κ,
whence
κ̂ ≡ κ(P ) = P −1 z0 ,
where the columns of P are vectors ∈ the space of the eigenvectors. Naturally, there is an infinity of
such P s, but the previous formula shows how κ(P ) must “adjust” to guarantee the stability of the
solution with respect to changes of P .
3A.1 Example. d = 2. Let us suppose that λ1 ∈ (0, 1), λ2 > 1. The system is unstable in cor-
respondence with any initial condition but a set of zero measure. This set gives rise to the so-called
saddlepoint path. Let us compute its coordinates. The strategy consists in finding the set of initial
conditions for which κ2 = 0. Let us evaluate the solution at t = 0,
µ ¶ µ ¶ µ ¶
x0 κ1 v11 κ1 + v12 κ2
= z0 = P κ = (v1 , v2 ) = ,
y0 κ2 v21 κ1 + v22 κ2
where we have set z = (x, y)> . By replacing the second equation into the first one and solving for κ2 ,
v11 y0 − v21 x0
κ2 = .
v11 v22 − v12 v21
y0
x0 x
y = (v21/v11) x
FIGURE 3.9.
Here the saddlepoint is a line with slope equal to the ratio of the components of the eigenvector
associated with the root with modulus less than one. The situation is represented in figure 3.9, where
the “divergent” line has as equation y0 = vv22
12
x0 , and corresponds to the case κ1 = 0.
The economic content of the saddlepoint is the following one: if x is a predetermined variable, y
must “jump” to y0 = vv21
11
x0 to make the system display a non-explosive behavior. Notice that there is
a major conceptual difficulty when the system includes two predetermined variables, since in this case
there are generically no stable solutions. Such a possibility is unusual in economics, however.
4A.2 Example. The previous example can be generated by the neoclassic growth model. In section
3.2.3, we showed that in a small neighborhood of the stationary values k, c, the dynamics of (k̂t , ĉt )t
(deviations of capital and consumption from their respective stationary values k, c) is:
µ ¶ µ ¶
k̂t+1 k̂t
=A
ĉt+1 ĉt
where à !
y0 (k) −1
A≡ u0 (c) 00 u0 (c) 00 , β ∈ (0, 1).
− u00 (c) y (k) 1 + β u00 (c) y (k)
By using the relationship βy 0 (k) = 1, and the conditions imposed on u and y, we have
- det(A) = y 0 (k) = β −1 > 1;
0
- tr(A) = β −1 + 1 + β uu00(c) 00
(c) y (k) > 1 + det(A).
√
tr(A)∓ tr(A)2 −4 det(A)
The two eigenvalues are solutions of a quadratic equation, and are: λ1/2 = 2 . Now,
a ≡ tr(A)2 − 4 det(A)
h 0
i2
= β −1 + 1 + β uu00(c)
(c) y 00 (k) − 4β −1
¡ −1 ¢2
> β + 1 − 4β −1
¡ ¢2
= 1 − β −1
> 0.
17 The previous decomposition is known as the spectral decomposition if P > = P −1 . When it is not possible to diagonalize A,
Next, we wish to generalize the previous examples to the case d > 2. The counterpart of the
saddlepoint seen before is called the convergent, or stable subspace: it is the locus of points for which
the solution does not explode. (In the case of nonlinear systems, such a convergent subspace is termed
convergent, or stable manifold. In this appendix we only study linear systems.)
Let Π ≡ P −1 , and rewrite the system determining the solution for κ:
κ̂ = Πz0 .
We suppose that the elements of z and matrix A have been reordered in such a way that ∃s : |λi | < 1,
for i = 1, · · ·, s and |λi | > 1 for i = s + 1, · · ·, d. Then we partition Π in such a way that:
⎛ ⎞
Πs
κ̂ = ⎝ s×d ⎠ z0 .
Πu
(d−s)×d
As in example (3A.1), the objective is to make the system “stay prisoner” of the convergent space,
which requires that
κ̂s+1 = · · · = κ̂d = 0,
or, by exploiting the previous system,
⎛ ⎞
κ̂s+1
⎜ .. ⎟
⎝ . ⎠ = Πu z0 = 0(d−s)×1 .
(d−s)×d
κ̂d
Let d ≡ k + k ∗ (k free and k∗ predetermined), and partition Πu and z0 in such a way to distinguish
the predetermined from the free variables:
⎛ ⎞
µ ¶ z0free
(1) (2)
0(d−s)×1 = Πu z0 = Πu Πu pre ⎠ = Πu
⎝ k×1 (1)
z0free + Π(2)
u z0pre ,
(d−s)×d (d−s)×k (d−s)×k ∗ z0 (d−s)×k k×1 (d−s)×k k∗ ×1
∗
k∗ ×1
or,
Π(1)
u z0free = − Π(2)
u z0pre .
(d−s)×k k×1 (d−s)×k∗ k∗ ×1
The previous system has d − s equations and k unknowns (the components of z0free ): this is so be-
(1) (2)
cause z0pre is known (it is the k∗ -dimensional vector of the predetermined variables) and Πu , Πu are
(1)
primitive data of the economy (they depend on A). We assume that Πu has full rank.
(d−s)×k
Therefore, there are three cases: 1) s = k ∗ ; 2) s < k∗ ; and 3) s > k ∗ . Before analyzing these case,
let us mention a word on terminology. We shall refer to s as the dimension of the convergent subspace
(S). The reason is the following one. Consider the solution:
κ̂s+1 = · · · = κ̂d = 0,
97
3.7. Appendix 1: Finite difference equations, with economic applications c
°by A. Mele
i.e.,
zt = V̂ · λ̂t ,
d×1 d×s s×1
¡ ¢>
where V̂ ≡ (v1 κ̂1 , · · ·, vs κ̂s ) and λ̂t ≡ λt1 , · · ·, λts . Now for each t, introduce the vector subspace:
(i) d − s = k, or s = k ∗ . The dimension of the divergent subspace is equal to the number of the free
variables or, the dimension of the convergent subspace is equal to the number of predetermined
variables. In this case, the system is determined. The previous conditions are easy to interpret.
The predetermined variables identify one and only one point in the convergent space, which
allows us to compute the only possible jump in correspondence of which the free variables can
(1)−1 (2) pre
jump to make the system remain in the convergent space: z0free = −Πu Πu z0 . This is exactly
the case of the previous examples, in which d = 2, k = 1, and the predetermined variable was
x: there x0 identified one and only one point in the saddlepoint path, and starting from such a
point, there was one and only one y0 guaranteeing that the system does not explode.
(ii) d − s > k, or s < k∗ . There are generically no solutions in the convergent space. This case was
already reminded at the end of example 4A.1.
(iii) d − s < k, or s > k ∗ . There exists an infinite number of solutions in the convergent space, and
such a phenomenon is typically referred to as indeterminacy. In the previous example, s = 1,
and this case may emerge only in the absence of predetermined variables. This is also the case
in which sunspots may arise.
98
3.8. Appendix 2: Neoclassic growth model - continuous time c
°by A. Mele
where n̄ is an instantaneous rate, and = ht is the number of subperiods in which we have chopped a
given time period t. The solution is Nh = (1 + n̄h) N0 , or
Nt = (1 + n̄ · h) t/h N0 .
By taking limits:
N (t) = lim (1 + n̄ · h) t/h N (0) = en̄t N (0).
h↓0
N (t − ∆) = en̄(t−∆) N(0).
⇔
N (t)
= en̄∆ ≡ 1 + n∆ .
N (t − ∆)
⇔
1
n̄ = ln (1 + n∆ ) .
∆
E.g., ∆ = 1, n∆ = n1 ≡ n : n̄ = ln (1 + n).
Now let’s try to do the same thing for the capital accumulation law:
¡ ¢
Kh(k+1) = 1 − δ̄ · h Khk + Ih(k+1) · h, k = 0, · · ·, − 1,
¡ ¢ X¡ ¢ −j t
Kh = 1 − δ̄ · h K0 + 1 − δ̄ · h Ihj · h, = ,
h
j=1
or
t/h
¡ ¢ t/h X¡ ¢ t/h−j
Kt = 1 − δ̄ · h K0 + 1 − δ̄ · h Ihj · h,
j=1
As h ↓ 0 we get: Z t
K(t) = e−δ̄t K0 + e−δ̄t eδ̄u I(u)du,
0
or in differential form:
K̇(t) = −δ̄K(t) + I(t),
and starting from the IS equation:
Y (t) = C(t) + I(t),
we obtain the capital accumulation law:
Discretization issues
An exact discretization gives:
Z t+1
−δ̄ −δ̄(t+1)
K(t + 1) = e K(t) + e eδ̄u I(u)du.
t
By identifying with the standard capital accumulation law in the discrete time setting:
Kt+1 = (1 − δ) Kt + It ,
we get:
1
δ̄ = ln .
(1 − δ)
It follows that
δ ∈ (0, 1) ⇒ δ̄ > 0 and δ = 0 ⇒ δ̄ = 0.
Hence, while δ can take on only values on [0, 1), δ̄ can take on values on the entire real line.
An important restriction arises in the continuous time model when we note that:
1
lim ln = ∞,
δ→1− (1 − δ)
It is impossible to think about a “maximal rate of capital depreciation” in a continuous time model
because this would imply an infinite depreciation rate!
Finally, substitute δ into the exact discretization (?.?):
Z t+1
K(t + 1) = (1 − δ) K(t) + e−δ̄(t+1) eδ̄u I(u)du
t
R t+1
so that we have to interpret investments in t + 1 as e−δ̄(t+1) t eδ̄u I(u)du.
Per capita dynamics
where all variables are expressed in per-capita terms. We suppose that there is no capital depreciation
(in the discrete time model, we supposed a total capital depreciation). More general results can be
obtained with just a change in notation.
100
3.8. Appendix 2: Neoclassic growth model - continuous time c
°by A. Mele
u0 (c(t)) £ ¤
ċ(t) = 00
ρ + δ̄ + n̄ − y 0 (k(t)) . (3A2.3)
u (c(t))
The equilibrium is the solution of the system consisting of the constraint of (4A2.1), and (4A2.3).
As in section 3.2.3, here we analyze the equilibrium dynamics of the system in a small neighborhood of
the stationary state.18 Denote the stationary state as the solution (c, k) of the constraint of program
(4A2.1), and (4A2.3) when ċ(t) = k̇(t) = 0,
½ ¡ ¢
c = y(k) − δ̄ + n̄ k
ρ + δ̄ + n̄ = y0 (k)
Warning! these are instantaneous figures, so that don’t worry if they are not such that
y 0 (k) ≥ 1 + n!. A first-order approximation of both sides of the constraint of program (4A2.1) and
(4A2.3) near (c, k) yields: ⎧ 0
⎨ ċ(t) = − u (c) y 00 (k) (k(t) − k)
u00 (c)
⎩
k̇(t) = ρ · (k(t) − k) − (c(t) − c)
where we used the equality ρ + δ̄ + n̄ = y 0 (k). By setting x(t) ≡ c(t) −c and y(t) ≡ k(t) − k the previous
system can be rewritten as:
ż(t) = A · z(t), (3A2.4)
where z ≡ (x, y)> , and ⎛ ⎞
u0 (c) 00
0 − y (k)
A≡⎝ u00 (c) ⎠.
−1 ρ
Warning! There must be some mistake somewhere. Let us diagonalize system (4A2.4) by
setting A = P ΛP −1 , where P and Λ have the same meaning as in the previous appendix. We have:
ν̇(t) = Λ · ν(t),
18 In addition to the theoretical results that are available in the literature, the general case can also be treated numerically with
where ν ≡ P −1 z.
The eigenvalues are solutions of the following quadratic equation:
u0 (c) 00
0 = λ2 − ρλ − y (k). (3A2.5)
u00 (c)
q 0
We see that λ1 < 0 < λ2 , and λ1 ≡ ρ
2 − 1
2 ρ2 + 4 uu00(c) 00
(c) y (k). The solution for ν(t) is:
ν i (t) = κi eλi t , i = 1, 2,
whence
z(t) = P · ν(t) = v1 κ1 eλ1 t + v2 κ2 eλ2 t ,
where the vi s are 2 × 1 vectors. We have,
½
x(t) = v11 κ1 eλ1 t + v12 κ2 eλ2 t
y(t) = v21 κ1 eλ1 t + v22 κ2 eλ2 t
y(0) v21
κ2 = 0 ⇔ = .
x(0) v11
As in the discrete time model, the saddlepoint path is located along a line that has as a slope
the ratio of the components of the eigenvector associated with the negative root. We can explicitely
compute such ratio. By definition, A · v1 = λ1 v1 ⇔
⎧ 0
⎨ u (c) 00
− 00 y (k) = λ1 v11
u (c)
⎩
−v11 + ρv21 = λ1 v21
v21 λ1 v21 1
i.e., v11 =− u0 (c) 00 and simultaneously, v11 = ρ−λ1 , which can be verified with the help of (3A2.5).
u00 (c)
y (k)
102
3.8. Appendix 2: Neoclassic growth model - continuous time c
°by A. Mele
References
Farmer, R. (1998): The Macroeconomics of Self-Fulfilling Prophecies. Boston: MIT Press.
Kamihigashi, T. (1996): “Real Business Cycles and Sunspot Fluctuations are Observationally
Equivalent.” Journal of Monetary Economics 37, 105-117.
King, R. G. and S. T. Rebelo (1999): “Resuscitating Real Business Cycles.” In: J. B. Taylor
and M. Woodford (Editors): Handbook of Macroeconomics, Elsevier.
Lucas, R. E. Jr. (1978): “Asset Prices in an Exchange Economy.” Econometrica 46, 1429-1445.
Lucas, R. E. Jr. (1994): “Money and Macroeconomics.” In: General Equilibrium 40th Anniver-
sary Conference, CORE DP no. 9482, 184-187.
Prescott, E. (1991): “Real Business Cycle Theory: What Have We Learned?” Revista de Anal-
isis Economico 6, 3-19.
Watson, M. (1993): “Measures of Fit for Calibrated Models.” Journal of Political Economy
101, 1011-1041.
103
4
Continuous time models
Finally, let us assume that limT →∞ Et [ξ (T ) S (T )] = 0. Then, provided it exists, the price St
of an infinitely lived asset price satisfies,
∙Z ∞ ¸
ξ (t) S (t) = Et ξ (u) D (u) du . (4.3)
t
4.1. Lambdas and betas in continuous time c
°by A. Mele
where y is a vector of state variables that are suggested by economic theory. In other words, we
assume that the price-dividend ratio p is independent of the dividends D. Indeed, this “scale-
invariant” property of asset prices arises in many model economies, as we shall discuss in detail
105
4.2. An introduction to arbitrage and equilibrium in continuous time models c
°by A. Mele
In this case, expected returns and risk-adjusted discount rates are the same thing, as in the
simple one-factor Lucas economy of Section 2.
If, instead, the price-dividend ratio is not constant, the last term in Eq. (4.7) introduces a
wedge between expected returns and risk-adjusted discount rates. As we shall see, the risk-
adjusted discount rates play an important role in explaining returns volatility, i.e. the beta
related to the fluctuations of the price-dividend ratio. Intuitively, this is because risk-adjusted
discount rates affect prices through rational evaluation and, hence, price-dividend ratios and
price-dividend ratios volatility. To illustrate these properties, note that Eq. (4.3) can be rewrit-
ten as, ∙Z ∞ ¯ ¸
D∗ (τ ) − τ Disc(y(u))du ¯¯
p (y (t)) = Et ·e t ¯ y (t) , (4.8)
t D (t)
where the expectation is taken under the risk-neutral probability, but the expected dividend
growth DD(t)
∗ (τ )
is not risk-adjusted (that is E( DD(t)
∗ (τ )
) = eg0 (τ −t) ). Eq. (4.8) reveals that risk-adjusted
discount rates play an important role in shaping the price function p and, hence, the volatility
of the price-dividend ratio p. These points are developed in detail in Chapter 7.
where D (τ ) is the dividend process, π ≡ Sθ(1) , and θ(1) is the number of trees in the portfolio
of the representative agent.
We assume that the dividend process, D (τ ), is solution to the following stochastic differential
equation,
dD
= μD dτ + σ D dW,
D
for two positive constants μD and σ D . Under rational expectation, the price function S is such
that S = S(D). By Itô’s lemma,
dS
= μS dτ + σ S dW,
S
where
μ DS 0 (D) + 12 σ 2D D2 S 00 (D) σ D DS 0 (D)
μS = D ; σS = .
S(D) S(D)
Then, by Eq. (4.9), the value of wealth satisfies,
∙ µ ¶ ¸
D
dV = π μS + − r + rV − c dτ + πσ S dW.
S
Below, we shall show that in the absence of arbitrage, there must be some process λ, the “unit
risk-premium”, such that,
D
μS + − r = λσS . (4.10)
S
Let us assume that the short-term rate, r, and the risk-premium, λ, are both constant. Below,
we shall show that such an assumption is compatible with a general equilibrium economy. By
the definition of μS and σ S , Eq. (4.10) can be written as,
1
0 = σ 2D D2 S 00 (D) + (μD − λσD ) DS 0 (D) − rS (D) + D. (4.11)
2
Eq. (4.11) is a second order differential equation. Its solution, provided it exists, is the the
rational price of the asset. To solve Eq. (4.11), we initially assume that the solution, SF say,
tales the following simple form,
SF (D) = K · D, (4.12)
where K is a constant to be determined. Next, we verify that this is indeed one solution to
Eq. (4.11). Indeed, if Eq. (4.12) holds, then, by plugging this guess and its derivatives into Eq.
(4.11) leaves, K = (r − μD + λσD )−1 and, hence,
1
SF (D) = D. (4.13)
r + λσD − μD
This is a Gordon-type formula. It merely states that prices are risk-adjusted expectations of
future expected dividends, where the risk-adjusted discount rate is given by r + λσD . Hence,
in a comparative statics sense, stock prices are inversely related to the risk-premium, a quite
intuitive conclusion.
Eq. (4.13) can be thought to be the Feynman-Kac representation to Eq. (4.11), viz
∙Z ∞ ¸
−r(τ −t)
SF (D (t)) = Et e D (τ ) dτ , (4.14)
t
107
4.2. An introduction to arbitrage and equilibrium in continuous time models c
°by A. Mele
where Et [·] is the conditional expectation taken under the risk neutral probability Q (say), the
dividend process follows,
dD
= (μD − λσ D ) dτ + σ D dW̃ ,
D
and W̃ (τ ) = W (τ )+λ (τ − t) is a another standard Brownian motion defined under Q. Formally,
the true probability, P , and the risk-neutral probability, Q, are tied up by the Radon-Nikodym
derivative,
dQ 1 2
η= = e−λ(W (τ )−W (t))− 2 λ (τ −t) . (4.15)
dP
By comparing Eq. (4.14) with Eq. (4.18) reveals that the equilibrium in the real markets, D = c,
also implies that S = V . Next, rewrite (4.18) as,
∙Z ∞ ¸ ∙Z ∞ ¸
−r(τ −t)
V (t) = Et e c(t)dτ = Et mt (τ )c(t)dτ ,
t t
where
ξ (τ )
= e−(r+ 2 λ )(τ −t)−λ(W (τ )−W (t)) .
1 2
mt (τ ) ≡
ξ (t)
We assume that a representative agent solves the following intertemporal optimization prob-
lem, ∙Z ¸ ∙Z ¸
∞ ∞
max Et e−ρ(τ −t) u (c(τ )) dτ s.t. V (t) = Et mt (τ )c(τ )dτ [P1]
c t t
for some instantaneous utility function u (c) and some subjective discount rate ρ.
To solve the program [P1], we form the Lagrangean
∙Z ∞ ¸ ∙ µZ ∞ ¶¸
−ρ(τ −t)
L = Et e u(c(τ ))dτ + · V (t) − Et mt (τ )c(τ )dτ ,
t t
108
4.2. An introduction to arbitrage and equilibrium in continuous time models c
°by A. Mele
Next, let us define the right hand side of Eq. (A14) as U (τ ) ≡ · e−(r+ 2 λ −ρ)(τ −t)−λ(W (τ )−W (t))
1 2
.
By Itô’s lemma, again,
dU
= (ρ − r) dτ − λdW. (4.21)
U
By Eq. (A14), drift and volatility components of Eq. (4.20) and Eq. (4.21) have to be the same.
This is possible if
Let us assume that λ is constant. After integrating the second of these relations two times, we
obtain that besides some irrelevant integration constant,
D1−η − 1 λ
u (D) = , η≡ ,
1−η σD
η(η + 1) 2
r = ρ + ημD − σD , λ = ησD .
2
Finally, by replacing these expressions for the short-term rate and the risk-premium into Eq.
(4.13) leaves,
1
S(D) = ¡ ¢ D,
ρ − (1 − η) μD − 12 ησ 2D
We are only left to check that the transversality condition (4.17) holds at the equilibrium
S = V . We have that under the previous inequality,
£ ¤ £ ¤
lim Et e−r(τ −t) V (τ ) = lim Et e−r(τ −t) S(τ )
τ →∞ τ →∞
τ →∞
= 0. (4.23)
4.2.3 Bubbles
The transversality condition in Eq. (4.17) is often referred to as a no-bubble condition. To
illustrate the reasons underlying this definition, note that Eq. (4.11) admits an infinite number
of solutions. Each of these solutions takes the following form,
Indeed, by plugging Eq. (4.24) into Eq. (4.11) reveals that Eq. (4.24) holds if and only if the
following conditions holds true:
1
0 = K (r + λσD − μD ) − 1, and 0 = δ (μD − λσD ) + δ (δ − 1) σ2D − r. (4.25)
2
The first condition implies that K equals the price-dividend ratio in Eq. (4.13), i.e. K =
SF (D)/ D. The second condition leads to a quadratic equation in δ, with the two solutions,
It satisfies:
To rule out an explosive behavior of the price as the dividend level, D, gets small, we must set
A1 = 0, which leaves,
The component, SF (D), is the fundamental value of the asset, as by Eq. (4.14), it is the
risk-adjusted present value of the expected dividends. The second component, B (D), is simply
the difference between the market value of the asset, S (D), and the fundamental value, SF (D).
Hence, it is a bubble.
We seek conditions under which Eq. (4.26) satisfies the transversality condition in Eq. (4.17).
We have,
£ ¤ £ ¤ £ ¤
lim Et e−r(τ −t) S(τ ) = lim Et e−r(τ −t) SF (D (τ )) + lim Et e−r(τ −t) B (D (τ )) .
τ →∞ τ →∞ τ →∞
By Eq. (4.23), the fundamental value of the asset satisfies the transversality condition, under
the condition given in Eq. (4.22). As regards the bubble, we have,
£ ¤ h i
lim Et e−r(τ −t) B (D (τ )) = A2 · lim Et e−r(τ −t) D (τ )δ2
τ →∞ τ →∞
h 1 2
i
= A2 · D (t)δ2 · lim Et e(δ2 (μD −λσD )+ 2 δ2 (δ2 −1)σD −r)(τ −t)
τ →∞
= A2 · D (t)δ2 , (4.27)
where the last line holds as δ 2 satisfies the second condition in Eq. (4.25). Therefore, the bubble
can not satisfy the transversality condition, except in the trivial case in which A2 = 0. In other
words, in this economy, the transversality condition in Eq. (4.17) holds if and only if there are
no bubbles.
S 0 (D) = 0. (4.28)
This condition is in fact a no-arbitrage condition. Indeed, after hitting the barrier D, the divi-
dend is reflected back for the part exceeding D. Since the reflection takes place with probability
one, the asset is locally riskless at the barrier D. However, the dynamics of the asset price is,
dS σ D DS 0
= μS dτ + dW.
S S }
| {z
σS
111
4.3. Martingales and arbitrage in a diffusion model c
°by A. Mele
Therefore, the local risklessness of the asset at D is ensured if S 0 (D) = 0. [Warning: We need
to add some local time component here.] Furthermore, rewrite Eq. (4.10) as,
D σD DS 0 (D)
μS + − r = λσS = λ .
S S (D)
This example illustrates how the relation in Eq. (4.10) works to preclude arbitrage opportunities.
Finally, we solve the model. We have, K ≡ SF (D)/ D, and
where the second condition is the value matching condition, which needs to be imposed to
ensure continuity of the pricing function with respect to D and, hence absence of arbitrage.
The previous system can be solved to yield1
1 − δ1 K 1−δ1
Q= KD and A1 = D .
−δ 1 −δ 1
Note, the price is an increasing and convex function of the fundamentals, D.
1 In this model, we take the barrier D as given. In other context, we might be interested in “controlling” the dividend D in such
a way that as soon as the price, q, hits a level Q, the dividend level D is activate to induce the price q to increase. The solution for
−δ 1
Q reveals that this situation is possible when D = K −1 Q, where Q is an exogeneously given constant.
1 − δ1
112
4.3. Martingales and arbitrage in a diffusion model c
°by A. Mele
(S0 (τ ), · · ·, Sm (τ ))> }τ ∈[t,T ] be the positive F(τ )-adapted asset price process. The accumulation
factor does not distribute dividends. Its price satisfies:
µZ τ ¶
S0 (τ ) = exp r(u)du ,
t
RT
where r(τ ) is F(τ )-adapted process satisfying E( t r(τ )du) < ∞. We assume the dynamics of
the last components of S+ , i.e. S ≡ (S1 , · · ·, Sm )> , satisfy:
where âi (τ ) and σ i (τ ) are processes satisfying the same properties as r, with σ i (τ ) ∈ Rd . We
assume that rank(σ(τ ; ω)) = m ≤ d a.s., where σ(τ ) ≡ (σ 1 (τ ), · · ·, σ m (τ ))> .
We assume that Di is solution to
4.3.2 Viability
Rτ
Let ḡi = SS0i + z̄i , i = 1, · · ·, m, where dz̄i = S10 dzi and zi (τ ) = t Di (u)du. Let us generalize the
definition of the risk-neutral probability in Eq. (4.15), and introduce the set Q of risk-neutral,
or equivalent martingale, probabilities, defined as:
Q ≡ {Q ≈ P : ḡi is a Q-martingale} .
The aim of this section is to show the equivalent of Theorem 2.8 in Chapter 2: Q is not empty
if and only if there are not arbitrage opportunities.
Associated to every F(t)-adapted process {λ(t)}t∈[0,T ] satisfying some basic regularity condi-
tions (essentially, the Novikov’s condition),
Z τ
W0 (t) = W (t) + λ(u)du, τ ∈ [t, T ], (4.32)
t
113
4.3. Martingales and arbitrage in a diffusion model c
°by A. Mele
The process η(τ )τ ∈[t,T ] is a martingale under P . This result is the celebrated Girsanov’s theorem.
Now let us rewrite Eq. (4.29) under such a new probability by plugging W0 in it. Under Q,
We also have
µ ¶
Si Si (τ )
dḡi (τ ) = d (τ ) + dz̄i (τ ) = [(ai (τ ) − r(τ )) dτ + σ i (τ )dW (τ )] .
S0 S0 (τ )
Therefore, by Eqs. (4.30), (4.32) and (4.35), we have that, for τ ∈ [t, T ],
Z τ Z τ
V x,π,c (τ ) c (u) π > (u) σ (u)
=x− du + dW0 (u). (4.36)
S0 (τ ) t S0 (u) t S0 (u)
We have:
Theorem 4.2. There are no arbitrage opportunities if and only if Q is not empty.
A proof of this theorem is in the Appendix. The if part follows easily, by Eq. (4.36). The
only if part is more elaborated, but its basic structure can be understood as follows. By the
Girsanov’s theorem, the statement “absence of arbitrage opportunities ⇒ ∃Q ∈ Q” is equivalent
to “absence of arbitrage opportunities ⇒ ∃λ satisfying Eq. (4.35).” If Eq. (4.35) didn’t hold, one
could implement an arbitrage, and find a nonzero π : π > σ = 0 and π > (a−1m r) 6= 0. Once could
then use π when a − 1m r > 0 and −π when a − 1m r < 0, and obtain an appreciation rate of V
greater than r in spite of having zeroed uncertainty through π> σ = 0. If Eq. (4.35) holds, such
an arbitrage opportunity would never occur, as in this case for each π, π > (a − 1m r) = π > σλ.
Let © ª
hσ > i⊥ ≡ x ∈ L2t,T,m : σ > x = 0d
114
4.3. Martingales and arbitrage in a diffusion model c
°by A. Mele
and © ª
hσi ≡ z ∈ L2t,T,m : z = σu, for u ∈ L2t,T,d .
Then, we may formalize the previous reasoning as follows. The excess return vector, a − 1m r,
must be orthogonal to all vectors in hσ> i⊥ , and since hσi and hσ> i⊥ are orthogonal, a − 1m r ∈
hσi, or ∃λ ∈ L2t,T,d : a − 1m r = σλ.2
Definition 4.3 (Market completeness). Markets are dynamically complete if for each ran-
dom variable Y ∈ L2 (Ω, F, P ), we can find a portfolio process π : V x,π,0 (T ) = Y a.s.
The previous definition is the natural continuous-time counterpart to that we gave in the
discrete-time case (see Chapter 2). In analogy with the conclusions in Chapter 2, we shall prove
that in continuous-time, markets are dynamically complete if and only if (i) m = d and (ii) the
price volatility matrix of the available assets (primitives and derivatives) is nonsingular. We shall
provide a sketch of the proof for the sufficiency part of this statement (see, e.g., Karatzas (1997
pp. 8-9) for the converse), which relates to the existence of fully spanning dynamic strategies.
So given a Y ∈ L2 (Ω, F, P ), let m = d and suppose the volatility matrix σ is nonsingular. Let
us consider the Q-martingale:
¡ ¯ ¢
M(τ ) ≡ E Q S0 (T )−1 · Y ¯ F(τ ) . (4.37)
We wish to find out a portfolio process π such that the discounted wealth process, net of
consumption, S0−1 (τ ) V x,π,0 (τ ) equals M (τ ) under P (or, equivalently, under Q) a.s. By Eq.
(4.36), Z τ >
V x,π,0 (τ ) π (u) σ (u)
=x+ dW0 (u),
S0 (τ ) t S0 (u)
and so, by identifying, the portfolio we are looking for is π̂> = S0 ϕ> σ −1 . Set, then, x = M (t).
Then, M(τ ) = S0−1 (τ ) V M(t),π̂,0 (τ ), and in particular, M(T ) = S0−1 (T ) V M(t),π̂,0 (T ) a.s. By
comparing with Eq. (4.37), V M(t),π̂,0 (T ) = Y .
Armed with this result, we can now easily state:
115
4.3. Martingales and arbitrage in a diffusion model c
°by A. Mele
Under the usual regularity conditions, λ̂ can be interpreted as the process of unit risk-premia.
In fact, all processes belonging to the set:
n o
⊥
Z = λ : λ(t) = λ̂(t) + η(t), η ∈ hσi
are bounded and, hence, can be interpreted as unit risk-premia processes. More precisely, define
the Radon-Nikodym derivative of Q with respect to P on F(T ):
µ Z Z T ¶
dQ 1 T°°
°2
° >
η̂(T ) ≡ = exp − °λ̂(t)° dt − λ̂ (t)dW (t) ,
dP 2 0 0
a strictly positive P -martingale. We have the following results, which follows for example by
He and Pearson (1991, Proposition 1 p. 271) or Shreve (1991, Lemma 3.4 p. 429):
Proposition 4.5. Q ∈ Q if and only if it is of the form: Q(A) = E(1A η(T )), ∀A ∈ F(T ).
To summarize, we have that dim(hσi⊥ ) = d − m. The previous result shows quite nitidly that
markets incompleteness implies the existence of an infinity of risk-neutral probabilities. Such a
result was shown in great generality by Harrison and Pliska (1983).3
3 The so-called Föllmer and Schweizer (1991) measure, or minimal equivalent martingale measure, is defined as: P̂ ∗ (A) ≡
The first approach to solve this problem was introduced by Merton, which we shall see later.
We wish to present another approach, which makes use of Arrow-Debreu state prices, similarly
as in Chapter 2. Our first task is to derive a budget constraint paralleling the budget constraint
in Chapter 2: £ ¡ ¢¤
0 = c0 − w0 + E m · c1 − w1 , (4.38)
where c· and w· are consumption and endowments, and m is the discount factor m. In Chapter
2, such a budget constraint arises after having multiplied the initial budget constraint by the
Arrow-Debreu state prices,
Qs
φs = ms · Ps , ms ≡ (1 + r)−1 ηs , ηs = ,
Ps
and after “having taken the sum over all the states of nature”. We wish to apply the same logic
here. First, we define Arrow-Debreu state price densities:
dQ
φt,T ≡ mt,T · dP, mt,T = S0 (T )−1 η(T ), η(T ) = . (4.39)
dP
As in the finite state space of Chapter 2, we multiply the budget constraint in Eq. (4.31) by
these Arrow-Debreu densities, and then, we “take the integral over all states of nature.” The
original problem, one with an infinity of trajectory constraints, will then be reduced to one with
only one constraint, just as for the budget constraint in Eq. (4.38). Accordingly, multiply both
sides in Eq. (4.31) by φ0,T = S0 (T )−1 · dQ, and rearrange terms, to obtain:
∙ x,π,c Z T ¸ "Z ¡ ¢ #
V (T ) c(u) T
π > (a − 1m r) (u)du + (π > σ)(u)dW (u)
0= + du − x dQ − dQ.
S0 (T ) t S0 (u) t S0 (u)
Next, take the integral over all states of nature. By the Girsanov’s theorem,
∙ x,π,c Z T ¸
V (T ) c(u)
0=E + du − x .
S0 (T ) t S0 (u)
We can retrieve back the budget constraint under the probability P . We have, by a change of
measure and computations in the Appendix, that:
∙ x,π,c Z T ¸ ∙ Z T ¸
V (T ) c(u) x,π,c
x=E + du = E mt,T · V (T ) + mt,u · c(u)du . (4.40)
S0 (T ) t S0 (u) t
4 Moreover, we assume that the agent only considers the choice space in which the control functions satisfy the elementary
Because of its emphasis on the equivalent martingale measure, this approach to solve the original
problem is known as relying on martingale methods. Critically, market completeness is needed
to use these methods, as in this case, there is one and only one Arrow-Debreu density process.
However, the same martingale methods can be applied in the presence of portfolio constraints
(which include incomplete markets as a special case) too, although in a slightly modified manner,
as we shall see in Section 4.5.
To solve the problem, consider the Lagrangean,
∙Z T ¸
max E [u (τ , c(τ )) − ψ · mt,τ · c(τ )] dτ + U(v) − ψ · mt,T · v + ψ · x ,
(c,v) t
To compute the portfolio-consumption policy, note that for c (τ ) ≡ 0, the proof is just that
leading to Theorem 4.4. In the general case, define,
∙ Z T ¯ ¸
¯
Q −1
M(τ ) ≡ E S0 (T ) · v̂ + −1 ¯
S0 (u) ĉ(u)du¯ F(τ ) .
t
Notice that:
∙ Z ¯ ¸ ∙ Z ¯ ¸
T ¯ T ¯
M(τ ) = E Q
S0−1 (T ) · v̂ + S0 (u) ĉ(u)du¯¯ F(τ ) = E mt,T · v̂ +
−1
mt,u · ĉ(u)du¯¯ F(τ ) .
t t
By identifying, ∙ ¸
> x,π,c φ> (τ ) −1
π (τ ) = V (τ ) λ (τ ) + σ (τ ) , (4.43)
mt,τ
118
4.4. Equilibrium with a representative agent c
°by A. Mele
which shows that φ = 0 in the representation of Eq. (4.43). So by replacing φ = 0 into (4.43),
π > (τ ) = V x,π,ĉ (τ ) λ (τ ) σ −1 (τ ) .
x1−η − B
J(x) = A ,
1−η
where A, B are constants to be determined. Using the first condition in (4.45), leaves c =
A−1/η V . By plugging this expression into Eq. (4.46), and using the conjectured analytical form
of J, we obtain:
µ ¶
1−η η −1/η 1 Sh ρ 1
0 = AV A + +r− − (1 − ρAB) .
1−η 2 η 1−η 1−η
This equation must hold for every V . Therefore
µ ¶−η µ ¶η
ρ − r(1 − η) (1 − η)Sh 1 ρ − r(1 − η) (1 − η)Sh
A= − , B= −
η 2η 2 ρ η 2η2
4.4.3 Equilibrium
In a complete markets setting, an equilibrium is (i) a consumption plan satisfying the first order
conditions (4.42); (ii) a portfolio process having the form in Eq. (4.43), and (iii) the following
market clearing conditions:
X
m X
m
c (τ ) = D(τ ) ≡ Di (τ ), for τ ∈ [t, T ), q(T ) ≡ Si (T ) (4.47)
i=1 i=1
θ0 (τ ) = 0, π(τ ) = S(τ ), for τ ∈ [t, T ] . (4.48)
We now derive equilibrium allocations and Arrow-Debreu state price densities. First, note
that the dividend process, D, satisfies:
where the first equality holds in an equilibrium, the second equality follows by the first order
conditions in (4.42), and the third equality is true by the definition of mt,τ in Eq. (4.41).
Finally, by Itô’s lemma, log uc (τ , D(τ )) is solution to:
" Ã µ ¶2 !#
uτ c ucc 1 2 2 uccc ucc ucc
d log uc = + aD D + σD D − dt + Dσ D dW. (4.50)
uc uc 2 uc uc uc
By identifying drifts and diffusion terms in Eqs. (4.49)-(4.50), we obtain, after a few simplifi-
cations, the expression for the equilibrium short term rate and the prices of risk:
∙ ¸
uτ c (τ , D(τ )) ucc (τ , D(τ )) 1 2 2 uccc (τ , D(τ ))
r(τ ) = − + aD (τ )D(τ ) + σ D (τ ) D(τ )
uc (τ , D(τ )) uc (τ , D(τ )) 2 uc (τ , D(τ ))
ucc (τ , D(τ ))
λ| (τ ) = − σ D (τ ) D (τ ) .
uc (τ , D(τ ))
For example, consider the CRRA utility function, if u (τ , c) = e−(τ −t)ρ (c1−η − 1) / (1 − η), and
m = 1. Then,
1
r(τ ) = ρ + η · aD (τ ) − η(η + 1)σ D (τ )2 , λ (τ ) = ησ D (τ ) .
2
Appendix 2 performs Walras’s consistency tests: Eq. (4.47) ⇐⇒ Eq. (4.48).
where the second line follows by the same arguments leading to Eq. (4.40). Replacing the
first order condition in (4.42), and the equilibrium conditions in Eq. (4.47), we obtain the
consumption CAPM evaluation of each asset:
" ¡ ¢ Z T 0 ¯ #
u0 q(T ) u (D(s)) ¯
¯
Si (τ ) = E Si (T ) + Di (s)ds¯ F(τ ) , i = 0, 1, · · · , m.
u0 (D(τ )) 0
τ u (D(τ )) ¯
As an example, consider a pure discount bond, with price b. We have that its dividend is zero
and that b(T ) = 1. Therefore,
" ¡ ¢¯ # ∙ ¯ ¸
u0 q(T ) ¯¯ mt,T ¯¯
b(τ ) = E ¯ F(τ ) = E F(τ ) ,
u0 (D(τ )) ¯ mt,τ ¯
and optimal consumption and portfolio choices for this unconstrained problem are exactly those
chosen by the investor constrained to have p ∈ K. Appendix 4 provides an informal sketch of
the arguments leading to Eq. (4.54).
122
4.6. Jumps c
°by A. Mele
Examples of the support function ζ in Eq. (4.51) are the unconstrained case: K = Rd , in
which case K̃ = {0} and ζ = 0 on K̃; prohibition of short-selling: K = [0, ∞)d , in which case
K̃ = K and ζ = 0 on K̃, or: incomplete markets: K = {ν ∈ Rd : pM+1 = · · · = pD = 0} (i.e.
the first M assets can only be traded), in which case K̃ = {ν ∈ Rd : ν 1 = · · · = pM = 0} and
ζ = 0 on K̃.
In the context of log-utility functions, we have that,
³ ° °2 ´
ν̂ = arg min 2ζ (ν) + °λ + σ −1 ν ° ,
ν∈K̃
where λ = σ −1 (a − 1d r). Applications of this will be worked out in Part II on “Asset pricing
and reality.”
4.6 Jumps
Brownian motions are well suited to model the price behavior of liquid assets or assets issued by
names or Governments not subject to default risk. There is, however, a fair amount of interest in
modeling discontinuous changes in asset prices. Fixed income instruments may undergo liquidity
dry-ups, or even default, causing price discontinuities that we wish to model. This section is
an introduction to Poisson models, a class of processes that is particularly useful in addressing
these issues.
(i) The random number of events arrivals on any disjoint time intervals of (t, T ) are inde-
pendent.
(ii) Given two arbitrary disjoint but equal time intervals in (t, T ), the probability of a given
random number of events arrivals is the same in each interval.
(iii) The probability that at least two events occur simultaneously in any time interval is zero.
Next, let Pk (τ − t) be the probability that k events arrive during the time interval τ − t. We
make use of the previous three properties to determine the functional form of Pk (τ − t). First,
Pk (τ − t) must satisfy:
P0 (τ + dτ − t) = P0 (τ − t) P0 (dτ ) , (4.55)
and we impose
P0 (0) = 1, Pk (0) = 0 for k ≥ 1. (4.56)
Eq. (4.55) and the first condition in (4.56) are satisfied by P0 (τ ) = e−vτ , for some constant v,
which we take to be positive, so as to ensure that P0 ∈ [0, 1]. Furthermore, we have that:
⎧
⎪
⎪ P1 (τ + dτ − t) = P0 (τ − t) P1 (dτ ) + P1 (τ − t) P0 (dτ )
⎪
⎨ ..
.
(4.57)
⎪
⎪ Pk (τ + dτ − t) = Pk−1 (τ − t) P1 (dτ ) + Pk (τ − t) P0 (dτ )
⎪
⎩ ..
.
123
4.6. Jumps c
°by A. Mele
4.6.2 Interpretation
A Poisson model is one of rare events. Moreover, by:
E (event arrival in dτ ) = P1 (dτ ) = vdτ .
For this reason, we usually refer to the parameter v as the intensity of event arrivals.
To provide additional intuition about the mathematics of rare events, consider the expression
for the probability of k “arrivals” in n trials, predicted by a binomial distribution:
µ ¶
n k n−k n!
Pn,k = p q = pk qn−k , p, q > 0, p + q = 1,
k k! (n − k)!
where p is the probability of arrival for each trial. We want to model the probability p as a
function of n, with the feature that limn→∞ p(n) = 0, so as to make each arrival “rare.” One
possible choice is p (n) = na , for some constant a > 0. Under this assumption, we have:
n!
Pn,k = p(n)k (1 − p(n))n−k
k! (n − k)!
n! ³ a ´k ³ a ´n−k
= 1−
k! (n − k)! n n
n! ³ a ´k ³ a ´n ³ a ´−k
= 1− 1−
k! (n − k)! n n n
k ³ ´ ³ ´
n! a a n a −k
= 1 − 1 −
nk (n − k)! k! n n
k ³
n n−1 n−k +1a a ´n ³ a ´−k
= · ··· 1− 1− ,
|n n {z n } k! n n
k times
leaving,
ak −a
lim Pn,k ≡ Pk = e .
n→∞ k!
Next, we split the interval (τ − t) into n subintervals of length τ n−t , and then make the prob-
ability of one arrival in each sub-interval proportional to each sub-interval length, as illustrated
in Figure 4.1,
τ −t a
p(n) = v ≡ , a ≡ v(τ − t).
n n
124
4.6. Jumps c
°by A. Mele
n −1 (τ − t)
t τ
n subintervals
The Poisson model in the previous section is thus as that we consider here, with n → ∞,
which is continuous-time, as each sub-interval in Figure 4.1 shrinks to dτ . The probability there
is one arrival in dτ is vdτ , which is also the expected number of events in dτ as shown below:
E (# arrivals in dτ )
= Pr (one arrival in dτ ) × one arrival + Pr (zero arrivals in dτ ) × zero arrivals
= Pr (one arrival in dτ ) × 1 + Pr (zero arrivals in dτ ) × 0
= vdτ .
The heuristic construction in this section opens the way to how we can simulate Poisson
processes. We can just simulate a Uniform random variable U (0, 1), with the continuous-time
process being approximated by Y , where:
½
0 if 0 ≤ U < 1 − vh
Y =
1 if 1 − vh ≤ U < 1
where h is a discretization interval.
A related distribution is the exponential (or Erlang) distribution. Remember, the probability
of zero arrivals in τ − t predicted by the Poisson model is P0 (τ − t) = e−v(τ −t) , from which it
follows that:
G (τ − t) ≡ 1 − P0 (τ − t) = 1 − e−v(τ −t)
is the probability of at least one arrival in τ − t. The function G can be also interpreted as the
probability the first arrival occurred before τ , starting from t. The density function of G is:
∂
g (τ − t) =G (τ − t) = ve−v(τ −t) .
∂τ
The first two moments of the exponential distribution are:
Z ∞ Z ∞
−vx −1
¡ ¢2
Mean = xve dx = v , Variance = x − v −1 ve−vx dx = v −2 .
0 0
125
4.6. Jumps c
°by A. Mele
The expected time of the first arrival occurred before τ starting from t equals v −1 . More gen-
erally, v −1 can be interpreted as the average time from an arrival to another.5
A more general distribution than the exponential is the Gamma distribution with density:
− t)]γ−1
−v(τ −t) [v (τ
gγ (τ − t) = ve .
(γ − 1)!
The exponential distribution obtains when γ = 1.
5 Suppose arrivals are generated by Poisson processes, and consider the random variable “time interval elapsing from one arrival
to next one.” Let τ 0 be the instant at which the last arrival occurred. Then, the probability the time τ − τ 0 which will elapse from
the last arrival to the next is less than ∆ is the same as the probability that during the time interval τ − τ 0 , there is at least one
arrival.
6 For simplicity, we take v to be constant. If v is a deterministic function of time, we have that
τ k τ
t v(u)du
Pr (Z(τ ) − Z(t) = k) = exp − v(u)du , k = 0, 1, · · ·
k! t
and there is also the possibility to model v as a function of the state: v = v(q), for example. Cox processes.
126
4.7. Continuous-time Markov chains c
°by A. Mele
∂
The first two terms in are the usual Itô’s lemma terms, with ∂τ · +L· denoting the infinitesimal
generator for diffusions. The third term accounts for jumps. If there are no jumps from time τ −
to time τ (where dτ = τ − τ − ), then dZ(τ ) = 0. If there is a jump then dZ(τ ) = 1, and in this
case f, as a “rational” function, needs also instantaneously jump to f (S(τ ) + (S(τ )) · S, τ ).
The jump will be exactly f (S(τ ) + (S(τ )) · S, τ ) − f (S(τ ), τ ), where S is another random
variable with a fixed probability measure. Clearly, if f (S, τ ) = S, we are back to the initial
jump-diffusion model in Eq. (4.58).
To derive the infinitesimal generator for jumps-diffusion, LJ f say, note that:
µ ¶
∂
E (df ) = + L f dτ + E [(f (S + S, τ ) − f (S, τ )) · dZ(τ )]
∂τ
µ ¶
∂
= + L f dτ + E [(f (S + S, τ ) − f (S, τ )) · v · dτ ] ,
∂τ
or Z
J
L f = Lf + v · [f (S + S, τ ) − f (S, τ )] p (dS) ,
supp(S)
127
4.8. Appendix 1: Convergence issues c
°by A. Mele
Now let ∆ ↓ 0 and assume that θ(1) and θ(2) are approximately constant between t and t − ∆. We
have:
dV (τ ) = (dS(τ ) + D(τ )dτ ) θ(1) (τ ) + db(τ )θ(2) (τ ) − c(τ )dτ .
Assume that
db(τ )
= rdτ .
b(τ )
The budget constraint can then be written as:
128
4.9. Appendix 2: Proofs of selected results c
°by A. Mele
129
4.9. Appendix 2: Proofs of selected results c
°by A. Mele
where we used the fact that c is adapted, the law of iterated expectations, the martingale property of
η, and the definition of m0,t .
That is,
¡ ¢
θ0 (τ )S0 (τ ) + π > (τ ) − S > (τ ) 1m S > (τ )1m
+
S0 (τ ) S0 (τ )
¡
Z τ > >
¢ Z τ Z τ >
> π (u) − S (u) σ(u) c (u) S (u)σ(u)
= 1m S(t) + dW0 (u) − du + dW0 (u).
t S0 (u) t S0 (u) t S0 (u)
³ ´ Rτ ¡ ¢ Rτ ¡ ¢
Plugging the solution SS0i (τ ) = Si (t)+ t S0−1 Si (u)σ i (u)dW0 (u)− t S0−1 Di (u)du in the previous
relation,
¡ ¢ Z T > Z T
θ0 (T )S0 (T ) + π > (T ) − S > (T ) 1m π (u) − S > (u) D(u) − c(u)
= σ(u)dW0 (u) + du. (4A.1)
S0 (T ) t S0 (u) t S0 (u)
When Eq. (4.47) holds, we have that V x,π,c (T ) = θ0 (T )S0 (T ) + π > (T )1m = q(T ) = S > (T )1m , and
D = c, and Eq. (4A.1) becomes:
Z T
π > (u) − S > (u)
0 = x(T ) ≡ σ(u)dW0 (u),
t S0 (u)
π > (τ ) − S > (τ )
dx(τ ) = σ(τ )dW0 (τ ) = 0.
S0 (τ )
Since ker(σ) = {∅} then, we have that π(τ ) = S(τ ) a.s. for τ ∈ [t, T ] and, hence, π(τ ) = S(τ ) a.s. for
τ ∈ [t, T ]. It is easily checked that this implies θ0 (T ) = 0 P -a.s. and that in fact, θ0 (τ ) = 0 a.s.
Next, we show that Eq. (4.48) ⇒ Eq. (4.47). When Eq. (4.48) holds, Eq. (4A.1) becomes:
Z T
D(u) − c(u)
0 = y(T ) ≡ du,
t S0 (u)
a martingale starting at zero. We conclude by the same arguments used in the proof of the previous
part. k
130
4.10. Appendix 3: The Green’s function c
°by A. Mele
Let
¡ ¢ S0 (t0 )
a t0 , t00 = .
S0 (t00 )
In terms of a, Eq. (4A.3) is:
∙ Z T ¸
V (x(τ ), τ ) = E a (τ , T ) V (x(T ), T ) + a (τ , s) h (x(s), s) ds .
τ
y(u) ≡ (a (τ , u) , x(u)) , τ ≤ u ≤ T,
and let P ( y(t00 )| y(τ )) be the density function of the augmented state vector under the risk-neutral
probability. We have,
∙ Z T ¸
V (x(τ ), τ ) = E a (τ , T ) V (x(T ), T ) + a (τ , s) h (x(s), s) ds
τ
Z Z T Z
= a (τ , T ) V (x(T ), T ) P ( y(T )| y(τ )) dy(T ) + a (τ , s) h (x(s), s) P ( y(s)| y(τ )) dy(s)ds.
τ
where: Z
G(τ , T ) ≡ a (τ , T ) P ( y(T )| y(τ )) dy(T ).
A
131
4.10. Appendix 3: The Green’s function c
°by A. Mele
It is the value in state x ∈ Rd as of time t of a unit of numéraire at > t if future states lie in a
neighborhood (in Rd ) of ξ. It is thus the Arrow-Debreu state-price density.
For example, a pure discount bond has V (x, T ) = 1 ∀x, and h(x, s) = 1 ∀x, s, and
Z
V (x(τ ), τ ) = G (x(τ ), τ ; ξ, T ) dξ,
X
with
lim G (x(τ ), τ ; ξ, T ) = δ (x(τ ) − ξ) ,
τ ↑T
where δ is the Dirac delta.
132
4.11. Appendix 4: Portfolio constraints c
°by A. Mele
Next, define the standard Brownian motion under the probability Qν , defined through the Radon-
Nikodym in Eq. (4.53):
Z t¡ Z t¡
−1
¢ ¢
Wν (t) = W (t) + λ (u) + σ (u) ν (u) du ≡ W0 (t) + σ −1 (u) ν (u) du,
0 0
where λ = σ −1 (a − 1d r), and W0 is the usual Brownian under the risk-neutral probability in a market
without any frictions. If the price system is as in Eqs. (4.52), then, for any unconstrained portfolio-
consumption (p, c), the dynamics of wealth, Vνx,p,c say, are easily seen to be:
³ ´
dVνx,p,c = p> ν + ζ (ν) Vνx,p,c + rVνx,p,c − c dt + p> σdW0 .
Therefore, for any normalized portfolio-consumption (p, c), we have that the wealth difference, ∆ (t) ≡
Vνx,π,c (T )−V x,π,c (T )
S0 (T ) , satisfies:
Because m (t) ≥ 0 by Eq. (4A.7), then, by a comparison theorem (e.g., Karatzas and Shreve (1991,
p. 291-295)), ∆ (t) ≥ ∆ ¯ (t) = 0, where the last equality follows because the solution to Eq. (4A.8) is
¯ ¯
∆ (t) = ∆ (0) L (t), for some positive process L (t). Therefore, we have,
Vνx,p,c (t) ≥ V x,p,c (t) , with an equality if ζ (ν (t)) + p> ν (t) = 0 for all t. (4A.9)
Finally, suppose there is a constrained portfolio-consumption pair (pν̂ , cν̂ ), such that
Naturally, we have that Val (x; K) ≤ Valν (x) for all ν and, hence,
Moreover, we have,
∙Z T ¸
x,p,c
Val (x; K) = E u (t, c (t)) dt + U (V (T )) , p (t) ∈ K
0
∙Z T ¸
x,pν̂ ,cν̂
≥ E u (t, cν̂ (t)) dt + U (V (T ))
0
∙Z T ¸
¡ x,pν̂ ,cν̂ ¢
= E u (t, cν̂ (t)) dt + U Vν̂ (T )
0
= Valν̂ (x) , (4A.12)
where the second line follows, because the value of the unconstrained problem is, of course, the largest
we may have, once we consider any arbitrary constrained portfolio-consumption (pν̂ , cν̂ ). The third
line follows by Eq. (4A.10) and (4A.9). The fourth line is the definition of Valν (x). Combining (4A.11)
with (4A.12) leaves,
Val (x; K) = Valν̂ (x) .
The converse, namely “if there exists a ν̂ ∈ K̃ that minimizes Valν̂ (x), then, the corresponding
portfolio-consumption process (pν̂ , cν̂ ) is optimal for the constrained problem,” is also true, but its
arguments (even informal) are omitted here.
134
4.12. Appendix 5: Models with final consumption only c
°by A. Mele
Even if markets are incomplete, agents can solve the sequence of problems {Pt }Tt=1 as time unfolds.
Each problem can be written as:
" Ã !¯ #
XT ¯
¯
max E u V1 + (rt Vt−1 − rt St−1 θt + ∆St θt ) ¯ Ft−1 .
θt ¯
t=1
136
4.13. Appendix 6: Further topics on jumps c
°by A. Mele
and to
τi τi
− v Q (u)du − v(u)λJ (u)du
v Q (τ i−1 )e τ i−1
= v(τ i−1 )λJ (τ i−1 )e τ i−1
under the probability Q.
As explained in section 6.9.3 (see also formula # 6.76)), these are in fact densities of time intervals
elapsing from one arrival to the next one.
Next let A be the event of marks at time τ 1 , τ 2 , · · ·, τ n . The Radon-Nikodym derivative is the
likelihood ratio of the two probabilities Q and P of A:
τ2 τ3
τ1
v(u)λJ (u)du − v(u)λJ (u)du − v(u)λJ (u)du
Q(A) e− t · v(τ 1 )λJ (τ 1 )e τ1
· v(τ 2 )λJ (τ 2 )e τ2
· ···
= τ1 − τ2
v(u)du − τ3
v(u)du
,
P (A) e− t v(u)du · v(τ 1 )e τ1 · v(τ 2 )e τ2 · ···
where we have usedτthe fact that given that atτ τ 0 = t, there are no-jumps, the probability of no-jumps
1 1 J
from t to τ 1 is e− t v(u)du under P and e− t v(u)λ (u)du under Q, respectively. Simple algebra then
yields,
Q(A) τ1
v(u)(λJ (u)−1)du − τ2
v(u)(λJ (u)−1)du − τ3
v(u)(λJ (u)−1)du
= λJ (τ 1 ) · λJ (τ 2 ) · e− t ·e τ1
e τ2
· ···
P (A)
n
Y τn
v(u)(λJ (u)−1)du
= λJ (τ i ) · e− t
i=1
" Ãn !#
Y τn J
v(u)(λ (u)−1)du
= exp log λJ (τ i ) · e− t
i=1
" n Z #
X τn ¡ J ¢
J
= exp log λ (τ i ) − v(u) λ (u) − 1 du
i=1 t
"Z Z #
T T ¡ J ¢
= exp log λJ (u)dZ(u) − v(u) λ (u) − 1 du ,
t t
where the last equality follows from the definition of the Stieltjes integral.
The previous results can be used to say something substantive on an economic standpoint. But
before, we need to simplify both presentation and notation. We have:
dS
= bdτ + σdW + SdZ
S
= bdτ + σdW + S (dZ − vdτ ) + Svdτ
= (b + Sv) dτ + σdW + S (dZ − vdτ ) .
Next, define
¡ Q ¢
dZ̃ = dZ − v Q dτ v = vλJ ; dW̃ = dW + λdτ .
dS ¡ ¢
= b + Sv Q − σλ dτ + σdW̃ + SdZ̃.
S
The characterization of the equivalent martingale measure for the discounted price is given by the
following Radon-Nikodym density of Q with respect to P :
µ Z T Z T ¶
dQ ¡ J ¢
=E − λ(τ )dW (τ ) + λ (τ ) − 1 (dZ(τ ) − v(τ )) dτ ,
dP t t
Clearly, markets are incomplete here. It is possible to show that if S is deterministic, a representative
1−η
agent with utility function u(x) = x 1−η−1 makes λJ (S) = (1 + S)−η .
The objective here is to use Itô’s lemma for jump processes to express L in differential form. Define
the jump process y as:
Z τ Z τ
¡ ¢
y(τ ) ≡ − v(u) λJ (u) − 1 du + log λJ (u)dZ(u).
t t
or,
dL(τ ) ¡ ¢ ¡ ¢ ¡ ¢
= −v(τ ) λJ (τ ) − 1 dτ + λJ (τ ) − 1 dZ(τ ) = λJ (τ ) − 1 (dZ(τ ) − v(τ )dτ ) .
L(τ )
The general case (with stochastic distribution) is covered in the following subsection.
138
4.13. Appendix 6: Further topics on jumps c
°by A. Mele
du(x(τ ), τ )
= μu (x(τ − ), τ )dτ + σ u (x(τ − ), τ )dW (τ ) + J u (∆x, τ ) dZ(τ )
u(x(τ − ), τ )
= (μu (x(τ − ), τ ) + v(x(τ − ))J u (∆x, τ )) dτ + σ u (x(τ − ), τ )dW (τ ) + J u (∆x, τ ) dM (τ ),
£¡ ∂ ¢ ¤± ¡ ¢±
where μu = ∂t + L u u, σ u = ∂u ∂x · σ
∂
u, ∂t + L is the generator for pure diffusion processes and,
finally:
u(x(τ ), τ ) − u(x(τ − ), τ )
J u (∆x, τ ) ≡ .
u(x(τ − ), τ )
Next generalize the steps made some two subsections ago, and let
The objective is to find restrictions on both λ and vQ such that both W̃ and Z̃ are Q-martingales.
Below, we show that there is a precise connection between v Q and J η , where J η is the jump component
in the differential representation of η:
dη(τ )
= −λ(x(τ − ))dW (τ ) + J η (∆x, τ ) dM (τ ), η(t) = 1.
η(τ − )
The relationship is
vQ = v (1 + J η ) ,
and a proof of these facts will be provided below. What has to be noted here, is that in this case,
dη(τ ) ¡ ¢
= −λ(x(τ − ))dW (τ ) + λJ − 1 dM (τ ), η(t) = 1,
η(τ − )
du
= (μu + vJ u ) dτ + σ u dW + J u (dZ − vdτ )
u ¡ ¢
= μu + v Q J u − σ u λ dτ + σ u dW̃ + J u dZ̃
= (μu + v (1 + J η ) J u − σ u λ) dτ + σ u dW̃ + J u dZ̃.
where E∆x is taken with respect to the jump-size distribution, which is the same under Q and P .
139
4.13. Appendix 6: Further topics on jumps c
°by A. Mele
dη(τ )
= −λ(x(τ − ))dW (τ ) + Jη (∆x, τ ) dM (τ ), η(t) = 1.
η(τ − )
i.e., ³ ´
E η(T ) · Z̃(T )
E(Z̃(t)) = = Z̃(t) ⇔ η(t)Z̃(t) = E[η(T )Z̃(T )],
η(t)
i.e.,
η(t)Z̃(t) is a P -martingale.
By Itô’s lemma,
But
¡ ¢ ¡ ¢
dη · dZ̃ = η (−λdW + J η dM ) dZ − v Q dτ = η [−λdW + J η (dZ − vdτ )] dZ − v Q dτ ,
which implies
v Q (τ ) = v(τ ) (1 + J η (∆x)) , a.s.
k
140
4.13. Appendix 6: Further topics on jumps c
°by A. Mele
References
Cvitanić, J. and I. Karatzas (1993): “Convex Duality in Constrained Portfolio Optimization.”
Annals of Applied Probability 2, 767-818.
Harrison, J.M. and S. Pliska (1983): “A Stochastic Calculus Model of Continuous Trading:
Complete Markets.” Stochastic Processes and Their Applications 15, 313-316.
He, H. and N. Pearson (1991): “Consumption and Portfolio Policies with Incomplete Markets
and Short-Sales Constraints: The Infinite Dimensional Case.” Journal of Economic Theory
54, 259-304.
Föllmer, H. and M. Schweizer (1991): “Hedging of Contingent Claims under Incomplete Infor-
mation.” In: Davis, M. and R. Elliott (Editors): Applied Stochastic Analysis. New York:
Gordon & Breach, 389-414.
Karatzas, I. and S.E. Shreve (1991): Brownian Motion and Stochastic Calculus. Springer Ver-
lag, Berlin.
Shreve, S. (1991): “A Control Theorist’s View of Asset Pricing.” In: Davis, M. and R. Elliot
(Editors): Applied Stochastic Analysis. New York: Gordon & Breach, 415-445.
141
5
Taking models to data
5.1 Introduction
This chapter surveys methods to estimate and test dynamic models of asset prices. It begins
with foundational issues on identification, specification and testing. Then, it surveys classical
estimation and testing methodologies such as the Method of Moments, in which the number of
moment conditions equals the dimension of the parameter vector (Pearson (1894)); Maximum
Likelihood (ML) (Gauss (1816), Fisher (1912)); the Generalized Method of Moments (GMM),
in which the number of moment conditions exceeds the dimension of the parameter vector,
and thus leads to minimum chi-squared (Neyman and Pearson (1928), Hansen (1982)); and
finally the recent developments based on simulations, which aim at implementing ML and
GMM estimation for models that are analytically quite complex - but that can be simulated.
The chapter concludes with an illustration of how asset pricing can help achieve asymptotic
efficiency in the estimation of dynamic models. The chapter emphasizes the asymptotic theory,
that is, what happens when the sample size is large).
where xt = (yt−1 , zt ), and 0 denotes the conditional density of the data, the true law. Then,
we have three basic definitions. First, we define a parametric model as a set of conditional laws
for yt , indexed by a parameter vector θ ∈ Θ ⊆ Rp ,
(M) = { (yt | xt ; θ) , θ ∈ Θ ⊆ Rp } .
5.2. Data generating processes c
°by A. Mele
Third, we say that the model (M) is identifiable if θ0 is unique. The main concern in this
chapter is to draw inference about the true parameter θ0 , given the observations.
• Restrictions related to the heterogeneity of the stochastic process - which pave the way
to the concept of stationarity.
• Restrictions related to the memory of the stochastic process - which pave the way to the
concept of ergodicity.
5.2.2.1 Stationarity
Stationary processes describe phenomena that approach a sort of long run equilibrium in a sta-
tistical sense: as time unfolds, the probability generating the observations settles down to some
“long-run” probability density, a time invariant probability. In the early 1980s, economy the-
ory even defined a long-run equilibrium as a well-defined stationary, or “invariant” probability
distribution generating economic outcomes. We have two notions of stationarity.
Even with stationary DGP, we might encounter situations where the number of parameters
to be estimated increases with the sample size. For example, consider two stochastic processes:
(i) one, for which cov(yt , yt+τ ) = τ 2 , and (ii) and another, for which cov(yt , yt+τ ) = exp (− |τ |).
In both cases, the DGP is stationary. Yet for the first process, the dependence increases with
τ , and for the second, the dependence decreases with τ . This issue relates to the “memory”
of the process: as this simple example reveals, a stationary stochastic process may have “long
memory.” “Ergodicity” further restricts DGP, so as to make this memory play a more limited
role.
5.2.2.2 Ergodicity
We shall consider situations in which the dependence between yt1 and yt2 decreases with |t2 − t1 |.
Let’s introduce some concepts and notation. Two events A and B are independent when P (A ∩
B) = P (A)P (B). A stochastic process is asymptotically independent if, for some function β τ ,
We say that (i) y is strongly mixing, or α-mixing if limτ →∞ ατ → 0; (ii) y is uniformly mixing
if limτ →∞ ϕτ → 0. Clearly, a uniformly mixing
PT process is also strongly mixing. A second order
stationary process is ergodic if limT →∞ τ =1 cov (yt , yt+τ ) < ∞. If a second order stationary
process is strongly mixing, it is also ergodic.
Naturally, any estimator does necessarily depend on the sample size, which we write as θ̂T ≡
tT (y). Of a given estimator θ̂T , we say that it is:
• Correct (or unbiased ), if E(θ̂T ) = θ0 . The difference E(θ̂T ) − θ0 is the distortion, or bias.
a.s.
• Weakly consistent if plimθ̂T = θ0 . And strongly consistent if θ̂T → θ0 .
(1) (2)
Finally, an estimator θ̂T is more efficient than another estimator θ̂T if, for any vector of
(1) (2)
constants c, we have that c> · var(θ̂T ) · c < c> · var(θ̂T ) · c.
Now suppose that the support of y doesn’t depend on θ. Under regularity conditions,
Z Z
∇θ f (y; θ) dy = ∇θ f (y; θ) dy = 0p ,
1 Therefore, we follow a classical perspective. A Bayesian statistician would view the sample as given. We do not review Bayesian
Finally we have,
Z
0p×p = ∇θ [∇θ log f (y; θ)] f (y; θ) dy
Z Z
= [∇θθ log f (y; θ)] f (y; θ) dy + |∇θ log f (y; θ)|2 f (y; θ) dy,
where |x|2 denotes the outer product, i.e. |x|2 = x · x> . Hence, by Eq. (5.1),
Eθ [∇θθ log f (y; θ)] = −Eθ |∇θ log f (y; θ)|2 = −varθ [∇θ log f (y; θ)] ≡ −J (θ), ∀θ ∈ Θ.
The matrix J is known as the Fisher’s information matrix.
By the basic inequality, cov (x, y)2 ≤ var (x) var (y),
[cov (t(y), ∇θ log f (y; θ))]2 ≤ var [t(y)] · var [∇θ log f (y; θ)] .
Therefore,
[∇θ E (t(y))]2 ≤ var [t(y)] · var [∇θ log f (y; θ)] = −var [t(y)] · E [∇θθ log f (y; θ)] .
But if t(y) is unbiased, or E [t(y)] = θ,
var [t(y)] ≥ [−E (∇θ log f (y; θ))]−1 ≡ J (θ)−1 .
This is the celebrated Cramer-Rao bound. The same results holds with a change in notation
in the multidimensional case (see, e.g., Amemiya (1985, p. 14-17)).
5.3.2 Factorizations
Consider a series of events {Ai }. In the Appendix, we show that,
µn ¶ Y Ã ¯ !
T n ¯ i−1
T
¯
Pr Ai = Pr Ai ¯ Aj . (5.2)
i=1 ¯j=1
i=1
Next, consider the previous definition of the MLE. Given Eq. (5.2), we have that the MLE
satisfies: µ ¶
1
θ̂T = arg max LT (θ) = arg max log LT (θ) ,
θ∈Θ θ∈Θ T
where
Y
T
¡ ¯ ¢ XT
¡ ¯ ¢ XT X
T
log LT (θ) ≡ log f yt ¯y1t−1 ; θ = log f yt ¯y1t−1 ; θ ≡ log f (yt ; θ) ≡ t (θ),
t=1 t=1 t=1 t=1
146
5.3. Maximum likelihood estimation c
°by A. Mele
Next, consider again the asymptotic expansion in Eq. (5.3), which can be elaborated, so as to
have,
∙ ¸−1
√ d 1 1
T (θ̂T − θ0 ) = − ∇θθ log LT (θ0 ) √ ∇θ log LT (θ0 )
T T
" # −1
1X 1 X
T T
=− ∇θθ t (θ0 ) √ ∇θ t (θ0 ).
T t=1 T t=1
By the law of large numbers reviewed in the Appendix (weak law no. 1),
1X
T
p
∇θθ t (θ0 ) → Eθ0 [∇θθ t (θ0 )] = −J (θ0 ) .
T t=1
Therefore, asymptotically,
√ d −1 1
XT
T (θ̂T − θ0 ) = J (θ0 ) √ ∇θ t (θ0 ).
T t=1
We also have,
1 X
T
d
√ ∇θ t (θ0 ) → N (0, J (θ0 )) .
T t=1
1
PT
Indeed, by the central limit theorem reviewed in the Appendix, let ∇θ (θ0 )T = T t=1 ∇θ t (θ0 ),
and note that E (∇θ t (θ0 )) = 0. Then,
PT √ ³ ´
1 ∇ (θ ) T ∇θ (θ )
0 T − E (∇ (θ
θ t 0 ))
√ p t=1 θ t 0 = p ,
T var [∇θ t (θ0 )] var [∇θ t (θ0 )]
5.4 M-estimators
Consider a function g of the unknown parameters θ. Given a function Ψ, a M-estimator of the
function g(θ) is the solution to,
X
T
max Ψ (xt , yt ; g) ,
g∈G
t=1
where y and x are as in Section 5.2.1. We assume that a solution to this problem exists, that it
is interior and that it is unique. Let us denote the M-estimator with ĝT (xT1 , y1T ). Naturally, the
M-estimator satisfies the following first order conditions,
1X
T
¡ ¢
0= ∇g Ψ yt , xt ; ĝT (xT1 , y1T ) .
T t=1
To simplify the presentation, we assume that (x, y) are independent in time, and that they have
the same law. By the law of large numbers,
ZZ ZZ
1X
T
p
Ψ (yt , xt ; g) → Ψ (y, x; g) dF (x, y) = Ψ (y, x; g) dF (y| x) dZ (x) ≡ Ex E0 [Ψ (y, x; g)] ,
T t=1
where E0 is the expectation operator taken with respect to the true conditional law of y given
x and Ex is the expectation operator taken with respect to the true marginal law of x. The
limit problem is,
g∞ = g∞ (θ0 ) = arg max Ex E0 [Ψ (y, x; g)] .
g∈G
Under standard regularity conditions,2 there exists a sequence of M-estimators ĝT (x, y) con-
verging a.s. to g∞ = g∞ (θ0 ). Under some additional regularity conditions, the M-estimator is
asymptotic normal:
³ ´
>
Theorem 5.1: Let I ≡ Ex E0 ∇g Ψ (y, x; g∞ (θ0 )) [∇g Ψ (y, x; g∞ (θ0 ))] and assume that the
matrix J ≡ Ex E0 [−∇gg Ψ (y, x; g)] exists and has an inverse. We have,
√ d ¡ ¢
T (ĝT − g∞ (θ0 )) → N 0, J −1 IJ −1 .
Sketch of the proof. The M-estimator satisfies the following first order conditions,
1 X
T
0 = √ ∇g Ψ (yt , xt ; ĝT )
T t=1
" #
d 1 XT
√ 1 XT
=√ ∇g Ψ (yt , xt ; g∞ ) + T ∇gg Ψ (yt , xt ; g∞ ) · (ĝT − g∞ ) .
T t=1 T t=1
2 G is compact; Ψ is continuous with respect to g and integrable with respect to the true law, for each g; 1 T a.s.
T t=1 Ψ (yt , xt ; g) →
Ex E0 [Ψ (y, x; g)] uniformly on G; the limit problem has a unique solution g∞ = g∞ (θ0 ).
148
5.5. Pseudo (or quasi) maximum likelihood c
°by A. Mele
By rearranging terms,
" #−1 " #
√ d 1 XT
1 XT
T (ĝT − g∞ ) = − ∇gg Ψ (yt , xt ; g∞ ) √ ∇g Ψ (yt , xt ; g∞ )
T t=1 T t=1
1 X
T
d
= [Ex E0 (−∇gg Ψ (y, x; g))]−1 · √ ∇g Ψ (yt , xt ; g∞ )
T t=1
1 X
T
d −1
=J ·√ ∇g Ψ (yt , xt ; g∞ ) .
T t=1
³ ´
By the limiting problem, Ex E0 [∇g Ψ (y, x; g∞ )] = 0. Since var (∇g Ψ) = E ∇g Ψ · [∇g Ψ]> = I,
then,
1 X
T
d
√ ∇g Ψ (yt , xt ; g∞ ) → N (0, I) .
T t=1
The result follows by the Slutzky’s theorem and the symmetry of J . k
3 That is, θ ∗ is, clearly, the solution to some misspecified limiting problem. It is possible to show that θ ∗ has an appealing
0 0
interpretation in terms of some entropy distance minimizer.
149
5.6. GMM c
°by A. Mele
matrix J −1 IJ −1 depends on the unknown law of (yt , xt ). To assess the precision of the estimates
of ĝT , one needs to estimate such a variance-covariance matrix. A common practice is to use
the following a.s. consistent estimators,
1 X
T
1 X¡
T
£ ¤¢
Jˆ = − ∇gg Ψ(yt , xt ; ĝT ), and Î = − ∇g Ψ(yt , xt ; ĝT ) ∇g Ψ(yt , xt ; ĝT )> .
T t=1
T t=1
5.6 GMM
Economic theory often places restrictions on models that have the following format,
E [h (yt ; θ0 )] = 0q , (5.4)
Definition (GMM estimator): The GMM estimator is the sequence θ̂T satisfying,
¡ ¢> ¡ ¢
θ̂T = arg min p h̄ y1T ; θ · WT · h̄ y1T ; θ ,
θ∈Θ⊆R 1×q q×q q×1
where {WT } is a sequence of weighting matrices, with elements that may depend on the obser-
vations.
The simplest situation arises in the so-called just-identified case p = q, in which case, the
GMM estimator is simply:
θ̂T : h̄(y1T ; θ̂T ) = 0q .
In general, p ≤ q. If p < q, we say that the GMM estimator imposes overidentifying restrictions.
We analyze the i.i.d. case only. Under regularity conditions, there exists a matrix WT that
minimizes the asymptotic variance of the GMM estimator, which satisfies asymptotically,
h ³ ´i−1
W= lim T · E h̄(y1T ; θ̂T ) · h̄(y1T ; θ̂T )> ≡ Σ−1
0 .
T →∞
1 Xh i
T
ΣT = h(yt ; θ̂T ) · h(yt ; θ̂T )> .
T t=1
150
5.6. GMM c
°by A. Mele
Note that θ̂T depends on the weighting matrix ΣT and viceversa. Therefore, we need to imple-
ment an iterative procedure. The more one iterates, the less the final results will depend on the
(0) (0)
initial weighting matrix ΣT . For example, one can start with ΣT = Iq .
We have:
p
Theorem 5.2: Suppose to be given a sequence of GMM estimators θ̂T such that: θ̂T → θ0 . We
have, µ h
√ i−1 ¶
d −1 >
T (θ̂T − θ0 ) → N 0p , E (hθ ) Σ0 E (hθ ) , where hθ ≡ ∇θ h(y; θ0 ).
p
Sketch of the proof: The assumption that θ̂T → θ0 is easy to check under mild regularity
conditions. Moreover, the GMM satisfies,
Eq. (5.5) confirms that if p = q the GMM satisfies θ̂T : h̄(y1T ; θ̂T ) = 0. This is so because if
p = q, then ∇θ hΣ−1T is full-rank, and Eq. (5.5) can only be satisfied with h̄ = 0. In the general
case, q > p, we have,
√ √ ¡ ¢ £ ¡ ¢¤> √
T h̄(y1T ; θ̂T ) = T h̄ y1T ; θ0 + ∇θ h̄ y1T ; θ0 T (θ̂T − θ0 ) + op (1).
q×1 q×1 q×p
The l.h.s. of this equality is zero by the first order conditions in Eq. (5.5). By rearranging
terms,
√ ³ ¡ ¢ £ ¡ T ¢¤> ´−1 √ ¡ T ¢
d
T (θ̂T − θ0 ) = − ∇θ h̄ y1T ; θ0 Σ−1T ∇ θ h̄ y1 ; θ 0 ∇θ h̄(y1T ; θ̂T )Σ−1
T · T h̄ y1 ; θ 0
µ T ¶−1 X T
1 P −1 1 P
T 1 √ ¡ T ¢
=− ∇θ h(yt ; θ̂T )ΣT [∇θ h(yt ; θ̂T )] >
∇θ h(yt ; θ̂T )Σ−1
T T h̄ y1 ; θ0
T t=1 T t=1 T t=1
³ ´−1 1 X
T
d >
= − E (hθ ) Σ−1
0 E (hθ ) E (hθ ) Σ−1
0 · √ h (yt ; θ0 ) .
T t=1
P
T
Next, consider the term √1 h (yt ; θ0 ), which satisfies a central limit theorem:
T
t=1
1 X
T
d
√ h (yt ; θ0 ) → N (E(h), var(h)) ,
T t=1
¡ ¢
where, by Eq. (5.4), E(h) = 0, and var(h) = E h · h> = Σ0 . Then, we have:
1 X
T
d
√ h (yt ; θ0 ) → N (0, Σ0 ) .
T t=1
151
5.6. GMM c
°by A. Mele
√
Therefore, T (θ̂T − θ0 ) is asymptotic normal with expectation 0p , and variance,
³ ´−1 ³ ´>−1 ³ ´−1
> > > >
E (hθ ) Σ−1
0 E (hθ ) E (hθ ) Σ−1 −1
0 Σ0 Σ0 E (hθ ) E (hθ ) Σ−1
0 E (hθ ) = E (hθ ) Σ−1
0 E (hθ ) .
A global specification test is that of the celebrated “overidentifying restrictions.” First, con-
sider the behavior of the statistic,
√ ¡ T ¢> −1 √ ¡ T ¢> d 2
T h̄ y1 ; θ0 Σ0 T h̄ y1 ; θ0 → χ (q).
One might be led to think that the same result applies when θ0 is replaced with θ̂T (which is a
consistent estimator of θ0 ). Wrong. Consider,
√ √
CT = T h̄(y1T ; θ̂T )> Σ−1
T · T h̄(y1T ; θ̂T ).
We have,
√ d
√ ¡ ¢ ¡ ¢√
T h̄(y1T ; θ̂T ) = T h̄ y1T ; θ0 + ∇θ h̄ y1T ; θ0 T (θ̂T − θ0 )
d √ ¡ ¢ £ ¡ ¢¤> h >
i−1 √ ¡ T ¢
= T h̄ y1T ; θ0 − ∇θ h̄ y1T ; θ0 E (hθ ) Σ−1
0 E (h θ ) E (hθ ) Σ −1
0 · T h̄ y1 ; θ0
h i
d √ ¡ ¢ −1 √ ¡ T ¢
= T h̄ y1T ; θ0 − E (hθ )> E (hθ ) Σ−1 0 E (hθ )
>
E (hθ ) Σ−1
0 · T h̄ y1 ; θ0
√ ¡ T ¢
= (Iq − P) T h̄ y1 ; θ0 ,
q×q q×1
and h i−1
Pq ≡ E (hθ )> E (hθ ) Σ−1
0 E (hθ )>
E (hθ ) Σ−1
0
is the orthogonal projector in the space generated by the columns of E (hθ ) by the inner product
Σ−1
0 . We have thus shown that,
d √ ¡ ¢> √ ¡ T ¢
CT = T h̄ y1T ; θ0 (Iq − Pq )> Σ−1
T (I − P q ) T h̄ y1 ; θ0 .
But,
√ ¡ T ¢ d
T h̄ y1 ; θ0 → N (0, Σ0 ) ,
and by a classical result,
d
CT → χ2 (q − p) .
Hansen and Singleton (1982, 1983) started the literature on the estimation and testing of dy-
namic asset pricing models within a fully articulated rational expectations framework. Consider
the classical system of Euler equations arising in the Lucas tree,
∙ 0 ¯ ¸
u (ct+1 ) ¯
E β 0 (1 + ri,t+1 ) − 1¯¯ Ft = 0, i = 1, · · ·, m,
u (ct )
where u is the utility function, ri is the return on asset i, β is the time-discount factor, Ft is
the information set as of time t, and m is the number of assets. Consider the CRRA utility
152
5.7. Simulation-based estimators c
°by A. Mele
function, u(x) = x1−η / (1 − η). If the model is well-specified, there exist some β 0 and η 0 such
that: " µ ¶−η0 ¯ #
ct+1 ¯
¯
E β0 (1 + ri,t+1 ) − 1¯ Ft = 0, i = 1, · · ·, m.
ct ¯
To sumup, the dimension of the parameter vector is p = 2. To estimate the true parameter
vector θ0 ≡ (β 0 , η0 ), we may build up a system of orthogonality conditions. This system can
be based on projecting observable variables predicted by the model onto other variables, which
we call “instruments,” and that are included in the information set Ft ,
E [h (yt ; θ0 )] = 0,
where, for some vector of n instruments, say, Instrt = [i1,t , · · · , in,t ]> ,
⎛ ∙ ³ ´−η ¸ ⎞
ct+1
⎜ β ct (1 + r1,t+1 ) − 1 · Instrt ⎟
⎜ ⎟
⎜ .. ⎟
h (yt ; θ) = ⎜ . ⎟.
m×n ⎜ ∙ ³ ´−η ¸ ⎟
⎝ ct+1 ⎠
β ct (1 + rm,t+1 ) − 1 · Instrt
The instruments used to produce the orthogonality restrictions, may include constants, past
values of consumption growth, ct+1
ct
, past returns, ri .
1X ∗
T
¡ T ¢
h̄ y1 ; θ = [f − E (f (zt , θ))] , (5.7)
T t=1 t
where,
ft∗ = f (zt , θ0 ) ,
is a vector-valued moment function, or “observation function,” a function that summarize satis-
factorily the data, so to speak. The GMM estimator is unfeasible, if we are not able to compute
the expectation E (f (zt , θ)) in closed form, for each θ. Simulation-based methods can make the
method of moments feasible in such cases.
153
5.7. Simulation-based estimators c
°by A. Mele
and S (T ) is the simulated sample size, which is written as a function of the sample size T , for
the purpose of the asymptotic theory.
The estimator θT , also known as the Simulated Method of Moments (SMM) estimator, aims to
match the sample properties of the actual and simulated processes ft∗ and ftθ . It was introduced
in a series of works, by McFadden (1989), Pakes and Pollard (1989), Lee and Ingram (1991)
and Duffie and Singleton (1993). The simulated pseudo-maximum likelihood method of Laroque
and Salanié (1989, 1993, 1994) can also be interpreted as a SMM estimator.
Another approach to statistical inference based on simulations relies on the indirect inference
principle (IIP), and was initiated by Gouriéroux, Monfort and Renault (1993) and Smith (1993).
A IIP-based estimator works as follows. Instead of minimizing the distance of some moment
conditions, the IIP relies on minimizing the parameters of an auxiliary, possibly misspecified
model. For example, consider the following auxiliary parameter estimator,
¡ ¢
β T = arg max log L y1T ; β , (5.9)
β
where L is the likelihood of some (possibly misspecified) model. Consider simulating S times
the process yt in Eq. (5.6), and computing,
where ys (θ)T1 = (ytθ,s )Tt=1 are the simulated variables (for s = 1, ···, S) when the parameter vector
is θ. The IIP-based estimator is defined similarly as θT in Eq. (5.8), but with the function GT
given by,
1X s
S
GT (θ) = β T − β (θ) . (5.10)
S s=1 T
The diagram in Figure 5.1 illustrates the main ideas underlying the IIP.
154
5.7. Simulation-based estimators c
°by A. Mele
Estimation of an
auxiliary model on
Model-simulated data model-simulated data
Model
Auxiliary
y t = f ( y t −1 , ε t ;θ ) ~
y (θ ) = ( ~
y1 (θ ), L , ~yT (θ )) parameter estimates
~
bT (θ )
Auxiliary
y = ( y1 , L , y T ) parameter estimates
bT
Observed data
Estimation of the
same auxiliary model
on observed data
Indirect Inference Estimator
~
θˆT ∈ arg min bT (θ ) − bT
θ∈Θ Ω
FIGURE 5.1. The Indirect Inference principle. Given the true model yt = f (yt−1 , t ; θ), an estima-
tor of θ based on the indirect inference principle (θ̂T say) makes the parameters of some auxiliary
model b̃T (θ̂T ) as close as possible to the° parameters°bT of the same auxiliary model estimated on the
° °
observations. That is, θ̂T = arg minθ∈Θ °b̃T (θ) − bT ° , for some norm Ω.
Ω
Finally, Gallant and Tauchen (1996) propose a simulation-based estimation method they
label efficient method of moments (EMM). Their estimation sets,
1 X ∂
N
¡ ¯ θ ¢
GT (θ) = log f ynθ ¯ zn−1 ; β T ≡ mT (θ, β T ),
N n=1 ∂β
∂
where ∂β log f (y| z; β) is the score of some auxiliary model f, β T is the Pseudo ML estimator
of the auxiliary model, and (ynθ )Nn=1 is a long simulation (i.e. N is very large) of Eq. (5.6), with
parameter vector set equal to θ. Finally, the weighting matrix WT in Eq. (5.8) is taken to be
any matrix IT−1 converging in probability to:
∙¯ ¯¸
¯∂ ¯
I = E ¯¯ log (y2 | z1 ; β)¯¯ . (5.11)
∂β 2
∂
To motivate this choice, note that the auxiliary score, ∂β
log f (yt | zt−1 ; β T ), satisfies the
following first order conditions:
1X ∂
T
log f (yt | zt−1 ; β T ) = 0,
T n=1 ∂β
for some β ∗ . Likewise, we must have that with θ = θ0 , mT (θ0 , β T ) = 0, for large N. Moreover,
by the theory of pseudo maximum likelihood reviewed in Section 5.5, we know that under
regularity conditions,
√ d ¡ ¢
T (β T − β ∗ ) → N 0, J −1 IJ −1 , (5.12)
∂ 2
where I is as in Eq. (5.11) and J = E( ∂β∂β > log (y2 | y1 ; β)). Moreover, for large N,
√ d √ d
T mT (θ0 , β T ) = J T (β T − β ∗ ) → N (0, I) ,
where the last convergence follows by (5.12). By the usual MM theory, then,
√ ³ ¡ ¢−1 ´
∗ d > −1
T (β T − β ) → N 0, D0 Σ0 D0 ,
P∞ ³ ´
where D0 = ∇θ m and Σ0 = j=−∞ E
∂
∂β
log f (yt | zt−1 ; β ∗ ) , ∂β
∂
log f (yt−j | zt−j−1 ; β ∗ ) .
5.7.2.1 SMM
Let,
X
∞ h ¡ ∗ ¡ ∗ ¢¢> i
∗ ∗
Σ0 = E (ft − E (ft )) ft−j − E ft−j ,
j=−∞
T
¡ θ0
¢
where τ = limT →∞ S(T )
, D0 = E (∇θ G∞ (θ0 )) = E ∇θ f∞ , and the notation G∞ means that
G. is drawn from its stationary distribution.
Indeed, the first order conditions satisfied by the SMM in Eq. (5.8) are,
0p = [∇θ GT (θT )]> WT GT (θT ) = [∇θ GT (θT )]> WT · [GT (θ0 ) + ∇θ GT (θ0 ) (θT − θ0 )] + op (1) .
That is,
√ ³ ´−1 √
d
T (θT − θ0 ) = − [∇θ GT (θT )]> WT ∇θ GT (θ0 ) [∇θ GT (θT )]> WT · T GT (θ0 )
d ¡ ¢−1 > √
= − D0> W0 D0 D0 W0 · T GT (θT )
¡ ¢−1 > −1 √
= − D0> Σ−1
0 D0 D0 Σ0 · T GT (θ0 ) . (5.14)
156
5.7. Simulation-based estimators c
°by A. Mele
We have,
à !
1X
T
√ √ 1 P
S(T )
T GT (θ0 ) = T· ft∗ − f θ0
T t=1 S (T ) s=1 s
√ S(T )
1 X ∗ X¡
T
∗ T 1 ¡ θ0 ¢¢
= √ (ft − E (f∞ )) − p ·p fsθ0 − E f∞
T t=1 S (T ) S (T ) s=1
d
→ N (0, (1 + τ ) Σ0 ) ,
∗
¡ θ ¢
where we have used the fact that E (f∞ ) = E f∞ 0
. By replacing this result into Eq. (5.14)
T
produces the convergence in Eq. (5.13). If τ = limT →∞ S(T )
= 0 (i.e. if the number of simula-
tions grows more fastly than the sample size), the SMM estimator is as efficient as the GMM
T
estimator. Moreover, inspection of (5.13) reveals that we must have τ = limT →∞ S(T )
< ∞.
This condition means that the number of simulations S (T ) can not grow more slowly than the
sample size.
5.7.2.2 Indirect inference
The IIP-based estimator works slightly differently. For this estimator, even if the number of
simulations S is fixed, asymptotic normality obtains without the need to impose that S goes
to infinity more fastly than the sample size. Basically, what really matters here is that ST goes
to infinity.
By Eq. (5.14), and the discussion in Section 5.7.1, we know that asymptotically, the first
order conditions satisfied by the IIP-based estimator are,
√ d ¡ ¢−1 > √
T (θT − θ0 ) = − D0> W0 D0 D0 W0 · T GT (θ0 ) ,
where GT is as in Eq. (5.10), D0 = ∇θ b (θ), and b (θ) is solution to the limiting problem
corresponding to the estimator in (5.9), viz,
µ ¶
1 ¡ T ¢
b (θ) = arg max lim log L y1 ; β .
β T →∞ T
1 X√
S
√
T GT (θ0 ) = T (β T − β sT (θ0 ))
S s=1
1 X√
S
= T [(β T − β 0 ) − (β sT (θ0 ) − β 0 )]
S s=1
1 X√
S
√
= T (β T − β 0 ) − T (β sT (θ0 ) − β 0 ) ,
S s=1
where β 0 = b (θ0 ). Hence, given the independence of the sample and the simulations,
µ µ ¶ ³√ ´¶
√ d 1
T GT (θ0 ) → N 0, 1 + · Asy.Var T βT .
S
That is, asymptotically S can be fixed with respect to T .
157
5.7. Simulation-based estimators c
°by A. Mele
We have,
1 X ∂
N
¡ ¯ θ ¢
θT = arg min mT (θ, β T )> WT mT (θ, β T ) , mT (θ, β T ) = log f ynθ ¯ zn−1 ; βT .
θ N n=1 ∂β
or
√ ³ ´−1 √
d
T (θT − θ0 ) = − ∇θ mT (θ0 , β T )> WT ∇θ mT (θ0 , β T ) ∇θ mT (θ0 , β T )> WT T mT (θ0 , β T ) .
Hence,
√ d
T (θT − θ0 ) → N (0, V ) ,
where,
¡ ¢−1 ¡ ¢−1
V = ∇θ m> W ∇θ m ∇θ m> W IW > ∇θ m ∇θ m> W ∇θ m .
5.7.3 Advances
The three estimators that we have examined in Sections 5.7.1-5.7.2, are general-purpose, but
in general, they do not lead to to asymptotic efficiency unless the true score belongs to the
span of the moment conditions, as we shall explain in Section 5.8. There exist other simulation-
based methods, which aim to approximate the likelihood function through simulations (e.g.,
Lee (1995), Hajivassiliou and McFadden (1998)). While these methods lead to asymptotically
efficient estimators, they address specific estimation problems.
There exist estimators that are both general purpose and that can lead to asymptotic effi-
ciency. Fermanian and Salanié (2004) consider an estimator that relies on approximating the
likelihood function through kernel estimates obtained simulating the model of interest. Carrasco,
Chernov, Florens and Ghysels (2007) rely on a “continuum of moment conditions” matching
model-based (simulated) characteristic functions to data-based characteristic functions. Al-
tissimo and Mele (2009) propose an estimator that minimizes a certain distance between con-
ditional densities on the observations and the conditional densities from data simulated from
the model, both estimated through kernel methods.
158
5.8. Spanning scores c
°by A. Mele
we have,
−1
varMLE = var (s) = Bvar (sf ) B > + var (s| sf ) = cov (s, sf ) var (sf )−1 cov (s, sf )> + var (s| sf ) ,
(5.17)
where varMLE denotes the asymptotic variance of the MLE. We claim that:
Indeed,
∙ µZ ¶¸
∗ ∂ ∂ ∗
∇θ m (θ0 , β ) = log f (y; β ) p (y, θ) dy
∂θ ∂β θ=θ0
Z
∂ ∂
= log f (y; β ∗ ) p (y, θ0 ) dy
∂β ∂θ
Z µ ¶
∂ ∗ ∂
= log f (y; β ) log p (y, θ0 ) p (y, θ0 ) dy
∂β ∂θ
= cov (s, sf )> ,
where p (y, θ) is the true density. Next, replace Eq. (5.18) into Eq. (5.17),
−1
varMLE = ∇θ m> var (sf )−1 ∇θ m + var (s| sf ) = varEMM
−1
+ var (s| sf ) .
Therefore, the EMM estimator achieves the Cramer-Rao lower bound under the spanning con-
dition in Eq. (5.16).
where W is a multidimensional process and (b, Σ) satisfy some regularity conditions we single
out below. This appendix analyzes situations in which the original partially observed system
(5.19) can be estimated by augmenting it with a number of observable deterministic functions
of the state. In many situations of interest, such deterministic functions are suggested by asset
pricing theories in a natural way. Typical examples include derivative asset price functions or
any deterministic function(als) of asset prices (e.g., asset returns, bond yields, implied volatil-
ity, etc.). The idea to use predictions of asset pricing theories to improve the fit of models
with unobservable factors is not new (see, e.g., Christensen (1992), Pastorello, Renault and
Touzi (2000), Chernov and Ghysels (2000), Singleton (2001, sections 3.2 and 3.3)), and Pas-
torello, Patilea and Renault (2003). In this appendix, we provide a theoretical description of
the mechanism leading to efficiency within the class of our estimators.
We consider a standard Markov pricing setting. For fixed t ≥ 0, we let M be the expiration
date of a contingent claim with rational price process c = {c(y(τ ), M − τ )}τ ∈[t,M) , and let
{z(y(τ ))}τ ∈[t,M] and Π(y) be the associated intermediate payoff process and final payoff function,
respectively. Let ∂/ ∂τ + L be the usual infinitesimal generator of (5.19) taken under the risk-
neutral measure. In a frictionless economy without arbitrage opportunities, c is the solution to
the following partial differential equation:
⎧ µ ¶
⎨ ∂
0= + L − R c(y, M − τ ) + z(y), ∀(y, τ ) ∈ Y × [t, M)
∂τ (5.20)
⎩
c(y, 0) = Π(y), ∀y ∈ Y
where R ≡ R(y) is the short-term rate. We call prediction function any continuous and twice
differentiable function c (y; M − τ ) solution to the partial differential equation (5.20).
We now augment system (5.19) with d − q ∗ prediction functions. Precisely, we let:
where Γ ⊂ Rpγ is a compact parameter set containing additional parameters. These new pa-
rameters arise from the change of measure leading to the pricing model (5.20), and are now
part of our estimation problem.
We assume that the pricing model (5.20) is correctly specified. That is, all contingent claim
prices in the economy are taken to be generated by the prediction function c(y, M − τ ) for some
(θ0 , γ 0 ) ∈ Θ × Γ. For simplicity, we also consider a stylized situation in which all contingent
claims have the same contractual characteristics specified by C ≡ (z, Π). More generally, one
may define a series of classes of contingent claims {Cj }Jj=1 , where class of contingent claims j
has contractual characteristics specified by Cj ≡ (zj , Πj ).4 The number of prediction functions
P
that we would introduce in this case would be equal to d − q ∗ = Jj=1 M j , where M j is the
number of prediction functions within class of assets j. To keep the presentation simple, we do
not consider such a more general situation here.
4 As an example, assets belonging to class C1 can be European options; assets belonging to class C1 can be bonds; and so on.
160
5.9. Asset pricing, prediction functions, and statistical inference c
°by A. Mele
Φ
Y
FIGURE 5.2. Asset pricing, the Markov property, and statistical efficiency. Y is the domain on which
the partially observed primitive state process y ≡ (y o y u )> takes values, Φ is the domain on which
the observed system φ ≡ (yo C(y))> takes values in Markovian economies, and C(y) is a contingent
∗ ∗
claim price process in Rd−q . Let φc = (y o , c(y, 1 ), · · ·, c(y, d−q∗ )), where {c(y, j )}d−q
j=1 forms an
intertemporal cohort of contingent claim prices, as in Definition 5.3. If the local restrictions of φ are
one-to-one and onto, statistical inference about θ and γ can be made, using information about the price
of derivative contracts, φc . If φ is also globally invertible, statistical inference can lead to first-order
asymptotic efficiency, once conditioned upon φc .
Our objective is to provide estimators of the parameter vector (θ0 , γ 0 ) under which observa-
tions were generated. In exactly the same spirit as for the estimators considered in the main
text, we want any estimator of (θ0 , γ 0 ) to make the finite dimensional distributions of φ implied
by model (5.19) and (5.20) as close as possible to their sample counterparts. Let Φ ⊆ Rd be the
domain on which φ takes values. As illustrated in Figure 5.2, our program is to move from the
“unfeasible” domain Y of the original state variables in y (observables and not) to the domain Φ
on which all observable variables take value. Ideally, we would like to implement such a change
in domain in order to recover as much information as possible on the original unobserved pro-
cess in (5.19). Clearly, φ is fully revealing whenever it is globally invertible. However, we will
show that estimation is feasibly even when φ is only locally one-to-one.
An important feature of the theory in this section is that it does not hinge upon the availability
of contingent prices data covering the same sample period covered by the observables in (5.19).
First, the price of a given contingent claim is typically not available for a long sample period.
As an example, available option data often include option prices with a life span smaller than
the usual sample span of the underlying asset prices; in contrast, it is common to observe long
time series of option prices having the same maturity. Second, the price of a single contingent
claim depends on time-to-maturity of the claim; therefore, it does not satisfy the stationarity
assumptions maintained in this paper. To address these issues, we deal with data on assets
having the same characteristics at each point in time. Precisely, consider the data generated by
the following random processes:
in Definition 5.3, they would be deterministic functions of y, and hence stationary. We now
develop conditions ensuring both feasibility and first-order efficiency of the class of simulation-
based estimators, as applied to this kind of data. Let ā denote the matrix having the first q ∗
rows of Σ, where a is the diffusion matrix in (5.??). Let ∇C denote the Jacobian of C with
respect to y. We have:
Theorem 5.4. (Asset pricing and Cramer-Rao lower bound) Suppose to observe an intertempo-
ral ( , d − q∗ )-cohort of contingent claim prices c (τ , ), and that there exist prediction functions
∗
C in Rd−q with the property that for θ = θ0 and γ = γ 0 ,
µ ¶
ā(τ ) · Σ(τ )−1
6= 0, P ⊗ dτ -a.s. all τ ∈ [t, t + 1], (5.21)
∇C(τ )
where C satisfies the initial condition C(t) = c (t, ) ≡ (c(y(t), 1 ), · · ·, c(y(t), d−q∗ )). Let
φct = (y o (t), c(y(t), 1 ), · · ·, c(y(t), d−q∗ )). Then, any simulation-based estimator applied to φct
is feasible. Moreover, asssume φct is also Markov. Then, any estimator with a span of moment
conditions for φct that also spans the true score, attains the Cramer-Rao lower bound, with
respect to the fields generated by φct .
According to Theorem 5.4, any estimator is feasible, whenever φ is locally invertible for a
time span equal to the sampling interval. As Figure 5.2 illustrates, condition (5.21) is satisfied
whenever φ is locally one-to-one and onto.5 If φ is also globally invertible for the same time span,
φc is Markov. The last part of this theorem says that in this case, any estimator is obviously
asymptotically efficient. We emphasize that this conclusion is about first-order efficiency in the
joint estimation of θ and γ given the observations on φc . It is not a claim about some estimator
being first-order efficient in the estimation of θ, when y is fully observable.
Naturally, condition (5.21) does not ensure that φ is globally one-to-one and onto. In other
terms, φ might have many locally invertible restrictions.6 In practice, φ might fail to be globally
invertible because monotonicity properties of φ may break down in multidimensional diffusion
models. In models with stochastic volatility, for example, option prices can be decreasing in the
underlying asset price (see Bergman, Grundy and Wiener (1996)); and in the corresponding
stochastic volatility yield curve models, medium-long term bond prices can be increasing in the
short-term rate (see Mele (2003)). Intuitively, these pathologies may arise because there is no
guarantee that the solution to a stochastic differential system is nondecreasing in the initial
condition of one if its components - as it is instead the case in the scalar case.
When all components of vector y o represent the prices of assets actively traded in frictionless
markets, (5.21) corresponds to a condition ensuring market completeness in the sense of Harrison
and Pliska (1983). As an example, condition (5.21) for Heston’s (1993) model is ∂c/ ∂σ 6=
0 P ⊗ dτ -a.s, where σ denotes instantaneous volatility of the price process. This condition is
satisfied by the Heston’s model. In fact, Romano and Touzi (1997) showed that within a fairly
general class of stochastic volatility models, option prices are always strictly increasing in σ
whenever they are convex in Q. Theorem 5.4 can be used to implement efficient estimators in
other complex multidimensional models. Consider for example a three-factor model of the yield
curve. Consider a state-vector (r, σ, ), where r is the short-term rate and σ, are additional
5 Local invertibility of φ means that for every y ∈ Y , there exists an open set Y containing y such that the restriction of φ to
∗
Y∗ is invertible. And φ is locally invertible on Y∗ if det Jφ 6= 0 (where Jφ is the Jacobian of φ), which is condition (5.21).
6 As an example, consider the mapping R2 7→ R2 defined as φ(y , y ) = (ey1 cos y , ey1 sin y ). The Jacobian satisfies
1 2 2 2
det Jφ(y1 , y2 ) = e2y1 , yet φ is 2π-periodic with respect to y2 . For example, φ(0, 2π) = φ(0, 0).
162
5.9. Asset pricing, prediction functions, and statistical inference c
°by A. Mele
factors (such as, say, instantaneous short-term rate volatility and a central tendency factor). Let
u(i) = u (r(τ ), σ(τ ), (τ ); Mi − τ ) be the time τ rational price of a pure discount bond expiring
at Mi ≥ τ , i = 1, 2, and take M1 < M2 . Let φ ≡ (r, u(1) , u(2) ). Condition (5.21) for this model
is then,
(2) (1)
u(1)
σ u − u u(2)
σ 6= 0, P ⊗ dt-a.s. τ ∈ [t, t + 1], (5.22)
where subscripts denote partial derivatives. It is easily checked that this same condition must be
satisfied by models with correlated Brownian motions and by yet more general models. Classes
of models of the short-term rate for which condition (5.22) holds are more intricate to identify
than in the European option pricing case seen above (see Mele (2003)).
163
5.10. Appendix 1: Proof of selected results c
°by A. Mele
That is,
µ ¶
T
3 T T T
Pr Ai = Pr (A1 A2 ) · Pr (A3 |A1 A2 ) = Pr (A1 ) · Pr (A2 |A1 ) · Pr (A3 |A1 A2 ) .
i=1
164
5.11. Appendix 2: Collected notions and results c
°by A. Mele
Almost sure convergence. A sequence of random vectors {xT } converges almost surely to the
random vector x̃ if, for each i = 1, 2, · · ·, N , we have:
Pr (ω : xT i (ω) → x̃i ) = 1,
a.s.
where ω denotes the entire random sequence xT i . This is succinctly written as xT → x̃.
Next, assume that the second order moments of all xi are finite. We have:
Convergence in distribution. Let {fT (·)}T be the sequence of probability distributions (that is,
fT (x) = pr (xT ≤ x)) of the sequence of the random vectors {xT }. Let x̃ be a random vector with
probability distribution f (x). A sequence {xT } converges in distribution to x̃ if, for each i = 1, 2, ···, N ,
we have:
lim fT (x) = f (x).
T →∞
165
5.11. Appendix 2: Collected notions and results c
°by A. Mele
d
This is succinctly written as xT → x̃.
The following two results are useful to the purpose of this chapter:
p d
Slutzky’s theorem. If yT → ȳ and xT → x̃, then:
d
yT · xT → ȳ · x̃.
d ¡ ¢ d
The following example illustrates the Cramer-Wold device. If λ> · xT → N 0; λ> Σλ , then xT →
N (0; Σ).
We now state two laws about convergence in probability.
Weak law (No. 1) (Khinchine). Let {xT } be a i.i.d. sequence satistfying E(xT ) = μ < ∞ ∀T . We
have:
T
1X p
x̄T ≡ xt → μ.
T t=1
We now state and provide a proof of the central limit theorem in a simple setting.
£ ¤
Central Limit Theorem. Let {xT } be a i.i.d. sequence, satisfying E(xT ) = μ < ∞ and E (xT − μ)2
P
= σ 2 < ∞ ∀T . Let x̄T ≡ T1 Tt=1 xt . We have,
√
T (x̄T − μ) d
→ N (0, 1).
σ
The multidimensional version of this theorem requires a mere change in notation. For the proof, the
classic method relies on the characteristic functions. Let:
Z
¡ ¢ √
ϕ(t) ≡ E eitx = eitx f (x)dx, i ≡ −1.
∂r
¯
We have ¯ = ir m(r) , where m(r) is the r-th order moment. By a Taylor’s expansion,
∂tr ϕ(t) t=0
¯ ¯
∂ ¯ 1 ∂2 ¯
¯
ϕ(t) = ϕ(0) + ϕ(t)¯ t + ϕ(t)¯ t2 + · · · = 1 + im(1) t − m(2) 1 t2 + · · ·.
∂t 2 ∂t 2 ¯ 2
t=0 t=0
P
Next, let x̄T = T1 Tt=1 xt , and consider the random variable,
√ T
T (x̄T − μ) 1 X xt − μ
YT ≡ =√ .
σ T t=1 σ
166
5.11. Appendix 2: Collected notions and results c
°by A. Mele
x
√t −μ ,
The characteristic function of YT is the product of the characteristic functions of at ≡ Tσ
which are
t2
all the same: ϕYT (t) = (ϕa (t))T , where ϕa (t) = 1 − 2T + · · ·. Therefore,
µ ¶T µ ¶
t 1 t2 ¡ ¢ T
ϕYT (t) = ϕ √ = 1− + o T −1 .
T 2T
1 2
Clearly, limT →∞ ϕYT (t) = e− 2 t , which is the characteristic function of a standard Gaussian variable.
167
5.12. Appendix 3: Theory for maximum likelihood estimation c
°by A. Mele
Consider the c-parametrized curves θ(c) = c¤(θ0 − θ̂T ) + θ̂T where, for all c ∈ (0, 1)p and θ ∈ Θ, c¤θ
denotes a vector in Θ where the ith element is c(i) θ(i) . By the intermediate value theorem, there exists
then a c∗ in (0, 1)p such that we have almost surely:
a.s.
where the supremum is taken over the set of all the observations. Since θ̂T → θ0 , we also have that
a.s.
θ∗T → θ0 . Moreover, by the law of large numbers,
T
1X p
HT (θ0 ) = H ( θ0 | yt ) → E [H ( θ0 | yt )] = −J (θ0 ) . (5A.2)
T
t=1
Since H is continuous in θ uniformly in y, the inequality in (5A.1), and (5A.2) both imply that:
a.s.
HT (θ∗T ) → −J (θ0 ) .
Therefore, as T → ∞,
√ ³ ´ √ √
T θ̂T − θ0 = −HT−1 (θ0 ) · sT (θ0 ) T = J −1 · T sT (θ0 ).
1 PT
By the central limit theorem, and E (sT ) = 0, the score, sT (θ0 ) = T t=1 s (θ 0 , yt ), is such that
√ d
T · sT (θ0 ) → N (0, var (s (θ0 , yt ))) ,
where
var (s (θ0 , yt )) = J .
The result follows by the Slutzky’s theorem and the symmetry of J .
Finally, one should show the existence of a sequence θ̂T converging a.s. to θ0 . Proofs on this type
of convergence can be found in Amemiya (1985), or in Newey and McFadden (1994).
168
5.13. Appendix 4: Dependent processes c
°by A. Mele
we say that {xt } is weakly dependent. Of a process, we say it is “nonergodic,” when it exhibits such a
strong dependence that it does not even satisfy the law of large numbers.
• Stationarity
• Weak dependence
• Ergodicity
xt ≡ c> ∇θ t (θ0 ).
and because xt is a martingale difference, E (xt xt−i ) = E [E ( xt · xt−i | Ft−i )] = E [E ( xt | Ft−i ) · xt−i ] =
0, for all i. That is, xt and xt−i are mutually uncorrelated. It follows that,
à T ! T
X X ¡ ¢
var xt = E x2t
t=1 t=1
XT
= c> Eθ0 (|∇θ t (θ0 )|2 ) c
t=1
T
X
= c> Eθ0 [Eθ0 ( |∇θ t (θ0 )|2 | Ft−1 )] c
t=1
T
X
=− c> Eθ0 [Jt−1 (θ0 )] c
t=1
" T #
X
= −c> Eθ0 (Jt−1 (θ0 )) c.
t=1
Next, define:
T T
" T
#
1X 1 X ¡ ¢ 1 X
x̄T ≡ xt and σ̄ 2T ≡ E x2t = −c> Eθ (Jt−1 (θ0 )) c.
T t=1 T t=1 T t=1 0
Under the conditions underlying the central limit theorem for weakly dependent processes provided
earlier, to be spelled out below, √
T x̄T d
→ N (0, 1) .
σ̄ T
By the Cramer-Wold device,
" T
#−1/2 T
1X 1 X d
Eθ0 (Jt−1 (θ0 )) √ ∇θ t (θ0 ) → N (0, Ip ) .
T T
t=1 t=1
170
5.14. Appendix 5: Proof of Theorem 5.4 c
°by A. Mele
By Σ(τ ) full rank P ⊗ dτ -a.s., and Itô’s lemma, φ satisfies, for τ ∈ [t, t + 1],
½ o
dy (τ ) = bo (τ )dτ + F (τ )Σ(τ )dW (τ )
dc(τ ) = bc (τ )dτ + ∇c(τ )Σ(τ )dW (τ )
where bo and bc are, respectively, q ∗ -dimensional and (d − q ∗ )-dimensional measurable functions, and
F (τ ) ≡ ā(τ )·Σ(τ )−1 P ⊗dτ -a.s. Under condition (5.21), π t is not degenerate. Furthermore, C (y(t); ) ≡
C(t) is deterministic in ≡ ( 1 , · · ·, d−q∗ ). That is, for all (c̄, c̄+ ) ∈ Rd × Rd , there exists a function μ
such that for any neighbourhood N (c̄+ ) of c̄+ , there exists another neighborhood N (μ(c̄+ )) of μ(c̄+ )
such that,
© ¯ ª
ω ∈ Ω : φ (y(t + 1), M − (t + 1)1d−q∗ ) ∈ N (c̄+ )¯ φ (y(t), M − t1d−q∗ ) = c̄
©
= ω ∈ Ω : (y o (t + 1), c(y(t + 1), M1 − t)), · · ·, c(y(t + 1), Md−q∗ − t)) ∈ N (μ(c̄+ ))
|φ (y(t), M − t1d−q∗ ) = c̄ }
©
= ω ∈ Ω : (y o (t + 1), c(y(t + 1), M1 − t)), · · ·, c(y(t + 1), Md−q∗ − t)) ∈ N (μ(c̄+ ))
|(y o (t), c(y(t), M1 − t), · · ·, c(y(t), Md−q∗ − t)) = c̄ }
where the last equality follows by the definition of φ. In particular, the transition laws of φct given
φct−1 are not degenerate; and φct is stationary. The feasibility of simulation based method of moments
estimation is proved. The efficiency claim follows by the Markov property of φ, and the usual score
martingale difference argument. ¥
171
5.14. Appendix 5: Proof of Theorem 5.4 c
°by A. Mele
References
Altissimo, F. and A. Mele (2009): “Simulated Nonparametric Estimation of Dynamic Models.”
Review of Economic Studies 76, 413-450.
Amemiya, T. (1985): Advanced Econometrics. Cambridge, Mass.: Harvard University Press.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): “General Properties of Option Prices.”
Journal of Finance 51, 1573-1610.
Carrasco, M., M. Chernov, J.-P. Florens and E. Ghysels (2007): “Efficient Estimation of Gen-
eral Dynamic Models with a Continuum of Moment Conditions.” Journal of Econometrics
140, 529-573.
Chernov, M. and E. Ghysels (2000): “A Study towards a Unified Approach to the Joint Esti-
mation of Objective and Risk-Neutral Measures for the Purpose of Options Valuation.”
Journal of Financial Economics 56, 407-458.
Christensen, B. J. (1992): “Asset Prices and the Empirical Martingale Model.” Working paper,
New York University.
Duffie, D. and K. J. Singleton (1993): “Simulated Moments Estimation of Markov Models of
Asset Prices.” Econometrica 61, 929-952.
Fermanian, J.-D. and B. Salanié (2004): “A Nonparametric Simulated Maximum Likelihood
Estimation Method.” Econometric Theory 20, 701-734.
Fisher, R. A. (1912): “On an Absolute Criterion for Fitting Frequency Curves.” Messages of
Mathematics 41, 155-157.
Gallant, A. R. and G. Tauchen (1996): “Which Moments to Match?” Econometric Theory 12,
657-681.
Gauss, C. F. (1816): “Bestimmung der Genanigkeit der Beobachtungen.” Zeitschrift für As-
tronomie und Verwandte Wissenschaften 1, 185-196.
Gouriéroux, C., A. Monfort and E. Renault (1993): “Indirect Inference.” Journal of Applied
Econometrics 8, S85-S118.
Hajivassiliou, V. and D. McFadden (1998): “The Method of Simulated Scores for the Estima-
tion of Limited-Dependent Variable Models.” Econometrica 66, 863-896.
Hansen, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estima-
tors.” Econometrica 50, 1029-1054.
Hansen, L. P. and K. J. Singleton (1982): “Generalized Instrumental Variables Estimation of
Nonlinear Rational Expectations Models.” Econometrica 50, 1269-1286.
Hansen, L. P. and K. J. Singleton (1983): “Stochastic Consumption, Risk Aversion, and the
Temporal Behavior of Asset Returns.” Journal of Political Economy 91, 249-265.
Harrison, J. M. and S. R. Pliska (1983): “A Stochastic Calculus Model of Continuous Trading:
Complete Markets.” Stochastic Processes and their Applications 15, 313-316.
172
5.14. Appendix 5: Proof of Theorem 5.4 c
°by A. Mele
Heston, S. (1993): “A Closed-Form Solution for Options with Stochastic Volatility with Ap-
plications to Bond and Currency Options.” Review of Financial Studies 6, 327-343.
Laroque, G. and B. Salanié (1994): “Estimating the Canonical Disequilibrium Model: Asymp-
totic Theory and Finite Sample Properties.” Journal of Econometrics 62, 165-210.
Lee, B-S. and B. F. Ingram (1991): “Simulation Estimation of Time-Series Models.” Journal
of Econometrics 47, 197-207.
Mele, A. (2003): “Fundamental Properties of Bond Prices in Models of the Short-Term Rate.”
Review of Financial Studies 16, 679-716.
Newey, W. K. and D. L. McFadden (1994): “Large Sample Estimation and Hypothesis Test-
ing.” In: Engle, R. F. and D. L. McFadden (Editors): Handbook of Econometrics, Vol. 4,
Chapter 36, 2111-2245. Amsterdam: Elsevier.
Neyman, J. and E. S. Pearson (1928): “On the Use and Interpretation of Certain Test Criteria
for Purposes of Statistical Inference.” Biometrika 20A, 175-240, 263-294.
Pakes, A. and D. Pollard (1989): “Simulation and the Asymptotics of Optimization Estima-
tors.” Econometrica 57, 1027-1057.
Pastorello, S., E. Renault and N. Touzi (2000): “Statistical Inference for Random-Variance
Option Pricing.” Journal of Business and Economic Statistics 18, 358-367.
Pastorello, S., V. Patilea, and E. Renault (2003): “Iterative and Recursive Estimation in
Structural Non Adaptive Models.” Journal of Business and Economic Statistics 21, 449-
509.
Romano, M. and N. Touzi (1997): “Contingent Claims and Market Completeness in a Stochas-
tic Volatility Model.” Mathematical Finance 7, 399-412.
Singleton, K. J. (2001): “Estimation of Affine Asset Pricing Models Using the Empirical Char-
acteristic Function.” Journal of Econometrics 102, 111-141.
Smith, A. (1993): “Estimating Nonlinear Time Series Models Using Simulated Vector Autore-
gressions.” Journal of Applied Econometrics 8, S63-S84.
173
5.14. Appendix 5: Proof of Theorem 5.4 c
°by A. Mele
174
Part II
Asset pricing and reality
175
6
On kernels and puzzles
This chapter discusses theoretical restrictions that can be used to perform statistical validation
of asset pricing models. We reconsider the Lucas’ model, and give more structure on the data
generating process. We present a simple setting which allows us to obtain closed-form solutions.
We then discuss how the model’s predictions can be used to test the validity of the model.
where à µ ¶−η ! µ ¶
Dt+1 St+1 + Dt+1
Zt+1 = log β ; Qt+1 = log .
Dt St
In fact, eq. (8.6) holds for any asset. In particular, it holds
¡ −1 ¢ for a one-period bond with price
b b b b
St ≡ bt , St+1 ≡ 1 and D£t+1 ≡¯0. Define,
¤ Qt+1 ≡ log bt ≡ log Rt . By replacing this into eq.
−1 Zt+1 ¯
(8.6), one gets Rt = E e Ft . We are left with the following system:
⎧
⎨ 1 £ ¯ ¤
= E eZt+1 ¯ Ft
R (6.3)
⎩ 1 t = E £ eZt+1 +Qt+1 ¯¯ F ¤
t
Rt
The equilibrium interest rate thus satisfies,
η (η + 1) 2
log Rt = − log β + ημD − σD , a constant. (6.4)
2
The ημD term reflects “intertemporal substitution” effects; the last term term reflects “precau-
tionary” motives.
The second equation in (6.3) can be written as,
£ ¯ ¤
1 = E [ exp (Zt+1 + Qt+1 )| Ft ] = elog β−η(μD − 2 σD )+μS − 2 σS · E eñt+1 ¯ Ft ,
1 2 1 2
where ñt+1 ≡ S,t+1 −η D,t ∼ N(0, σ 2S +η2 σ 2D −2ησ SD ). The above expectation can be computed
through Lemma 6.1. The result is,
η (η + 1) 2
0 = log β − ημD + σ D + μS − ησSD .
| {z 2 }
− log Rt
μ − r = ησSD .
| S{z }
risk premium
To sum up, (
μS = r + ησ SD
η (η + 1) 2
rt = − log β + ημD − σD
2
177
6.1. A single factor model c
°by A. Mele
Let us compute other interesting objects. The expected gross return on the risky asset is,
∙ ¯ ¸
St+1 + Dt+1 ¯¯ 1 2
E ¯ Ft = eμS − 2 σS · E [e S,t+1 | Ft ] = eμS = er+ησSD .
St
£ ¯ ¤
Therefore, if σ SD > 0, then E [ (St+1 + Dt+1 )/ St | Ft ] > E b−1 t
¯ Ft , as expected.
Next, we test the internal consistency of the model. The coefficients of the model must satisfy
some restrictions. In particular, the asset price volatility must be determined endogeneously.
We first conjecture that the following “no-sunspots” condition holds,
We will demonstrate below that this is indeed the case. Under the previous condition,
μS = r + λσD ; λ ≡ ησD ,
and µ ¶
1 2 D,t+1
Zt+1 = − r + λ − λuD,t+1 ; uD,t+1 ≡ .
2 σD
Under condition (6.5), we have a very instructive way to write the pricing kernel. Precisely,
define recursively,
ξ
mt+1 = t+1 ≡ exp (Zt+1 ) ; ξ 0 = 1.
ξt
This is reminiscent of the continuous time representation of Arrow-Debreu state prices (see
Chapter 4).
Next, let’s iterate the asset price equation (8.6),
"Ã n ! ¯ # "Ã i ! ¯ #
Y ¯ Xn Y ¯
¯ ¯
St = E eZt+i · St+n ¯ Ft + E eZt+j · Dt+i ¯ Ft
¯ ¯
j=1 i=1 j=1
∙ ¯ ¸ X n ∙ ¯ ¸
ξ t+n ¯ ξ ¯
= E · St+n ¯¯ Ft + E t+i
· Dt+i ¯¯ Ft .
ξt i=1
ξt
This is a version of the celebrated Gordon’s formula. It predicts that price-dividend ratios are
constant, a counterfactual feature addressed in Chapter 8.
To find the final restrictions of the model, notice that eq. (6.7) and the second equation in
(6.1) imply that
1
log(St + Dt ) − log St−1 = − log k + μD − σ 2D + D,t .
2
By the first equation in (6.1),
( 1 1
μS − σ 2S = μD − σ 2D − log k
2 2
S,t = D,t , ∀t
The second condition confirms condition (6.5). It also reveals that, σ2S = σ SD = σ 2D . By replacing
this into the first condition, delivers back μS = μD − log k = r + σ D λ.
6.1.2 Extensions
In Chapter 3 we showed that in a i.i.d. environment, prices are convex (resp. concave) in the
dividend rate whenever η > 1 (resp. η < 1). The pricing formula (6.7) reveals that in a dynamic
environment, such a property is lost. In this formula, prices are always linear in the dividends’
rate. It would be possible to show with the techniques developed in the next chapter that in
a dynamic context, convexity properties of the price function would be inherited by properties
of the dividend process in the following sense: if the expected dividend growth under the risk-
neutral measure is a convex (resp. concave) function of the initial dividend rate, then prices are
convex (resp. concave) in the initial dividend rate. In the model analyzed here, the expected
dividend growth under the risk-neutral measure is linear in the dividends’ rate, and this explains
the linear formula (6.7).
Even if we dismiss the idea that η = 30 is implausible, there is another puzzle, the interest
rate puzzle. As we showed in eq. (6.4), very high values of η can make the interest rate very
high (see Figure 6.1).
In the next section, we show how this failure of the model can be “detected” with a general
methodology that can be applied to a variety of related models - more general models.
179
6.3. The Hansen-Jagannathan cup c
°by A. Mele
0.1
0.0
10 20 30 40
eta
-0.1
FIGURE 6.1. The risk-free rate puzzle: the two curves depict the graph
η 7→ r(η) = − log β + 0.0183 · η − (0.0328)2 · η(η+1)
2 , with β = 0.95 (top curve) and β = 1.05
(bottom curve). Even if we accept the idea that risk aversion is as high as η = 30, we would obtain a
resulting equilibrium interest rate as high as 10%. The only way to make low r consistent with high
values of η is to make β > 1.
1 = E [ mt+1 (1 + Rj,t+1 )| Ft ] , j = 1, · · ·, n.
By taking the unconditional expectation of the previous equation, and defining Rt = (R1,t , · ·
·, Rn,t )> ,
1n = E [mt (1n + Rt )] .
Let m̄ ≡ E(mt ). We create a family of stochastic discount factors m∗t parametrized by m̄ by
projecting m on to the asset returns,
where1
β m̄ = Σ−1 cov (m, 1n + Rt ) = Σ−1 [1n − m̄E (1n + Rt )] ,
n×1 n×n n×1
h i
and Σ ≡ E (Rt − E(Rt )) (Rt − E(Rt ))> . As shown in the appendix, we also have that,
We have,
p q q
var (mt (m̄)) = β m̄ Σβ m̄ = (1n − m̄E (1n + Rt ))> Σ−1 (1n − m̄E {1n + Rt }).
∗ >
This is the celebrated Hansen-Jagannathan “cup” (Hansen and Jagannathan (1991)). The
interest of this object lies in the following theorem.
Theorem 6.2: Among all stochastic discount factors with fixed expectation m̄, m∗t (m̄) is the
one with the smallest variance.
Proof: Consider another discount factor indexed by m̄, i.e. mt (m̄). Naturally, mt (m̄) satisfies
1n = E [mt (m̄) (1n + Rt )]. And since it also holds that 1n = E [m∗t (m̄) (1n + Rt )], we deduce
that
where the third line follows from the fact that E [mt (m̄)] = E [m∗t (m̄)] = m̄, and the fourth line
follows because E [(mt (m̄) − m∗t (m̄))] = 0. But m∗t (m̄) is a linear combination of Rt . By the
previous equation, it must then be the case that,
Hence,
The previous bound can be improved by using conditioning information as in Gallant, Hansen
and Tauchen (1990) and the relatively more recent work by Ferson and Siegel (2003). Moreover,
these bounds typically diplay a finite sample bias: they typically overstate the true bounds and
thus they reject too often a given model. Finite sample corrections are considered by Ferson
and Siegel (2003).
For example, let us consider an application of the Hansen-Jagannathan testing methodology
to the model in Section 6.1. That model has the following stochastic discount factor,
µ ¶
ξ t+1 1 2 D,t+1
mt+1 = = exp (Zt+1 ) ; Zt+1 = − r + λ − λuD,t+1 ; uD,t+1 ≡ .
ξt 2 σD
First, we have to compute the first two moments of the stochastic discount factor. By Lemma
6.1 we have,
p 1 2
p 2
m̄ = E(mt ) = e−r and σ̄ m = var (mt (m̄)) = e−r+ 2 λ 1 − e−λ (6.8)
where
η (η + 1) 2
r = − log β + ημD − σ D and λ = ησD .
2
For given μD and σ 2D , system (6.8) forms a η-parametrized curve in the space (m̄-σ̄ m ). The
objective is to see whether there are plausible values of η for which such a η-parametrized
181
6.4. Multifactor extensions c
°by A. Mele
curve enters the Hansen-Jagannathan cup. Typically, this is not the case. Rather, one has the
situation depicted in Figure 6.2 below.
The general message is that models can be consistent with data with high volatile pricing
kernels (for a fixed m̄). Dismiss the idea of a representative agent with CRRA utility function.
Consider instead models with heterogeneous agents (by generalizing some ideas in Constan-
tinides and Duffie (1996); and/or consider models with more realistic preferences - such as for
example the habit preferences considered in Campbell and Cochrane (1999); and/or combina-
tions of these. These things will be analyzed in depth in the next chapter.
Lemma 6.3 (Stein’s lemma): Suppose that two random variables x and y are jointly normal.
Then,
cov [g (x) , y] = E [g 0 (x)] · cov (x, y) ,
for any function g : E (|g 0 (x)|) < ∞.
We now suppose that R̃ is normally distributed. This assumption is inconsistent with the
model in Section 6.1. In the model of Section 6.1, R̃ is lognormally distributed in equilibrium
because log R̃ = μD − 12 σ 2D + S , with S normal. But let’s explore the asset pricing implications of
M 1 σt (mt+1 ) M
Et (R̃t+1 )− =− σt (R̃t+1 ).
Et (mt+1 ) Et (mt+1 )
σ t (mt+1 ) M ) should be time-varying.
In more general setups than the ones considered in this introductory example, both Et (mt+1 )
and σ t (R̃t+1
182
6.4. Multifactor extensions c
°by A. Mele
this tilting assumption. Because R̃t+1 and Zt+1 are normal, and mt+1 = m (Zt+1 ) = exp (Zt+1 ),
we may apply Lemma 6.3 and obtain,
We wish to extend the previous observations to more general situations. Clearly, the pricing
kernel is some function of K factors m ( 1t , · · ·, Kt ). A particularly convenient analytical as-
sumption is to make m exponential-affine and the factors ( i,t )K i=1 normal, as in the following
definition:
X
K
Zt ≡ φ0 + φi i,t .
i=1
A EAPK is a function
mt = m(Zt ) = exp(Zt ).
If ( i,t )K 2
i=1 are jointly normal, and each i,t has mean zero and variance σ i , i = 1, · · ·, K, the
EAPK is called a Normal EAPK (NEAPK).
In the previous definition, we assumed that each i,t has mean zero. This entails no loss of
generality insofar as φ0 6= 0.
Now suppose that R̃ is normally distributed. By Lemma 6.3 and the NEAPK structure,
X
K
−1 −1
cov(mt+1 , R̃t+1 ) = cov[exp (Zt+1 ) , R̃t+1 ] = R cov(Zt+1 , R̃t+1 ) = R φi cov( i,t+1 , R̃t+1 ).
i=1
By replacing this into eq. (6.9) leaves the linear factor representation,
X
K
E(R̃t+1 ) − R = − φi cov( i,t+1 , R̃t+1 ). (6.10)
| {z }
i=1
“betas”
Proposition 6.5: Suppose that R̃ is normally distributed. Then, NEAPK ⇒ linear factor
representation for asset returns.
The APT representation in eq. (6.10), is close to one result in Cochrane (1996).3 Cochrane
(1996) assumed that m has a linear structure, i.e. m (Zt ) = Zt where Zt is as in Definition 5.1.
3 To recall why eq. (6.10) is indeed a APT equation, suppose that R̃ is a n-(column) vector of returns and that R̃ = a + bf , where
f is K-(column) vector with zero mean and unit variance and a, b are some given vector and matrix with appropriate dimension.
Then clearly, b = cov(R̃, f ). A portfolio π delivers π> R̃ = π> a + π> cov(R̃, f )f . Arbitrage opportunity is: ∃π : π> cov(R̃, f ) = 0
and π > a 6= r. To rule that out, we may show as in Part I of these Lectures that there must exist a K-(column) vector λ s.t.
a = cov(R̃, f )λ + r. This implies R̃ = a + bf = r + cov(R̃, f )λ + bf . That is, E(R̃) = r + cov(R̃, f )λ.
183
6.4. Multifactor extensions c
°by A. Mele
PK
This assumption implies that cov(mt+1 , R̃t+1 ) = i=1 φi cov( i,t+1 , R̃t+1 ). By replacing this into
eq. (6.9),
X
K
1 1
E(R̃t+1 ) − R = −R φi cov( i,t+1 , R̃t+1 ), where R = = .
i=1
E (m) φ0
The advantage to use the NEAPKs is that the pricing kernel is automatically guaranteed to be
strictly positive - a condition needed to rule out arbitrage opportunities.
Consider first the case K = 1 and let yt = log R̃t be normally distributed. The previous
equation can be written as,
£ ¤ 1 2 2 2
e−φ0 = E eφ1 t+1 +yt+1 = eE(yt+1 )+ 2 (φ1 σ +σy +2φ1 σ y ) .
This is, ∙ ¸
1 2 2 2
E (yt+1 ) = − φ0 + (φ1 σ + σ y + 2φ1 σ y ) .
2
By applying the pricing equation (6.11) to a bond price,
¡ ¢ 1 2 2
e−φ0 = E eφ1 t+1 elog Rt+1 = elog Rt+1 + 2 φ1 σ ,
and then µ ¶
1 2 2
log Rt+1 = − φ0 + φ1 σ .
2
The expected excess return is,
1
E (yt+1 ) − log Rt+1 + σ 2y = −φ1 σ y .
2
This equation reveals how to derive the simple theory in Section 6.1 in an alternate way.
Apart from Jensen’s inequality effects ( 12 σ 2y ), this is indeed the Lucas model of Section 6.1 once
φ1 = −η. As is clear, this is a poor model because we are contrived to explain returns with only
one “stochastic discount-factor parameter” (i.e. with φ1 ).
Next consider the general case. Assume as usual that dividends are as in (6.1). To find the
price function in terms of the state variable , we may proceed as in Section 6.1. In the absence
of bubbles,
X∞ ∙ ¸ X∞
ξ t+i K
e(μD +φ0 + 2 i=1 φi (φi σi +2σi,D ))·i , σ i,D ≡ cov ( i , D ) .
1 2
St = E · Dt+i = Dt ·
i=1
ξt i=1
Thus, if
1X ¡ 2
K
¢
k̂ ≡ μD + φ0 + φi φi σ i + 2σ i,D < 0,
2 i=1
184
6.5. Pricing kernels, Sharpe ratios and the market portfolio c
°by A. Mele
then,
St k̂
= .
Dt 1 − k̂
Even in this multi-factor setting, price-dividend ratios are constant - which is counterfactual.
Note that the various parameters can be calibrated so as to make the pricing kernel satisfy
the Hansen-Jagannathan theoretical test conditions in Section 6.3. But the resulting model
always makes the boring prediction that price-dividend ratios are constant. This multifactor
model doesn’t work even if the variance of the implied pricing kernel is high - and lies inside
the Hansen-Jagannathan cup. Living inside the cup doesn’t necessarily imply that the resulting
model is a good one. We need other theoretical test conditions. The next chapter develops
such theoretical test conditions (When are price-dividend ratios procyclical? When is returns
volatility countercyclical? Etc.).
e
±q e
As explained in Chapter 2, the Sharpe ratio Et (rM,t+1 ) V art (rM,t+1 ) has also the interpre-
tation of unit market risk-premium. Therefore:
p
V art (mt+1 )
Π ≡ unit market risk premium = .
Et (mt+1 )
For example, the Lucas model in Section 5.1 has,
p
V art (mt+1 ) p η2 σ2
= e D − 1 ≈ ησD .
Et (mt+1 )
In Section 5.1, we also obtained that (μS − r)/ σ D = ησ D . As the previous relation reveals, Π
is only approximately equal to ησD because the asset in Section 6.1 is simply not a β-CAPM
generating portfolio. For example, suppose that the economy in Section 6.1 has only a single
risky asset. It would then be very natural to refer this asset to as “market portfolio”. Yet this
asset wouldn’t be β-CAPM generating.
eμS , R = e− log β+η(μD − 2 σD )− 2 η σD and var(R̃) =
1 2 1 2 2
In Section 6.1, we found that E(R̃) = q
.
2
e2μS (eσD − 1). Therefore, S ≡ E(R̃ − R) var(R̃) is:
2
1 − e−ησD
S ≡ Sharpe Ratio = p 2 .
eσD − 1
Indeed, by simple computations,
2
1 − e−ησD
ρ = −p 2 2 p 2 .
eη σD − 1 eσD − 1
This is not precisely “minus one”. Yet in practice ρ ≈ −1 when σ D is low. However, consumption
claims are not acting as market portfolios - in the sense of Chapter 2. If that consumption claim
is very highly correlated with the pricing kernel, then it is also a good approximation to the β-
CAPM generating portfolio. But as the previous simple example demonstrates, that is only an
approximation. To summarize, the fact that everyone is using an asset (or in general a portfolio
in a 2-funds separation context) doesn’t imply that the resulting return is perfectly correlated with
the pricing kernel. In other terms, a market portfolio is not necessarily β-CAPM generating.5
We now describe a further complication: a β-CAPM generating portfolio is not necessarily
the tangency portfolio. We show the existence of another portfolio producing the same β-pricing
relationship as the tangency portfolio. For reasons developed below, such a portfolio is usually
referred to as the maximum correlation portfolio.
1
Let R̄ = E(m) . By the CCAPM (see Chapter 3),
¡ ¢ β Ri ,m ¡ ¢
E Ri − R̄ = E (Rp ) − R̄ ,
β Rp ,m
where Rp is a portfolio return. Next, let
m
Rp = Rm ≡ .
E(m2 )
5 As is well-known, things are the same in economies with one agent with quadratic utility. This fact can be seen at work in the
previous formulae (just take η = −1). You should also be able to show this claim with more general quadratic utility functions - as
in chapter 3.
186
6.5. Pricing kernels, Sharpe ratios and the market portfolio c
°by A. Mele
This is clearly perfectly correlated with the kernel, and by the analysis in Chapter 3,
¡ ¢ £ ¤
E Ri − R̄ = β Ri ,Rm E (Rm ) − R̄ .
This is not yet the β-representation of the CAPM, because we have yet to show that there
is a way to construct Rm as a portfolio return. In fact, there is a natural choice: pick m = m∗ ,
where m∗ is the minimum-variance kernel leading to the Hansen-Jagannathan bounds. Since
∗
m∗ is linear in all asset retuns, Rm can be thought of as a return that can be obtained by
∗
investing in all assets. Furthermore, in the appendix we show that Rm satisfies,
¡ ∗¢
1 = E m · Rm .
Where is this portfolio located? As shown in the appendix, there is no portfolio yielding the
∗
same expected return with lower variance (i.e., Rm is mean-variance efficient). In addition, in
the appendix we show that,
¡ ∗¢ r − Sh 1+r
E Rm − 1 = =r− Sh < r.
1 + Sh 1 + Sh
∗
Mean-variance efficiency of Rm and the previous inequality imply that this portfolio lies in
the lower branch of the mean-variance efficient portfolios. And this is so because this portfolio
is positively correlated with the true pricing kernel. Naturally, the fact that this portfolio is
β-CAPM generating doesn’t necessarily imply that it is also perfectly correlated with the true
∗
pricing kernel. As shown in the appendix, Rm has only the maximum possible correlation
with all possible m. Perfect correlation occurs exactly in correspondence of the pricing kernel
m = m∗ (i.e. when the economy exhibits a pricing kernel exactly equal to m∗ ).
∗ ∗
Proof that Rm is β-capm generating. The relations 1 = E(m∗ Ri ) and 1 = E(m∗ Rm )
imply
¡ ¢
E(Ri ) − R = −R · cov m∗ , Ri
∗ ¡ ∗¢
E(Rm ) − R = −R · cov m∗ , Rm
and,
E(Ri ) − R cov (m∗ , Ri )
∗ = .
E(Rm ) − R cov (m∗ , Rm∗ )
∗ ∗
By construction, Rm is perfectly correlated with m∗ . Precisely, Rm = m∗ / E(m∗2 ) ≡ γ −1 m∗ ,
γ ≡ E(m∗2 ). Therefore,
¡ ∗ ¢ ¡ ∗ ¢
cov (m∗ , Ri ) cov γRm , Ri γ · cov Rm , Ri
= = = β Ri ,Rm∗ .
cov (m∗ , Rm∗ ) cov (γRm∗ , Rm∗ ) γ · var (Rm∗ )
the data–Sharpe ratio on the market portfolio) and inside the Hansen-Jagannathan bounds.
Typically, very high values of η are required to enter the Hansen-Jagannathan bounds.
There is a beautiful connection between these things and the familiar mean-variance portfolio
frontier described of Chapter 1. As shown in Figure 6.3, every asset or portfolio must lie inside
the wedged region bounded by two straight lines with slopes ∓ σ(m)/ E(m). This is so because,
for any asset (or portfolio) that is priced with a kernel m, we have that
¯ ¯ ¡ ¢
¯E(Ri ) − R¯ ≤ σ(m) · σ Ri .
E(m)
As seen in the previous section, the equality is only achieved by asset (or portfolio) returns that
are perfectly correlated with m. The point here is that a tangency portfolio such as T doesn’t
necessarily attain the kernel volatility bounds. Also, there is no reason for a market portfolio
to lie on the kernel volatility bound. In the simple Lucas-Breeden economy considered in the
previous section, for example, the (only existing) asset has a Sharpe ratio that doesn’t lie on
the kernel volatility bounds. In a sense, the CCAPM doesn’t necessarily imply the CAPM, i.e.
there is no necessarily an asset acting at the same time as a market portfolio and β-CAPM
generating that is also priced consistently with the true kernel of the economy. These conditions
simultaneously hold if the (candidate) market portfolio is perfectly negatively correlated with
the true kernel of the economy, but this is very particular (it is in this sense that one may
say that the CAPM is a particular case of the CCAPM). A good research question is to find
conditions on families of kernels consistent with the previous considerations.
σ (m )
H ansen-Jagannathan bounds
Sharpe ratio
E(m )
On the other hand, we know that there exists another portfolio, the maximum correlation
portfolio, that is also β-CAPM generating. In other terms, if ∃R∗ : R∗ = −γm, for some positive
constant γ, then the β-CAPM representation holds, but this doesn’t necessarily mean that R∗
is also a market portfolio. More generally, if there is a return R∗ that is β-CAPM generating,
then ρi,m
ρi,R∗ = , all i. (6.12)
ρR∗ ,m
Therefore, we don’t need an asset or portfolio return that is perfectly correlated with m to
make the CCAPM shrink to the CAPM. In other terms, the existence of an asset return that is
188
6.5. Pricing kernels, Sharpe ratios and the market portfolio c
°by A. Mele
perfectly negatively correlated with the price kernel is a sufficient condition for the CCAPM to
shrink to the CAPM, not a necessary condition. The proof of eq. (6.12) is easy. By the CCAPM,
σ(m) σ(m)
E(Ri ) − R = −ρi,m σ(Ri ); and E(R∗ ) − R = −ρR∗ ,m σ(R∗ ).
E(m) E(m)
That is,
E(Ri ) − R ρi,m σ(Ri )
= (6.13)
E(R∗ ) − R ρR∗ ,m σ(R∗ )
But if R∗ is β-CAPM generating,
E(Ri ) − R cov(Ri , R∗ ) σ(Ri )
= = ρi,R∗ . (6.14)
E(R∗ ) − R σ(R∗ )2 σ(R∗ )
Comparing eq. (6.13) with eq. (6.14) produces (6.12).
E(R)
tangency portfolio
1 / E(m)
σ (R)
A final thought. Many recent applied research papers have important result but also a surpris-
ing motivation. They often state that because we observe time-varying Sharpe ratios on.(proxies
p
of) the market portfolio, one should also model the market risk-premium V art (mt+1 ) Et (mt+1 )
as time-varying. However, this is not rigorous
. motivation. The Sharpe
. ratio of the market portfo-
p p
lio is generally less than V art (mt+1 ) Et (mt+1 ). V art (mt+1 ) Et (mt+1 ) is only a bound.
p .
On a strictly theoretical point of view, V art (mt+1 ) Et (mt+1 ) time-varying is not a neces-
sary nor a sufficient condition to observe time-varying Sharpe ratios. Figure 6.3 illustrates this
point.
proxy of the market portfolio will incorrectly support the model if such a proxy is more or
less the same as the tangency portfolio. On the other hand, if the proxy is not mean-variance
efficient, the CAPM can be rejected even if the CAPM is wrong. All in all, any test of the CAPM
is a joint test of the model itself and of the closeness of the proxy to the market portfolio.
190
6.6. Appendix c
°by A. Mele
6.6 Appendix
Proof of the Equation, 1n = E [m∗t (m̄) · (1n + Rt )]. We have,
h³ ´ i
E [m∗t (m̄) · (1n + Rt )] = E m̄ + (Rt − E(Rt ))> β m̄ (1n + Rt )
h i
= m̄E (1n + Rt ) + E (Rt − E(Rt ))> β m̄ (1n + Rt )
h i
= m̄E (1n + Rt ) + E (1n + Rt ) (Rt − E(Rt ))> β m̄
h i
= m̄E (1n + Rt ) + E ((1n + E(Rt )) + (Rt − E(Rt ))) (Rt − E(Rt ))> β m̄
h i
= m̄E (1n + Rt ) + E (Rt − E(Rt )) (Rt − E(Rt ))> β m̄
= m̄E (1n + Rt ) + Σβ m̄
= m̄E (1n + Rt ) + 1n − m̄E (1n + Rt ) ,
∗ 1
E(m · Rm ) = E (m · m∗ ) ,
E [(m∗ )2 ]
where
h i
E (m · m∗ ) = m̄2 + E m (Rt − E(Rt ))> β m̄
h i h i
= m̄2 + E m (1 + Rt )> β m̄ − E m (1 + E(Rt ))> β m̄
= m̄2 + β m̄ − E (m) [1 + E(Rt )]> β m̄
h i
= m̄2 + 1n − m̄ (1 + E(Rt ))> β m̄
h i
= m̄2 + 1n − m̄ (1 + E(Rt ))> Σ−1 [1n − m̄ (1n + E(Rt ))]
= m̄2 + var (m∗ ) ,
p> 1n+1 = 1.
¡ ¢>
The returns we consider are rt = m̄−1 − 1, r1,t , · · ·, rn,t . We denote our “benchmark” portfolio
∗
return as rbt = rm − 1. Next, we build up an arbitrary portfolio yielding the same expected return
E(rbt ) and then we show that this has a variance greater than the variance of rbt . Since this portfolio
191
6.6. Appendix c
°by A. Mele
is arbitrary, the proof will be complete. Let rpt = p> rt such that E(rpt ) = E(rbt ). We have:
The first line follows by construction since E(rpt ) = E(rbt ). The last line follows because
Given this, the claim follows directly from the fact that
var (Rpt ) = var [Rbt + (Rpt − Rbt )] = var (Rbt ) + var (Rpt − Rbt ) ≥ var (Rbt ) .
¡ ∗¢ 1+r
Proof of the Equation, E Rm − 1 = r − 1+Sh Sh. We have,
∗ m̄
E(Rm ) − 1 = − 1.
E[(m∗ )2 ]
In terms of the notation introduced in Section 6.8, m∗ is:
We have,
h i2
E[(m∗ )2 ] = m̄ + (a )> β m̄
h i2
= m̄2 + E (a )> β m̄
h i2
= m̄2 + E (a )> β m̄ · (a )> β m̄
h³ ´³ ´i
= m̄2 + E β > m̄ a > >
a β m̄
= m̄2 + β > · σ · β m̄
h m̄ ³ ´i
= m̄2 + 1> n − m̄ 1>
n + b>
σ −1 [1n − m̄ (1n + b)]
³ ´
= m̄2 + 1>n σ −1
1 n − m̄ 1 > −1
n σ 1 n + 1> −1
n σ b
n ³ ´o
−m̄ 1> −1 > −1 > −1 > −1 > −1
n σ 1n + b σ 1n − m̄ 1n σ 1n + b σ 1n + 1n σ b + b σ b
> −1
This is positive if r − Sh > 0, i.e. if b> σ −1 b − (2β + 1) r + γr2 < 0, which is possible for sufficiently
low (or sufficiently high) values of r.
∗
Proof that Rm is the m-maximum correlation portfolio. We have to show that for any
price kernel m, |corr(m, Rbt )| ≥ |corr(m, Rpt )|. Define a -parametrized portfolio such that:
We have
The first line follows because (1 − )Ro + Rpt is a nonstochastic affine translation of Rpt . The last
equality follows because
where the first line follows because E((1 − )Ro + Rpt ) = E(Rbt ).
Therefore,
cov (m, Rbt ) cov (m, Rbt )
corr (m, Rpt ) = p ≤ p = corr (m, Rbt ) ,
σ(m) · var ((1 − )Ro + Rpt ) σ(m) · var(Rbt )
where the inequality follows because Rbt is mean-variance efficient (i.e. @ feasible portfolios with the
same expected return as Rbt and variance less than var(Rbt )), and then var((1 − )Ro + Rpt ) ≥
var(Rbt ), all Rpt .
193
6.6. Appendix c
°by A. Mele
References
Campbell, J. Y. and J. Cochrane (1999): “By Force of Habit: A Consumption-Based Expla-
nation of Aggregate Stock Market Behavior.” Journal of Political Economy 107, 205-251.
Cecchetti, S., Lam, P-S. and N. C. Mark (1994): “Testing Volatility Restrictions on Intertem-
poral Rates of Substitution Implied by Euler Equations and Asset Returns.” Journal of
Finance 49, 123-152.
Ferson, W. E. and A. F. Siegel (2003): “Stochastic Discount Factor Bounds with Conditioning
Information.” Review of Financial Studies 16, 567-595.
Gallant, R. A., L. P. Hansen and G. Tauchen (1990): “Using the Conditional Moments of
Asset Payoffs to Infer the Volatility of Intertemporal Marginal Rates of Substitution.”
Journal of Econometrics 45, 141-179.
Hansen, L. P. and R. Jagannathan (1991): “Implications of Security Market Data for Models
of Dynamic Economies.” Journal of Political Economy 99, 225-262.
Mehra, R. and E. C. Prescott (1985): “The Equity Premium: A Puzzle.” Journal of Monetary
Economics 15, 145-161.
Roll, R. (1977): “A Critique of the Asset Pricing Theory’s Tests Part I: On Past and Potential
Testability of the Theory.” Journal of Financial Economics 4, 129-176.
194
7
Aggregate stock market fluctuations
7.1 Introduction
This chapter reviews the progress made to address the empirical puzzles relating to the neoclas-
sical asset pricing model. We first provide a succinct overview of the main empirical regularities
of aggregate stock market fluctuations. For example, we emphasize that price-dividend ratios
and returns are procyclical, and that returns volatility and risk-premia are both time-varying
and countercyclical. Then, we discuss the extent to which these empirical features can be ex-
plained by rational models. For example, many models with state dependent preferences predict
that Sharpe ratios are time-varying and that stock market volatility is countercyclical. Are these
appealing properties razor-edge? Or are they general properties of all conceivable models with
state-dependent preferences? Moreoover, would we expect that these properties show up in
other related models in which asset prices are related to the economic conditions? The final
part of this chapter aims at providing answers to these questions, and develops theoretical test
conditions on the pricing kernel and other primitive state processes that make the resulting
models consistent with sets of qualitative predictions given in advance.
necessarily related to the business cycle conditions. As an example, during the “roaring” 1960s,
price-dividend ratios experienced two major drops having the same magnitude as the decline
at the very beginning of the “chaotic” 1970s. Ex-post returns follow approximately the same
pattern, but they are more volatile than price-dividend ratios (see Figure 7.2).1
A second set of stylized facts is related to the first two moments of the returns distribution:
Fact 2. Returns volatility, the equity premium, risk-adjusted discount rates, and Sharpe ratios
are strongly countercyclical. Again, business cycle conditions are not the only factor ex-
plaining both short-run and long-run movements in these variables.
Figures 7.3 through 7.5 are informally very suggestive of the previous statement. For example,
volatility is markedly higher during recessions than during expansions. (It also appears that the
volatility of volatility is countercyclical.) Yet it rocketed to almost 23% during the 1987 crash -
a crash occurring during one of the most enduring post-war expansions period. As we will see
later in this chapter, countercyclical returns volatility is a property that may emerge when the
volatility of the P/D ratio changes is countercyclical. Table 7.1 reveals indeed that the P/D
ratios variations are more volatile in bad times than in good times. Table 7.1 also reveals that
the P/D ratio (in levels) is more volatile in good times than in bad times. Finally, P/E ratios
behave in a different manner.
A third set of very intriguing stylized facts regards the asymmetric behavior of some important
variables over the business cycle:
Fact 3. P/D ratio changes, risk-adjusted discount rates changes, equity premium changes, and
Sharpe ratio changes behave asymmetrically over the business cycle. In particular, the
deepest variations of these variables occur during the negative phase of the business cycle.
As an example, not only are risk-adjusted discount rates counter-cyclical. On average, risk-
adjusted discount rates increase more during NBER recessions than they decrease during NBER
expansions. Analogously, not only are P/D ratios procyclical. On average, P/D increase less
during NBER expansions than they decrease during NBER recessions. Furthermore, the order
of magnitude of this asymmetric behavior is very high. As an example, the average of P/D
percentage (negative) changes during recessions is almost twice as the average of P/D percentage
(positive) changes during expansions. It is one objective in this chapter to connect this sort of
“concavity” of P/D ratios (“with respect to the business cycle”) to “convexity” of risk-adjusted
discount rates.2
1 We use “smoothed” ex-post returns to eliminate the noise inherent to high frequency movements in the stock-market.
2 Volatilityof changes in risk-premia related objects appears to be higher during expansions. This is probably a conservative
view because recessions have occurred only 16% of the time. Yet during recessions, these variables have moved on average more
than they have done during good times. In other terms, “economic time” seems to move more fastly during a recession than during
an expansion. For this reason, the “physical calendar time”-based standard deviations in Table 6.1 should be rescaled to reflect
unfolding of “economic calendar time”. In this case, a more appropriate concept of volatility would be the standard deviation ÷
average of expansions/recessions time.
196
7.2. The empirical evidence c
°by A. Mele
TABLE 7.1. P/D and P/E are the S&P Comp. price-dividends and price-earnings ratios. Smooth
P12 St +Dt
returns as of time t are defined as i=1 (R̃t−i − Rt−i ), where R̃t = log( St−1 ), and R is the risk-free
rate. Volatility is the excess returns volatility. With the exception of the P/D and P/E ratios, all
figures are annualized percent. Data are sampled monthly and cover the period from January 1954
through December 2002. Time series estimates of equity premium π t (say), excess return volatility σ t
(say) and Sharpe ratios π t / σ t are obtained through Maximum Likelihood estimation (MLE) of the
following model,
¡ ¢
R̃t − Rt = π t + εt , εt | Ft−1 ∼ N 0, σ2t
π t = 0.162 + 0.766 π t−1 − 0.146 IP∗t−1 ; σ t = 0.218 + 0.106 |εt−1 | + 0.868 σ t−1
(0.004) (0.005) (0.015) (0.089) (0.010) (0.029)
where (robust) standard errors are in parentheses; IP is the US real, seasonally adjusted industrial
production rate; and IP∗ is generated by IP∗t = 0.2·IPt−1 +0.8·IP∗t−1 . Analogously, time series estimates
of the risk-adjusted discount rate Disc (say) are obtained by MLE of the following model,
¡ ¢
R̃t − inflt = Disct + ut , ut | Ft−1 ∼ N 0, vt2
Disct = 0.191 + 0.767 Disct−1 − 0.152 IP∗t−1 ; vt = 0.214 + 0.105 |ut−1 | + 0.869 vt−1
(0.036) (0.042) (0.081) (0.012) (0.004) (0.003)
197
7.2. The empirical evidence c
°by A. Mele
100 t pt p t p t p t pt pt p t pt pt
75
P/D ratio
50
25
P/E ratio
0
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
60 t pt pt p t p t pt pt p t pt pt
40
20
-20
-40
-60
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
198
7.2. The empirical evidence c
°by A. Mele
27.5 t pt pt p t p t pt pt p t pt p t
25.0
22.5
20.0
17.5
15.0
12.5
10.0
7.5
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
25 t pt pt p t p t pt pt p t pt p t
20
15
10 equity premium
0
long-averaged industrial production rate
-5
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
FIGURE 6.4. Equity premium and long-averaged real industrial production rate
199
7.2. The empirical evidence c
°by A. Mele
1.4 t pt p t p t p t pt pt p t pt pt
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002
Stylized fact 1 has a simple and very intuitive consequence: price-dividend ratios are some-
what related to, or “predict”, future medium-term returns. The economic content of this pre-
diction is simple. After all, expansions are followed by recessions. Therefore in good times the
stock market predicts that in the future, returns will be negative. Indeed, define the excess
return as R̃te ≡ R̃t − Rt . Consider the following regressions,
e
R̃t+n = an + bn × P/Dt + un,t , n ≥ 1,
2
where u is a residual term. Typically, the estimates of bn are significantly negative, and the
¯ R on
e ¯
these regressions increases with n. In turn, the previous regressions imply that E[ R̃t+n ¯P/Dt ] =
an + bn ×P/Dt . They thus suggest that price-dividend ratios are driven by expected excess
returns. In this restrictive sense, countercyclical expected returns (stated in stylized fact 2) and
procyclical price-dividend ratios (stated in stylized fact 1) seem to be the two sides of the same
coin.
There is also one apparently puzzling feature: price-dividend ratios do not predict future
dividend growth. Let gt ≡ log(Dt / Dt−1 ). In regressions of the following form,
the predictive content of price-dividend ratios is very poor, and estimates of bn even come with
a wrong sign.
The previous simple regressions thus suggest that: 1) price-dividend ratios are driven by time-
varying expected returns (i.e. by time-varying risk-premia); and 2) the role played by expected
dividend growth seems to be somewhat limited. As we will see later in this chapter, this view
can however be challenged along several dimensions. First, it seems that expected earning
200
7.3. Understanding the empirical evidence c
°by A. Mele
growth does help predicting price-dividend ratios. Second, the fact that expected dividend
growth doesn’t seem to affect price-dividend ratios can in fact be a property to be expected in
equilibrium.
The previous formula reveals that properties of returns can be understood through the corre-
sponding properties of dividend growth gt and price-dividend ratios pt . The empirical evidence
discussed in the previous section suggests that our models should take into account at least
the following two features. First, we need volatile price-dividend ratios. Second we need that
price-dividend ratios be on average more volatile in bad times than in good times. For exam-
ple, consider a model in which prices are affected by some key state variables related to the
business cycle conditions (see Section 7.4 for examples of models displaying this property). A
basic property that we should require from this particular model is that the price-dividend
ratio be increasing and concave in the state variables related to the business cycle conditions.
In particular, the concavity property ensures that returns volatility increases on the downside
- which is precisely the very definition of countercyclical returns volatility. One of the ultimate
scopes in this chapter is to search for classes of promising models ensuring this and related
properties.
The Gordon’s model in Chapter 6 predicts that price dividend ratios are constant - which is
counterfactual. It is thus unsuitable for the scopes we are pursuing here. We need to think of
multidimensional models. However, not all multidimensional models will work. As an example,
in the previous chapter we showed how to arbitrarily increase the variance of the kernel of the
Lucas model by adding more and more factors. We also showed that the resulting model is
one in which price-dividend ratios are constant. We need to impose some discipline on how to
increase the dimension of a model.
Si = Si (y), y ∈ Rd , i = 1, · · ·, m (m ≤ d),
where y = [y1 , ···, yd ]> is the vector of factors affecting asset prices, and Si is the rational pricing
function. We assume that asset i pays off an instantaneous dividend rate Di , i = 1, · · ·, m, and
that Di = Di (y), i = 1, · · ·, m. We also assume that y is a multidimensional diffusion process,
viz
dyt = ϕ(yt )dt + v(yt )dWt ,
201
7.4. The asset pricing model c
°by A. Mele
⎢ .. ⎥ ⎢ ⎥
⎣ . ⎦ = |{z}
σ |{z}
λ , where σ = ⎣ ... ⎦ · v. (7.1)
LSm Dm ∇S
Sm
− r + Sm m×d d×1
Sm
m
The usual interpretation of λ is the vector of unit prices of risk associated with the fluctuations
of the d factors. To simplify the structure of the model, we suppose that,
The previous assumptions impose a series of severe restrictions on the dimension of the model.
We emphasize that these restrictions are arbitrary, and that they are only imposed for simplicity
sake.
Eqs. (7.1) constitute a system of m uncoupled partial differential equations. The solution to
it is an equilibrium price system. For example, the Gordon’s model in Chapter 6 is a special case
of this setting.3 We do not discuss transversality conditions and bubbles in this chapter. Nor
we discuss issues related to market completeness.4 Instead, we implement a reverse-engineering
approach and search over families of models guaranteeing that long-lived asset prices exhibit
some properties given in advance. In particular, we wish to impose conditions on the primitives
P ≡ (a, b, r, λ) such that the aggregate stock market behavior exhibits the same patterns
surveyed in the previous section. For example, model (7.1) predicts that returns volatility is,
µ ¶
dSi,t 1
volatility ≡ V(yt )dt ≡ 2 k∇Si (yt )v(yt )k2 dt.
Si,t Si,t
In this model volatility is thus typically time-varying. But we also wish to answer questions such
as, Which restrictions may we impose to P to ensure that volatility V(yt ) is countercyclical?
Naturally, an important and challenging subsequent step is to find models guaranteeing that
the restrictions on P we are looking for are economically and quantitatively sensible. The most
natural models we will look at are models which innovate over the Gordon’s model due to
time-variation in the expected returns and/or in the expected dividend growth. These issues
are analyzed in a simplified version of the model.
3 Let m = d = 1, ζ = y, ϕ(y) = μy and ξ(y) = σ y, and assume that λ and r are constant. By replacing these things into eq. (7.1)
0
and assuming no-bubbles yields the (constant) price-dividend ratio predicted by the Gordon’s model, qt / ζ t = (μ − r − σ0 λ)−1 .
4 As we explained in chapter 4, in this setting markets are complete if and only if m = d.
202
7.4. The asset pricing model c
°by A. Mele
long-lived asset (see below). We also assume that d = 2, and take as given the consumption
endowment process D and a second state variable y. We assume that D, y are solution to,
½
dD(τ ) = m (y(τ )) D (τ ) dτ + σ 0 D(τ )dW1 (τ )
dy(τ ) = ϕ(D(τ ), y(τ ))dτ + v1 (y(τ )) dW1 (τ ) + v2 (y(τ )) dW2 (τ )
where W1 and W2 are independent standard Brownian motions. By the connection between
conditional expectations and solutions to partial differential equations (the Feynman-Kac rep-
resentation theorem) (see Chapter 4), we may re-state the FTAP in (7.1) in terms of conditional
expectations in the following terms. By (7.1), we know that S is solution to,
Under regularity conditions, the Feynman-Kac representation of the solution to Eq. (7.3) is:
Z ∞ ∙ µ Z τ ¶ ¯ ¸
¯
S(D, y) = C(D, y, τ )dτ , C(D, y, τ ) ≡ E exp − r(D(t), y(t))dt · D(τ )¯¯ D, y ,
0 0
(7.4)
5
where E is the expectation operator taken under the risk-neutral probability Q (say). Finally,
(Z, Y ) are solution to
½
dD(τ ) = m̂(y(τ ))D(τ )dτ + σ 0 D(τ )dŴ1 (τ )
(7.5)
dy(τ ) = ϕ̂(D(τ ), y(τ ))dτ + v1 (y(τ )) dŴ1 (τ ) + v2 (y(τ )) dŴ2 (τ )
where Ŵ1 and Ŵ2 are two independent Q-Brownian motions, and m̂ and ϕ̂ are risk-adjusted drift
functions defined as m̂ (D, y) ≡ m(y)D−σ 0 Dλ1 (D, y) and ϕ̂ (D, y) ≡ ϕ (D, y)−v1 (y) λ1 (D, y)−
v2 (y) λ2 (D, y).6 Naturally, Eq. (7.4) can also be rewritten under the physical measure. We have,
∙ µ Z τ ¶ ¯ ¸
¯
C(D, y, τ ) = E exp − ¯
r(D(t), y(t))dt · D(τ )¯ D, y = E [μ (τ ) · D(τ )| D, y] ,
0
ξ (τ )
μ (τ ) = ; ξ (0) = 1.
ξ (0)
Given the previous assumptions on the information structure of the economy, ξ necessarily
satisfies,
dξ(τ )
= − [r(D(τ ), y(τ ))dτ + λ1 (D(τ ), y(τ ))dW1 (τ ) + λ2 (D(τ ), y(τ ))dW2 (τ )] . (7.6)
ξ (τ )
In the appendix (“Markov pricing kernels”), we provide an example of pricing kernel generating
interest rates and risk-premia having the same functional form as in (7.2).
5 See, for example, Huang and Pagès (1992) (thm. 3, p. 53) or Wang (1993) (lemma 1, p. 202), for a series of regularity conditions
underlying the Feynman-Kac theorem in infinite horizon settings arising in typical financial applications.
6 See, for example, Huang and Pagès (1992) (prop. 1, p. 41) for mild regularity conditions ensuring that Girsanov’s theorem
7.4.3 Issues
We analyze general properties of long-lived asset prices that can be streamlined into three cate-
gories: “monotonicity properties”, “convexity properties”, and “dynamic stochastic dominance
properties”. We now produce examples illustrating the economic content of such a categoriza-
tion.
• Monotonicity. Consider a model predicting that S(D, y) = D·p(y), for some positive func-
0 (y)
tion p ∈ C 2 (Y). By Itô’s lemma, returns volatility is vol(D)+ pp(y) vol(y), where vol(D) > 0
is consumption growth volatility and vol(y) has a similar interpretation. As explained in
the previous chapter, actual returns volatility is too high to be explained by consumption
volatility. Naturally, additional state variables may increase the overall returns volatility.
In this simple example, state variable y inflates returns volatility whenever the price-
dividend ratio p is increasing in y. At the same time, such a monotonicity property would
ensure that asset returns volatility be strictly positive. Eventually, strictly positive volatil-
ity is one crucial condition guaranteeing that dynamic constraints of optimizing agents
are well-defined.
• Concavity. Next, suppose that y is some state variable related to the business cycle con-
ditions. Another robust stylized fact is that stock market volatility is countercyclical. If
S(D, y) = D · p(y) and vol(y) is constant, returns volatility is countercyclical whenever p
is a concave function of y. Even in this simple example, second-order properties (or “non-
linearities”) of the price-dividend ratio are critical to the understanding of time variation
in returns volatility.
• Dynamic stochastic dominance. An old issue in financial economics is about the relation
between long-lived asset prices and volatility of fundamentals.7 The traditional focus of the
literature has been the link between dividend (or consumption) volatility and stock prices.
Another interesting question is the relationship between the volatility of additional state
variables (such as the dividend growth rate) and stock prices. In some models, volatility
of these additional state variables is endogenously determined. For example, it may be
inversely related to the quality of signals about the state of the economy.8 In many other
circumstances, producing a probabilistic description of y is as arbitrary as specifying the
preferences of a representative agent. In fact, y is in many cases related to the dynamic
specification of agents’ preferences. The issue is then to uncover stochastic dominance
properties of dynamic pricing models where state variables are possible nontradable.
In the next section, we provide a simple characterization of the previous properties. To achieve
this task, we extend some general ideas in the recent option pricing literature. This literature
7 See, for example, Malkiel (1979), Pindyck (1984), Poterba and Summers (1985), Abel (1988) and Barsky (1989).
8 See, for example, David (1997) and Veronesi (1999, 2000)
204
7.5. Analyzing qualitative properties of models c
°by A. Mele
attempts to explain the qualitative behavior of a contingent claim price function C(D, y, τ ) with
as few assumptions as possible on D and y. Unfortunately, some of the conceptual foundations
in this literature are not well-suited to pursue the purposes of this chapter. As an example,
many available results are based on the assumption that at least one state variable is tradable.
This is not the case of the “European-type option” pricing problem (7.4). In the next section,
we introduce an abstract asset pricing problem which is appropriate to our purposes. Many
existing results are specific cases of the general framework developed in the next section (see
Theorems 7.1 and 7.2). In sections 7.6 and 7.7, we apply this framework of analysis to study
basic model examples of long-lived asset prices.
x(T )
c(x) = E [ψ(x · G(T ))] , G(T ) ≡ , x > 0.
x
As this simple formula reveals, standard stochastic dominance arguments still apply: c decreases
(increases) after a mean-preserving spread in G whenever ψ is concave (convex) - consistently
for example with the prediction of the Black and Scholes (1973) formula. This point was first
made by Jagannathan (1984) (p. 429-430). In two independent papers, Bergman, Grundy and
Wiener (1996) (BGW) and El Karoui, Jeanblanc-Picqué and Shreve (1998) (EJS) generalized
these results to any diffusion process (i.e., not necessarily a proportional process).9,10 But one
crucial assumption of these extensions is that X must be the price of a traded asset that
does not pay dividends. This assumption is crucial because it makes the risk-neutralized drift
function of X proportional to x. As a consequence of this fact, c inherits convexity properties of
ψ, as in the proportional process case. As we demonstrate below, the presence of nontradable
9 The proofs in these two articles are markedly distinct but are both based on price function convexity. An alternate proof
directly based on payoff function convexity can be obtained through a direct application of Hajek’s (1985) theorem. This theorem
states that if ψ is increasing and convex, and X1 and X2 are two diffusion processes (both starting off from the same origin) with
integrable drifts b1 and b2 and volatilities a1 and a2 , then E[ψ(x1 (T ))] ≤ E[ψ(x2 (T ))] whenever m2 (τ ) ≤ m1 (τ ) and a2 (τ ) ≤ a1 (τ )
for all τ ∈ (0, ∞). Note that this approach is more general than the approach in BGW and EJS insofar as it allows for shifts in
both m and a. As we argue below, both shifts are important to account for when X is nontradable.
10 Bajeux-Besnainou and Rochet (1996) (section 5) and Romano and Touzi (1997) contain further extensions pertaining to
state variables makes interesting nonlinearites emerge. As an example, Proposition 7.1 reveals
that in general, convexity of ψ is neither a necessary or a sufficient condition for convexity of
c.11 Furthermore, “dynamic” stochastic dominance properties are more intricate than in the
classical second order stochastic dominance theory (see Proposition 7.1).
To substantiate these claims, we now introduce a simple, abstract pricing problem.
In this pricing problem, X can be the price of a traded asset. In this case b(x) = xρ(x). If in
addition, ρ0 = 0, the problem collapses to the classical European option pricing problem with
constant discount rate. If instead, X is not a traded risk, b(x) = b0 (x)−a(x)λ(x), where b0 is the
physical drift function of X and λ is a risk-premium. The previous framework then encompasses
a number of additional cases. As an example, set ψ(x) = x. Then, one may 1) interpret X as
consumption Rprocess; 2) restrict a long-lived asset price S to be driven by consumption only,
∞
and set S = 0 c(x, τ )dτ . As another example, set ψ(x) = 1 and ρ(x) = x. Then, c is a zero-
coupon bond price as predicted by a simple univariate short-term rate model. The importance
of these specific cases will be clarified in the following sections.
In the appendix (see Proposition 7.A.1), we provide a result linking the volatility of the state
variable x to the price c. Here I characterize slope (cx ) and convexity (cxx ) properties of c. We
have:
11 Kijima (2002) recently produced a counterexample in which option price convexity may break down in the presence of convex
payoff functions. His counterexample was based on an extension of the Black-Scholes model in which the underlying asset price had
a concave drift function. (The source of this concavity was due to the presence of dividend issues.) Among other things, the proof
of proposition 2 reveals the origins of this counterexample.
206
7.5. Analyzing qualitative properties of models c
°by A. Mele
The last part of Proposition 7.1-b) then says that convexity of ψ propagates to convexity of
c. This result reproduces the findings in the literature that surveyed earlier. Proposition 7.1-
b) characterizes option price convexity within more general contingent claims models. As an
example, suppose that ψ00 = ρ0 = 0 and that X is not a traded risk. Then, Proposition 7.1-b)
reveals that c inherits the same convexity properties of the instantaneous drift of X. As a final
example, Proposition 7.1-b) extends one (scalar) bond pricing result in Mele (2003). Precisely,
let ψ(x) = 1 and ρ(x) = x; accordingly, c is the price of a zero-coupon bond as predicted by
a standard short-term rate model. By Proposition 7.1-b), c is convex in x whenever b00 (x) < 2.
This corresponds to Eq. (8) (p. 688) in Mele (2003).12 In analyzing properties of long-lived asset
prices, both discounting and drift nonlinearities play a prominent role.
An intuition of the previous result can be obtained through a Taylor-type expansion of c (x, T )
in Eq. (7.7). To simplify, suppose that in Eq. (7.7), ψ ≡ 1, and that
The economic interpretation of the previous decomposition is that g is the growth rate of some
underlying “dividend process” and Disc is some “risk-adjusted” discount rate. Consider the
following discrete-time counterpart of Eq. (7.7):
½ PN ¯ ¾
¯
c(x0 , N) ≡ Ē e i=0 [g(xi )−Disc(xi )] ¯ x0 .
¯
X
N X
i
c(x, N) ≈ 1 + [g (x) − Disc (x)] × N + Ē [∆g (xj ) − ∆Disc (xj )| x] , (7.8)
i=0 j=1
where ∆g (xj ) ≡ g (xj ) − g (xj−1 ). The second term of the r.h.s of Eq. (7.8) make clear that
convexity of g can potentially translate to convexity of c w.r.t x; and that convexity of Disc
can potentially translate to concavity of c w.r.t x. But Eq. (7.8) reveals that higher order terms
are important too. Precisely, the expectation Ē [ ∆g (xj ) − ∆Disc (xj )| x] plays some role. Intu-
itively, convexity properties of c w.r.t x also depend on convexity properties of this expectation.
In discrete time, these things are difficult to see. But in continuous time, this simple observation
translates to a joint restriction on the law of movement of x. Precisely, convexity properties
of c w.r.t x will be somehow inherited by convexity properties of the drift function of x. In
continuous time, Eq. (7.8) becomes, for small T ,13
We aim at writing the solution in the canonical pricing problem format of Section 7.5, and
then at applying Proposition 7.1. Our starting point is the evaluation formula (7.4). To apply
it here, we might note that interest rate are constant. Yet to gain in generality we continue
to assume that they are state dependent, but that they only depend on s. Therefore Eq. (7.4)
becomes,
Z ∞ Z ∞ ∙ ¯ ¸
S(D, s) C(D, s, τ ) τ D(τ ) ¯
= dτ = E e − 0 r(s(u))du
· ¯ D, s dτ . (7.14)
D 0 D 0 D ¯
To compute the inner expectation, we have to write the dynamics of Z under the risk-neutral
probability measure. By Girsanov theorem,
D(τ ) 1 2 τ
= e− 2 σ0 τ +σ0 Ŵ (τ ) · eg0 τ − 0 σ 0 λ(s(u))du
,
D
where Ŵ is a Brownian motion under the risk-neutral measure. By replacing this into Eq.
(7.14), Z ∞
S(D, s) h 1 2 ¯ i
τ ¯
= eg0 τ · E e− 2 σ0 τ +σ0 Ŵ (τ ) · e− 0 Disc(s(u))du ¯ D, s dτ , (7.15)
D 0
where
Disc (s) ≡ r (s) + σ 0 λ (s)
is the “risk-adjusted” discount rate. Note also, that under the risk-neutral probability measure,
where ϕ̂ (s) = ϕ (s) − v (s) λ (s), ϕ (s) = s[(1 − φ)(s̄ − log s) + 12 σ 20 l(s)2 ] and v (s) = σ 0 sl(s).
Eq. (7.15) reveals that the price-dividend ratio p (D, s) ≡ S (D, s)/ D is independent of D.
1 2
Therefore, p (D, s) = p (s). To obtain a neat formula, we should also get rid of the e− 2 σ0 τ +σ0 Ŵ (τ )
term. Intuitively this term arises because consumption and habit are correlated. A convenient
change of measure will do the job. Precisely, define a new probability measure P̄ (say) through
± 1 2
the Radon-Nikodym derivative dP̄ dP̂ = e− 2 σ0 τ +σ0 Ŵ (τ ) . Under this new probability measure,
the price-dividend ratio p (s) satisfies,
Z ∞ h ¯ i
τ ¯
p (s) = eg0 τ · Ē e− 0 Disc(s(u))du ¯ s dτ , (7.16)
0
and
ds(τ ) = ϕ̄ (s(τ )) dτ + v(s (τ ))dW̄ (τ ),
where W̄ (τ ) = Ŵ (τ ) − σ 0 τ is a P̄ -Brownian motion, and ϕ̄ (s) = ϕ (s) − v (s) λ (s) + σ 0 v (s).
The inner expectation in Eq. (7.16) comes in exactly the same format as in the canonical
pricing problem of Section 7.5. Therefore, we are now ready to apply Proposition 7.1. We have,
d
1. Suppose that risk-adjusted discount rates are countercyclical, viz ds
Disc(s) ≤ 0. Then
d
price-dividend ratios are procyclical, viz ds p (s) > 0.
2. Suppose that price-dividend ratios are procyclical. Then price-dividend ratios are con-
d2
cave in s whenever risk-adjusted discount rates are convex in s, viz ds 2 Disc(s) > 0, and
d2 d
ds2
ϕ̄ (s) ≤ 2 ds Disc(s).
209
7.6. Time-varying discount rates and equilibrium volatility c
°by A. Mele
So we have found joint restrictions on the primitives such that the pricing function p is
consistent with certain properties given in advance. What is the economic interpretation related
to the convexity of risk-adjusted discount rates? If price-dividend ratios are concave in some
state variable Y tracking the business cycle condititions, returns volatility increases on the
downside, and it is thus countercyclical (see Figure 7.6.) According to the previous predictions,
price-dividend ratios are concave in Y whenever risk-adjusted discount rates are decreasing
and sufficiently convex in Y . The economic significance of convexity in this context is that in
good times, risk-adjusted discount rates are substantially stable; consequently, the evaluation
of future dividends does not vary too much, and price-dividend ratios are relatively stable. And
in bad times risk-adjusted discount rates increase sharply, thus making price-dividend ratios
more responsive to changes in the economic conditions.
Heuristically, the mathematics behind the previous results can be explained as follows. For
small τ , Eq. (7.9) is,
h τ ¯ i
¯
p (s, τ ) ≡ Ē e 0 [g0 −Disc(s(u))]du ¯ s ≈ 1 + [g0 − Disc (s)] × τ + h.o.t.
Hence convexity of Disc(s) translates to concavity of p (s, τ ). But as pointed out earlier, the
additional higher order terms matter too. The problem with these heuristic arguments is how
well the approximation works for small τ . Furthermore
R∞ p (s, τ ) is not the price-dividend ratio.
The price dividend ratio is instead p (s) = 0 p (s, τ ) dτ . Anyway the previous predictions
confirm that the intuition is indeed valid.
210
7.6. Time-varying discount rates and equilibrium volatility c
°by A. Mele
Risk-adjusted Price-dividend
discount rates ratio
good
times
bad bad
times times
good
times
Y Y
FIGURE 7.6. Countercyclical return volatility
What does empirical evidence suggest? To date no empirical work has been done on this.
Here is a simple exploratory analysis. First it seems that real risk-adjusted discount rates Disct
are convex in some very natural index summarizing the economic conditions (see Figure 7.7).
In Table 7.2 , we also run Least Absolute Deviations (LAD) regressions to explore whether P/D
dividend ratios are concave functions in IP.14 And we run LAD regressions in correspondence of
three sample periods to better understand the role of the exceptional (yet persistent) increase
in the P/D ratio during the late 1990s. Figure 7.8 depicts scatter plots of data (along with
fitted regressions) related to these three sampling periods.
14 We run LAD regressions because this methodology is known to be more robust to the presence of outliers than Ordinary Least
Squares.
211
7.6. Time-varying discount rates and equilibrium volatility c
°by A. Mele
20
12
15
10
10
5
6
0 4
-1.2 -0.6 0.0 0.6 1.2 1.8 2.4 -1.2 -0.6 0.0 0.6 1.2 1.8 2.4
Industrial Production Growth Rate (%) Industrial Production Growth Rate (%)
FIGURE 7.7.The left-hand side of this picture plots estimates of the expected returns (annualized,
percent) (Êt say) against one-year moving averages of the industrial production growth (IPt ). The ex-
pected returns are estimated through the predictive regression of S&P returns on to default-premium,
p
term-premium and return volatility defined as Volc t ≡ π P12 |Exc√t+1−i | , where Exct is the return
2 i=1 12
in excess of the 1-month bill return as of month t. The one-year moving average of the industrial
1 P12
production growth is computed as IPt ≡ 12 i=1 Indt+1−i , where Indt is the real, seasonally adjusted
industrial production growth as of month t. The right-hand side of this picture depicts the prediction
of the static Least Absolute Deviations regression: Êt = 8.56 −4.05 ·IPt +1.18 ·IP2t + wt , where wt is a
(0.15) (0.30) (0.31)
residual term, and standard errors are in parenthesis. Data are sampled monthly, and span the period
from January 1948 to December 2002.
TABLE 7.2. Price-dividend ratios and economic conditions. Results of the LAD regression P/D =
a + b·IP+c·IP2 + w, where P/D is the S&P Comp. price-dividend ratio, IPt = (It + · · · + It−11 )/ 12; It
is the real, seasonally adjusted US industrial production growth rate, and wt is a residual term. Data
are sampled monthly, and cover the period from January 1948 through December 2002.
1948:01 - 1991:12 1948:01 - 1996:12 1948:01 - 2002:12
estimate std dev estimate std dev estimate std dev
a 27.968 0.311 29.648 0.329 30.875 0.709
b 2.187 0.419 2.541 0.475 3.059 1.074
c −2.428 0.429 −3.279 0.480 −3.615 1.091
212
7.7. Large price swings as a learning induced phenomenon c
°by A. Mele
g ≡ E (θ = A| D)
is linear in Pr (θ = A| D), the same qualitative conclusions are also valid for g.
Di (observable state)
D1 = 2A D2 = 0 D3 = −2A
1 1 1
Pr(Di ) 2
p 2 2
(1 − p)
Pr (θ = A| D = Di ) 1 p 0
To understand in detail how we computed the values in Table 7.3, let us recall Bayes’ Theo-
rem. Let (Ei )i be a partition of the state space Ω. (This partition can be finite or uncountable,
i.e. the set of indexes i can be finite or uncountable - it really doesn’t matter.) Then Bayes’
Theorem says that,
Pr (F | Ei ) Pr (F | Ei )
Pr (Ei | F ) = Pr (Ei ) · = Pr (Ei ) · P . (7.17)
Pr (F ) j Pr (F | Ej ) Pr (Ej )
Pr (D = D1 | θ = A) Pr (D = D1 | θ = A)
Pr (θ = A| D = D1 ) = Pr (θ = A) =p .
Pr (D = D1 ) Pr (D = D1 )
Pr (D ∈ dD| θ = A)
π(D) = Pr (θ = A) · .
Pr (D ∈ dD| θ = A) Pr (θ = A) + Pr (D ∈ dD| θ = −A) Pr (θ = −A)
φ(D − A) − φ(D + A)
π(D) − p = p(1 − p) . (7.18)
pφ(D − A) + (1 − p)φ(D + A)
That is, the variance of the “probability changes” π(D) − p is proportional to p2 (1 − p)2 .
To add more structure to the problem, we now assume that w is Brownian motion and set
A ≡ Adτ . Let D0 ≡ D(0) = 0. In appendix, we show that by an application of Itô’s lemma to
π(D),
dπ(τ ) = 2A · π(τ )(1 − π(τ ))dW (τ ), π(D0 ) ≡ p, (7.19)
where dW (τ ) ≡ dD(τ ) − g(τ )dτ and g(τ ) ≡ E (θ| D (τ )) = [Aπ(τ ) − A(1 − π(τ ))]. Naturally,
this construction is heuristic. Nevertheless, the result is correct.16 Importantly, it is possible to
show that W is a Brownian motion with respect to the agents’ information set σ (D(t), t ≤ τ ).17
Therefore, the equilibrium in the original economy with incomplete information is isomorphic
in its pricing implications to the equilibrium in a full information economy in which,
½
dD(τ ) = [g(τ ) − λ(τ )σ 0 ] dτ + σ 0 dŴ (τ )
(7.20)
dg(τ ) = −λ(τ )v(g(τ ))dτ + v(g(τ ))dŴ (τ )
16 See,for example, Liptser and Shiryaev (2001a) (theorem 8.1 p. 318; and example 1 p. 371).
17 SeeLiptser and Shiryaev (2001a) (theorem 7.12 p. 273).
18 Such an isomorphic property has been pointed out for the first time by Veronesi (1999) in a related model.
214
7.7. Large price swings as a learning induced phenomenon c
°by A. Mele
dD(τ )
= ĝ(τ )dτ + σ 0 dw1 (τ ),
D(τ )
where Ĝ = {ĝ(τ )}τ >0 is unobserved, but now it does not evolve on a countable number of
states. Rather, it follows an Ornstein-Uhlenbeck process:
where ḡ, σ 1 and σ 2 are positive constants. Suppose now that the agent implements a learning
procedure similar as before. If she has a Gaussian prior on ĝ(0) with variance γ 2∗ (defined below),
the nonarbitrage price takes the form S(D, g), where (Z, G) are now solution to Eq. (7.6), with
215
7.7. Large price swings as a learning induced phenomenon c
°by A. Mele
2. Suppose that the price-dividend ratio is increasing in the dividend growth rate. Then it is
d2 d2 d
convex whenever dg 2 R (g) > 0, and dg 2 [ϕ0 (g) + (σ 0 − λ) v (g)] ≥ −2 + 2 dg R (g).
For example, if the riskless asset is constant (because for example it is infinitely elastically
supplied), then the price-dividend ratio is always increasing and it is convex whenever,
d2
[ϕ (g) + (σ 0 − λ) v (g)] ≥ −2.
dg2 0
The reader can now use these conditions to check predictions made by all models with stochastic
dividend growth presented before.
20 Intheir article, Brennan and Xia considered a slightly more general model in which consumption and dividends differ. They
obtain a reduced-form model which is identical to the one in this example. In the calibrated model, Brennan and Xia found that
the variance of the filtered ĝ is higher than the variance of the expected dividend growth in an economy with complete information.
The results on γ ∗ in this example can be obtained through an application of theorem 12.1 in Liptser and Shiryaev (2001) (Vol. II, p.
22). They generalize results in Gennotte (1986) and are a special case of results in Detemple (1986). Both Gennotte and Detemple
did not emphasize the impact of learning on the pricing function.
216
7.8. Appendix 1 c
°by A. Mele
7.8 Appendix 1
7.8.1 Markov pricing kernels
Let τ
ξ(τ ) ≡ ξ(D(τ ), y(τ ), τ ) = e− 0 δ(D(s),y(s))ds
Υ(D(τ ), y(τ )), (7.25)
for some bounded positive function δ, and some positive function Υ(D, y) ∈ C 2,2 (Z × Y). By the
assumed functional form for ξ, and Itô’s lemma,
LΥ(D, y)
R(D, y) = δ(D, y) −
Υ(D, y)
∂ ∂
λ1 (D, y) = −σ 0 D log Υ(D, y) − v1 (D, y) log Υ(D, y)
∂D ∂y
∂
λ2 (D, y) = −v2 (D, y) log Υ(D, y)
∂y
Example A1 below is an important special case of this setting. Finally, to derive Eq. (7.3) in this
setting, let us define the (undiscounted) “Arrow-Debreu adjusted” asset price process as:
By the results in Section 7.4.2, we know that the following price representation holds true:
∙Z ∞ ¸
S(τ )ξ(τ ) = E ξ(s)D(s)ds , τ ≥ 0.
τ
Under usual regularity conditions, the previous equation can then be understood as the unique
Feynman-Kac stochastic representation of the solution to the following partial differential equation
Example A1 (Infinite horizon, complete markets economy.) Consider an infinite horizon, complete
markets economy in which total consumption Z is solution to Eq. (7.6), with v2 ≡ 0. Let a (single)
agent’s program be:
∙Z ∞ ¸ ∙Z ∞ ¸
max E e−δτ u(c(τ ), x(τ ))dτ s.t. V0 = E ξ(τ )c(τ )dτ , V0 > 0,
0 0
where δ > 0, the instantaneous utility u is continuous and thrice continuously differentiable in its
arguments, and x is solution to
218
7.8. Appendix 1 c
°by A. Mele
x(t)
φ<0
x(T)
τ
t T
φ>0
x(T)
τ
t T
x(t)
FIGURE 6.9. Illustration of the maximum principle for ordinary differential equations
Let Z τ
τ u
y(τ ) ≡ e− t k(u)du
u(τ ) + e− t k(s)ds
ζ(u)du.
t
I claim that if (7.29) holds, then y is a martingale under some regularity conditions. Indeed,
τ τ τ
dy(τ ) = −k(τ )e− t k(u)du
u(τ )dτ + e− t k(u)du
du(τ ) + e− t k(u)du ζ(τ )dτ
∙µ ¶ ¸
τ τ ∂ τ
= −k(τ )e− t k(u)du
u(τ ) + e− t k(u)du
+ L u(τ ) dτ + e− t k(u)du ζ(τ )dτ
∂τ
+ local martingale
∙ µ ¶ ¸
− tτ k(u)du ∂
= e −k(τ )u(τ ) + + L u(τ ) + ζ(τ ) dτ + local martingale
∂τ
µ ¶
∂
= local martingale - because + L − k u + ζ = 0.
∂τ
and starting from this relationship, you can adapt the previous reasoning on deterministic differential
equations to the stochastic differential case. The case with jumps is entirely analogous.
219
7.8. Appendix 1 c
°by A. Mele
Proposition 7.A.1. (Dynamic Stochastic Dominance) Consider two economies A and B with two
fundamental volatilities aA and aB and let π i (x) ≡ ai (x)·λi (x) and ρi (x) (i = A, B) the corresponding
risk-premium and discount rate. If aA > aB , the price cA in economy A is lower than the price price
cB in economy B whenever for all (x, τ ) ∈ R × [0, T ],
1£ 2 ¤
V (x, τ ) ≡ − [ρA (x) − ρB (x)] cB (x, τ ) − [π A (x) − π B (x)] cB
x (x, τ ) + aA (x) − a2B (x) cB
xx (x, τ ) < 0.
2
(7.30)
7.8.4 Proofs
RT ¯
¯
Proof of proposition 7.A.1. Function c(x, T − s) ≡ E[ exp(− s ρ(x(t))dt) · ψ(x(T ))¯ x(s) = x] is
solution to the following partial differential equation:
½
0 = −c2 (x, T − s) + L∗ c(x, T − s) − ρ(x)c(x, T − s), ∀(x, s) ∈ R × [0, T )
(7.31)
c(x, 0) = ψ(x), ∀x ∈ R
where L∗ c(x, u) = 12 a(x)2 cxx (x, u) + b(x)cx (x, u) and subscripts denote partial derivatives. Clearly, cA
and cB are both solutions to the partial differential equation (7.31), but with different coefficients. Let
bA (x) ≡ b0 (x) − π A (x). The price difference ∆c(x, τ ) ≡ cA (x, τ ) − cB (x, τ ) is solution to the following
partial differential equation: ∀(x, s) ∈ R × [0, T ),
1
0 = −∆c2 (x, T − s) + σ B (x)2 ∆cxx (x, T − s) + bA (x)∆cx (x, T − s) − ρA (x)∆c(x, T − s) + V (x, T − s),
2
with ∆c(x, 0) = 0 for all x ∈ R, and V is as in Eq. (7.30) of the proposition. The result follows by the
maximum principle for partial differential equations. ¥
Proof of proposition 7.1. By differentiating twice the partial differential equation (7.31) with
respect to x, I find that c(1) (x, τ ) ≡ cx (x, τ ) and c(2) (x, τ ) ≡ cxx (x, τ ) are solutions to the following
partial differential equations: ∀(x, s) ∈ R++ × [0, T ),
(1) 1 1
0 = −c2 (x, T − s) + a(x)2 c(1) 2 0 (1)
xx (x, T − s) + [b(x) + (a(x) ) ]cx (x, T − s)
2 2
£ ¤
− ρ(x) − b0 (x) c(1) (x, T − s) − ρ0 (x)c(x, T − s),
with c(2) (x, 0) = ψ 00 (x) ∀x ∈ R. By the maximum principle for partial differential equations, c(1) (x, T −
s) > 0 (resp. < 0) ∀(x, s) ∈ R×[0, T ) whenever ψ0 (x) > 0 (resp. < 0) and ρ0 (x) < 0 (resp. > 0) ∀x ∈ R.
This completes the proof of part a) of the proposition. The proof of part b) is obtained similarly. ¥
220
7.8. Appendix 1 c
°by A. Mele
dD = gdτ + dW,
pφ(D − A) pe
π(D) = =
pφ(D − A) + (1 − p)φ(D + A) p + (1 − p)e−2AD
1 2
where the second equality follows by the Gaussian distribution assumption φ (x) ∝ e− 2 x , and straight
forward simplifications. By simple computations,
dπ = 2Aπ (1 − π) dW. ¥
As pointed out in Section 7.6, a restricted version of Proposition 7.1-b) implies that in all scalar
(diffusion) models of the short-term rate, u11 (r0 , T ) < 0 whenever b00 < 2, where b is the risk-netraulized
drift of r. This specific result was originally obtained in Mele (2003). Both the theory in Mele (2003)
and the proof of Proposition 7.1-b) rely on the Feynman-Kac representation of u11 . Here we provide
a more intuitive derivation under a set of simplifying assumptions.
By Mele (2003) (Eq. (6) p. 685),
("µZ ¶2 Z T 2 # µ Z T ¶)
T
∂r ∂ r
u11 (r0 , T ) = E (τ )dτ − 2 (τ )dτ exp − r(τ )dτ .
0 ∂r0 0 ∂r0 0
7.9 Appendix 2
In their original article, Campbell and Cochrane considered a discrete-time model in which consump-
tion is a Gaussian process. The diffusion limit of their model is simply Eq. (8.17) given in the main
text. By example A1 (Eq. (7.27)),
∙ ¸
η 1
λ(D, x) = σ 0 − γ(D, x) . (7.34)
s D
To find the diffusion function γ of x, notice that x = D(1 − s), where s solution to Eq. (7.12). By
Itô’s lemma, then, γ = [1 − s − sl(s)] Dσ 0 . Finally, we replace this function into (7.34), and obtain
λ(s) = ησ 0 [1 + l(s)], as we claimed in the main text. (This result holds approximately in the original
discrete time framework.) Finally, the real interest rate is found by an application of formula (7.26),
µ ¶
1 2 1
R(s) = δ + η g0 − σ 0 + η(1 − φ)(s̄ − log s) − η 2 σ 20 [1 + l(s)]2 .
2 2
Campbell
p and Cochrane choose l so as p to make the real interest rate constant. They¡ took l(s)¢ =
S̄ −1 1 + 2(s − log s) − 1, where S̄ = σ 0 η/(1 − φ) = exp(s̄), which leaves R = δ + η g0 − 12 σ 20 −
1
2 η(1 − φ).
A numerical solution can be implemented as follows. Create a grid and define pj = p (sj ), j = 1, ···, N ,
for some N . We have,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤
p1 b1 a11 · · · aN1 p1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. .. .. ⎥ ⎢ .. ⎥ ,
⎣ . ⎦=⎣ . ⎦+⎣ . . . ⎦⎣ . ⎦
pN bN a1N · · · aNN pN
P
N
bi = aji , aji = gji · pji , gji = g (sj , si ) , pji = Pr ( sj | si ) · ∆s,
j=1
where ∆s is the integration step; s1 = smin , sN = smax ; smin and smax are the boundaries in the
approximation; and Pr ( sj | si ) is the transition density from state i to state j - in this case, a Gaussian
transition density. Let p = [p1 · · · pN ]> , b = [b1 · · · bN ]> , and let A be a matrix with elements aji .
The solution is,
p = (I − A)−1 b. (7.35)
The model can be simulated in the following manner. Let s and s̄ be the boundaries of the underlying
s̄ − s
state process. Fix ∆s = . Draw states. State s∗ is drawn. Then,
N
s∗ −s
1. If min (s∗ − s, s̄ − s∗ ) = s∗ − s, let k be the smallest integer close to ∆s . Let smin = s∗ − k∆s,
and smax = smin + N · ∆s.
s∗ −s
2. If min (s∗ − s, s̄ − s∗ ) = s̄ − s∗ , let k be the biggest integer close to ∆s . Let smax = s∗ + k∆s,
and smin = smax − N · ∆s.
The previous algorithm avoids interpolations. Importantly, it ensures that during the simulations,
p is computed in correspondence of exactly the state s∗ that is drawn. Precisely, once s∗ is drawn,
1 ) create the corresponding grid s1 = smin , s2 = smin + ∆s, · · ·, sN = smax according to the previous
rules; 2 ) compute the solution from Eq. (7.35). In this way, one has p (s∗ ) at hand - the simulated
P/D ratio when state s∗ is drawn.
223
7.10. Appendix 3: Simulation of discrete-time pricing models c
°by A. Mele
References
Abel, A. B. (1988): “Stock Prices under Time-Varying Dividend Risk: An Exact Solution
in an Infinite-Horizon General Equilibrium Model.” Journal of Monetary Economics 22,
375-393.
Bajeux-Besnainou, I. and J.-C. Rochet (1996): “Dynamic Spanning: Are Options an Appro-
priate Instrument?” Mathematical Finance 6, 1-16.
Bansal, R. and A. Yaron (2004): “Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles.” Journal of Finance 59, 1481-1509.
Barberis, N., M. Huang and T. Santos (2001): “Prospect Theory and Asset Prices.” Quarterly
Journal of Economics 116, 1-53.
Barsky, R. B. (1989): “Why Don’t the Prices of Stocks and Bonds Move Together?” American
Economic Review 79, 1132-1145.
Barsky, R. B. and J. B. De Long (1990): “Bull and Bear Markets in the Twentieth Century.”
Journal of Economic History 50, 265-281.
Barsky, R. B. and J. B. De Long (1993): “Why Does the Stock Market Fluctuate?” Quarterly
Journal of Economics 108, 291-311.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): “General Properties of Option Prices.”
Journal of Finance 51, 1573-1610.
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Brennan, M. J. and Y. Xia (2001): “Stock Price Volatility and Equity Premium.” Journal of
Monetary Economics 47, 249-283.
David, A. (1997): “Fluctuating Confidence in Stock Markets: Implications for Returns and
Volatility.” Journal of Financial and Quantitative Analysis 32, 427-462.
El Karoui, N., M. Jeanblanc-Picqué and S. E. Shreve (1998): “Robustness of the Black and
Scholes Formula.” Mathematical Finance 8, 93-126.
Fama, E. F. and K. R. French (1989): “Business Conditions and Expected Returns on Stocks
and Bonds.” Journal of Financial Economics 25, 23-49.
224
7.10. Appendix 3: Simulation of discrete-time pricing models c
°by A. Mele
Huang, C.-F. and Pagès, H. (1992): “Optimal Consumption and Portfolio Policies with an
Infinite Horizon: Existence and Convergence.” Annals of Applied Probability 2, 36-64.
Jagannathan, R. (1984): “Call Options and the Risk of Underlying Securities.” Journal of
Financial Economics 13, 425-434.
Malkiel, B. (1979): “The Capital Formation Problem in the United States.” Journal of Finance
34, 291-306.
Mele, A. (2003): “Fundamental Properties of Bond Prices in Models of the Short-Term Rate.”
Review of Financial Studies 16, 679-716.
Pindyck, R. (1984): “Risk, Inflation and the Stock Market.” American Economic Review 74,
335-351.
Poterba, J. and L. Summers (1985): “The Persistence of Volatility and Stock Market Fluctu-
ations.” American Economic Review 75, 1142-1151.
Romano, M. and N. Touzi (1997): “Contingent Claims and Market Completeness in a Stochas-
tic Volatility Model.” Mathematical Finance 7, 399-412.
Veronesi, P. (1999): “Stock Market Overreaction to Bad News in Good Times: A Rational
Expectations Equilibrium Model.” Review of Financial Studies 12, 975-1007.
Veronesi, P. (2000): “How Does Information Quality Affect Stock Returns?” Journal of Finance
55, 807-837.
Wang, S. (1993): “The Integrability Problem of Asset Prices.” Journal of Economic Theory
59, 199-213.
225
8
Tackling the puzzles
vt = W (ct , v̂t+1 ) ,
where W is the “aggregator function” and v̂t+1 is the certainty-equivalent utility at t+1 defined
as,
h (v̂t+1 ) = E [h (vt+1 )] ,
where h is a von Neumann - Morgenstern utility function. That is, the certainty equivalent
depends on some agent’s risk-attitudes encoded into h. Therefore,
¡ ¢
vt = W ct , h−1 [E (h (vt+1 ))] .
for three positive constants ρ, η and δ. In this formulation, risk-attitudes for static wealth
gambles have still the classical CRRA flavor. More precisely, we say that η is the RRA for
static wealth gambles and ψ ≡ (1 − ρ)−1 is the IES.
8.1. Non-expected utility c
°by A. Mele
We have,
£ ¡ 1−η ¢¤ £ ¡ 1−η ¢¤ 1−η
1
v̂t+1 = h−1 [E (h (vt+1 ))] = h−1 E vt+1 = E vt+1 .
The previous parametrization of the aggregator function then implies that,
h ¡ ¢ ρ i1/ρ
vt = cρt + e−δ E(vt+1
1−η 1−η
) . (8.1)
This collapses to the standard intertemporal additively separable case when ρ = 1 − η ⇔
RRA = IES−1 . Indeed, it is straight forward to show that in this case,
∙ µ∞ ¶¸ 1
P −δn 1−η 1−η
vt = E e ct+n .
n=0
Eq. (8.6) obviously holds for the market portfolio and the risk-free asset. Therefore, by taking
logs in Eq. (8.6) for i = M, and for the risk-free asset, i = 0 (say) yields the two following
conditions:
∙ µ µ 0¶ ¶¸
θ c 0
0 = log E exp −δθ − log + θRM , RM = log (1 + rM ), (8.7)
ψ c
and, ∙ µ µ 0¶ ¶¸
θ c
−Rf = − log (1 + r0 ) = log E exp −δθ − log + (θ − 1) RM . (8.8)
ψ c
¡ 0¢
Next, suppose that consumption growth, log cc , and the market portfolio return, RM , are
jointly normally distributed. In the appendix, we show that the expected excess return on the
market portofolio is given by,
1 θ
E(RM ) − Rf + σ 2RM = σ RM ,c + (1 − θ) σ2RM (8.9)
2 ψ
where σ 2RM = var(RM ) and σ RM ,c = cov(RM , log (c0 /c)), and the term 12 σ 2RM in the left hand
side is a Jensen’s inequality term. Note, Eq. (8.9) is a mixture of the Consumption CAPM (for
the part ψθ σ RM ,c ) and the CAPM (for the part (1 − θ) σ 2RM ).
The risk-free rate is given by,
∙ µ 0 ¶¸
1 c 1 1 θ 2
Rf = δ + E log − (1 − θ) σ 2RM − σ , (8.10)
ψ c 2 2 ψ2 c
σ 2RM = σ 2c + σ 2∗ ; σ RM ,c = σ 2c , (8.11)
where σ 2∗ is a positive constant that may arise when the asset return is driven by some additional
state variable. (This is the case, for example, in the Bansal and Yaron (2004) model described
below.) Under the assumption that the asset return volatility is as in Eq. (8.11), the equity
premium in Eq. (8.9) is,
1 η − ψ1 2
E(RM ) − Rf + σ 2RM = ησ2c + (1 − θ) σ 2∗ = ησ2c + σ .
2 1 − ψ1 ∗
As we can see, disentangling risk-aversion from intertemporal substitution does not suffice per
se to resolve the equity premium puzzle. To increase theequity premium, it is important that
σ 2∗ > 0, i.e. that some additional state variables drives the variation of the asset return.
228
8.1. Non-expected utility c
°by A. Mele
On the other hand, in this framework, the volatility of these state variables can only affect the
asset return if risk-aversion is distinct from the inverse of the intertemporal rate substitution.
In particular, suppose that σ 2∗ does not depend on η and ψ. If ψ > 1, then, the equity premium
increases with σ 2∗ whenever η > ψ−1 .
Next, let us derive the risk-free rate. Assume that E [log (c0 /c)] = g0 − 12 σ 2c , where g0 is the
expected consumption growth, a constant. Furthermore, use the assumptions in Eq. (8.11) to
obtain that the risk-free rate in Eq. (8.10) is,
µ ¶ 1
1 1 1 2 1η − ψ 2
Rf = δ + g0 − η 1 + σc − σ .
ψ 2 ψ 2 1 − ψ1 ∗
As we can see, we may increase the level of relative risk-aversion, η, without substantially
affecting the level of the risk-free rate, Rf . This is because the effects of η on Rf are of a
second-order importance (they multiply variances, which are orders of magnitude less than the
expected consumption growth, g0 ).
Let us conjecture that the log of the price-dividend ratio takes the simple form, zt = a0 + a1 xt ,
where a0 and a1 are two coefficients to be determined. Substituting this guess into Eq. (8.15),
and identifying terms, one finds that there exixts a constant a0 such that,
1
1− ψ
zt = a0 + xt (8.16)
1 − κ1 ρ
The appendix provides details on these calculations.
Next, use RM,t+1 ≈ κ0 + κ1 zt+1 − zt + gt+1 (or alternatively, the stochastic discount factor)
to compute σ 2∗ , volatility, risk-premium, etc. [In progress]
Therefore, x (τ ) satisfies,
where f (η)−1 is the marginal utility of income of the agent η. (See Chapter 2 in Part I, for the
theoretical foundations of this program.)
230
8.3. Incomplete markets c
°by A. Mele
In the appendix, we show that the solution to the static program [P1] leads to the following
expression for the utility function u (D, x),
Z ∞
1 1 η−1
u(D, x) = f(η) η V (s) η dη,
1 1−η
where V is a Lagrange multiplier, which satisfies,
Z ∞
1 1
s
e = f (η) η V (s)− η dη.
1
The appendix also shows that the unit risk-premium predicted by this model is,
exp(s)
λ(s) = σ0 R ∞ 1 1 . (8.19)
1
1
η
f (η) η V (s)− η dη
This economy collapses to an otherwise identical homogeneous economy if the social weighting
function f (η) = δ (η − η 0 ), the Dirac’s mass at η0 . In this case, λ (s) = σ 0 η0 , a constant.
A crucial assumption in this model is that the standard of living X is a process with bounded
variation (see Eq. (8.18)). By this assumption, the standard living of others is not a risk which
agents require to be compensated for. The unit risk-premium in Eq. (8.19) is driven by s through
nonlinearities induced by agents heterogeneity. By calibrating their model to US data, Chan
and Kogan find that the risk-premium, λ (s), is decreasing and convex in s.1 The mechanism at
the heart of this result is an endogenous wealth redistribution in the economy. Clearly, the less
risk-averse individuals put a higher proportion of their wealth in the risky assets, compared to
the more risk-avers agents. In the poor states of the world, stock prices decrease, the wealth
of the less risk-averse lowers more than that of the more risk-averse agents, which reduces the
fraction of wealth held by the less risk-averse individuals in the whole economy. Thus, in bad
times, the contribution of these less risk-averse individuals to aggregate risk-aversion decreases
and, hence, the aggregate risk-aversion increases in the economy.
1 Their numerical results also revealed that in their model, the log of the price-dividend ratio is increasing and concave in s.
Finally, their lemma 5 (p. 1281) establishes that in a homogeneous economy, the price-dividend ratio is increasing and convex in s.
231
8.4. Limited stock market participation c
°by A. Mele
where wp , wn are two constants, and ξ is the usual pricing kernel process, solution to,
dξ (τ )
= −R (τ ) dt − λ (τ ) · dW (τ ) . (8.21)
ξ (τ )
Let
u (D, x) ≡ max [up (cp ) + x · un (cn )] ,
cp +cn =D
where
u0p (ĉp )
x≡ 0 = u0p (ĉp )ĉn (8.22)
un (ĉn )
is a stochastic social weight. By the definition of ξ, x (τ ) is solution to,
dx (τ ) = −x (τ ) λ (τ ) dW (τ ) , (8.23)
Then, the equilibrium price system in this economy is supported by a fictitious representa-
tive agent with utility u (D, x). Intuitively, the representative agent “allocations” satisfy, by
construction,
u0p (c∗p (τ )) u0p (ĉp (τ ))
= = x (τ ) ,
u0n (c∗n (τ )) u0n (ĉn (τ ))
where starred allocations are the representative agent’s “allocations”. In other words, the trick
underlying this approach is to find a stochastic social weight process x (τ ) such that the first
order conditions of the representative agent leads to the market allocations. This is shown more
rigorously in the Appendix.
Guvenen (2005) makes an interesting extension of the Basak and Cuoco model. He consider
two agents in which only the “rich” invests in the stock-market, and is such that ISErich >
IESpoor . He shows that for the rich, a low IES is needed to match the equity premium. However,
US data show that the rich have a high IES, which can not do the equity premium. (Guvenen
considers an extension of the model in which we can disentangle IES and CRRA for the rich.)
232
8.5. Appendix on non-expected utility c
°by A. Mele
Optimality. Consider Eq. (8.4). The first order condition for c yields,
¡ ¡ ¡ ¢¢¢ ¡ ¡ ¡ ¢¢¢ £ ¡ ¢¡ ¡ ¢¢¤
W1 c, E V x0 , y0 = W2 c, E V x0 , y0 · E V1 x0 , y 0 1 + rM y 0 , (A1)
where subscripts denote partial derivatives. Thus, optimal consumption is some function c (x, y). Hence,
¡ ¡ ¢¢
x0 = (x − c (x, y)) 1 + rM y 0
We have, ¡ ¡ ¡ ¢¢¢
V (x, y) = W c (x, y) , E V x0 , y 0 .
By differentiating the value function with respect to x,
¡ ¡ ¡ ¢¢¢
V1 (x, y) = W1 c (x, y) , E V x0 , y 0 c1 (x, y)
¡ ¡ ¡ 0 0 ¢¢¢ £ ¡ 0 0 ¢ ¡ ¡ ¢¢¤
+W2 c (x, y) , E V x , y E V1 x , y 1 + rM y 0 (1 − c1 (x, y)) ,
where subscripts denote partial derivatives. By replacing Eq. (A1) into the previous equation we get
the Envelope Equation for this dynamic programming problem,
¡ ¡ ¡ ¢¢¢
V1 (x, y) = W1 c (x, y) , E V x0 , y 0 . (A2)
Below, we show that by a similar argument the same Euler equation applies to any asset i,
∙ ¸
W2 (c (x, y) , ν (x, y)) ¡ ¡ 0 0 ¢ ¡ 0 0 ¢¢ ¡ ¡ 0 ¢¢
E W1 c x , y , ν x , y 1 + ri y = 1, i = 1, · · ·, m. (A3)
W1 (c (x, y) , ν (x, y))
Optimal consumption is c (x, y). Let ν (x, y) ≡ E (V (x0 , y0 )), as in the main text. By replacing Eq.
(A2) into the previous equation,
∙ ¸
W2 (c (x, y) , ν (x, y)) ¡ ¡ 0 0 ¢ ¡ 0 0 ¢¢ p0i + Di0
E W1 c x , y , ν x , y = 1, i = 1, · · ·, m. ¥
W1 (c (x, y) , ν (x, y)) pi
Derivation of Eq. (8.5). We need to compute explicitly the stochastic discount factor in Eq.
(A3),
¡ ¢ W2 (c (x, y) , ν (x, y)) ¡ ¡ ¢ ¡ ¢¢
m x, y; x0 y0 = W1 c x0 , y0 , ν x0 , y0 .
W1 (c (x, y) , ν (x, y))
We have,
1 h ρ ρ i ρ
1−η
and,
¡ ¢ h ¡ ¢ ρ i 1−η
ρ
−1 ¡ ¢ 1−η−ρ 1−η−ρ
W1 c0 , ν 0 = c0ρ + e−δ (1 − η) ν 0 1−η c0ρ−1 = W c0 , ν 0 1−η (1 − η) 1−η c0ρ−1 (A4)
V1 (x, y)
¡ ¡ ¡ ¢¢¢ 1−η−ρ 1−η−ρ 1−η−ρ 1−η−ρ
= W1 c (x, y) , E V x0 , y 0 = W (c, ν) 1−η (1 − η) 1−η cρ−1 = V (x, y) 1−η (1 − η) 1−η cρ−1 .
where the first equality follows by Eq. (A2), the second equality follows by Eq. (A4), and the last
equality follows by optimality. By making use of the conjecture on V , and rearraning terms,
ρ
c (x, y) = a (y) x, a (y) ≡ b (y) (1−η)(ρ−1) . (A6)
±
Hence, V (x0 , y0 ) = b (y 0 ) x01−η (1 − η), where
¡ ¡ ¢¢
x0 = (1 − a (y)) x 1 + rM y 0 , (A7)
and h i
E (V (x0 , y0 )) E ψ (y 0 ) (1 + rM (y0 ))1−η
= . (A8)
V (x0 , y 0 ) ψ (y 0 ) (1 + rM (y 0 ))1−η
234
8.5. Appendix on non-expected utility c
°by A. Mele
Along any optimal path, V (x, y) = W (c (x, y) , E (V (x0 , y 0 ))). By plugging in W (from Eq. 8.4)) and
the conjecture for V ,
h ¡ ¢¡ ¡ ¢¢ i ³ ´− 1−η µ a (y) ¶ (1−η)(ρ−1)
ρ
1−η ρ
E ψ y 0 1 + rM y 0 = e−δ . (A9)
1 − a (y)
Moreover,
¡ 0¢ ¡ ¡ 0 ¢¢1−η h ¡ 0 ¢ ¡ ¡ 0 ¢¢ ρ i (1−η)(ρ−1)
ρ
ψ y 1 + rM y = a y 1 + rM y ρ−1 . (A10)
By plugging Eqs. (A9)-(A10) into Eq. (A8),
" # (1−η)(ρ−1)
E (V (x0 , y 0 )) ³ −δ ´−
1−η ρ
ρ a (y)
= e ρ
V (x0 , y 0 ) (1 − a (y)) a (y 0 ) (1 + rM (y0 ))ρ−1
"µ ¶ # (1−η)(ρ−1)
³ ´− 1−η c0 −1 x0 ρ
ρ
= e−δ ρ
c (1 − a (y)) x (1 + rM (y 0 )) ρ−1
"µ ¶ # (1−η)(ρ−1)
³ ´− 1−η c0 −1 1 ρ
ρ
= e−δ 1
c (1 + r (y0 )) ρ−1
M
where the first equality follows by Eq. (A6), and the second equality follows by Eq. (A7). The result
follows by replacing this into Eq. (A5). ¥
Proof of Eqs. (8.9) and (8.10). By using the standard property that log E(eỹ ) = E(ỹ)+ 12 var (ỹ),
for ỹ normally distributed, in Eq. (8.7), we obtain,
∙ µ µ 0¶ ¶¸
θ c
0 = log E exp −δθ − log + θRM
ψ c
∙ µ 0 ¶¸ "µ ¶ #
θ c 1 θ 2 2 2 2 θ2
= −δθ − E log + θE(RM ) + σ c + θ σ RM − 2 σ RM ,c . (A11)
ψ c 2 ψ ψ
By replacing Eq. (A12) into Eq. (A11), we obtain Eq. (8.9) in the main text.
To obtain the risk-free rate Rf in Eq. (8.10), we replace the expression for E(RM ) in Eq. (8.9) into
Eq. (A12). ¥
where the second equality follows by Eqs. (8.13) and (8.14). Note, then, that this equality can only
hold if the two constants, const1 and const2 are both zero. Imposing const2 = 0 yields,
1
1− ψ
a1 = ,
1 − κ1 ρ
as in Eq. (8.16) in the main text. Imposing const1 = 0, and using the solution for a1 , yields the solution
for the constant a0 . ¥
Here (f, A) is the aggregator. A is a variance multiplier - it places a penalty proportional to utility
volatility kσ vt k2 . (f, A) somehow corresponds to (W, v̂) in the discrete time case.
The solution to the previous “stochastic differential utility” is,
½Z T ∙ ¸¾
1 2
vt = E f (cs , vs ) + A (vs ) kσ vs k .
t 2
236
8.6. Appendix on economies with heterogenous agents c
°by A. Mele
where D is the aggregate endowment in the economy. Then, the equilibrium price system can be
computed as the Arrow-Debreu state price density in an economy with a single agent endowed with
the aggregate endowment D, instantaneous utility function u (c, x), and where for a ∈ A, the social
weighting function f (a) equals the reciprocal of the marginal utility of income of the agent a.
The practical merit of this approach is that while the marginal utility of income is unobservable, the
thusly constructed Arrow-Debreu state price density depends on the “infinite dimensional parameter”,
f , which can be calibrated to reproduce the main quantitative features of consumption and asset price
data.
We now apply this approach to indicate how to derive the Chan and Kogan (2002) equilibrium
conditions.
“catching up with the Joneses” (Chan and Kogan (2002)). In this model, markets are
complete, and we have that A = [1, ∞] and uη (cη , x) = ( cη / x)1−η / (1 − η). The static optimization
problem for the social planner in [Soc-Pl] can be written as,
Z Z
∞
( cη / x)1−η ∞
u (D, x) = max f (η) dη, s.t. ( cη / x) dη = D/ x. (A13)
cη 1 1−η 1
which is obtained by replacing Eq. (A14) into the budget constraint of the social planner.
The general equilibrium allocations and prices can be obtained by setting f (η) equal to the marginal
utility of income for agent η. Then, the expression for the unit risk-premium in Eq. (8.19) follows by,
µ Á ¶
∂ 2 u (D, x) ∂u (D, x)
λ (s) = − σ 0 D,
∂D2 ∂D
and lenghty computations, after setting D/ x = es . The short-term rate can be computed by calculating
the expectation of the pricing kernel in this fictitious representative agent economy.
237
8.6. Appendix on economies with heterogenous agents c
°by A. Mele
It is instructive to compare the first order conditions of the social planner in Eq. (A14) with those
in the decentralized economy. Since markets are complete, we have that the first order conditions in
the decentralized economy satisfy:
e−δt ( cη (t)/ x (t))−η = κ (η) ξ (t) x (t) , (A15)
where κ (η) is the marginal utility of income for the agent η, and ξ (t) is the usual pricing kernel.
By aggregating the market equilibrium allocations in Eq. (A15),
Z ∞ Z ∞h i− 1 1
κ (η)− η dη.
η
D (t) = cη (t) dη = x (t) eδt ξ (t) x (t)
1 1
Restricted stock market participation (Basak and Cuoco (1998)). We first show that
Eq. (8.23) holds true. Indeed, by the definition of the stochastic social weight in Eq. (8.22), we have
that
wp τ
x (τ ) = u0p (ĉp (τ ))ĉn (τ ) = ξ (τ ) e 0 R(s)ds
wn
where the second line follows by the first order conditions in Eq. (8.20). Eq. (8.23) follows by the
previous expression for x and the dynamics for the pricing kernel in Eq. (8.21).
By Chapter 7 (Appendix 1), the unit risk premium λ satisfies,
u11 (D, x) u12 (D, x)x
λ(D, x) = − σ0D + λ(D, x).
u1 (D, x) u1 (D, x)
This is:
u1 (D, x)u11 (D, x) σ0 D
λ(D, x) = − ·
u1 (D, x) − u12 (D, x)x u1 (D, x)
u00 (ĉa )
=− a σ0D
u1 (D, x)
u00 (ĉa )ĉa
= − a0 σ 0 s−1 .
ua (ĉa )
where the second line follows by Basak and Cuoco (identity (33), p. 331) and the third line follows by
the definition of u(D, x) and s. The Sharpe ratio reported in the main text follows by the definition
of ua . The interest rate is also found through Chapter 7 (Appendix 1). We have,
ηg0 1 η(η + 1)σ 20
R(s) = δ + − .
η − (η − 1)s 2 s(η − (η − 1)s)
238
8.6. Appendix on economies with heterogenous agents c
°by A. Mele
Finally, by applying Itô’s lemma to s = cDa , and using the optimality conditions for agent a, we find
that drift and diffusion functions of s are given by:
∙ ¸
(1 − η)(1 − s) 1 (η + 1)σ 20 1 (η + 1)σ 20
φ(s) = g0 s− + + σ 0 (s − 1),
η + (1 − η)s 2 η + (1 − η) s 2 s
239
8.6. Appendix on economies with heterogenous agents c
°by A. Mele
References
Bansal, R. and A. Yaron (2004): “Risks for the Long Run: A Potential Resolution of Asset
Pricing Puzzles.” Journal of Finance 59, 1481-1509.
Basak, S. and D. Cuoco (1998): “An Equilibrium Model with Restricted Stock Market Par-
ticipation.” Review of Financial Studies 11, 309-341.
Campbell, J. and R. Shiller (1988): “The Dividend-Price Ratio and Expectations of Future
Dividends and Discount Factors.” Review of Financial Studies 1, 195—228.
Chan, Y.L. and L. Kogan (2002): “Catching Up with the Joneses: Heterogeneous Preferences
and the Dynamics of Asset Prices.” Journal of Political Economy 110, 1255-1285.
Duffie, D. and L.G. Epstein (1992a): “Asset Pricing with Stochastic Differential Utility.” Re-
view of Financial Studies 5, 411-436.
Duffie, D. and L.G. Epstein (with C. Skiadas) (1992b): “Stochastic Differential Utility.” Econo-
metrica 60, 353-394.
Epstein, L.G. and S.E. Zin (1989): “Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: A Theoretical Framework.” Econometrica 57, 937-969.
Epstein, L.G. and S.E. Zin (1991): “Substitution, Risk-Aversion and the Temporal Behavior of
Consumption and Asset Returns: An Empirical Analysis.” Journal of Political Economy
99, 263-286.
Guvenen, F. (2005): “A Parsimonious Macroeconomic Model for Asset Pricing: Habit forma-
tion or Cross-Sectional Heterogeneity.” Working paper, University of Rochester.
Huang, C.-f. (1987): “An Intertemporal General Equilibrium Asset Pricing Model: the Case
of Diffusion Information.” Econometrica 55, 117-142.
Weil, Ph. (1989): “The Equity Premium Puzzle and the Risk-Free Rate Puzzle.” Journal of
Monetary Economics 24, 401-421.
240
9
Information and other market frictions
9.1 Introduction
The assumption agents have imperfect information about the fundamentals of the economy was
first used by Phelps (1970) and Lucas (1972), to explain the relation between monetary policy
and the business cycle. This information-based approach to the business cycle, summarized
in Lucas (1981), was, in fact, abandoned in favour of the real business cycle theory, reviewed
in Chapter 3, partly because imperfect information can not be considered as the sole engine
of macroeconomic fluctuations. Instead, it is widely acknowledged that the merit of Lucas’
approach was the introduction of a systematic way of thinking about fluctuations, in a context
of rational expectations. Moreover, his information approach has inspired work in financial
economics, where imperfect information is likely to play a quite fundamental role. In Section
9.2, we provide a succinct account of the Lucas framework, and solve a model relying on a
simplified version of Lucas (1973). We solve this model, following the perspective we think a
finance theorist would typically have. It is quite useful to present this model, as this is very
simple and at the same time, contributes to give us a big picture of where imperfect information
can lead us, in general. Section 9.2 through 9.7 review the many models in financial economics
that have been used to explain the price formation mechanism in contexts with imperfect
information, be it asymmetric or differential, as we shall make precise below.
Sections 9.7 and 9.9 conclude this chapter, and present additional market frictions that are
potentially apt to explain certain features in the asset price formation process.
The previous equation can be easily derived, once we assume p is common knowledge, as for
example in the model of monopolistic competion of Blanchard and Kiyotaki (1987). If, instead,
p is not common knowledge, it is more problematic to derive the exact functional form assumed
for yis , although this describes a quite plausible decision mechanism.
Information is disseminated differentially, not asymmetrically, in that producers in the i-th
island do not know the price in the remaining islands, and guess economic developments in
the other islands with the same precision. We assume and, later, verify, that all variables,
exogeneous and endogeneous, are normally distributed. Under this presumption, we shall show,
the price index p gathers all the available information in the economy efficiently, i.e. it is a
sufficient statistics for all that information.
We have, by the Projection theorem,
where we have used the fact that information is symmetrically disseminated and, then, (i)
the expectation E (pi ) = E (pj ) = E (p) for every i and j, and (ii) both the numerator and
denominator of the ratio, β ≡ cov(p i −p,pi )
var(pi )
, are the same across all islands. This coefficient will
be determined below, as a result of the equilibrium.
Aggregating across all islands, yields the celebrated Lucas supply equation:
1X s
n
s
y ≡ y = β (p − E (p)) . (9.1)
n j=1 j
Next, assume the demand for the good produced in the i-th island is given by:
¡ ¢
yid = m − p + ui − θ (pi − p) , where ui ∼ N 0, σ 2u
where money is ¡ ¢
m = E (m) + , where ∼ N 0, σ 2 . (9.2)
Pn
Finally, we assume that E (ui ) = 0, and that ui are a sectoral shocks, in that: j=1 uj = 0.
The functional form assumed for the demand function, yid , can be easily derived, assuming the
goods in the islands are imperfect substitutes, as for example in Blanchard and Kiyotaki (1987).
In this context, the equilibrium price in the islands plays two roles. A first, standard role, is
to clear the market in each island, being such that yis = yid , or:
Its second role is to convey information about the two shocks, the macroeconomic, monetary
shock, , and the real shocks in all the islands, uj , j = 1, · · · , n. Let us assume, then, that the
only real shock that matters for the price in the i-th island is ui . Below, we shall verify this
conjecture holds, in equilibrium. Then, the price is a function pi = P ( , ui ), which we conjecture
to be affine, in and ui , viz
P ( , ui ) = a + b + cui , (9.4)
where the coefficients a, b and c have to be determined, in equilibrium. Under these conditions,
the average price is a function p = P̄ ( ), equal to:
P̄ ( ) = a + b . (9.5)
242
9.2. Prelude: imperfect information in macroeconomics c
°by A. Mele
Let us replace Eqs. (9.4), (9.5) and (9.2) into Eq. (9.3). By rearranging terms, we obtain:
a = E (m) ,
and the coefficients for and ui must both equal zero, leading to the following expressions for
b and c:
1 1
b= , c= . (9.6)
1+β θ+β
We are left with determining β, which given Eqs. (9.4)-(9.5), and Eq. (9.6), is easily shown to
equal:
σ 2u
β= ³ ´2 . (9.7)
2 θ+β 2
σ u + 1+β σ
The positive fixed point to this equation, which is easily shown to exist, delivers β, which can
then be replaced back into Eqs. (9.6), to yield the solutions for b and c, which are both positive.
We can now figure out the implications of this equilibrium. By replacing Eqs. (9.4)-(9.5) into
the Lucas supply equation (9.1), leaves:
y s = βb .
This is Lucas celebrated neutrality result. Anticipated monetary policy, E (m), does not affect
the equilibrium outcome, y s . Instead, it is the monetary shock that affects y s . Agents in any
one island do not observe the price in the remaining islands and, hence, the aggregate price
level, p. Therefore, they are unable to tell whether an increase in the price of the good they
produce, pi , is due to a real shock, ui , or to a monetary shock, . In other words, they can
not disentangle a monetary shock from a real shock. If the agents were informed about the
real shocks in the other islands, they would of course infer , and a monetary shock would not
exert any effect on the equilibrium production. Formally, in equilibrium, the price difference,
pi − p = cui , which does not depend on , a standard “dichotomy” prediction reminiscent of
classical theory. But pi −p is not observed, as p is not observed. Instead, the producers in the i-th
island can only guess pi −E (p| pi ) = b +cui , which co-varies positively with the observed price,
pi , cov (pi − p, pi ) = c2 σ 2u . This covariance is zero precisely when we remove the assumption of
imperfect knowledge about the real shocks, so that σ 2u = 0, in which case β = 0. By contrast,
with imperfect knowledge, producers act so as to compensate for their partial lack of knowledge,
and produce to the maximum extent they can justify, on the basis of the positive statistical
co-movements, cov (pi − p, pi ) > 0. Note, if E (m) = m−1 , i.e. money supply in the previous
period, then from Eq. (9.5), the inflation rate, p − p−1 = b + (1 − b) −1 . Therefore, output and
inflation are positively correlated, and generate a Phillips curve, which policy makers can not
exploit anyway, as anticipated monetary policy, E (m), is rationally “factored out,” and does
not affect output. This is the essence of the Lucas critique (Lucas, 1977).
In the next sections, we present a number of models that work due to a similar mechanism.
Why should we ever purchase an asset from any one else, who is insisting in selling it to the
market? Trading seems to be a difficult phenomenon to explain, in a world with imperfect
information. Yet trading does occur, if imperfect information has the same nature as that of
243
9.3. Grossman-Stiglitz paradox c
°by A. Mele
the Phelps-Lucas model. Agents might well be imperfectly informed about the nature of, say,
unusually high market orders. For example, huge sell orders might arrive to the market, either
because the asset is a lemon or because the agents selling it are hit by a liquidity shock. In
the models of this section, an equilibrium with rational expectation exists, precisely because of
this “noise”–liquidity, in this example. There is a chance the sell order arrives to the market,
simply because the agents selling it are hit by a liquidy shock. Imperfectly informed agents,
therefore, might be willing to buy, if it is in their interest to do so.
References
Blanchard, O. and N. Kiyotaki (1987): “Monopolistic Competition and the Effects of Aggregate
Demand.” American Economic Review 77, 647-666.
Lucas, R. E., Jr. (1972): “Expectations and the Neutrality of Money.” Journal of Economic
Theory 4, 103-124.
Lucas, R. E., Jr. (1973): “Some International Evidence on Output-Inflation Tradeoffs.” Amer-
ican Economic Review 63, 326-334.
Lucas, R. E., Jr. (1981): Studies in Business-Cycle Theory. Boston, MIT Press.
245
Part III
Applied asset pricing theory
246
10
Options and volatility
10.1 Introduction
This chapter is under construction. I shall include material on futures, American options, exotic
options, and evaluation of contingent claims through trees. I shall illustrate more systemati-
cally the general ideas underlying implied trees, and cover details on how to deal with market
imperfections.
K K + c(t)
S(T) S(T)
S(t)
− c(t)
− S(t)
Buy share Buy call
S(t)
π (-S)T = S(t) - S(T) π (p)T = p(T) - p(t)
K
S(T) S(T)
S(t) K - p(t)
-p(t)
Short‐sell share
Buy put
c(t) p(t)
K + c(t)
S(T) S(T)
K K
K - p(t)
p(t) - K
Sell call Sell put
FIGURE 10.1.
248
10.2. General properties of options c
°by A. Mele
K K+c(t)
S(T)
-S(t) A
-c(t)
-(K+c(t))
-S(t) B
FIGURE 10.2.
Figure 10.2 illustrates this. It depicts two cases. The first case is such that the share price is
less than the option price. In this case, the net profits generated by the purchase of the share,
the AA line, would strictly dominate the net profits generated by the purchase of the option,
π (C)T , an arbitrage opportunity. Therefore, we need that C (t) < S (t) to rule this arbitrage
opportunity. Next, let us consider a second case, which arises when the share and option prices
are such that the net profits generated by the purchase of the share, the BB line, is always
dominated by the net profits generated by the purchase of the option, π(C)T , an arbitrage
opportunity. Therefore, the price of the share and the price of the option must be such that
S (t) < K + C (t), or C (t) > S (t) − K. Finally, note that the option price is always strictly
positive as the option payoff is nonnegative. Therefore, we have that (S (t) − K)+ < C (t) <
S (t). Theorem 8.1 below provides a formal derivation of this result in the more general case in
which the short-term rate is positive.
Let us go back to Figure 10.1. This figure suggests that the price of the call and the price
of the put are intimately related. Indeed, by overlapping the first two panels, and using the
same arguments used in Figure 10.2, we see that for two call and put contracts with the same
exercise price K and expiration date T , we must have that C (t) > p (t) −K. The put-call parity
provides the exact relation between the two prices C (t) and p (t). Let P (t, T ) be the price as
of time t of a pure discount bond expiring at time T . We have:
Theorem 10.1 (Put-call parity). Consider a put and a call option with the same exercise
price K and the same expiration date T . Their prices p(t) and C(t) satisfy, p (t) = C (t) −
S (t) + KP (t, T ).
Proof. Consider two portfolios: (A) Long one call, short one underlying asset, and invest
KP (t, T ); (B) Long one put. The table below gives the value of the two portfolios at time t
249
10.2. General properties of options c
°by A. Mele
and at time T .
Value at T
Value at t S(T ) ≤ K S(T ) > K
Portfolio A C (t) − S (t) + KP (t, T ) −S(T ) + K S(T ) − K − S(T ) + K
Portfolio B p (t) K − S (T ) 0
The two portfolios have the same value in each state of nature at time T . Therefore, their values
at time t must be identical to rule out arbitrage. ¥
By the put-cal parity in Theorem 8.1, the properties of European put prices can be mechani-
cally deduced from those of the corresponding call prices. We focus the discussion on European
call options. The following result gathers some basic properties of European call option prices
before the expiration date, and generalizes the reasoning underlying Figure 10.2.
Theorem 10.2. The rational option price C (t) = C (S (t) ; K; T − t) satisfies the following
properties: (i) C (S (t) ; K; T − t) ≥ 0; (ii) C (S (t) ; K; T − t) ≥ S(t) − KP (t, T ); and (iii)
C (S (t) ; K; T − t) ≤ S (t).
Proof. Part (i) holds because Pr {C (S (T ) ; K; 0) > 0} > 0, which implies that C must be
nonnegative at time t to preclude arbitrage opportunities. As regards Part (ii), consider two
portfolios: Portfolio A, buy one call; and Portfolio B, buy one underlying asset and issue debt
for an amount of KP (t, T ). The table below gives the value of the two portfolios at time t and
at time T .
Value at T
Value at t S(T ) ≤ K S(T ) > K
Portfolio A C(t) 0 S(T ) − K
Portfolio B S (t) − KP (t, T ) S(T ) − K S(T ) − K
At time T , Portfolio A dominates Portfolio B. Therefore, in the absence of arbitrage, the value
of Portfolio A must dominate the value of Portfolio B at time t. To show Part (iii), suppose the
contrary, i.e. C (t) > S (t), which is an arbitrage opportunity. Indeed, at time t, we could sell
m options (m large) and buy m of the underlying assets, thus making a sure profit equal to
m · (C (t) − S (t)). At time T , the option will be exercized if S (T ) > K, in which case we shall
sell the underlying assets and obtain m · K. If S (T ) < K, the option will not be exercized, and
we will still hold the asset or sell it and make a profit equal to m · S (T ). ¥
250
10.2. General properties of options c
°by A. Mele
c(t)
A B
A
45° B
A S(t)
K b(t,T)
c(t) c(t)
B C
S(t) S(t)
FIGURE 10.3.
The previous results provide the basic properties, or arbitrage bounds, that option prices
satisfy. First, consider the top panel of Figure 10.3. Eq. (10.1) tells us that C (t) must lie inside
the AA and the BB lines. Second, Corollary 10.3(i) tells us that the rational option price starts
from the origin. Third, Eq. (10.1) also reveals that as S → ∞, the option price also goes to
infinity; but because C cannot lie outside the the region bounded by the AA line and the BB
lines, C will go to infinity by “sliding up” through the BB line.
How does the option price behave within the bounds AA and BB? That is impossible to
tell. Given the boundary behavior of the call option price, we only know that if the option
price is strictly convex in S, then it is also increasing in S, In this case, the option price could
be as in the left-hand side of the bottom panel of Figure 10.3. This case is the most relevant,
empirically. It is predicted by the celebrated Black and Scholes (1973) formula. However, this
property is not a general property of option prices. Indeed, Bergman, Grundy and Wiener
(1996) show that in one-dimensional diffusion models, the price of a contingent claim written
on a tradable asset is convex in the underlying asset price if the payoff of the claim is convex
in the underlying asset price (as in the case of a Europen call option). In our context, the
boundary conditions guarantee that the price of the option is then increasing and convex in the
price of the underlying asset. However, Bergman, Grundy and Wiener provide several counter-
examples in which the price of a call option can be decreasing over some range of the price of
the asset underlying the option contract. These counter-examples include models with jumps,
or the models with stochastic volatility that we shall describe later in this chapter. Therefore,
there are no reasons to exclude that the option price behavior could be as that in right-hand
side of the bottom panel of Figure 10.3.
251
10.2. General properties of options c
°by A. Mele
c(t)
T1 T2 T3
45°
S(t)
K b(t,T)
FIGURE 10.4.
The economic content of convexity in this context is very simple. When the price S is small,
it is unlikely that the option will be exercized. Therefore, changes in the price S produce little
effect on the price of the option, C. However, when S is large, it is likely that the option
will be exercized. In this case, an increase in S is followed by almost the same increase in C.
Furthermore, the elasticity of the option price with respect to S is larger than one,
dC S
≡ · > 1,
dS C
as for a convex function, the first derivative is always higher than the secant. Overall, the option
price is more volatile than the price of the asset underlying the contract. Finally, call options
are also known as “wasting assets”, as their value decreases over time. Figure 10.4 illustrate
this property, in correspondence of the maturity dates T1 > T2 > T3 .
These properties illustrate very simply the general principles underlying a portfolio that
“mimicks” the option price. For example, investment banks sell options that they want to
hedge against, to avoid the exposure to losses illustrated in Figure 10.1. At a very least, the
portfolio that “mimics” the option price must exhibit the previous general properties. For
example, suppose we wish our portfolio to exhibit the behavior in the left-hand side of the
bottom panel of Figure 10.3, which is the most relevant, empirically. We require the portfolio
to exhibit a number of properties.
(p-i) The portfolio value, V , must be increasing in the underlying asset price, S.
(p-ii) The sensitivity of the portfolio value with respect to the underlying asset price must be
strictly positive and bounded by one, 0 < dVdS
< 1.
(p-iii) The elasticity of the portfolio value with respect to the underlying asset price must be
strictly greater than one, dV
dS
· VS > 1.
(c-i) The portfolio includes the asset underlying the option contract.
(c-ii) The number of assets underlying the option contract is less than one.
252
10.3. Evaluation c
°by A. Mele
(c-iii) The portfolio includes debt to create a sufficiently large elasticity. Indeed, let V = θS −D,
where θ is the number of assets underlying the option contract, with θ ∈ (0, 1), and D is
debt. Then, dV
dS
> θ and dV · S = θ · VS > 1 ⇔ θS > V = θS − D, which holds if and only
dS V
if D > 0.
In fact, the hedging problem is dynamic in nature, and we would expect θ to be a function
of the underlying asset price, S, and time to expiration. Therefore, we require the portfolio to
display the following additional property:
(p-iv) The number of assets underlying the option contract must increase with S. Moreover,
when S is low, the value of the portfolio must be virtually insensitive to changes in
S. When S is high, the portfolio must include mainly the assets underlying the option
contract, to make the portfolio value “slide up” through the BB line in Figure 10.3.
The previous property holds under the following condition:
(c-iv) θ is an increasing function of S, with limS→0 θ (S) → 0 and limS→∞ θ (S) → 1.
Finally, the purchase of the option does not entail any additional inflows or outflows until time
to expiration. Therefore, we require that the “mimicking” portfolio display a similar property:
(p-v) The portfolio must be implemented as follows: (i) any purchase of the asset underlying
the option contract must be financed by issue of new debt; and (ii) any sells of the asset
underlying the option contract must be used to shrink the existing debt:
The previous property of the portfolio just says that the portfolio has to be self-financing, in
the sense described in the first Part of these lectures.
(c-v) The portfolio is implemented through a self-financing strategy.
We now proceed to add more structure to the problem.
10.3 Evaluation
We consider a continuous-time model in which asset prices are driven by a d-dimensional Brow-
nian motion W .1 We consider a multivariate state process
P
dY (h) (t) = ϕh (y (t)) dt + dj=1 hj (y (t)) dW (j) (t) ,
for some functions ϕh and hj (y), satisfying the usual regularity conditions.
The price of the primitive assets satisfies the regularity conditions in Chapter 4. The value of
a portfolio strategy, V , is V (t) = θ (t) · S+ (t). We consider a self-financing portfolio. Therefore,
V is solution to
h i
dV (t) = π (t)> (μ (t) − 1m r (t)) + r (t) V (t) − C (t) dt + π (t)> σ (t) dW (t) , (10.2)
where π ≡ (π (1) , ..., π (m) )> , π (i) ≡ θ(i) S (i) , μ ≡ (μ(1) , ..., μ(m) )> , S (i) is the price of the i-th asset,
μ(i) is its drift and σ (t) is the volatility matrix of the price process. We impose that V satisfy
the same regularity conditions in Chapter 4.
where F is a process with finite variation, and γ̃ ∈ L20,T,d (Ω, F, P ). We wish to replicate A
through a portfolio. First, then, we must look for a portfolio π satisfying
dF (t)
= π (t)> (μ (t) − 1m r (t)) + r (t) V (t) = π (t)> (μ (t) − 1m r (t)) + r (t) F (t) . (10.4)
dt
The second equality holds because if drift and diffusion terms of F and V are identical, then
F (t) = V (t).
Clearly, if m < d, there are no solutions for π in Eq. (10.3). The economic interpretation is that
in this case, the number of assets is so small that we can not create a portfolio able to replicate
all possible events in the future. Mathematically, if m < d, then V x,π (T ) ∈ M ⊂ L2 (Ω, F, P ).
As Chapter 4 emphasizes, there is also a converse to this result, which motivates the definition
of market incompleteness given in Chapter 4 (Definition 4.5).
10.3.2 Pricing
Eq. (10.4) is all we need to price any derivative asset which promises to pay off some F(T )-
measurable random variable. The first step is to cast Eq. (10.4) in a less abstract format. Let
us consider a semimartingale in the context of the diffusion model of this section. For example,
let us consider the price process H (t) of a European
¡ call option.
¢ This price process is rationally
formed if H (t) = C (t, y (t)), for some C ∈ C 1,2 [0, T ) × Rk . By Itô’s lemma,
µ ¶
C ∂C
dC = μ̄ Cdt + J dW,
∂Y
254
10.3. Evaluation c
°by A. Mele
Pk Pk Pk ∂2C
where μ̄C C = ∂C∂t
+ ∂C
l=1 ∂yl ϕl (t, y) + 1
2 l=1 j=1 ∂yl ∂yj cov (yl , yj ); ∂C/ ∂Y is 1 × d; and J is
d × d. Finally,
C (T, y) = X̃ ∈ L2 (Ω, F, P ) .
In this context, μ̄C C and (∂C/ ∂Y ) · J play the same roles as dF / dt and γ̃ in the previous
∂C
section. In particular, the volatility identification ∂Y J = π > σ, corresponds to Eq. (10.3).
As an example, let m = d = 1, and suppose that the only state variable of the economy is
the price of a share, ϕ(s) = μs, cov(s) = σ2 s2 , and μ and σ2 are constants, then (∂C/ ∂Y ) · J =
CS σS, π = CS S, and by Eq. (10.4),
∂C ∂C 1 ∂ 2C 2 2 ∂C
+ μS + 2
σ S = π (μ − r) + rC = S (μ − r) + rC. (10.5)
∂t ∂S 2 ∂S ∂S
By the boundary condition, C (T, s) = (s − K)+ , one obtains that the solution is the celebrated
Black and Scholes (1973) formula,
where
S Z
log( K ) + (r + 12 σ 2 )(T − t) √ 1 x
1 2
d1 = √ , d2 = d1 − σ T − t, Φ(x) = √ e− 2 u du.
σ T −t 2π −∞
Note that we can obtain Eq. (10.6) even without assuming that a market exists for the
option,2 and that the pricing function C (t, S) is differentiable. As it turns out, the option price is
differentiable, but this can be shown to be a result, not just an assumption. Indeed, let us define
the function C (t, S) that solves Eq. (10.5), with boundary condition C (T, S) = (S − K)+ .
Note, we are not assuming this function is the option price. Rather, we shall show this is the
option price. Consider a self-financed portfolio of bonds and stocks, with π = CS S. Its value
satisfies,
dV = [CS S(μ − r) + rV ] dt + CS σSdW.
Moreover, by Itô’s lemma, C (t, s) is solution to
µ ¶
1 2 2
dC = Ct + μSCS + σ S CSS dt + CS σSdW.
2
Hence, we have that V (τ ) − C (τ , S (τ )) = [V (0) − C (0, S (0))] exp(rτ ), for all τ ∈ [0, T ].
Next, assume that V (0) = C (0, S (0)). Then, V (τ ) = C (τ , S (τ )) and V (T ) = C (T, S (T )) =
(S (T ) − K)+ . That is, the portfolio π = CS S replicates the payoff underlying the option
contract. Therefore, V (τ ) equals the market price of the option. But V (τ ) = C (τ , S (τ )), and
we are done.
2 The original derivation of Black and Scholes (1973) and Merton (1973) relies on the assumption that an option market exists
3 Moreover, Eqs. (10.8), (10.9) and (10.10) below can be seen as particular cases of the general results given in Chapter 6.
256
10.4. Properties of models c
°by A. Mele
By the maximum principle, again, ∆C > 0 whenever CSS > 0. Therefore, it follows that if
option prices are convex in the underlying asset price, then they are also always increasing in
the volatility of the underlying asset prices. Economically, this result follows because volatility
changes are mean-preserving spread in this context. We are left to show that CSS > 0. Let
us differentiate Eq. (10.8) with respect to S. The result is that Z ≡ HS = CSS satisfies the
following partial differential equation,
½
0 = Zτ + (r + 2v0 (S)) ZS + ZSS σ(S) − (r − σ 00 (S))Z for all (τ , S) ∈ [t, T ) × R++
H(S, T, T ) = ψ00 (S) for all S ∈ R++
(10.10)
By the maximum principle, we have that
H (S, τ , T ) > 0 for all (τ , S) ∈ [t, T ] × R++ , whenever ψ00 (S) > 0 ∀S ∈ R++ .
That is, we have that in the scalar diffusion setting, the option price is always convex in the
underlying asset price if the terminal payoff is convex in the underlying asset price. In other
terms, the convexity of the terminal payoff propagates to the convexity of the pricing function.
Therefore, if the terminal payoff is convex in the underlying asset price, then the option price
is always increasing in the volatility of the underlying asset price.
where Q is the risk-neutral measure and q(x+ | x)dx ≡ dQ(x+ | x). This is indeed a very general
formula, as it does not rely on any parametric assumptions for the dynamics of the price
underlying the option contract. Let us differentiate the previous formula with respect to K,
Z ∞
r(T −t) ∂C (S(t), t, T ; K)
e =− q (x| S(t)) dx,
∂K K
∂ 2 C (S(t), t, T ; K)
er(T −t) = q (K| S(t)) . (10.11)
∂K 2
Eq. (10.11) provides a means to “recover” the risk-neutral density using option prices. The
Arrow-Debreu state density, AD (S + = u| S(t)), is given by,
¯
¡ + ¯ ¢ ¡ ¯ ¢¯ ∂ 2
C (S(t), t, T ; K) ¯
AD S = u¯ S(t) = er(T −t) q S + ¯ S (t) ¯S + =u = e2r(T −t) ¯
¯ .
∂K 2 K=u
yt = a + t , t | Ft−1 ∼ N(0, σ 2t ), σ 2t = w + α 2
t−1 + βσ 2t−1 , (10.12)
where a, w, α and β are parameters and Ft denotes the information set as of time t. This model
is known as the GARCH(1,1) model (Generalized ARCH). It was introduced by Bollerslev
(1986), and collapses to the ARCH(1) model introduced by Engle (1982) once we set β = 0.
ARCH models have played a prominent role in the analysis of many aspects of financial
econometrics, such as the term structure of interest rates, the pricing of options, or the presence
of time varying risk premia in the foreign exchange market. A classic survey is that in Bollerslev,
Engle and Nelson (1994).
The quintessence of ARCH models is to make volatility dependent on the variability of past
observations. An alternative formulation, initiated by Taylor (1986), makes volatility driven
by some unobserved components. This formulation gives rise to the stochastic volatility model.
Consider, for example, the following stochastic volatility model,
yt = a + t ; ∼ N(0, σ 2t );
t | Ft−1
log σ 2t = w + α log 2 2
t−1 + β log σ t−1 + η t ; ηt | Ft−1 ∼ N(0, σ 2η )
where a, w, α, β and σ2η are parameters. The main difference between this model and the
GARCH(1,1) model in Eq. (10.12) is that the volatility as of time t, σ2t , is not predetermined
by the past forecast error, t−1 . Rather, this volatility depends on the realization of the stochastic
volatility shock η t at time t. This makes the stochastic volatility model considerably richer than
a simple ARCH model. As for the ARCH models, SV models have also been intensively used,
especially following the progress accomplished in the corresponding estimation techniques. The
seminal contributions related to the estimation of this kind of models are mentioned in Mele
and Fornari (2000). Early contributions that relate changes in volatility of asset returns to
economic intuition include Clark (1973) and Tauchen and Pitts (1983), who assume that a
stochastic process of information arrival generates a random number of intraday changes of the
asset price.
Merton (1973) formulae was too restrictive. The Black-Scholes model assumes that the price of
the asset underlying the option contract follows a geometric Brownian motion,
dSt
= μdτ + σdWt ,
St
where W is a Brownian motion, and μ, σ are constants. As explained earlier, σ is the only
parameter to enter the Black-Scholes-Merton formulae.
The assumption that σ is constant is inconsistent with the empirical evidence reviewed in
the previous section. This assumption is also inconsistent with the empirical evidence on the
cross-section of option prices. Let CBS (St , t; K, T, σ) be the option price predicted by the Black-
Scholes formula, when the stock price is St , the option contract has a strike price equal to K,
and the maturity is K, and let the market price be Ct$ (K, T ). Then, empirically, the implied
volatility”, i.e. the value of σ that equates the Black-Scholes formula to the market price of the
option, IV say,
CBS (St , t; K, T, IV) = Ct$ (K, T ) (10.13)
depends on the “moneyness of the option,” defined as,
St er(T −t)
mo ≡ ,
K
where r is the short-term rate, K is the strike of the option, and T is the maturity date of
the option contract. By the results in Section 10.4.1, we know the Black-Scholes option price is
strictly increasing in σ. Therefore, the previous definition makes sense, in that there exists an
unique value IV such that Eq. (10.13) holds true. In fact, the market practice is to quote option
prices in terms of their implied volatilities - rather than in terms of prices. Moreover, this same
implied volatility relates to both the call and the put option prices. Consider the put-call parity
in Theorem 10.1,
Pt (K, T ) = Ct (K, T ) − St + Ke−r(T −t) ,
Naturally, for each σ, this same equation must necessarily hold for the Black-Scholes model, i.e.
PBS (St , t; K, T, σ) = CBS (St , t; K, T, σ) − St + Ke−r(T −t) . Subtracting this equation from the
previous one, we see that, the implied volatilities of a call and a put options are the same.
The crucial empirical point is that the IV exhibits a pattern. Before 1987, it did not display
1
a clear pattern, or a ∪-shaped pattern in mo at best, a “smile.” After the 1987 crash, the smile
turned in to a “smirk,” also referred to as “volatility skew.”
Why a smile? There are many explanations. The first, is that options (be they call or puts)
that are deep-in-the-money and options (be they call or puts) that are deep-out-of the money are
relatively less liquid and therefore command a liquidity risk-premium. Since the Black-Scholes
1
option price is increasing in volatility, the implied volatility is then, ∪-shaped in mo .
A second explanation relates to the Black-Scholes assumption that asset returns are log-
normally distributed. This assumption may not be correct, as the market might be pricing using
an alternative distribution. One possibility is that such an alternative distribution puts more
weight on the tails, as a result of the market fears about the occurrence of extreme outcomes.
For example, the market might fear the stock price will decrease under a certain level, say K.
As a result, the market density should then have a left tail ticker than that of the log-normal
density, for values of S < K. This implies that the probability deep-out-of-the-money puts (i.e.,
those with low strike prices) will be exercized is higher under the market density than under the
259
10.5. Stochastic volatility c
°by A. Mele
log-normal density. In other words, the volatility needed to price deep-out-of-the-money puts is
larger than that needed to price at-the-money calls and puts.
At the other extreme, if the market fears that the stock price will be above some K̄, then, the
market density should exhinit a right tail ticker than that of the log-normal density, for values
of S > K̄, which implies a larger probability (compared to the log-normal) that deep-out-of-the-
money calls (i.e., those with high strike prices) will be exercized. Then, the implied volatility
needed to price deep-out-of-the-money calls is larger than that needed to price at-the-money
calls and puts. The second effect has disappeared since the 1987 crash, leaving the “smirk.”
Ball and Roma (1994) and Renault and Touzi (1996) were the first to note that a smile effect
arises when the asset return exhibits stochastic volatility. In continuous time,4
dS (t)
= μdt + σ (t) dW (t)
S (t) (10.14)
dσ2 (t) = b(S (t) , σ (t))dτ + a(S (t) , σ (t))dW σ (t)
where W σ is another Brownian motion, and b and a are some functions satisfying the usual
regularity conditions. In other words, let us suppose that Eqs. (10.14) constitute the data
generating process. Then, the fundamental theorem of asset pricing (FTAP, henceforth) tells
us that there is a probability equivalent to P , Q say (the risk-neutral probability), such that
the rational option price C(S(t), σ(t)2 , t, T ) is given by,
£ ¯ ¤
C(S(t), σ(t)2 , t, T ) = e−r(T −t) E (S(T ) − K)+ ¯ S(t), σ(t)2 ,
where E [·] is the expectation taken under the probability Q. Next, if we continue to assume
that option prices are really given by the previous formula, then, by inverting the Black-Scholes
formula produces a “constant” volatility that is ∪-shaped with respect to K.
The first option pricing models with stochastic volatility are developed by Hull and White
(1987), Scott (1987) and Wiggins (1987). Explicit solutions have always proved hard to derive.
If we exclude the approximate solution provided by Hull and White (1987) or the analytical
solution provided by Heston (1993b),5 we typically need to derive the option price through
some numerical methods based on Montecarlo simulation or the numerical solution to partial
differential equations.
In addition to these important computational details, models with stochastic volatility raise
serious economic concerns. Typically, the presence of stochastic volatility generates market
incompleteness. As we pointed out earlier, market incompleteness means that we can not hedge
against future contingencies. In our context, market incompleteness arises because the number
of the assets available for trading (one) is less than the sources of risk (i.e. the two Brownian
motions).6 In our option pricing problem, there are no portfolios including only the underlying
4 In an important paper, Nelson (1990) shows that under regularity conditions, the GARCH(1,1) model converges in distribution
a square root process, the instantaneous variance of the process is proportional to the level reached by that process: in model
(10.14), for instance, a(S, σ) = a · σ, where a is a constant. In this case, it is possible to show that the characteristic function is
exponential-affine in the state variables S and σ. Given a closed-form solution for the characteristic function, the option price is
obtained through standard Fourier methods.
6 Naturally, markets can be “completed” by the presence of the option. However, in this case the option price is not preference
free.
260
10.5. Stochastic volatility c
°by A. Mele
asset and a money market account that could replicate the value of the option ¡ at the expiration ¢
date. Precisely, let C be the rationally formed price at time t, i.e. C (τ ) = C S (τ ) , σ (τ )2 , τ , T ,
where σ(τ )2 is driven by a Brownian motion W σ , which is different from W . The value of the
portfolio that only includes the underlying asset is only driven by the Brownian motion driving
the underlying asset price, i.e. it does not include W σ . Therefore, the value of the portfolio does
not factor in all the random fluctuations that move the return volatility, σ (τ )2 . Instead, the
option price depends on this return¡volatility as we have ¢ assumed that the option price, C (τ ),
2
is rationally formed, i.e. C (τ ) = C S (τ ) , σ (τ ) , τ , T .
In other words, trading with only the underlying asset can not lead to a perfect replication of
the option price, C. In turn, rembember, a perfect replication of C is the condition we need to
obtain a unique preference-free price for the option. To summarize, the presence of stochastic
volatility introduces two inextricable consequences:7
• There is an infinity of option prices that are consistent with the requirement that there
are no arbitrage opportunities.
• Perfect hedging strategies are impossible. Instead, we might, alternatively, either (i) use
a strategy, which is not self-financed, but that allows for a perfect replication of the claim
or (ii) a self-financed strategy for some misspecified model. In case (i), the strategy leads
to a hedging cost process. In case (ii), the strategy leads to a tracking error process, but
there can be situations in which the claim can be “super-replicated”, as we explain below.
Next, let us consider a self-financed portfolio that includes (i) one call, (ii) −α shares, and
(iii) −β units of the money market account (MMA, henceforth). The value of this portfolio is
V = C − αS − βP , and satisfies
dV = dC − αdS − βdP
∙ ¸
∂C 1 2 2 1 2
= + μS (CS − α) + bCσ2 + σ S CSS + a Cσ2 σ2 − rβP dτ + σS (CS − α) dW + aCσ2 dW σ .
∂t 2 2
As is clear, only when a = 0, we could zero the volatility of the portfolio value. In this case,
we could set α = CS and βP = C − αS − V , leaving
µ ¶
∂C 1 2 2
dV = + bCσ2 + σ S CSS − rC + rSCS + rV dτ ,
∂t 2
7 The mere presence of stochastic volatility is not necessarily a source of market incompleteness. Mele (1998) (p. 88) considers
a “circular” market with m asset prices, in which (i) the asset price no. i exhibits stochastic volatility, and (ii) this stochastic
volatility is driven by the Brownian motion driving the (i − 1)-th asset price. Therefore, in this market, each asset price is solution
to the Eqs. (10.14) and yet, by the previous circular structure, markets are complete.
261
10.5. Stochastic volatility c
°by A. Mele
where we have used the equality V = C. The previous equation shows that the portfolio is
locally riskless. Therefore, by the FTAP,
∂C 1
0= + bCσ2 + σ 2 S 2 CSS − rC + rSCS + rV = rV.
∂t 2
The previous equation generalizes the Black-Scholes equation to the case in which volatility
is time-varying and non-stochastic, as a result of the assumption that a = 0. If a 6= 0, return
volatility is stochastic and, hence, there are no hedging portfolios to use to derive a unique
option price. However, we still have the possibility to characterize the price of the option.
Indeed, consider a self-financed portfolio with (i) two calls with different strike prices and
maturity dates (with weights 1 and γ), (ii) −α shares, and (iii) −β units of the MMA. We
denote the price processes of these two calls with C 1 and C 2 . The value of this portfolio is
V = C 1 + γC 2 − αS − βP , and satisfies,
where the second equality follows by the definition of α, and by rearranging terms. Finally, by
using the definition of γ, and by rearranging terms,
These ratios agree. So they must be equal to some process a · Λσ (say) independent of both the
strike prices and the maturity of the options. Therefore, we obtain that,
∂C 1 1
+ rSCS + [b − aΛσ ] Cσ2 + σ 2 S 2 CSS + a2 Cσ2 σ2 = rC. (10.16)
∂t 2 2
The economic interpretation of Λσ is that of the unit risk-premium required to face the risk
of stochastic fluctuations in the return volatility. The problem, the requirement of absence of
arbitrage opportunities does not suffice to recover a unique Λσ . In other words, by the Feynman-
Kac stochastic representation of a solution to a PDE, we have that the solution to Eq. (10.16)
is, £ ¯ ¤
C(S(t), σ(t)2 , t, T ) = e−r(T −t) EQΛ (S(T ) − K)+ ¯ S(t), σ(t)2 , (10.17)
262
10.5. Stochastic volatility c
°by A. Mele
where β S is the beta related to the volatility of the option price induced by fluctuations in
the stock price, S, and β σ2 is the beta related to the volatility of the option price induced by
fluctuations in the return volatility.
Therefore, the tracking error, defined as the difference between the Black-Scholes price and the
portfolio value,
et ≡ CBS (St , t; K, T, IV0 ) − Vt ,
satisfies, µ ¶
1¡ 2 2
¢ 2 ∂ 2 CBS
det = ret + σ − IV0 St dt.
2 t ∂S 2
At maturity T :
eT ≡ CBS (ST , T ; K, T, IV0 ) − VT
= max {ST − K, 0} − VT
Z
1 rT T −rt ¡ 2 ¢ ∂ 2 CBS
= e e σ t − IV20 St2 dt. (10.18)
2 0 ∂S 2
We know the Black-Scholes price is convex. Hence, the previous formula says that even if we
do not exactly know the law of movement for volatility, but still hold the view it will increase
in the future, we could do the following: (i) buy a call option; (ii) Black-Scholes hedge it. Eq.
(10.18) shows that this strategy leads to a positive profit with probability one. Naturally, this
is not an arbitrage opportunity. The critical assumption we are making is that volatility will
increase in the future.
Although we can always implement a strategy like this, the relevant profits (if any) are “price-
dependent.” Moreover, the strategy is costly, as it relies on expensive ∆-hedging. Volatility
contracts overcome this difficulty and are described in Section 10.6 below.
where BS(S(t), t, T ; Ṽ ) is the Black-Scholes formula obtained by replacing the constant σ 2 with
Ṽ , and Z T
1
Ṽ = σ(τ )2 dτ .
T −t t
This formula tells us that the option price is simply the Black-Scholes formula averaged over
all the possible “values” taken by the future average volatility Ṽ . A proof of this equation is
given in the appendix.8
The most widely used formula is the Heston’s (1993b) formula, which holds when the return
volatility is a square-root process.
8 The result does not hold in the general case in which the asset price and volatility are correlated. However, Romano and Touzi
(1997) prove that a similar result holds in such a more general case.
264
10.6. Local volatility c
°by A. Mele
• Obviously, stochastic volatility models do not allow for a perfect hedge. Their main draw-
back is that they can not perfectly FIT the Smile.
• Towards the end of 1980s and the beginning of the 1990s, interest rates modelers invented
models that allow a perfect fit of the initial yield curve.
• Important for interest rate derivatives.
• In 1993 and 1994, Derman & Kani, Dupire and Rubinstein come up with a technology
that could be applied to options on tradables.
• Why is it important to exactly fit the structure of already existing plain vanilla options?
• Plain vanilla versus exotics. Suppose you wish to price exotic, or illiquid, options.
• The model you use to price the illiquid option must predict that the plain vanilla option
prices are identical to those your company is selling! How can we trust a model that is
not even able to pin down all outstanding contracts? - Arbitrage opportunities for quants
and traders?
1. Start with a set of actively traded (i.e. liquid) European options. Let K and T be strikes
and time-to-maturity. Let us be given a collection of prices:
C$ (K, T ) ≡ C (K, T ) , K, T varying.
• It turns out, empirically, that σ loc (x, t) is typically decreasing in x for fixed t, a phe-
nomenon known as the Black-Christie-Nelson leverage effect. This fact leads some prac-
titioners to assume from the outset that σ(x, t) = xα f(t), for some function f and some
constant α < 0. This gives rise to the so-called CEV (Constant Elasticity of Variance)
model.
• More recently, practitioners use models that combine “local vols” with “stoch vol”, such
as
dSt
= rdt + σ(St , t) · vt · dŴt
St (10.20)
dvt = φ(vt )dt + ψ(vt )dŴtv
where Ŵ v is another Brownian motion, and φ, ψ are some functions (φ includes a risk-
premium). It is possible to show that in this specific case,
σ loc (K, T )
σ̃ loc (K, T ) = p (10.21)
E (vT2 | ST )
would be able to pin down the initial structure of European options prices. (Here σ loc (K, T )
is as in Eq. (10.19).)
• In this case, we simulate
⎧
⎨ dSt
= rdt + σ̃ loc (St , t) · vt · dŴt
S
⎩ dvt = φ(v )dt + ψ(v )dŴ v
t t t t
• The previous developments can be used to address very important issues. Define the total
“integrated” variance within the time interval [T1 , T2 ] (T1 > t) to be
RT
IVT1 ,T2 ≡ T12 σ 2u du.
For reasons developed below, let us compute the risk-neutral expectation of such a “real-
ized” variance. This can easily be done. If r = 0, then by Eq. (10.22),
Z
Ct (K, T2 ) − Ct (K, T1 )
E (IVT1 ,T2 ) = 2 dK, (10.23)
K2
where Ct (K, T ) is the price as of time t of a call option expiring at T and struck at K.
A proof of Eq. (10.23) is in the appendix.
266
10.6. Local volatility c
°by A. Mele
where F (t) is the forward price: F (t) = er(T −t) S (t), and Pt (K, T ) is the price as of time
t of a put option expiring at T and struck at K. A proof of Eq. (10.24) is in the appendix.
• In September 2003, the Chicago Board Option Exchange (CBOE) changed its stochastic
volatility index VIX to approximate the variance swap rate of the S&P 500 index return
(for 30 days). In March 2004, the CBOE launched the CBOE Future Exchange for trading
futures on the new VIX. Options on VIX are also forthcoming.
A variance swap is a contract that has zero value at entry (at t). At maturity T , the
buyer of the swap receives,
where SWt,T is the swap rate established at t and paid off at time T . Therefore, if r is
deterministic,
SWt,T = E (IVt,T ) ,
where E (IVt,T ) is given by Eq. (10.24). Therefore, (10.24) is used to evaluate these variance
swaps.
• Finally, it is worth that we mention that the previous contracts rely on some notions of
realized volatility as a continuous record of returns is obviously unavailable.
267
10.7. American options c
°by A. Mele
268
10.10. Appendix 1: Additional details on the Black & Scholes formula c
°by A. Mele
dV = n̄S dS + nC dC
∙ µ ¶ ¸
1 2 2
= n̄S dS + nC CS dS + Cτ + σ S CSS dτ
2
µ ¶
1 2 2
= (n̄S + nC CS ) dS + nC Cτ + σ S CSS dτ
2
where the second line follows from Itô’s lemma. Therefore, the portfolio is locally riskless whenever
1
nC = −n̄S ,
CS
in which case V must appreciate at the r-rate
¡ ¢ ¡ ¢
dV nC Cτ + 12 σ 2 S 2 CSS dτ − C1S Cτ + 12 σ 2 S 2 CSS
= = dτ = rdτ .
V n̄S S + nC C S − C1S C
That is, ( 1
0 = Cτ + σ 2 S 2 CSS + rSCS − rC, for all (τ , S) ∈ [t, T ) × R++
2
C (x, T, T ) = (x − K)+ , for all x ∈ R++
which is the Black-Scholes partial differential equation.
269
10.11. Appendix 2: Stochastic volatility c
°by A. Mele
where Pr(Ṽ | σ(t)2 ) is the density of Ṽ conditional on the current volatility value σ(t)2 .
In other terms, the price of an option on an asset with stochastic volatility is the expectation of
the Black-Scholes formula over the distribution of the average (random) volatility Ṽ . To © understand
ª
better this result, all we have to understand is that conditionally on the volatility path σ(τ )2 τ ∈[t,T ] ,
³ ´
log S(T )
S(t) is normally distributed under the risk-neutral probability measure. To see this, note that
under the risk-neutral probability measure,
µ ¶ Z Z T
S(T ) 1 T 2
log = r(T − t) − σ(τ ) dτ + σ(τ )dW (τ ).
S(t) 2 t t
This shows the claim. It also shows that the Black-Scholes formula can be applied to compute the
inner expectation of the second line of Eq. (10A.1). And this produces the third line of Eq. (10A.1).
The fourth line is trivial to obtain. Given the result of the third line, the only thing that matters in
the remaining conditional distribution is the conditional probability Pr(Ṽ | σ(t)2 ), and we are done.
270
10.12. Appendix 3: Technical details for local volatility models c
°by A. Mele
Proof of Eqs. (10.21) and (10.22). We first derive Eq. (10.21), a result encompassing Eq.
(10.19). By assumption,
dSt
= rdt + σ t dŴt ,
St
where σ t is some Ft -adapted process. For example, σ t ≡ σ(St , t) · vt , all t, where vt is solution to the
2nd equation in (10.20). Next, by assumption we are observing a set of option prices C (K, T ) with a
continuum of strikes K and maturities T . We have,
and
∂
C (K, T ) = −e−r(T −t) E (IST ≥K ) . (10A.3)
∂K
For fixed K,
∙ ¸
+ 1 2 2
dT (ST − K) = IST ≥K rST + δ (ST − K) σ T ST dT + IST ≥K σ T ST dŴT ,
2
where δ is the Dirac’s delta. Hence, by the decomposition (ST − K)+ + KIST ≥K = ST IST ≥K ,
dE (ST − K)+ £ ¤ 1 £ ¤
= r E (ST − K)+ + KE (IST ≥K ) + E δ (ST − K) σ 2T ST2 .
dT 2
By multiplying throughout by e−r(T −t) , and using (10A.2)-(10A.3),
∙ ¸
−r(T −t) dE (ST− K)+ ∂C (K, T ) 1 £ ¤
e = r C (K, T ) − K + e−r(T −t) E δ (ST − K) σ 2T ST2 . (10A.4)
dT ∂K 2
We have,
ZZ
£ ¤
E δ (ST − K) σ 2T ST2 = δ (ST − K) σ 2T ST2 φT ( σ T | ST ) φT (ST ) dST dσ T
| {z }
≡ joint density of (σ T ,ST )
Z ∙Z ¸
= σ 2T δ (ST − K) ST2 φT (ST ) φT ( σ T | ST ) dST dσ T
Z
= K 2 φT (K) σ 2T φT ( σ T | ST = K) dσ T
£ ¯ ¤
≡ K 2 φT (K) E σ 2T ¯ ST = K .
By replacing this result into Eq. (10A.4), and using the famous relation
∂ 2 C (K, T )
= e−r(T −t) φT (K) (10A.5)
∂K 2
(which easily follows by differentiating once again Eq. (10A.3)), we obtain
∙ ¸
dE (ST − K)+ ∂C (K, T ) 1 ∂ 2 C (K, T ) £ 2 ¯¯ ¤
e−r(T −t) = r C (K, T ) − K + K2 2
E σ T ST = K . (10A.6)
dT ∂K 2 ∂K
271
10.12. Appendix 3: Technical details for local volatility models c
°by A. Mele
We also have,
∂ ∂E (ST − K)+
C (K, T ) = −rC (K, T ) + e−r(T −t) .
∂T ∂T
Therefore, by replacing the previous equality into Eq. (10A.6), and by rearranging terms,
∂ ∂C (K, T ) 1 2 ∂ 2 C (K, T ) £ 2 ¯¯ ¤
C (K, T ) = −rK + K 2
E σ T ST = K .
∂T ∂K 2 ∂K
This is,
∂C (K, T ) ∂C (K, T )
£ ¯ ¤ + rK
E σ 2T ¯ ST = K = 2 ∂T ∂K ≡ σ loc (K, T )2 . (10A.7)
∂ 2 C (K, T )
K2
∂K 2
As an example, let σ t ≡ σ (St , t) · vt , where vt is solution to the 2nd equation in (10.20). Then,
£ ¯ ¤
σ loc (K, T )2 = E σ 2T ¯ ST = K
£ ¯ ¤ £ ¯ ¤
= E σ(ST , T )2 · vT2 ¯ ST = K = σ(K, T )2 E vT2 ¯ ST = K
£ ¯ ¤
≡ σ̃ loc (K, T )2 E vT2 ¯ ST = K ,
Z ∂C(K,T )
r(T −t) ∂T + rK ∂C(K,T
∂K
)
= 2e dK
K2
where the 2nd line follows by Eq. (10A.7), and the third line follows by Eq. (10A.5). This proves Eq.
(10.22). ¥
Z ∂C(K,T )
¡ 2¢ ∂T
E σT = 2 dK.
K2
Then, we have,
Z T2 Z ∙Z T2 ¸ Z
¡ 2¢ 1 ∂C (K, u) C (K, T2 ) − C (K, T1 )
E (IVT1 ,T2 ) = E σ u du = 2 du dK = 2 dK.
T1 K2 T1 ∂T K2
Proof of Eq. (10.24). By the standard Taylor expansion with remainder, we have that for any
function f smooth enough,
Z x
0
f (x) = f (x0 ) + f (x0 ) (x − x0 ) + (x − t) f 00 (t) dt.
x0
272
10.12. Appendix 3: Technical details for local volatility models c
°by A. Mele
Remark A1. The previous proof holds in the case of a constant instantaneous interest rate, r. If
the instantaneous interest rate is stochastic, the formula would be different, for
∙ µ Z T ¶ ¸
−1
Ct (K, T ) = P (t, T ) E P (t, T ) exp − rs ds (ST − K) = P (t, T ) EQT (ST − K)+ ,
+
F
t
Remark A2. Eqs. (10A.8) and (10A.9) reveal that variance swaps can be hedged!
Remark A3. Set for simplicity r = 0. In the previous proofs, it was argued.that if dC(K,TdT
)
=
+ 2
dE(ST −K) 2 ∂C(K,T ) 2 ∂ C(K,T )
dT , then volatility must be restricted in a way to make σ = 2 ∂T K ∂K 2
. The
converse is also true. By Fokker-Planck,
1 ∂2 ¡ 2 2 ¢ ∂
2
x σ φ = φ, t, x forward.
2 ∂x ∂t
If we ignore ill-posedness issues related to Eq. (10A.5), such as those.dealt with in Tikhonov and
2 2
Arsenin (1977), then, we have that φ = ∂∂xC2 . Replacing σ 2 = 2 ∂C(x,T
∂T
)
x2 ∂ C(x,T
∂x2
)
into the Fokker-
Planck equation, Ã ∂C(x,T ) !
∂2 ∂T ∂
φ = φ,
∂x2 ∂ 2 C(x,T
2
) ∂t
∂x
∂2C
which works for φ = ∂x2
.
273
10.12. Appendix 3: Technical details for local volatility models c
°by A. Mele
References
Ball, C.A. and A. Roma (1994): “Stochastic Volatility Option Pricing.” Journal of Financial
and Quantitative Analysis 29, 589-607.
Bergman, Y. Z., B. D. Grundy, and Z. Wiener (1996): “General Properties of Option Prices.”
Journal of Finance 51, 1573-1610.
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of
Econometrics 31, 307-327.
Bollerslev, T., Engle, R. and D. Nelson (1994): “ARCH Models.” In: McFadden, D. and R.
Engle (Editors): Handbook of Econometrics (Volume 4), 2959-3038. Amsterdam, North-
Holland
Clark, P. K. (1973): “A Subordinated Stochastic Process Model with Fixed Variance for Spec-
ulative Prices.” Econometrica 41, 135-156.
Corradi, V. (2000): “Reconsidering the Continuous Time Limit of the GARCH(1,1) Process.”
Journal of Econometrics 96, 145-153.
El Karoui, N., M. Jeanblanc-Picqué and S. Shreve (1998): “Robustness of the Black and
Scholes Formula.” Mathematical Finance 8, 93-126.
Engle, R.F. (1982): “Autoregressive Conditional Heteroskedasticity with Estimates of the Vari-
ance of United Kingdom Inflation.” Econometrica 50, 987-1008.
Fama, E. (1965): “The Behaviour of Stock Market Prices.” Journal of Business 38, 34-105.
Heston, S.L. (1993a): “Invisible Parameters in Option Prices.” Journal of Finance 48, 933-947.
Heston, S.L. (1993b): “A Closed Form Solution for Options with Stochastic Volatility with
Application to Bond and Currency Options.” Review of Financial Studies 6, 327-344.
Hull, J. and A. White (1987): “The Pricing of Options with Stochastic Volatilities.” Journal
of Finance 42, 281-300.
Mandelbrot, B. (1963): “The Variation of Certain Speculative Prices.” Journal of Business
36, 394-419.
Mele, A. (1998): Dynamiques non linéaires, volatilité et équilibre. Paris: Editions Economica.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets. Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Merton, R. (1973): “Theory of Rational Option Pricing.” Bell Journal of Economics and
Management Science 4, 637-654.
Nelson, D.B. (1990): “ARCH Models as Diffusion Approximations.” Journal of Econometrics
45, 7-38.
274
10.12. Appendix 3: Technical details for local volatility models c
°by A. Mele
Renault, E. (1997): “Econometric Models of Option Pricing Errors.” In: Kreps, D., Wallis, K.
(Editors): Advances in Economics and Econometrics (Volume 3), 223-278. Cambridge:
Cambridge University Press.
Romano, M. and N. Touzi (1997): “Contingent Claims and Market Completeness in a Stochas-
tic Volatility Model.” Mathematical Finance 7, 399-412.
Scott, L. (1987): “Option Pricing when the Variance Changes Randomly: Theory, Estimation,
and an Application.” Journal of Financial and Quantitative Analysis 22, 419-438.
Tikhonov, A. N. and V. Y. Arsenin (1977): Solutions to Ill-Posed Problems. Wiley, New York.
Wiggins, J. (1987): “Option Values and Stochastic Volatility: Theory and Empirical Esti-
mates.” Journal of Financial Economics 19, 351-372.
275
11
Interest rates
rates and markets. Section 11.2.3 develops the two basic representations of bond prices: one in
terms of the short-term rate; and the other in terms of forward rates. Section 11.2.4 develops
the foundations of the so-called forward martingale probability, which is a probability measure
under which forward interest rates are martingales. The forward martingale probability is an
important tool of analysis in the interest rate derivatives literature.
There are three main types of markets for interest rates: (i) LIBOR; (ii) Treasure rate; (iii)
Repo rate (or repurchase agreement rate).
LIBOR (London Interbank Offer Rate)
Many large financial institutions trade with each other deposits for maturities ranging from
just overnight to one year at a given currency. The LIBOR is an average indicative quote of
the interbank lending market, and it is calculated by Thomson Reuters for ten currencies,
and published daily by the British Bankers Association. It is the rate at which these financial
institutions are willing to lend money, on average. Instead, the LIBID (London Interbank Bid
Rate) is the rate that these financial institutions are prepared to pay to borrow money, on
average. Normally, LIBID < LIBOR. The LIBOR is a fundamental point of reference. Typically,
financial institutions look at the LIBOR as an opportunity cost of capital. Many fixed income
instruments are indexed on the LIBOR: forward rate agreements, interest rate swaps, variable
mortgage rates, etc. We can safely say that the LIBOR corresponds to the US Federal Funds
rate, which is the overnight rate at which banks lend reserves to each other. (Banks have to
maintain reserves at the Federal Reserve to back deposits and to clear financial transactions.
Transactions involve banks with excess reserves at the Fed, which earn no interest, to banks
with reserve deficiencies.)
Treasury rate
This is the rate at which a given Government can borrow at a given currency.
Repo rate (or repurchase agreement rate)
A Repo agreement is a contract by which one counterparty sells some assets to the other one,
with the obligation to buy these assets back at some future date. The assets act as collateral.
The rate at which such a transaction is made is the repo rate. One day repo agreements give
rise to overnight repos. Longer-term agreements give rise to term repos.
We now introduce an important piece of notation that will be used in Section 11.7. Given a
non decreasing sequence of dates {Ti }i=0,1,··· , we define:
Let Q be a risk-neutral probability probability. Let E [·] denote the expectation operator taken
under Q. By the FTAP, there are no arbitrage opportunities if and only if P (τ , T ) satisfies:
h T
i
P (τ , T ) = E e− τ r( )d , all τ ∈ [t, T ]. (11.3)
A sketch of the if-part (there is no arbitrage if bond prices are as in Eq. (11.3)) is provided in
Appendix 1. The proof is standard and in fact, similar to that in Chapter 4, but it is offered
again here, as it allows to highlight a few key issues arising in the term-structure domain.
A widely used concept is the yield-to-maturity R(t, T ), defined by,
It’s a sort of “average rate” for investing from time t to time T > t. The function,
T 7→ R(t, T )
In a forward rate agreement (FRA, henceforth), two counterparties agree that the interest
rate on a given principal in a future time-interval [T, S] will be fixed at some level K. Let
the principal be normalized to one. The FRA works as follows: at time T , the first counter-
party receives $1 from the second counterparty; at time S > T , the first counterparty pays
back $ [1 + 1 · (S − T ) K] to the second counterparty. The amount K is agreed upon at time
t. Therefore, the FRA makes it possible to lock-in future interest rates. We consider simply
compounded interest rates because this is the standard market practice.
The amount K for which the current value of the FRA is zero is called the simply-compounded
forward rate as of time t for the time-interval [T, S], and is usually denoted as F (t, T, S). A
simple argument can be used to express F (t, T, S) in terms of bond prices. Consider the following
278
11.1. Prices and interest rates c
°by A. Mele
portfolio implemented at time t. Long one bond maturing at T and short P (t, T )/ P (t, S) bonds
maturing at S, for the time period [t, S]. The initial cost of this portfolio is zero because,
P (t, T )
−P (t, T ) + P (t, S) = 0.
P (t, S)
At time T , the portfolio yields $1 (originated from the bond purchased at time t). At time S,
P (t, T )/ P (t, S) bonds maturing at S (that were shorted at t) must be purchased. But at time
S, the cost of purchasing P (t, T )/ P (t, S) bonds maturing at S is obviously $ P (t, T )/ P (t, S).
The portfolio, therefore, is acting as a FRA: it pays $1 at time T , and −$ P (t, T )/ P (t, S) at
time S. In addition, the portfolio costs nothing at time t. Therefore, the interest rate implicitly
paid in the time-interval [T, S] must be equal to the forward rate F (t, T, S), and we have:
P (t, T )
= 1 + (S − T )F (t, T, S). (11.5)
P (t, S)
Clearly,
L(T, S) = F (T, T, S).
Next, we derive the value of the FRA in the general case in which K 6= F (t, T, S). Consider
the following strategy. At time t, enter a FRA for the time-interval [T, S] as a future lender.
Come time T , honour the FRA by borrowing $1 for the time-interval [T, S] at the random
interest rate L(T, S). The time S payoff deriving from this strategy is:
(S − T ) [K − L(T, S)] .
The value of the FRA, which we denote as IRS(t, T, S; K), is the current market value of this
future, random payoff. By the FTAP,
h S
i
IRS (t, T, S; K) = E e− t r(τ )dτ (S − T ) [K − L(T, S)]
h S
i
= (S − T )P (t, S)K − (S − T )E e− t r(τ )dτ L(T, S)
" S
#
e− t r(τ )dτ
= [1 + (S − T )K] P (t, S) − E
P (T, S)
where the second line holds by the definition of L and the third line follows by the following
relation:2 " #
− tS r(τ )dτ
e
P (t, T ) = E . (11.7)
P (T, S)
2 To show that Eq. (11.7) holds, suppose that at time t, $P (t, T ) are invested in a bond maturing at time T . At time T , this
investment will obviously pay off $1. And at time T , $1 can be further rolled over another bond maturing at time S, thus yielding
$ 1/ P (T, S) at time S. Therefore, it is always possible to invest $P (t, T ) at time t and obtain a “payoff” of $ 1/ P (T, S) at time
S. By the FTAP, there are no arbitrage opportunities if and only if Eq. (11.7) holds true. Alternatively, use the law of iterated
expectations to obtain US UT US
e− t r(τ )dτ e− t r(τ )dτ e− T r(τ )dτ
E =E E F(T ) = P (t, T ).
P (T, S) P (T, S)
279
11.1. Prices and interest rates c
°by A. Mele
As is clear, IRS can take on any sign, and is exactly zero when K = F (t, T, S), where
F (t, T, S) solves Eq. (11.5).
Bond prices can be expressed in terms of these forward interest rates, namely in terms of the
“instantaneous” forward rates. First, rearrange terms in Eq. (11.5) so as to obtain:
P (t, S) − P (t, T )
F (t, T, S) = − .
(S − T )P (t, S)
The instantaneous forward rate f(t, T ) is defined as
∂ ln P (t, T )
f (t, T ) ≡ lim F (t, T, S) = − . (11.9)
S↓T ∂T
It can be interpreted as the marginal rate of return from committing a bond investment for an
additional instant. To express bond prices in terms of f, integrate Eq. (11.9)
∂ ln P (t, )
f(t, ) = −
∂
with respect to maturity date , use the condition that P (t, t) = 1, and obtain:
T
P (t, T ) = e− t f (t, )d
. (11.10)
11.1.3.3 More on the “marginal revenue” nature of forward rates
The expectation theory holds that forward rates equal expected future short-term rates, or
f (t, T ) = E [r (T )] ,
where E(·) denotes expectation under the physical probability. So by Eq. (11.11), the expecta-
tion theory implies that, Z T
1
R (t, T ) = E [r (τ )] dτ .
T −t t
280
11.1. Prices and interest rates c
°by A. Mele
The question whether f (t, T ) is higher than E [r (T )] is very old. The oldest intuition we
have is that only risk-adverse investors may induce f (t, T ) to be higher than the short-term
rate they expect to prevail at T , viz,
In other terms, (11.12) never holds true if all investors are risk-neutral. (11.13)
T
h T
i T
− f (t,τ )dτ
e t ≡ P (t, T ) = E e− t r(τ )dτ
≥ e− t E[r(τ )]dτ
.
By taking logs,
Z T Z T
E[r(τ )]dτ ≥ f(t, τ )dτ .
t t
Z T
1
R(t, T ) ≤ E[r(τ )]dτ .
T −t t
As an example, suppose that the short-term rate is a martingale under the risk-neutral proba-
bility, viz. E[r(τ )] = r(t). The previous relation then collapses to:
R(t, T ) ≤ r(t),
which means that the yield curve is not-increasing in T . Observing yield-curves that are increas-
ing in T implies that the short-term rate is not a martingale under the risk-neutral probability.
In some cases, this feature can be attributed to risk-aversion.
Finally, a recurrent definition. The difference
Z T
1
R(t, T ) − E [r(τ )] dτ
T −t t
3 According to the normal backwardation (contango) hypothesis, forward prices are lower (higher) than future expected spot
prices. Here the normal backwardation hypothesis is formulated with respect to interest rates.
281
11.1. Prices and interest rates c
°by A. Mele
Let ϕ(t, T ) be the T -forward price of a claim S(T ) at T . That is, ϕ(t, T ) is the price agreed
at t, that will be paid at T for delivery of the claim at T . Nothing has to be paid at t. By the
FTAP, there are no arbitrage opportunities if and only if:
h T
i
0 = E e− t r(u)du · (S(T ) − ϕ(t, T )) .
Now use the bond pricing equation (11.3), and rearrange terms in the previous equality, to
obtain " #
− tT r(u)du
e
ϕ(t, T ) = E · S(T ) = E [η T (T ) · S(T )] , (11.14)
P (t, T )
where4
T
e− t r(u)du
ηT (T ) ≡ .
P (t, T )
Eq. (11.14) suggests that we can define a new probability QTF , as follows,
T
dQTF e− t r(u)du
ηT (T ) = ≡ h i. (11.15)
dQ E e− tT r(u)du
where EQTF [·] denotes the expectation taken under QTF . For reasons that will be clear in a
moment, QTF is referred to as the T -forward martingale probability. The forward martingale
probability is a practical tool to price interest-rate derivatives, as we shall explain in Section
11.7. It was introduced by Geman (1989) and Jamshidian (1989), and further analyzed by
Geman, El Karoui and Rochet (1995). Appendix 3 provides a few mathematical details on
the forward martingale probability. In Appendix 2, we relate forward prices to their certainty
equivalent.
4 As an example, suppose that S is the price process of a traded asset. By the FTAP, there are no arbitrage opportunities if and
τ T
only if e− t r(u)du S(τ ) is a Q-martingale. In this case, E[e− t r(u)du S(T )] = S(t), and Eq. (11.14) collapses to the well-known
formula: ϕ(t, T )P (t, T ) = S(t). As is also well-known, entering the forward contract established at t at a later date τ > t costs.
Apply the FTAP to prove that the value of a forward contract as of time τ ∈ [t, T ] is given by P (τ , T ) · [ϕ(τ , T ) − ϕ(t, T )]. [Hint:
Notice that the final payoff is S(T ) − ϕ(t, T ) and that the discount has to be made at time τ .]
282
11.1. Prices and interest rates c
°by A. Mele
= EQTF [r(T )] .
Finally, the same result is also valid for the simply-compounded forward rate:
Fi (τ ) = EQTi+1 [L(Ti )] = EQTi+1 [Fi (Ti )] , τ ∈ [t, Ti ]
F F
where the second equality follows from Eq. (11.66). To show the previous relation, note that
by definition, the simply-compounded forward rate F (t, T, S) satisfies:
IRS(t, T, S; F (t, T, S)) = 0,
where IRS(t, T, S; K) is the value as of time t of a FRA struck at K for the time-interval [T, S].
By rearranging terms in the first line of Eq. (11.6),
h i
− tS r(τ )dτ
F (t, T, S)P (t, S) = E e L(T, S) .
By the definition of η S (S),
F (t, T, S) = EQSF [L(T, S)] .
Now use the definitions of L(Ti ) and Fi (τ ) in Eq. (11.1) and Eq. (11.64) to conclude.
These relations show that it is only under the forward martingale probability that the expec-
tation theory holds true. Consider, for instance, Eq. (13.14). We have,
f (t, T ) = EQTF [r(T )] = EQ [ηT (T ) r(T )]
= E [ηT (T )]E [r(T )] + covQ [η T (T ) , r(T )]
| {z }
=1
= E [r(T )] + cov [Ker (T ) , r(T )] + covQ [ηT (T ) , r(T )] ,
where Ker(T ) denotes the pricing kernel in the economy. That is, forward rates in general
deviate from the future expected spot rates because of risk-aversion corrections (the second
term in the last equality) and because interest rates are stochastic (the third term in the last
equality).
283
11.2. Common factors affecting the yield curve c
°by A. Mele
where Rt is the vector of returns, Ft is the zero-mean vector of common factors affecting the
returns, assumed to be zero mean, R̄ is the vector of unconditional expected returns, t is a vector
of idiosyncratic components of the return generating process, and B is a matrix containing the
factor loadings. Each row of B contains the factor loadings for all the common factors affecting
a given return, i.e. the sensitivities of a given return with respect to a change of the factors.
Each comumn of B contains the term-structure of factor loadings, i.e. how a change of a given
factor affects the term-structure of excess returns.
5 Suppose that in Eq. (11.18), F ∼ N (0, I), and that ∼ N (0, Ψ), where Ψ is diagonal. Then, R ∼ N R̄, Σ , where Σ = BB > +Ψ.
The assumptions that F ∼ N (0, I) and that Ψ is diagonal are necessary to identify the model, but not sufficient. Indeed, any
orthogonal rotation of the factors yields a new set of factors which also satisfies Eq. (11.18). Precisely, let T be an orthonormal
matrix. Then, (BT ) (BT )> = BT T > B > = BB > . Hence, the factor loadings B and BT have the same ability to generate the matrix
Σ. To obtain a unique solution, one needs to impose extra constraints on B. For example, Jöreskog (1967) develop a maximum
likelihood approach in which the log-likelihood function is, − 12 N log |Σ| + Tr SΣ−1 , where S is the sample covariance matrix of
R, and the constraint is that B > ΨB be diagonal with elements arranged in descending order. The algorithm is: (i) for a given Ψ,
maximize the log-likelihood with respect to B, under the constraint that B > ΨB be diagonal with elements arranged in descending
order, thereby obtaining B̂; (ii) given B̂, maximize the log-likelihood with respect to Ψ, thereby obtaining Ψ̂, which is fed back into
step (i), etc. Knez, Litterman and Scheinkman (1994) describe this approach in their paper. Note that the identification device they
describe at p. 1869 (Step 3) roughly corresponds to the requirement that B > ΨB be diagonal with elements arranged in descending
order. Such a constraint is clearly related to principal component analysis.
284
11.2. Common factors affecting the yield curve c
°by A. Mele
such that, for p vectors Ci> of dimension 1 ×p, (i) the new variables Yi are uncorrelated, and (ii)
their variances are arranged in decreasing order. The logic behind PCA is to ascertain whether
a few components of Y = [Y1 · · · Yp ]> account for the bulk of variability of the original data.
Let C > =¡ [C1> · · ¢· Cp> ] be a p × p matrix such that we can write Eq. (11.19) in matrix format,
Yt = C > Rt − R̄ or, by inverting,
Rt − R̄ = C >−1 Yt . (11.20)
Next, suppose that the vector Y (k) = [Y1 · · · Yk ]> accounts for most of the variability in the
original data,6 and let C >(k) denote a p × k matrix extracted from the matrix C >−1 through
the first k rows of C >−1 . Since the components of Y (k) are uncorrelated and they are deemed
largely responsible for the variability of the original data, it is natural to “disregard” the last
p − k components of Y in Eq. (11.20),
(k)
Rt − R̄ ≈ C >(k) Yt .
p×1 p×k k×1
(k)
If the vector Yt really accounts for most of the movements of Rt , the previous approximation
to Eq. (11.20) should be fairly good.
Let us make more precise what the concept of variability is in the context of PCA. Suppose
that the variance-covariance matrix of the returns, Σ, has p distinct eigenvalues, ordered from
the highest to the lowest, as follows: λ1 > · · · > λp . Then, the vector Ci in Eq. (11.19) is the
eigenvector corresponding to the i-th eigenvalue. Moreover,
var (Yi ) = λi , i = 1, · · · , p.
Finally, we have that Pk Pk
i=1 var (Yi ) λi
RPCA = Pp = Pi=1
p . (11.21)
i=1 var (Ri ) i=1 λi
(Appendix 4 provides technical details and proofs of the previous formulae.) It is in the sense
of Eq. (11.21) that in the context of PCA, we say that the first k principal components account
for RPCA % of the total variation of the data.
6 There are no rigorous criteria to say what “most of the variability” means in this context. Instead, a likelihood-ratio test is
most informative in the context of the estimation of Eq. (11.18) by means of the methods explained in the previous footnote.
285
11.2. Common factors affecting the yield curve c
°by A. Mele
FIGURE 11.1. Changes in the term-structure of interest rates generated by changes in the “level”,
“slope” and “curvature” factors.
• The second factor is called a “steepness” factor as its variations induce changes in the
slope of the term-structure of interest rates. After a shock in this steepness factor, the
short-end and the long-end of the yield curve move in opposite directions. The movements
of this factor explain approximately 15% of the total variation of the yield curve.
• The third factor is called a “curvature” factor as its changes lead to changes in the
curvature of the yield curve. That is, following a shock in the curvature factor, the middle
of the yield curve and both the short-end and the long-end of the yield curve move in
opposite directions. This curvature factor accounts for approximately 5% of the total
variation of the yield curve.
Understanding the origins of these three factors is still a challenge to financial economists and
macroeconomists. For example, macroeconomists explain that central banks affect the short-
end of the yield curve, e.g. by inducing variations in Federal Funds rate in the US. However, the
Federal Reserve decisions rest on the current macroeconomic conditions. Therefore, we should
expect that the short-end of the yield-curve is related to the development of macroeconomic
factors. Instead, the development of the long-end of the yield curve should largely depend on the
market average expectation and risk-aversion surrounding future interest rates and economic
conditions. Financial economists, then, should expect to see the long-end of the yield curve as
being driven by expectations of future economic activity, and by risk-aversion. Indeed, Ang and
Piazzesi (2003) demonstrate that macroeconomic factors such as inflation and real economic
activity are able to explain movements at the short-end and the middle of the yield curve.
However, they show that the long-end of the yield curve is driven by unobservable factors.
However, it is not clear whether such unobservable factors are driven by time-varying risk-
aversion or changing expectations.
The compelling lesson for practitioners is that reduced-form models with only one factor are
unlikely to perform well, in practice.
286
11.3. Models of the short-term rate c
°by A. Mele
11.3.1 Introduction
The fundamental bond pricing equation in Eq. (11.3),
h i
− tT r(u)du
P (t, T ) = E e , (11.22)
suggests to model the arbitrage-free bond price P by using as an input an exogenously given
short-term rate process r. In the Brownian information structure considered in this chapter, r
would then be the solution to a stochastic differential equation. As an example,
where b and a are well-behaved functions guaranteeing the existence of a strong-form solution
to the previous equation.
Historically, such a modeling approach was the first to emerge. It was initiated in the seminal
papers of Merton (1973)7 and Vasicek (1977), and it is now widely used. This section illustrates
the main modeling and empirical challenges related to this approach. We examine one-factor
“models of the short-term rate”, such as that in Eqs. (11.22)-(11.23), and also multifactor
models, in which the short-term rate is a function of a number of factors, r (τ ) = R(y (τ )),
where R is some function and y is solution to a multivariate diffusion process.
Two fundamental issues for the model’s users are that the models they deal with be (i)
fast to compute, and (ii) accurate. As regards the first point, the obvious target would be to
look for models with a closed form solution, such as for example, the so-called “affine” models
(see Section 11.3.6). The second point is more subtle. Indeed, “perfect” accuracy can never be
achieved with models such as that in Eqs. (11.22)-(11.23) - even when this model is is extended
to a multifactor diffusion. After all, the model in Eqs. (11.22)-(11.23) can only be taken as it
really is - a model of determination of the observed yield curve. As such the model in Eqs.
(11.22)-(11.23) can not exactly fit the observed term structure of interest rates.
As we shall explain, the requirement to exactly fit the initial term-structure of interest rates
is important when the model’s user is concerned with the pricing of options or other derivatives
written on the bonds. The good news is that such a perfect fit can be obtained, once we augment
Eq. (11.23) with an infinite dimensional parameter calibrated to the observed term-structure.
The bad news is that such a calibration device often leads to “intertemporal inconsistencies”
that we will also illustrate.
The models leading to perfect accuracy are often referred to as “no-arbitrage” models. These
models work by making the short-term rate process exactly pin down the term-structure that
we observe at a given instant. As we shall illustrate, intertemporal inconsistencies arise because
the parameters of the short-term rate pinning down the term structure today are generally
different from the parameters of the short-term rate process which will pin down the term
structure tomorrow. As is clear, this methodology goes to the opposite extreme of the initial
approach, in which the short-term rate was taken as the input of all subsequent movements of
the term-structure of interest rates. However, such an initial approach was consistent with the
standard rational expectations paradigm permeating modern economic analysis. The rationale
behind this approach is that economically admissible (i.e. no-arbitrage) bond prices are ratio-
nally formed. That is, they move as a consequence of random changes in the state variables.
Economists try to explain broad phenomena with the help of a few inputs, a science reduction
principle. Practitioners, instead, implement models to solve pricing problems that constantly
arise in their trading rooms. Both activities are important, and the choice of the “right” model
to use rests on the role that we are playing within a given institution.
8 We call σ λ a term-premium because under the good conditions, the bond price P (t, T ) decreases with λ uniformly in T , by
bi
a comparison result in Mele (2003).
288
11.3. Models of the short-term rate c
°by A. Mele
Eq. (11.26) shows that the bond price, P , depends on both the drift of the short-term rate, b,
and the risk-aversion correction, λ. This circumstance occurs as the initial asset market structure
is incomplete, in the following sense. In the Black-Scholes model, the option is redundant, given
the initial market structure. In the context we analyze here, the short-term rate r is not a
traded asset. In other words, the initial market structure has one untraded risk (r) and zero
assets - the factor generating uncertainty in the economy, r, is not traded. Therefore, the drift
of the short term, b, can not equal r · r = r2 under the risk-neutral probability, and the bond
price depends on b, a and λ.
This dependence is, perhaps, a kind of hindrance to practitioners. Instead, it can be viewed
as a good piece of news to policy-makers. Indeed, starting from observations and (b, a), one
may back out information about λ, which contains information about agents’ risk-appetite.
289
11.3. Models of the short-term rate c
°by A. Mele
Information about agents’ risk-appetite, then, can help central bankers to take decisions about
the interest rate to set.
By specifying the drift and diffusion functions b and a, and by identifying the risk-premium
λ, the partial differential equation (PDE, henceforth) (11.26) can explicitly be solved, either
analytically or numerically. Choices concerning the exact functional form of b, a and λ are often
made on the basis of either analytical or empirical reasons. In the next section, we will examine
the first, famous short-term rate models in which b, a and λ have a particularly simple form.
We will discuss the analytical advantages of these models, but we will also highlight the major
empirical problems associated with these models. In Section 11.3.4 we provide a very succinct
description of models exhibiting jump (and default) phenomena. In Section 11.3.5, we introduce
multifactor models: we will explain why do we need such more complex models, and show that
even in this more complex case, arbitrage-free bond prices are still solutions to PDEs such as
(11.26). In Section 11.3.6, we will present a class of analytically tractable multidimensional
models, known as affine models. We will discuss their historical origins, and highlight their
importance as regards the econometric estimation of bond pricing models. Finally, Section
11.3.7 presents the “perfectly fitting” models, and Appendix 5 provides a few technical details
about the solution of one of these models.
The article of Vasicek (1977) is considered to be the seminal contribution to the short-term rate
models literature. The model proposed by Vasicek assumes that the short-term rate is solution
to:
dr(τ ) = (θ̄ − κr(τ ))dτ + σdW (τ ), τ ∈ (t, T ], (11.27)
where θ̄, κ and σ are positive constants. This model generalizes the one proposed by Merton
(1973) in which κ ≡ 0. The intuition behind model (11.27) is very simple. Suppose first that
σ = 0. In this case, the solution is:
∙ ¸
θ̄ −κ(τ −t) θ̄
r(τ ) = + e r(t) − .
κ κ
The previous equation reveals that if the current level of the short-term rate r(t) = θ̄/κ, it
will be “locked-in”
¯ at
¯ θ̄/κ forever. If, instead, r(t) < θ̄/κ, then, for all τ > t, r(τ ) < θ̄/κ
too, but r(τ ) − θ̄/κ¯ will eventually shrink to zero as τ → ∞. An analogous property holds
¯
when r(t) > θ̄/κ. In all cases, the “speed” of convergence of r to its “long-term” value θ̄/κ is
determined by κ: the higher is κ, the higher is the speed of convergence to θ̄/κ. In other terms,
θ̄/κ is the long-term value towards which r tends to converge, and κ determines the speed of
such a convergence.
Eq. (11.27) generalizes the previous ideas to the stochastic differential case. It can be shown
that a “solution” to Eq. (11.27) can be written in the following format:
∙ ¸ Z τ
θ̄ −κ(τ −t) θ̄ −κτ
r(τ ) = + e r(t) − + σe eκs dW (s),
κ κ t
where the integral has the so-called Itô’s sense meaning. The interpretation of this solution
is similar to the one given above. The short-term rate tends to a sort of “central tendency”
θ̄/κ. Actually, it will have the tendency to fluctuate around it. In other terms, there is always
290
11.3. Models of the short-term rate c
°by A. Mele
the tendency for shocks to be absorbed with a speed dictated by the value of κ. In this case,
the short-term rate process r is said to exhibit a mean-reverting behavior. In fact, it can be
shown that the expected future value of r will be given by the solution given above for the
deterministic case, viz ∙ ¸
θ̄ −κ(τ −t) θ̄
E [r(τ )| r (t)] = + e r(t) − .
κ κ
Of course, that is only the expected value, not the actual value that r will take at time τ . As a
result of the presence of the Brownian motion in Eq. (11.27), r can not be predicted, and it is
possible to show that the variance of the value taken by r at time τ is:
σ2 £ ¤
var [r(τ )| r (t)] = 1 − e−2κ(τ −t) .
2κ
Finally, it can be shown that r is normally distributed (with expectation and variance given by
the two functions given above).
The previous properties of r are certainly instructive. Yet the main objective here is to find
the price of a bond. As it turns out, the assumption that the risk premium process λ is a
constant allows one to obtain a closed-form solution. Indeed, replace this constant and the
functions b(r) = θ̄ − κr and a(r) = σ into the PDE (11.26). The result is that the bond price
P is solution to the following partial differential equation:
∂P £ ¤ 1
0= + (θ̄ − λσ) − κr Pr + σ 2 Prr − rP, for all (r, τ ) ∈ R × [t, T ), (11.28)
∂τ 2
with the usual boundary condition. It is now instructive to see how this kind of PDE can be
solved. Guess a solution of the form:
where A and B have to be found. The boundary condition is P (r, T, T ) = 1, which implies that
the two functions A and B must satisfy:
By the definition of the yield curve given in Section 11.1 (see Eq. (13.1)),
ln P (r, t, T ) −A(t, T ) B(t, T )
R (τ , T ) ≡ − = + r.
T −t T −t T −t
It is possible to show the existence of a finite “asymptotic” spot rate, i.e. limT →∞ R(t, T ) =
limT →∞ −A(t,T
T −t
)
< ∞.
The model has a number of features that can describe quite a few aspects of reality. Many
textbooks show the typical shapes of the yield-curve that can be generated with the above
formula (see, for example, Hull (2003). However, this model is known to suffer from two main
drawbacks. The first drawback is that the short-term rate is Gaussian and, hence, can take on
negative values with positive probability. That is a counterfactual feature of the model. However,
it should be stressed that on a practical standpoint, this feature is practically irrelevant. If σ is
low compared to κθ̄ , this probability is really very small. The second drawback is tightly related
to the first one. It refers to the fact that the short-term rate diffusion is independent of the
level of the short-term rate. That is another conterfactual feature of the model. It is well-known
that short-term rates changes become more and more volatile as the level of the short-term rate
increases. In the empirical literature, this phenomenon is usually referred to as the level-effect.
The model proposed by Cox, Ingersoll and Ross (1985) (CIR, henceforth) addresses these
two drawbacks at once, as it assumes that the short-term rate is solution to,
p
dr(τ ) = (θ̄ − κr(τ ))dτ + σ r(τ )dW (τ ), τ ∈ (t, T ].
The CIR model is also referred to as “square-root” process to emphasize that the diffusion
function is proportional to the square-root of r. This feature makes the model address the level-
effect phenomenon. Moreover, this property prevents r from taking negative values. Intuitively,
when r wanders just above zero, it is pulled back to the stricly positive region at a strength
of the order dr = θ̄dτ .9 The transition density of r is noncentral chi-square. The stationary
density of r is a gamma distribution.
The expected value is as in Vasicek.10 However, the variance is different, although its exact
expression is really not important here.
CIR formulated a set of assumptions √ on the primitives of the economy (e.g., preferences) that
led to a risk-premium
√ function λ = r, where is a constant. By replacing this, b(r) = θ̄ − κr
and a(r) = σ r into the PDE (11.26), one gets (similarly as in the Vasicek model), that the
bond price function takes the form in Eq. (11.29), but with functions A and B satisfying the
following differential equations:
1
0 = A1 − θ̄B, 0 = −B1 + (κ + σ)B + σ 2 B 2 − 1,
2
subject to the boundary conditions (11.30).
In their article, CIR also showed how to compute options on bonds. They even provided
hints on how to “invert the term-structure”, a popular technique that we describe in detail in
Section 11.3.6. For all these features, the CIR model and paper have been used in the industry
for many years. And many of the more modern models are mere multidimensional extensions
of the basic CIR model. (See Section 11.3.6).
9 This is only intuition. The exact condition under which the zero boundary is unattainable by r is θ̄ > 1 σ 2 . See Karlin and
2
Taylor (1981, vol II chapter 15) for a general analysis of attainability of boundaries for scalar diffusion processes.
10 The expected value of linear mean-reverting processes is always as in Vasicek, independently of the functional form of the
diffusion coefficient. This property follows by a direct application of a general result for diffusion processes given in Chapter 6
(Appendix A).
292
11.3. Models of the short-term rate c
°by A. Mele
drift drift
0.3
0.05
0.2
0.1
0.00
0.04 0.06 0.08 0.10 0.12
0.0
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16
short-term rate r
-0.1 short-term rate r
-0.05
-0.2
-0.3 -0.10
Panel A Panel B
FIGURE 11.2. Nonlinear mean reversion?
The importance of the nonlinear effects in Figure 11.2 is related to the convexity effects in
Mele (2003). Mele (2003) showed that bond prices may be concave in the short-term rate if the
risk-neutralized drift function is sufficiently convex. While the results in this Figure relate to
the physical drift functions, the point is nevertheless important as risk-premium terms should
look like very strange to completely destroy the nonlinearities of the short-term rate under the
physical probability.
The main lesson is that under the “nonlinear drift dynamics”, the short-term rate behaves in
a way that can at least be roughly comparable with that it would behave under the “linear drift
dynamics”. However, the behavior at the extremes is dramatically different. As the short-term
rate moves to the extremes, it is pulled back to the “center” in a very abrupt way. At the
moment, it is not clear whether these preliminary empirical results are reliable or not. New
econometric techniques are currently being developed to address this and related issues.
One possibility is that such single factor models of the short-term rate are simply misspecified.
For example, there is strong empirical evidence that the volatility of the short-term rate is time-
varying, as we shall discuss in the next section. Moreover, the term-structure implications of
293
11.3. Models of the short-term rate c
°by A. Mele
a single factor model are counterfactual, since we know that a single factor can not explain
the entire variation of the yield curve, as we explained in Section 11.2. We now describe more
realistic models driven by more than one factor.
Which empirical regularities would the short-term rate model in Eqs. (11.31)-(11.32) address?
Which sign of the correlation coefficient ρ would be consistent with historical episodes such as
the Monetary Experiment of the Federal Reserve System between October 1979 and October
1982?
The short-term rate model in Eqs. (11.31)-(11.32) would address two empirical regularities.
1) The volatility of the short-term rate is not constant over time. Rather, it seems to be
driven by an additional source of randomness. All in all, the short-term process seems to be
generated by the stochastic volatility model in Eqs. (11.31)-(11.32), in which the volatility
component v (t) is driven by a source of randomness only partially correlated with the source
of randomness driving the short-term rate process itself.
2) The volatility of the short-term rate is increasing in the level of the short-term rate.
This phenomenon is known as the “level effect”. Perhaps, periods of high interest rates arise
because of erratic liquidity. (Erratic liquidity would command a high risk-premium and so a
high LIBOR rate say.) But precisely because of erratic liquidity, interest rates are also very
volatile. The short-term rate model in Eqs. (11.31)-(11.32) is a very useful reduced form able
to capture these effects through the two parameters: η and ρ. If the parameter η is greater than
zero, the instantaneous interest rate volatility increases with the level of the interest level. If the
“correlation” coefficient ρ > 0, the interest rate volatility is also partly related to the sources
of interest rate volatility not directly related to the level of the interest rate.
During the Monetary Experiment, the FED decided to target money supply, rather than
interest rates. So the high volatility of money demand mechanically translated to high interest
rate volatility as a result of the market clearing. Moreover, the quantity of monetary base was
kept deliberately low - to fight against inflation. So the US experienced both high interest rate
volatility and high interest rates (see for example Andersen and Lund (1997, J. Econometrics)
for an empirical study). Moreover, high nominal interest rates may be so because they might be
compensating for high inflation volatility, that is, not only high inflation. There is no empirical
study about the issues related to the sign of the correlation coefficient ρ. Here is a suggestion.
A rolling window estimation suggestive that the level of ρ changed a lot around the Monetary
Experiment would mean that the bulk of interest rate volatility was not entirely due to the
mechanical effects related to the FED behavior.
294
11.3. Models of the short-term rate c
°by A. Mele
Next, suppose we wish to estimate the parameter vector θ = [κ, μ, η, β, α, ξ, ϑ, ρ]> of the model
in Eqs. (11.31)-(11.32). Under which circumstances would Maximum Likelihood be a feasible
estimation method?
The ML estimator would be feasible under two sets of conditions. First, the model in Eqs.
(11.31)-(11.32) should not have stochastic volatility at all, viz, β = ξ = 0; in this case, the
short-term rate would be solution to,
where σ̄ is now a constant. Second, the value of the elasticity parameter η is important. If η = 0,
the short-term rate process is the Gaussian one proposed by Vasicek (1977, J. Fin. Economics).
If η = 12 , we obtain the square-root process introduced in Financial Economics by Cox, Ingersoll
and Ross (1985, Econometrica) (CIR henceforth). In the Vasicek case, the transition density of
r is Gaussian, and in the CIR case, the transition density of r is a noncentral chi-square. So in
both the Vasicek and CIR, we may write down the likelihood function of the diffusion process.
Therefore, ML estimation is possible in these two cases.
In the more general case, we have to go for simulation methods described in Chapter 5.
11.3.4.3 More general models
Estimating the model in Eqs. (11.31)-(11.32) is certainly instructive. Yet a more important
question is to examine the term-structure implications of this model. More generally, how would
the estimation procedure outlined in the previous subsection change if the task is to estimate
a Markov model of the term-structure of interest rates? There are three steps.
Step 1
Collect data on the term structure of interest rates. We will need to use data on two maturities
(say a time series of riskless 6 months and 5 years interest rates).
Step 2
Let us consider a model of the entire term-structure of interest rates. By the fundamental
theorem of asset pricing, and the Markov property of the diffusion, the price of a riskless bond
predicted by the model is,
µ RN ¯ ¶
j −
j ¯
r(s)ds ¯
P (r (t) , v (t)) ≡ P (r (t) , v (t) , Nj − t) = E e t ¯ r (t) , v (t) , (11.33)
where E (.) is the conditional expectation taken under the risk-neutral probability, and Nj is a
sequence of expiration dates. Naturally, the previous formula relies on some assumptions about
risk-aversion correction. (Some of these assumptions may be of a reduced-form nature; others
may rely on the specification of preferences, beliefs, markets and technology. But we do not
need to be more precise at this level of generality. In turn, these assumptions entail that the
pricing formula in Eq. (11.33) depends on some additional risk-adjustment parameter vector,
say λ. Precisely, the Radon-Nykodim derivative
¡ 1 R of the2 risk-neutral
R probability
¢ with respect to
the physical probability is given by exp − 2 kΛ (t)k dt − Λ (t) dZ (t) , where Z = [W U ]> ,
W and U are the two Brownian motions in Eqs. (11.31)-(11.32), and Λ (t) is some process
adapted to Z, which is taken to be of the form Λ (t) ≡ Λm (r (t) , v (t) ; λ), for some vector
valued function Λm and some parameter vector λ. The function Λm makes risk-adjustment
295
11.3. Models of the short-term rate c
°by A. Mele
corrections dependent on the current value of the state vector (r (t) , v (t)), and thus makes the
model Markov.
So the estimation problem is actually one in which we have to estimate both the “physical”
parameter vector θ = [κ, μ, η, β, α, ξ, ϑ, ρ]> and the “risk-adjustment” parameter vector λ.
Next, compute interest rates corresponding to two maturities,
1
Rj (r (t) , v (t) ; θ, λ) = − ln P j (r (t) , v (t)) , j = 1, 2, (11.34)
Nj
where the bond prices are computed through Eq. (11.33), and where the notation Rj (r, v; θ, λ)
emphasizes that the theoretical term-structure depends on the parameter vector (θ, λ). We can
now use the data (R$j say) and the model predictions about the data (Rj ), create moment con-
ditions, and proceed to estimate the parameter vector (θ, λ) through some method of moments
(provided the moments are enough to make (θ, λ) identifiable). But there are two difficulties.
First, the volatility process v (t) is not observable by the econometrician. Second, the bond
pricing formula in Eq. (11.33) does not generally admit a closed-form.
The first difficulty can be overcome through inference methods based on simulations. Here is
an outline of these methods that could be used here. Simulate the system in Eqs. (11.31)-(11.32)
for a given value of the parameter vector (θ, λ). For each simulation, compute a time series of
interest rates Rj from Eq. (11.34). Use these simulated data to create moment conditions.
The parameter estimator is the value of (θ, λ) which minimizes some norm of these moment
conditions obtained through the simulations.
The next step discusses how to address the second difficulty.
Step 3
The use of affine models would considerably simplify the analysis. Affine models place restric-
tions on the data generating process in Eqs. (11.31)-(11.32) and in the risk-aversion corrections
in Eq. (11.33) in such a way that the term structure in Eq. (11.34) is,
where A (j; θ, λ) and B (j; θ, λ) are some functions of the maturity Nj (B is vector valued),
and generally depend on the parameter vector (θ, λ); and finally the state vector y = [r v]> .
(Namely, an affine model obtains once η = 0, ϑ = 12 , and the function Λm is affine.) So once Eqs.
(11.31)-(11.32) are simulated, the computation of a time series of interest rates Rj is straight
forward.
In the CIR model, the instantaneous short-term rate volatility is stochastic, as it depends
on the level of the short-term rate, which is obviously stochastic. However, there is empirical
evidence, surveyed by Mele and Fornari (2000), which suggests that the short-term rate volatility
296
11.3. Models of the short-term rate c
°by A. Mele
depends on some additional factors. A natural extension of the CIR model is one in which the
instantaneous volatility of the short-term rate depends on (i) the level of the short-term rate,
similarly as in the CIR model, and (ii) some additional random component. Such an additional
random component is what we shall refer to as the “stochastic volatility” of the short-term
rate. It is the term-structure counterpart to the stochastic volatility extension of the Black and
Scholes (1973) model (see Chapter 10).
Fong and Vasicek (1991) write the first paper in which the volatility of the short-term rate
is stochastic. They consider the following model:
p
dr (τ ) = κr (r̄ − r (τ )) dτ + vp(τ )r (t)γ dW1 (τ )
(11.35)
dv (τ ) = κv (θ − v (τ )) dτ + ξ v v (τ )dW2 (τ )
in which κr , r̄, κv , θ and ξ v are constants, and [W1 W2 ] is a vector Brownian motion. To obtain
a closed-form solution, Fong and Vasicek set γ = 0. The authors also make assumptions about
risk aversion corrections. Namely, they assume that the unit-risk-premia for the stochastic fluc-
tuations
p of the short-term rate, λr , and the short-term rate volatility, λv , are both proportional
to v (τ ), and then they find a closed-form solution for the bond price as of time t and maturing
at time T , P (r (t) , v (t) , T − t).
Longstaff and Schwartz (1992) propose another model of the short-term rate in which the
volatility of the short-term rate is stochastic. The remarkable feature of their model is that
it is a general equilibrium model. Naturally, the Longstaff-Schwartz model predicts, as the
Fong-Vasicek model, that the bond price is a function of both the short-term rate and its
instantaneous volatility.
Note, then, the important feature of these models. The pricing function, P (r (t) , v (t) , T − t)
and, hence, the yield curve R (r (t) , v (t) , T − t) ≡ − (T − t)−1 ln P (r (t) , v (t) , T − t), depends
on the level of the short-term rate, r (t), and one additional factor, the instantaneous variance
of the short-term rate, v (t). Hence, these models predict that we now have two factors that
help explain the term-structure of interest rates, R (r (t) , v (t) , T − t).
What is the relation between the volatility of the short-term rate and the term-structure of
interest rates? Does this volatility help “track” one of the factors driving the variations of the
yield curve? Consider, first, the basic Vasicek (1997) model discussed in Section 11.3.3. This
model is simply a one-factor model, as it assumes that the volatility of the short-term rate is
constant. Yet this model can be used to develop intuition about the full stochastic volatility
models, such as the Fong and Vasicek (1991) in Eqs. (11.35).
Using the solution for the Vasicek model, we find that,
∙ Z T Z T ¸
∂R (r (t) , T − t) 1 2
=− σ B (T − s) ds + λ B (T − s) ds , (11.36)
∂σ T −t t t
£ ¤
where we remind that B (T − s) = κ1 1 − e−κ(T −s) .
The previous expression reveals that if λ ≥ 0, the term-structure of interest rates (and, hence,
the bond price) is always decreasing (increasing) in the volatility of the short-term rate. This
conclusion parallels a famous result in the option pricing literature, by which the option price is
always increasing in the volatility of the price of the asset underlying the contract. As explained
in Chapter 10, this property arises from the convexity of the price with respect to the state
variable of which we are contemplating changes in volatility.
297
11.3. Models of the short-term rate c
°by A. Mele
The intriguing feature of the model arises when λ < 0, which is the empirically relevant
case to consider.11 In this case, the sign of ∂R(t,T )
is the results of a conflict between “con-
∂σ RT
vexity” and “slope” effects. “Convexity” effects arise through the term σ t B(T − s)2 ds,
2 (r,T −t)
which are referred to as in this way because ∂ P ∂r 2 = P (r, T − t) B (T − t)2 . “Slope” ef-
RT
fects, instead, arise through the term t B (T − s) ds, and are referred to as in this way as
∂P (r,T −t)
∂r
= P (r, T − t) B (T − t). If λ is negative, and large in absolute value, slope effects
can dominate convexity effects, and the term-structure can actually increase in the volatility
parameter σ.
For intermediate values of λ, the term-structure can be both increasing and decreasing in the
volatility parameter σ. Typically, at short maturity dates, the convexity effects in Eq. (11.36)
are dominated by the slope effects, and the short-end of the term-structure can be increasing in
σ. At longer maturity dates, convexity effects should be magnified and can sometimes dominate
the slope effects. As a result, the long-end of the term-structure can be decreasing in σ.
Naturally, the previous conclusions are based on comparative statics for a simple model in
which the short-term rate has constant volatility. However, these comparative statics illustrate
well a general theory of bond price fluctuations in the more interesting case in which the short-
term rate volatility is stochastic, as for example in model (11.35) (see Mele (2003)). To develop
further intuition about the conflict between convexity and slope effects, consider the following
binomial example. In the next period, the short-term rate is either r+ = r +d or r+ = r −d with
equal probability, where r is the current interest rate level and d > 0. The price of a two-period
bond is P (r, d) = m(r, d)/ (1+r), where m(r, d) = E [1/ (1 + r+ )] is the expected discount factor
of the next period. By Jensen’s inequality, m(r, d) > 1/ (1 + E [r+ ]) = 1/ (1 + r) = m(r, 0).
Therefore, two-period bond prices increase upon activation of randomness. More generally, two-
period bond prices are always increasing in the “volatility” parameter d in this example (see
Figure 11.3).
This property relates to an important result derived by Jagannathan (1984, p. 429-430) in the
option pricing area, and discussed in Chapter 10. Jagannathan’s insight is that in a two-period
economy with identical initial underlying asset prices, a terminal underlying asset price ỹ is a
mean preserving spread of another terminal underlying asset price x̃ (in the Rothschild and
Stiglitz (1970) sense) if and only if the price of a call option on ỹ is higher than the price of a
call option on x̃. This is because if ỹ is a mean preserving spread of x̃, then E [f (ỹ)] > E [f(x̃)]
for f increasing and convex.12
These arguments go through as we assumed that the expected short-term rate is independent
of d. Consider, instead, a multiplicative setting in which either r+ = r (1 + d) or r+ = r/ (1 + d)
with equal probability. Litterman, Scheinkman and Weiss (1991) show that in such a setting,
bond prices are decreasing in volatility at short maturity dates and increasing in volatility
at long maturity dates. This is because expected future interest rates increase over time at
a strength positively related to d. That is, the expected variation of the short-term rate is
increasing in the volatility of the short-term rate, d, a property that can be re-interpreted
as one arising in an economy with risk-averse agents. At short maturity dates, such an effect
11 In this simple model, it is more reasonable to assume λ < 0 rather than λ > 0. This is because positive risk-premia are observed
more frequently than negative risk-premia, and in this model, ur < 0. Together with λ < 0, ur < 0 ensures that the model generates
positive term-premia.
12 To make such a connection more transparent in terms of the Rothschild and Stiglitz (1970) theory, let m̃ (i+ ) = 1/ (1 + i+ )
d
denote the random discount factor when i+ = i ∓ d. Clearly x 7→ −m̃d (x) is increasing and concave, and so we must have:
0 00
E [−m̃d00 (x)] < E [−m̃d0 (x)] ⇔ d < d , which is what demonstrated in figure 1. In Jagannathan (1984), f is increasing and convex,
and so we must have: E [f (ỹ)] > E [f (x̃)] ⇔ ỹ is riskier than (or a mean preserving spread of) x̃.
298
11.3. Models of the short-term rate c
°by A. Mele
dominates the convexity effect illustrated in Figure 11.3. At longer maturity dates, the convexity
effect dominates.
This simple example illustrates the yield-curve / volatility relation in the Vasicek model
summarized by Eq. (11.36). As is clear, volatility changes do not generally represent a mean
preserving spread for the risk-neutral distribution in the term-structure framework considered
here. The seminal contribution of Jagannathan (1984) suggests that this is generally the case in
the option pricing domain. In models of the short-term rate, the short-term rate is not a traded
asset. Therefore, the risk-neutral drift function of the short-term rate does in general depend
on the short-term volatility. For example, in the simple and scalar Vasicek model, this property
activates slope effects, as Eq. (11.36) reveals. In this case, and in the more complex stochastic
volatility cases, it can be shown that if the risk-premium required to bear the interest rate
risk is negative and sufficiently large in absolute value, slope effects dominate convexity effects
at any finite maturity date, thus making bond prices decrease with volatility at any arbitrary
maturity date.
m(r,d’) = (a + A)/2
m(r,d) = (b + B)/2
b
B
A
r − d’ r −d r r +d r + d’
FIGURE 11.3. A connection with the Rothschild-Stiglitz-Jagannathan theory: the simple case
in which convexity of the discount factor induces bond prices to be increasing in volatility. If the
risk-neutralized interest rate of the next period is either r+ = r + d or r+ = r − d with equal
probability, the random discount factor 1/ (1+r+ ) is either B or b with equal probability. Hence
m(r, d) = E [1/ (1 + r+ )] is the midpoint of bB. Similarly, if volatility is d0 > d, m(r, d0 ) is the
midpoint of aA. Since ab > BA, it follows that m(r, d0 ) > m(r, d). Therefore, the two-period
bond price P (r, d) = m(r, d)/ (1 + r) satisfies: P (r, d0 ) > P (r, d) for d0 > d.
What are the implications of these results in terms of the classical factor analysis of the term-
structure reviewed in Section 11.2? Clearly, the very short-end of yield curve is not affected
by movements of the volatility, as limT →t R (r (t) , v (t) , T − t) = r (t), for all possible values
of v (t). Also, in these models, we have that limT →∞ R (r (t) , v (t) , T − t) = R̄, where R̄ is a
constant and, hence, independent of of v (t). Therefore, movements in the short-term volatility
can only produce their effects on the middle of the yield curve. For example, if the risk-premium
required to bear the interest rate risk is negative and sufficiently large, an upward movement in
299
11.3. Models of the short-term rate c
°by A. Mele
v (t) can produce an effect on the yield curve qualitatively similar to that depicted in Figure 11.1
(“Curvature” panel), and would thus roughly mimic the “curvature” factor that we reviewed
in Section 11.2.
We need at least three factors to explain the entire variation in the yield-curve. A model in
which the interest rate volatility is stochastic may be far from being exhaustive in this respect. A
natural extension is a model in which the drift of the short-term rate contains some predictable
component, r̄ (τ ), which acts as a third factor, as in the following model:
p
dr (τ ) = κr (r̄ (τ ) − r (τ )) dτ +p v (τ )r (t)γ dW1 (τ )
dv (τ ) = κv (θ − v (τ )) dτ + ξ vp v (τ )dW2 (τ ) (11.37)
dr̄ (τ ) = κr̄ (ı̄ − r̄ (τ )) dτ + ξ r̄ r̄ (τ )dW3 (τ )
where κr , γ, κv , θ, ξ v , κr̄ , ı̄ and ξ r̄ are constants, and [W1 W2 W3 ] is vector Brownian motion.
Balduzzi et al. (1996) develop the first model in which the drift of the short-term rate changes
stochastically, as in Eqs. (11.37). Dai and Singleton (2000) consider a number of models that gen-
eralize that in Eqs. (11.37). The term-structure implications of these models can be understood
very simply. First, the bond price has now the form, P (r (t) , r̄ (t) , v (t) , T − t) and, hence, the
yield curve is, under reasonable assumptions on the risk-premia, R (r (t) , r̄ (t) , v (t) , T − t) ≡
− (T − t)−1 ln P (r (t) , r̄ (t) , v (t) , T − t). Second, and intuitively, changes in the new factor r̄ (t)
should primarily affect the long-end of the yield curve. This is because empirically, the usual
finding is that the short-term rate reverts relatively quickly to the long-term factor r̄ (τ ) (i.e.
κr is relatively high), where r̄ (τ ) mean-reverts slowly (i.e. κr̄ is relatively low). Ultimately,
the slow mean-reversion of r̄ (τ ) means that changes in r̄ (τ ) last for the relevant part of
the term-structure we are usually interested in (i.e. up to 30 years), despite the fact that
limT →∞ R (r (t) , r̄ (t) , v (t) , T − t) is independent of the movements of the three factors r (t),
r̄ (t) and v (t).
However, it is difficult to see how to reconcile such a behavior of the long-end of the yield
curve with the existence of any of the factors discussed in Section 11.2. First, the short-term rate
can not be taken as a “level factor”, since we know its effects die off relatively quickly. Instead, a
joint change in both the short-term rate, r (t), and the “long-term” rate, r̄ (t), should be really
needed to mimic the “Level” panel of Figure 11.1 in Section 11.2. However, this interpretation
is at odds with the assumption that the factors discussed in Section 11.2 are uncorrelated!
Moreover, and crucially, the empirical results in Dai and Singleton reveal that if any, r (t) and
r̄ (t) are negatively correlated.
Finally, to emphasize how exacerbated these puzzles are, consider the effects of changes in
the short-term rate r (t). We know that the long-end of the term-structure is not affected by
movements of the short-term rate. Hence, the short-term rate acts as a “steepness” factor, as
in Figure 11.1 (“Slope” panel). However, this interpretation is restrictive, as factor analysis
reveals that the short-end and the long-end of the yield curve move in opposite directions after
a change in the steepness factor. Here, instead, a change in the short-term rate only modifies
the short-end (and, perhaps, the middle) of the yield curve and, hence, does not produce any
variation in the long-end curve.
300
11.3. Models of the short-term rate c
°by A. Mele
The Vasicek and CIR models predict that the bond price is exponential-affine in the short-term
rate r. This property is the expression of a general phenomenon. Indeed, it is possible to show
that bond prices are exponential-affine in r if, and only if, the functions b and a2 are affine in
r. Models that satisfy these conditions are known as affine models. More generally, these basic
results extend to multifactor models, in which bond prices are exponential-affine in the state
variables.13 In these models, the short-term rate is a function r (y) such that
r (y) = r0 + r1 · y,
where r0 is a constant, r1 is a vector, and y is a multidimensional diffusion, in Rn , and is solution
to.
dy (τ ) = κ (μ − y (t)) dt + ΣV (y (τ )) dW (τ ) ,
where W is a d-dimensional Brownian motion, Σ is a full rank n × d matrix, and V is a full
rank d × d diagonal matrix with elements,
q
V (y)(ii) = αi + β >
i y, i = 1, · · ·, d, (11.38)
for some scalars αi and vectors β i . Langetieg (1980) develops the first multifactor model of this
kind, in which β i = 0. Under the assumption that the risk-premia Λ are
Λ (y) = V (y) λ1 ,
for some d-dimensional vector λ1 , Duffie and Kan (1996) show that the bond price is exponential-
affine in the state variables y. That is, the price of the zero has the following functional form,
P (y, T − t) = exp [A (T − t) + B (T − t) · y] ,
for some functions A and B of time to maturity, T − t (B is vector-valued), such that A (0) = 0
and B (0)(i) = 0.
The clear advantage of affine models is that they considerably simplify the econometric
estimation, as explained previously.
11.3.6.2 Quadratic
Affine models are known to impose tight conditions on the structure of the volatility of the
state variables. These restrictions arise to keep the square root in Eq. (11.38) real valued. But
these constraints may hinder the actual performance of the models. There exists another class
of models, known as quadratic models, that partially overcome these difficulties.
13 More generally, we say that affine models are those that make the characteristic function exponential-affine in the state variables.
In the case of the multifactor interest rate models of the previous section, this condition is equivalent to the condition that bond
prices are exponential affine in the state variables.
301
11.3. Models of the short-term rate c
°by A. Mele
where the previous equation is written under the risk-neutral probability, and bJ is thus a jump-
adjusted risk-neutral drift. For all (r, τ ) ∈ R++ × [t, T ), the bond price P (r, τ , T ) is then the
solution to,
µ ¶ Z
∂ Q
0= + L − r P (r, τ , T ) + v [P (r + S, τ , T ) − P (r, τ , T )] p (dS) , (11.39)
∂τ supp(S)
where N is the number of jump types, but here for simplicity we just set N = 1.
As regards the risk-neutral distribution, the important thing as usual is to identify the risk-
premia. Here we simply have:
vQ = v · λJ ,
where v is the intensity of the short-term rate jump under the physical distribution, and λJ is
the risk-premium demanded by agents to be compensated for the presence of jumps.15
Bonds subject to default-risk can be modeled through partial differential equations. This is
particularly the case when default is considered as an exogeously given rare event modeled as
a Poisson process. This is the so-called “reduced-form” approach. Precisely, assume that the
event of default at each instant of time is a Poisson process Z with intensity v,16 and assume
that in the event of default at point τ , the holder of the bond receives a recovery payment
P̄ (τ ) which can be a deterministic function of time (e.g., a constant) or more generally, a
σ (r(s) : t ≤ s ≤ τ )-adapted process satisfying some basic regularity conditions.
Next, let τ̂ be the random default time, and let’s create an auxiliary state variable g with
the following features:
½
0 if t ≤ τ < τ̂
g=
1 otherwise
The relevant information for an investor is thus given by the following risk-neutral dynamics:
½
dr(τ ) = b(r(τ ))dτ + a(r(τ ))dW (τ )
(11.40)
dg(τ ) = S · dN(τ ), where S ≡ 1, with probability one
Denote the rational bond price function as P (r, g, τ , T ), τ ∈ [t, T ]. It is assumed that ∀τ ∈
[t, T ] and ∀v ∈ (0, ∞), P (r, 1, τ , T ) = P̄ (τ ) < P (r, 0, τ , T ) a.s. As shown below, such an
assumption, plus the assumption that P̄ (τ ; v 0 ) ≥ P̄ (τ ; v) ⇔ v0 ≥ v, is sufficient to guarantee
that default-free bond prices are higher than defaultable bond prices.
14 Just use y(τ ) ≡ b(τ )−1 u(r(τ ), τ , T ), where b solves db(τ ) = r(τ )b(τ )dτ (in differential form), for the connection between Eq.
302
11.3. Models of the short-term rate c
°by A. Mele
By the usual absence of arbitrage opportunities arguments, the following equation is satisfied
by the pre-default bond price P (r, 0, τ , T ) = P pre (r, τ , T ):
µ ¶
∂
0= + L − r P (r, 0, τ , T ) + v(r) · [P (r, 1, τ , T ) − P (r, 0, τ , T )]
∂τ
µ ¶
∂
= + L − (r + v(r)) P (r, 0, τ , T ) + v(r)P̄ (τ ), τ ∈ [t, T ), (11.41)
∂τ
where E∗ [·] is the expectation operator taken with reference to only the first equation of system
(11.40). This coincides with Duffie and Singleton (1999, Eq. (10) p. 696) when we define a
percentage loss process l in [0, 1] so as to have P̄ = (1 − l) · P . Indeed, inserting P̄ = (1 − l) · P
into Eq. (11.41) gives:
µ ¶
∂
0= + L − (r + l(τ )v(r)) P (r, 0, τ , T ), ∀(r, τ ) ∈ R++ × [t, T ),
∂τ
To validate the claim that the bond price is decreasing with v, consider two economies A and B
in which the corresponding default-intensities are v A and vB , and assume that the coefficients
of L don’t depend on default-intensity. The pre-default bond price function in economy i is
P i (r, τ , T ), i = A, B, and satisfies:
µ ¶
∂
0= + L − r P i + v i · (P̄ i − P i ), i = A, B,
∂τ
with the usual boundary condition. Substracting these two equations and rearranging terms
reveals that the price difference ∆P (r, τ , T ) ≡ P A (r, τ , T ) − P B (r, τ , T ) satisfies, ∀(r, τ ) ∈
R++ × [t, T ),
µ ¶
∂ ¡ ¢£ ¤ £ ¤
0= + L − (r + v ) ∆P (r, τ , T )+ vA − vB · P̄ B (τ ) − P B (r, τ , T ) +vA P̄ A (τ ) − P̄ B (τ ) ,
A
∂τ
with ∆P (r, T, T ) = 0, ∀r ∈ R++ . Given the previous assumptions, the proof is complete by an
application of the maximum principle (see Appendix ? in Chapter 6).
303
11.4. No-arbitrage models c
°by A. Mele
where the symbol (a)+ denotes max (0, a). As an example, in affine models, P is lognormal
whenever r is normally distributed. This happens precisely for the Vasicek model. The intuition
developed for the Black and Scholes (1973) (BS) formula suggests that in this case, the previous
expectation is a nonlinear function of the current bond price P (r(t), t, T ). This claim can not
be shown with the simple risk-neutral tools used to show the BS formula. One of the troubles
T
is due to the presence of the e− t r(τ )dτ term inside the brackets, which is obviously unknown
at the time of evaluation t. But the problem is tractable, thanks to the forward martingale
probability introduced in Section 11.2.4. Precisely, let 1ex be the indicator of all events s.t. the
option is exercized i.e., that P (r(T ), T, S) ≥ K. We have:
C b (r(t), t, T, S)
h T
i h T
i
= E e− t r(τ )dτ P (r(T ), T, S) · 1ex − K · E e− t r(τ )dτ · 1ex
" S
# " T
#
e− t r(τ )dτ e− t r(τ )dτ
= P (r(t), t, S) · E · 1ex − KP (r(t), t, T ) · E · 1ex
P (r(t), t, S) P (r(t), t, T )
where the first term in the second equality has been derived by an argument nearly identical to
that produced in Section 11.1 (see footnote 2);17 QiF (i = T, S) is the i-forward probability; and
finally, EQiF [·] is the expectation taken under the i-forward martingale probability (see Section
11.1 for more details).
In Section 11.7, we will learn how to compute the two probabilities in Eq. (11.42). For now, it
should be clear that the bond option price does depend on theoretical bond prices P (r(t), t, T )
304
11.4. No-arbitrage models c
°by A. Mele
and P (r(t), t, S) which in turn, are generally never equal to the current, observed market prices.
This is so because P (r(t), t, T ) is only the output of a standard rational expectations model.
Naturally, this is not a source of concern to those who simply wish to predict future term-
structure movements with the help of a few, key state variables (as in the multifactor models
discussed before). But practitioners concerned with bond option pricing need a model that
perfectly matches the observed term-structure they face at the time of evaluation. The aim
of the models studied in this section is to exactly fit the initial term-structure, which for this
reason we call “perfectly fitting models.”
We do not develop a general model-building principle. Rather, we present specific models
that are effectively able to deserve the “perfectly fitting” qualification. We shall focus on two
celebrated models: the Ho and Lee (1986) model, and one generalization of it, introduced by
Hull and White (1990). In all cases, the general modeling principle is to generate bond prices
expiring at some date S that are of course random at time T < S, but also exactly equal to
the current observed bond prices (at time t). Finally, these prices must be arbitrage-free. As
we show, these conditions can be met by augmenting the models seen in the previous sections
with a set of “infinite dimensional parameters”.
A final remark. In Section 11.7, we will show that at least for the Vasicek’s model, Eq. (11.42)
does not explicitly depend on r because it only “depends” on P (r(t), t, T ) and P (r(t), t, S).
That is the essence of the celebrated Jamshidian’s (1989) formula. So why do we look for
perfectly fitting models in the first place? After all, it would be sufficient to use the Jamshidian’s
formulae in Section 11.7, and replace P (r(t), t, T ) and P (r(t), t, S) with the corresponding
market values, P $ (t, T ) and P $ (t, S) (say). This way, the model is perfectly fitting. Apart
from being theoretically inconsistent (you would have a model predicting something generically
different from prices), this way of thinking also leads to some practical drawbacks. As we will
show in Section 11.7, the bond option Jamshidian’s formula agrees “in notation” with that
obtained with the corresponding perfectly fitting model. But as we move to more complex
interest rate derivatives, the situation becomes dramatically different. This is the case, for
example, of options on coupon bonds and swaption contracts (see Section 11.7.3 for precise
details on this). Finally, it may be the case that some maturity dates are actually not traded at
some point in time. As an example, it may happen that P $ (t, T ) is not observed. (Furthermore,
it may happen that one may wish to price more “exotic,” or less liquid bonds or options
on these bonds.) An intuitive procedure to face up to this difficulty is to “interpolate” the
observed, traded maturities. In fact, the objective of perfectly fitting models is to allow for such
an “interpolation” while preserving absence of arbitrage opportunities.
Clearly, Eq. (11.43) gives rise to an affine model. Therefore, the bond price takes the following
form,
P (r(τ ), τ , T ) = eA(τ ,T )−B(τ ,T )·r(τ ) , (11.44)
for two functions A and B to be determined below. It is easy to show that,
Z T
1
A(τ , T ) = θ(s)(s − T )ds + σ 2 (T − τ )3 , B(τ , T ) = T − τ .
τ 6
The instantaneous forward rate f(τ , T ) predicted by the model is then,
∂ ln P (r(τ ), τ , T )
f(τ , T ) = − = −A2 (τ , T ) + B2 (τ , T ) · r(τ ), (11.45)
∂T
where A2 (τ , T ) ≡ ∂A(τ , T )/ ∂T and B2 (τ , T ) ≡ ∂B(τ , T )/ ∂T .
On the other hand, let f$ (t, τ ) denote the instantaneous, observed forward rate. By matching
f(t, τ ) to f$ (t, τ ) yields:
Z τ
1
f$ (t, τ ) = f (t, τ ) = θ(s)ds − σ 2 (τ − t)2 + r(t), (11.46)
t 2
where
³ we have evaluated
´ the two partial derivatives A2 (t, τ ) and B2 (t, τ ). Because P (t, T ) =
RT
exp − t f (t, τ ) dτ , the drift term θ (τ ) satisfying Eq. (11.46) also guarantees an exact fit of
the term-structure. By differentiating the previous equation with respect to τ , one obtains the
solution for θ,19
∂
θ(τ ) = f$ (t, τ ) + σ 2 (τ − t). (11.47)
∂τ
and
1£ ¤
1 − e−κ(T −τ ) .
B(τ , T ) = (11.50)
κ
By reiterating the same reasoning produced to show (11.47), one shows that the solution for
θ is:
∂ σ2 £ ¤
θ(τ ) = f$ (t, τ ) + κf$ (t, τ ) + 1 − e−2κ(τ −t) . (11.51)
∂τ 2κ
19 To check that θ is indeed the solution, replace Eq. (11.47) into Eq. (11.46) and verify that Eq. (11.46) holds as an identity.
306
11.5. The Heath-Jarrow-Morton model c
°by A. Mele
11.4.4 Critiques
Two important critiques to these models:
- As we shall see in Section 11.6, closed-form solution for options on bond prices are easy
to implement when the short-term rate is Gaussian. We will use the T -forward probability
machinery to show this. In principle, they could also be used to price caps, floors and swaptions.
But in general, no-closed form solutions are available that reproduce the standard market
practice. This difficulty is overcome by a class of models known as “market models” that is
built upon the modelling principles of the HJM models examined in Section 11.4.
- Intertemporal inconsistencies: θ functions have to be re-calibrated every single day. (As Eq.
(11.47) demonstrates, at time t, θ (τ ) depends on the slope of f$ which can change every day.)
This kind of problems is present in HJM-type models
- Stochastic string shocks models.
T
P (τ , T ) = e− τ f (τ , )d
, all τ ∈ [t, T ], (11.52)
is the starting point of a now popular modeling approach originally developed by Heath, Jarrow
and Morton (1992) (HJM, henceforth). Given (11.52), the modeling strategy of this approach
is to take as primitive the τ -stochastic evolution of the entire structure of forward rates (not
only the short-term rate r(t) = lim ↓t f (t, ) ≡ f(t, t)).20 Given (11.52) and the initial, ob-
served structure of forward rates {f(t, )} ∈[t,T ] , no-arbitrage “cross-equations” relations arise
to restrict the stochastic behavior of {f(τ , )}τ ∈(t, ] for any ∈ [t, T ].
20 One of the many checks of the internal consistency of any model consists in checking that the given model produces:
∂P (t, T )/ ∂T = −E[r(T ) exp(− tT r(u)du)] = −f (t, T )P (t, T ) and by continuity, limT ↓t ∂P (t, T )/ ∂T = −r(t) = −f (t, t).
307
11.5. The Heath-Jarrow-Morton model c
°by A. Mele
By construction, the HJM approach allows for a perfect fit of the initial term-structure. This
point may be grasped very simply by noticing that the bond price P (τ , T ) is,
T
P (τ , T ) = e− τ f (τ , )d
P (t, T ) P (t, τ ) − T f (τ , )d
= · e τ
P (t, τ ) P (t, T )
P (t, T ) − τ f (t, )d + T f (t, )d − T
f (τ , )d
= ·e t t τ
P (t, τ )
P (t, T ) T T
= · e τ f (t, )d − τ f (τ , )d
P (t, τ )
P (t, T ) − T [f (τ , )−f (t, )]d
= ·e τ .
P (t, τ )
The key point of the HJM methodology is to take the current forward rates structure f(t, ) as
given, and to model the future forward rate movements,
f(τ , ) − f(t, ).
Therefore, the HJM methodology takes the current term-structure as given and, hence, perfectly
fitted, as we we observe both P (t, T ) and P (t, τ ). In contrast, the other approach to interest rate
modeling is to model the current bond price P (t, T ) by means of a model for the short-term
rate (see Section 11.3) and, hence, does not fit the initial term structure. As we explained in
the previous section, fitting the initial term-structure is an important issue when the model’s
user is concerned with pricing interest-rate derivatives.
The primitive is still a Brownian information structure. Therefore, if we want to model future
movements of {f (τ , T )}τ ∈[t,T ] , we also have to accept that for every T , {f (τ , T )}τ ∈[t,T ] is F(τ )-
adapted. Under the Brownian information structure, there thus exist functionals α and σ such
that, for any T ,
dτ f (τ , T ) = α(τ , T )dτ + σ(τ , T )dW (τ ), τ ∈ (t, T ], (11.53)
where f(t, T ) is given. The solution is thus
Z τ Z τ
f(τ , T ) = f(t, T ) + α(s, T )ds + σ(s, T )dW (s), τ ∈ (t, T ]. (11.54)
t t
The next step is to derive Rrestrictions on α that are consistent with absence of arbitrage op-
T
portunities. Let X(τ ) ≡ − τ f(τ , )d . We have
Z T £ ¤
dX(τ ) = f (τ , τ )dτ − (dτ f(τ , )) d = r(τ ) − αI (τ , T ) dτ − σ I (τ , T )dW (τ ),
τ
308
11.5. The Heath-Jarrow-Morton model c
°by A. Mele
where Z Z
T T
I I
α (τ , T ) ≡ α(τ , )d ; σ (τ , T ) ≡ σ(τ , )d .
τ τ
1° °
°σ I (τ , T )°2 + σ I (τ , T )λ(τ ).
αI (τ , T ) = (11.55)
2
By differentiating the previous relation with respect to T gives us the arbitrage restriction that
we were looking for:
Z T
α(τ , T ) = σ(τ , T ) σ(τ , )> d + σ(τ , T )λ(τ ). (11.56)
τ
where Z τ
α2 (s, τ ) = σ 2 (s, τ ) σ(s, )> d + σ(s, τ )σ(s, τ )> + σ 2 (s, τ )λ(s).
s
As is clear, the short-term rate is in general non-Markov. However, the short-term rate can be
“risk-neutralized” and used to price exotics through simulations.
11.5.4 Embedding
At first glance, it might be guessed that HJM models are quite distinct from the models of the
short-term rate introduced in Section 11.3. However, there exist “embeddability” conditions
turning HJM into short-term rate models, and viceversa, a property known as “universality”
of HJM models.
309
11.5. The Heath-Jarrow-Morton model c
°by A. Mele
11.5.4.1 Markovianity
One natural question to ask is whether there are conditions under which HJM-type models
predict the short-term rate to be a Markov process. The question is natural insofar as it re-
lates to the early literature in which the entire term-structure was driven by a scalar Markov
process representing the dynamics of the short-term rate. The answer to this question is in
the contribution of Carverhill (1994). Another important contribution in this area is due to
Ritchken and Sankarasubramanian (1995), who studied conditions under which it is possible to
enlarge the original state vector in such a manner that the resulting “augmented” state vector
is Markov and at the same time, includes that short-term rate as a component. The resulting
model resembles a lot some of the short-term rate models surveyed in Section 11.3. In these
models, the short-term rate is not Markov, yet it is part of a system that is Markov. Here we
only consider the simple Markov scalar case.
Assume the forward-rate volatility structure is deterministic and takes the following form:
σ(t, T ) = g1 (t)g2 (T ) all t, T . (11.58)
Done. This is Markov. Condition (11.58) is then a condition for the HJM model to predict
that the short-term rate is Markov.
Mean-reversion is ensured by the condition that g20 < 0, uniformly. As an example, take λ =
constant, and:
g1 (t) = σ · eκt , σ > 0, g2 (t) = e−κt , κ > 0.
This is the Hull-White model discussed in Section 11.3.
11.5.4.2 Short-term rate reductions
We prove everything in the Markov case. Let the short-term rate be solution to:
dr(τ ) = b̄(τ , r(τ ))dτ + a(τ , r(τ ))dW̃ (τ ),
By Itô’s lemma, ∙ ¸
∂ 1 2
df = f + b̄fr + a frr dτ + afr dW̃ .
∂t 2
But for f(r, t, T ) to be consistent with the solution to Eq. (11.54), it must be the case that
∂ 1
α(t, T ) − σ(t, T )λ(t) = f (r, t, T ) + b̄(t, r)fr (r, t, T ) + a(t, r)2 frr (r, t, T )
∂t 2 (11.59)
σ(t, T ) = a(t, r)fr (t, r)
and
f (t, T ) = f (r, t, T ). (11.60)
In particular, the last condition can only be satisfied if the short-term rate model under con-
sideration is of the perfectly fitting type.
and, PN
i=1σ i (τ , T1 ) σ i (τ , T2 )
c (τ , T1 , T2 ) ≡ corr [df (τ , T1 ) df (τ , T2 )] = . (11.61)
kσ (τ , T1 )k · kσ (τ , T2 )k
By replacing this result into Eq. (11.56),
Z T
α(τ , T ) = σ(τ , T ) · σ(τ , )> d + σ(τ , T )λ(τ )
τ
Z T
= kσ (τ , )k kσ (τ , T )k c (τ , , T ) d + σ(τ , T )λ(τ ).
τ
One drawback of this model is that the correlation matrix of any (N + M)-dimensional vector
of forward rates is degenerate for M ≥ 1. Stochastic string models overcome this difficulty by
modeling in an independent way the correlation structure c (τ , τ 1 , τ 2 ) for all τ 1 and τ 2 rather
than implying it from a given N -factor model (as in Eq. (11.61)). In other terms, the HJM
methodology uses functions σi to accommodate both volatility and correlation structure of
forward rates. This is unlikely to be a good model in practice. As we will now see, stochastic
string models have two separate functions with which to model volatility and correlation.
The starting point is a model in which the forward rate is solution to,
dτ f (τ , T ) = α (τ , T ) dτ + σ (τ , T ) dτ Z (τ , T ) ,
where the string Z satisfies the following five properties:
311
11.6. Stochastic string shocks models c
°by A. Mele
Properties (iii), (iv) and (v) make Z Markovian. The functional form for ψ is crucially impor-
tant to guarantee this property. Given the previous properties, we can deduce a key property
of the forward rates. We have,
p
var [df (τ , T )] = σ (τ , T )
σ (τ , T1 ) σ (τ , T2 ) ψ (T1 , T2 )
c (τ , T1 , T2 ) ≡ corr [df (τ , T1 ) df (τ , T2 )] = = ψ (T1 , T2 )
σ (τ , T1 ) σ (τ , T2 )
As claimed before, we now have two separate functions with which to model volatility and
correlation.
RT
where as usual, αI (τ , T ) ≡ τ
α (τ , ) d . But P (τ , T ) = exp (X (τ )). Therefore,
dP (τ , T ) 1
= dX (τ ) + var [dX (τ )]
P (τ , T ) 2
∙ Z Z ¸
I 1 T T
= r (τ ) − α (τ , T ) + σ (τ , 1 ) σ (τ , 2 ) ψ ( 1 , 2 ) d 1 d 2 dτ
2 τ τ
Z T
− [σ (τ , ) dτ Z (τ , )] d .
τ
where T denotes the set of all “risks” spanned by the string Z, and φ is the corresponding
family of “unit risk-premia”.
By absence of arbitrage opportunities,
∙ µ µ ¶ µ ¶ µ ¶¶¸
dP dξ dP dξ
0 = E [d (P ξ)] = E P ξ · drift + drift + cov , .
P ξ P ξ
312
11.7. Interest rate derivatives c
°by A. Mele
11.7.2 Notation
We introduce notation that will prove useful to price interest rate derivatives. For a given
non-decreasing sequence of dates {Ti }i=0,1,··· , we set,
Fi (τ ) ≡ F (τ , Ti , Ti+1 ). (11.64)
where the last equality follows by the same argument leading to Eq. (11.42). Therefore, we have
the put-call parity relation:
where Put (t, T ; P (t, S) , K) is the price of a European put written on a zero expiring at time
S, expiring at time T < S, and struck at K, and Call (·) denotes the corresponding call price.
where K is the strike of the option. In terms of PDEs, C b is solution to Eq. (11.63) with π ≡ 0
and boundary condition C b (r, T, T, S) = (P (r, T, S) − K)+ , where P (r, τ , S) is also the solution
to Eq. (11.63) with π ≡ 0, but with boundary condition P (r, S, S) = 1. In terms of PDEs, the
314
11.7. Interest rate derivatives c
°by A. Mele
situation seems hopeless. As we show below, the problem can be considerably simplified with
the help of the T -forward martingale probability introduced in Section 11.1. In fact, we shall
show that under the assumption that the short-term rate is a Gaussian process, Eq. (11.68) has
a closed-form expression. We now present two models enabling this. The first one was developed
in a seminal paper by Jamshidian (1989), and the second one is, simply, its perfectly fitting
extension.
11.7.4.1 Jamshidian & Vasicek
Suppose that the short-term rate is solution to the Vasicek’s model considered in Section 11.3
(see Eq. (11.27)):
dr(τ ) = (θ − κr(τ ))dτ + σdW̃ ,
where W̃ is a Q-Brownian motion and θ ≡ θ̄ − σλ. As shown in Section 11.3 (see Eq. (11.29)),
the bond price takes the following form:
P (r(τ ), τ , S) = eA(τ ,S)−B(τ ,S)r(τ ) ,
£ ¤
for some function A; and for B(t, T ) = κ1 1 − e−κ(T −t) (see formula (11.50)).
In Section 11.3 (see formula (11.42)), it was also shown that
h T
i
E e− t r(τ )dτ (P (r(T ), T, S) − K)+
We now consider the perfectly fitting extension of the previous results. Namely, we consider
model (11.48) in Section 11.3, viz
dr(τ ) = (θ(τ ) − κr(τ ))dτ + σdW̃ (τ ),
where θ(τ ) is now the infinite dimensional parameter that is used to “invert the term-structure”.
The solution to Eq. (11.68) is the same as in the previous section. However, in Section 11.7.3
we shall argue that the advantage of using such a perfectly fitting extension arises as soon as
one is concerned with the evaluation of more complex options on fixed coupon bonds.
315
11.7. Interest rate derivatives c
°by A. Mele
11.7.4.3 Bond price volatility and the persistence of the short-term rate
The implied vol on options on bonds is typically very large, in real markets, comparable to
that on stocks. Why is it that this implied vol is so large, when in fact, the volatility of the
short-term rate is one order of magnitude less than that on stock markets? The answer is that
the short-term rate is very persistent, and it is “a risk for the long-run,” pretty much in the
same spirit of the explanations attempting to explain the equity premium puzzle, reviewed in
Chapter 7. Let’s make the point, here. First, define the term-structure of volatility. It is the
function , τ 7→ Vol (R (τ )), where R (τ ) is the spot rate for the maturity τ , and Vol (R (τ )) is the
standard deviation of this spot-rate. By the definition of R (τ ), the term-structure of volatility
can also be written as the function
µ ¶
1
τ 7→ Vol − ln P (τ ) ,
τ
where P (τ ) is the price of a zero with maturity equal to τ . It is instructive to see what this
volatility looks like, for a concrete model. Consider again the Vasicek model. This model assumes
that the short-term rate is solution to,
drt = κ (μ − rt ) dt + σdWt ,
where Wt is a Brownian motion, and κ, μ and σ are three positive constants. By previous results
given in this chapter, we know that for this model,
A (τ ) 1 1 − e−κτ
R (τ ) = + B (τ ) r, B (τ ) = .
τ τ κ
for some function A (τ ). Therefore, we have that,
1
Vol [R (τ )] = B (τ ) Vol∞ (r) , (11.71)
τ
p
where Vol∞ (r) is the “ergodic” volatility of the short-term rate, defined as, Vol∞ (r) = σ 2 /2κ.
For example, if κ = 0.2 and σ = 0.03, then Vol∞ (r) ≈ 4.7% which is not exactly what we observe
in the data (1.5%), but sort of. (No realistic parameter search here.) Given the previous values
for κ and σ, the picture below depicts the term-structure of volatility, i.e. Eq. (11.71).
Vol(R)
0.045
0.040
0.035
0.030
0 1 2 3 4 5
Maturity (years)
316
11.7. Interest rate derivatives c
°by A. Mele
As we can see, the term-structure of volatility is decreasing in the maturity of the zero, and
attains its maximum at Vol∞ (r) ≈ 4.7%.
Despite this, the volatility of bond returns can be much higher, as we now illustrate. We need
to figure out what the dynamics of the bond price are, for the Vasicek model. By Itô’s lemma,
dP (τ )
= [· · · ] dt + [−σ · B (τ )] dWt
P (τ )
Compare Eq. (11.72) with Eq. (11.71). The main difference between the two equations is that
the right hand side of Eq. (11.71) is divided by τ , which makes Vol [R (τ )] decreasing with τ .
(Otherwise, Vol∞ (r) and σ have roughly the same order of magnitude.) ¡ dP ¢
The previous calculation reveals that even if σ is very small, bond
¡ dP ¢ return volatility, Vol P
,
can be quite high. For example, if κ is close to zero, then, Vol P ≈ σ · τ , which is 15% for
a 5Y zero. This fact is illustrated by the next picture, which depicts Eq. (11.72), evaluated at
the previous parameter values, κ = 0.2 and σ = 0.03.
0.12
Vol(dP/P)
0.10
0.08
0.06
0.04
0.02
0.00
0 1 2 3 4 5 6 7 8
Maturity (years)
Intuitively, it is the high persistence of the short-term rate (measured by the low value of
κ), which makes the bond price so volatile in correspondence of large maturity dates. High
persistence in the short-term rate means that a shock in the short-term rate, is permanently
embedded in the future path of the short-term rate, or it has persistent consequences. This
makes the short-term rate very volatile in the long-run, which makes the value of the long
maturity zero very volatile as well.
Given a set of dates {Ti }ni=0 , a fixed coupon bond pays off a fixed coupon ci at Ti , i = 1, · · ·, n
and one unit of numéraire at time Tn . Ideally, one generic coupon at time Ti pays off for the
317
11.7. Interest rate derivatives c
°by A. Mele
time-interval Ti − Ti−1 . It is assumed that the various coupons are known at time t < T0 . By
the FTAP, the value of a fixed coupon bond is
X
n
Pfcb (t, Tn ) = P (t, Tn ) + ci P (t, Ti ) .
i=1
A floating rate bond works as a fixed coupon bond, with the important exception that the
coupon payments are defined as:
1
ci = δ i−1 L (Ti−1 ) = − 1, (11.73)
P (Ti−1 , Ti )
where δ i ≡ Ti+1 − Ti , and where the second equality is the definition of the simply-compounded
LIBOR rates introduced in Section 11.1 (see Eq. (11.2)). By the FTAP, the price pfrb as of time
t of a floating rate bond is:
X
n h Ti
i
pfrb (t) = P (t, Tn ) + E e− t r(τ )dτ
δ i−1 L(Ti−1 )
i=1
" Ti
#
X
n
e− t r(τ )dτ Xn
= P (t, Tn ) + E − P (t, Ti )
i=1
P (Ti−1 , Ti ) i=1
X
n X
n
= P (t, Tn ) + P (t, Ti−1 ) − P (t, Ti )
i=1 i=1
= P (t, T0 ).
where the second line follows from Eq. (11.73) and the third line from Eq. (11.7) given in Section
11.1.
The same result can be obtained by assuming an economy in which the floating rates contin-
uously pay off the instantaneous short-term rate r. Let T0 = t for simplicity. In this case, pfrb
is solution to the PDE (11.63), with π(r) = r, and boundary condition pfrb (T ) = 1. As it can
verified, pfrb = 1, all r and τ , is indeed solution to the PDE (11.63).
11.7.5.3 Options on fixed coupon bonds
At first glance, the expectation of the payoff in Eq. (11.74) seems very difficult to evaluate.
Indeed, even if we end up with a model that predicts bond prices at time T0 , P (T0 , Ti ), to
be lognormal, we know that the sum of lognormal is not lognormal. There is an elegant way
to deal with this issue. Suppose we wish to model the bond price P (t, T ) through any one of
the models of the short-term rate reviewed in Section 11.3. In this case, the pricing function is
obviously P (t, T ) = P (r, t, T ). Assume, further, that
∂P (r, t, T )
For all t, T, < 0, (11.75)
∂r
318
11.7. Interest rate derivatives c
°by A. Mele
and that
For all t, T, lim P (r, t, T ) > K and lim P (r, t, T ) = 0. (11.76)
r→0 r→∞
Under conditions (11.75) and (11.76), there is one and only one value of r, say r∗ , that solves
the following equation:
X
n
∗
P (r , T0 , Tn ) + ci P (r∗ , T0 , Ti ) = K. (11.77)
i=1
Then, the payoff in Eq. (11.74) can be written as:
" n #+ " n #+
X X
c̄i P (r(T0 ), T0 , Ti ) − K = c̄i (P (r(T0 ), T0 , Ti ) − P (r∗ , T0 , Ti )) ,
i=1 i=1
Next, note that each term of the sum in Eq. (11.78) can be evaluated as an option on a
pure discount bond with strike price equal to P (r∗ , T0 , Ti ). Typically, the threshold r∗ must be
found with some numerical method. The device to reduce the problem of an option on a fixed
coupon bond to a problem involving the sum of options on zero coupon bonds was invented by
Jamshidian (1989).22 The price of the call on the fixed coupon bond is, therefore,
X
n
Call (t, T0 ; Pfcb (t, Tn ) , K, v) = c̄i Call (t, T0 ; P (t, Ti ) , P (r∗ , T0 , Ti ) , vi ) ,
i=1
21 Suppose that P (r(T0 ), T0 , T1 ) > P (r∗ , T0 , T1 ). By Eq. (11.75), r(T0 ) < r∗ . Hence P (r(T0 ), T0 , T2 ) > P (r∗ , T0 , T2 ), etc.
22 The conditions in Eqs. (11.75) and (11.76) hold, within the Vasicek’s model that Jamshidian considered in his paper. In fact,
the condition in Eq. (11.75) holds for all one-factor stationary, Markov models of the short-term rate. However, the condition in
Eq. (11.75) is not a general property of bond prices in multi-factor models (see Mele (2003)).
319
11.7. Interest rate derivatives c
°by A. Mele
An interest rate swap is an exchange of interest rate payments. Typically, one counterparty
exchanges a fixed against a floating interest rate payment. For example, the counterparty re-
ceiving a floating interest rate payment has “good” (or only) access to markets for “variable”
interest rates, but wishes to pay fixed interest rates. And viceversa. The counterparty receiving
a floating interest rate payment and paying a fixed interest rate Kirs has a payoff equal to,
at time Ti , i = 1, · · ·, n. By the FTAP, the value pirs as of time t of an interest rate swap is:
X
n h Ti
i X
n
pirs (t) = E e− t r(τ )dτ
δ i−1 (L(Ti−1 ) − Kirs ) = − IRS(t, Ti−1 , Ti ; Kirs ),
i=1 i=1
where IRS is the value of a forward-rate agreement and is, by Eq. (11.8) in Section 11.1,
The forward swap rate Rswap is the value of Kirs such that pirs (t) = 0. Simple computations
yield: Pn
δ i−1 F (t, Ti−1 , Ti )P (t, Ti ) P (t, T0 ) − P (t, Tn )
Rswap (t) = i=1 Pn = Pn , (11.79)
i=1 δ i−1 P (t, Ti ) i=1 δ i−1 P (t, Ti )
where the last equality is due to the definition of F (t, Ti−1 , Ti ) given in Section 11.1 (see Eq.
(11.65)).
Finally, note that this case could have also been solved by casting it in the format of the
PDE (11.63). It suffices to consider continuous time swap exchanges, to set pirs (T ) ≡ 0 as a
boundary condition, and to set π(r) = r − k, where k plays the same role as K above. It is easy
to see that if the bond price P (τ ) is solution to (11.63) with its usual boundary condition, the
following function
Z T
pirs (τ ) = 1 − P (τ ) − k P (s)ds
τ
A cap works as an interest rate swap, with the important exception that the exchange of interest
rates payments takes place only if actual interest rates are higher than K. A cap protects against
upward movements of the interest rates. Therefore, we have that the payoff as of time Ti is
at time Ti , i = 1, · · ·, n.
320
11.7. Interest rate derivatives c
°by A. Mele
We will only focus on caps. By the FTAP, the value pcap of a cap as of time t is:
X
n h Ti
i
pcap (t) = E e− t r(τ )dτ
δ i−1 (L(Ti−1 ) − K)+
i=1
X
n h Ti
i
+
= E e− t r(τ )dτ
δ i−1 (F (Ti−1 , Ti−1 , Ti ) − K) . (11.80)
i=1
Models of the short-term rate can be used to give an explicit solution to this pricing problem.
First, we use the standard definition of simply compounded rates given in Section 11.1 (see
1
formula (11.2)), viz δ i−1 L(Ti−1 ) = P (Ti−1 ,Ti )
− 1, and rewrite the caplet payoff as follows:
1
[δ i−1 L(Ti−1 ) − δ i−1 K]+ = [1 − (1 + δi−1 K)P (Ti−1 , Ti )]+ .
P (Ti−1 , Ti )
We have,
" Ti
#
X
n
e− t r(τ )dτ
pcap (t) = E (1 − (1 + δ i−1 K)P (Ti−1 , Ti ))+
i=1
P (T i−1 , Ti )
Xn ∙ ¸
− t i−1 r(τ )dτ 1
T
= E e (Ki − P (Ti−1 , Ti )) , Ki = (1 + δi−1 K)−1 ,
+
(11.81)
i=1
K i
where the last equality follows by a simple computation.23 If bond prices are as in Jamshidian
or in Hull and White, the cap price in Eq. (11.81) can be expressed in closed-form. Indeed, as
Eq. (11.81) makes clear, a cap is a basket of puts on zero coupon bonds, with strikes Ki , and
can be priced through the models in Sections 11.7.4.1 and 11.7.4.2. We have:
Xn
1
pcap (t) = Put (t, Ti−1 ; P (t, Ti ) , Ki , v) , (11.82)
i=1
Ki
where Put (·) satisfies the put-call parity in Eq. (11.67), and, by the pricing formulae in Section
11.7.4.1,
U Ti U Ti
r(τ )dτ
=E E e− t r(τ )dτ
e Ti−1
(1 − (1 + δ i−1 K)P (Ti−1 , Ti ))+ F (Ti )
U Ti−1
=E E e− t r(τ )dτ
(1 − (1 + δ i−1 K)P (Ti−1 , Ti ))+ F (Ti )
U Ti−1
= E e− t r(τ )dτ
(1 − (1 + δ i−1 K)P (Ti−1 , Ti ))+
321
11.7. Interest rate derivatives c
°by A. Mele
Naturally, caps on interest rates, which are nothing but baskets of calls, are portfolios of puts
on fixed coupon bonds, due to the inverse relation between prices and interest rates.
Finally, note that we could also price caps and floors through the PDE (11.63), after setting
π (r) = (r − k)+ (caps) and π (r) = (k − r)+ (floors), and where k plays the same role played
by K above. However, this type of contracts, where payoffs are continuous, is highly stylized,
and does not obviously exist in the markets.
11.7.5.6 Swaptions
Swaptions are options to enter a swap contract on a future date. Let the maturity date of this
option be T0 . Then, at time T0 , the payoff of the swaption is the maximum between zero and
the value of an interest rate swap at T0 , pirs (T0 ), viz
" n #+ " n #+
X X
(pirs (T0 ))+ = − IRS (T0 , Ti−1 , Ti ; Kirs ) = δi−1 (F (T0 , Ti−1 , Ti ) − Kirs ) P (T0 , Ti ) .
i=1 i=1
(11.84)
By the FTAP, the value of a swaption at time t is:
" Ã n !+ #
T0 X
pswaption (t) = E e− t r(τ )dτ δ i−1 (F (T0 , Ti−1 , Ti ) − Kirs ) P (T0 , Ti )
i=1
" Ã !+ #
T0 X
n
= E e− t r(τ )dτ
1 − P (T0 , Tn ) − ci P (T0 , Ti ) , (11.85)
i=1
where ci ≡ δi−1 Kirs , and where we used the relation δ i−1 F (T0 , Ti−1 , Ti ) = PP(T(T0 ,T i−1 )
0 ,Ti )
− 1.
Eq. (11.85) is the expression of the price of a put option on a fixed coupon bond struck at one.
Therefore, we can price this contract in closed-form, through the models in Section 11.7.4.1 and
11.7.4.2, similarly to that we did in the previous section for caps pricing. We have,
where Put (·) satisfies the put-call parity in Eq. (11.67). By the pricing formulae in Section
11.7.4.1,
X
n
Call (t, T0 ; Pfcb (t, Tn ) , Kirs , v) = c̄i Call (t, T0 ; P (t, Ti ) , Pi∗ , v) ,
i=1
where c̄i = δ i−1 Kirs , for i = 1, · · · , n − 1 and c̄n = 1 + δ n−1 Kirs , and Call (t, T0 ; P (t, Ti ) , Pi∗ , v)
is as in Eq. (11.83), with Pi∗ = P (r∗ , T0 , Ti ), and r∗ solution to Eq. (11.77) for K = 1.
As demonstrated in the previous sections, models of the short-term rate can be used to ob-
tain closed-form solutions of virtually every important product of the interest rates derivatives
business. The typical examples are the Vasicek’s model and its perfectly fitting extension. Yet
practitioners have been evaluating caps through the Black’s (1976) formula for years. The
assumption underlying the market practice is that the simply-compounded forward rate is log-
normally distributed. As it turns out, the analytically tractable (Gaussian) short-term rate
322
11.7. Interest rate derivatives c
°by A. Mele
models are not consistent with this assumption. Clearly, the (Gaussian) Vasicek’s model does
not predict that the simply-compounded forward rates are Geometric Brownian motions.24
Can a non-Markovian HJM model address this problem? Yes. However, a practical difficulty
arising with the HJM approach is that instantaneous forward rates are not observed. Does this
compromise the practical appeal of the HJM methodology to the pricing of caps and floors -
which constitute an important portion of the interest rates business? No. Brace, Gatarek and
Musiela (1997), Jamshidian (1997) and Miltersen, Sandmann and Sondermann (1997) observed
that the general HJM framework can be “forced” to address some of the previous difficulties.
The key feature of the models identified by these authors is the emphasis on the dynamics of
the simply-compounded forward rates. An additional, and technical, assumption is that these
simply-compounded forward rates are lognormal under the risk-neutral probability Q. That is,
given a non-decreasing sequence of reset times {Ti }i=0,1,··· , each simply-compounded rate, Fi , is
solution to the following stochastic differential equation:25
dFi (τ )
= mi (τ )dτ + γ i (τ )dW̃ (τ ), τ ∈ [t, Ti ] , i = 0, · · ·, n − 1, (11.86)
Fi (τ )
where Fi (τ ) ≡ F (τ , Ti , Ti+1 ), and mi and γ i are some deterministic functions of time (γ i is
vector valued). On a mathematical point of view, that assumption that Fi follows Eq. (11.86)
is innocous.26
As we shall show, this simple framework can be used to use the simple Black’s (1976) formula
to price caps and floors. However, we need to emphasize that there is nothing wrong with the
short-term rate models analyzed in previous sections. The real advance of the so-called market
model is to give a rigorous foundation to the standard market practice to price caps and floors
by means of the Black’s (1976) formula.
11.7.6.2 Simply-compounded forward rate dynamics
Step 1: Let Pi ≡ P (τ , Ti ), and assume that under the risk-neutral probability Q, Pi is solution
to:
dPi
= rdτ + σ bi dW̃ .
Pi
24 Indeed, P (τ ,Ti )
1 + δ i Fi (τ ) = P (τ ,Ti+1 )
= exp [∆Ai (τ ) − ∆Bi (τ ) r (τ )], where ∆Ai (τ ) = A (τ , Ti ) − A (τ , Ti+1 ), and ∆Bi (τ ) =
B (τ , Ti ) − B (τ , Ti+1 ). Hence, Fi (τ ) is not a Geometric Brownian motion, despite the fact that the short-term rate r is Gaussian
and, hence, the bond price is log-normal. Black ’76 can not be applied in this context.
25 Brace, Gatarek and Musiela (1997) derived their model by specifying the dynamics of the spot simply-compounded Libor
interest rates. Since Fi (Ti ) = L(Ti ) (see Eq. (11.66)), the two derivations are essentially the same.
26 It is well-known that lognormal instantaneous forward rates create mathematical problems to the money market account (see, for
example, Sandmann and Sondermann (1997) for a succinct overview on how this problem is easily handled with simply-compounded
forward rates).
323
11.7. Interest rate derivatives c
°by A. Mele
where σ(τ , ) is the instantaneous volatility of the instantaneous -forward rate as of time
τ . By Itô’s lemma,
∙ ¸
P (τ , Ti ) 1£ ¤
d ln = − kσ bi k2 − kσ b,i+1 k2 dτ + (σ bi − σ b,i+1 ) dW̃ . (11.89)
P (τ , Ti+1 ) 2
Step 3: By Eq. (11.87), the diffusion terms in Eqs. (11.89) and (11.90) have to be the same.
Therefore,
δ i Fi (τ )
σbi (τ ) − σ b,i+1 (τ ) = γ (τ ), τ ∈ [t, Ti ] .
1 + δ i Fi (τ ) i
By summing over i, we get the following no-arbitrage restriction for bond price volatility:
X
i−1
δ j Fj (τ )
σ bi (τ ) − σ b,0 (τ ) = − γ (τ ). (11.91)
j=0
1 + δ j Fj (τ ) j
It should be clear that this relation is merely a restriction to the general HJM framework.
In other words, assume the instantaneous forward rates are as in Eq. (11.53) of Section 11.4.
As we demonstrated in Section 11.4, then, the bond prices volatility is given by Eq. (11.88).
But if we also assume that simply-compounded forward rates are solution to Eq. (11.86), then,
the bond prices volatility is also equal to Eq. (11.91). Comparing Eq. (11.88) with Eq. (11.91)
produces,
Z Ti Xi−1
δ j Fj (τ )
σ(τ , )d = γ j (τ ).
T0 j=0
1 + δ j Fj (τ )
The practical interest to restrict the forward-rate volatility dynamics in this way lies in the
possibility to obtain closed-form solutions for some of the interest rates derivatives surveyed in
Section 11.7.3.
11.7.6.3 Pricing formulae
Caps & Floors
324
11.7. Interest rate derivatives c
°by A. Mele
where EQTi [·] denotes, as usual, the expectation taken under the Ti -forward martingale proba-
F
bility QTFi ; the first equality is Eq. (11.80); and the second equality has been obtained through
the usual change of probability technique introduced Section 11.1.4.
The key point is that
A proof of this statement is in Section 11.1. By Eq. (11.86), this means that Fi−1 (τ ) is solution
to:
dFi−1 (τ ) Ti
= γ i−1 (τ )dW QF (τ ), τ ∈ [t, Ti−1 ] , i = 1, · · ·, n,
Fi−1 (τ )
under QTFi . Therefore, the cap price in Eq. (11.92) reduces to that of Black (1976), once we
assume γ is deterministic:
where
Z
ln Fi−1 (t)
+ 12 s2i Ti−1
d1,i−1 = K
, s2i = γ i−1 (τ )2 dτ .
si t
For sake of completeness, Appendix 8 provides the derivation of the Black’s formula.
Swaptions
By Eq. (11.84), and the equation defining the forward swap rate Rswap (see Eq. (11.79)), the
payoff of the swaption can be written as:
" n #+
X X
n
δ i−1 (F (T0 , Ti−1 , Ti ) − K) P (T0 , Ti ) = δi−1 P (T0 , Ti ) (Rswap (T0 ) − K)+ .
i=1 i=1
This can be dealt with through the so-called forward swap probability. Define the forward
swap probability Qswap by:
Pn Pn
dQswap i=1 δ i−1 P (T0 , Ti ) δ i−1 P (T0 , Ti )
T0
= Pn = P (t, T0 ) Pi=1
n ,
dQF EQT0 [ i=1 δ i−1 P (T0 , Ti )] i=1 δ i−1 P (t, Ti )
F
where
∙ Rthe last equality ¸ follows from the following elementary facts: P (t, Ti )
T0
= E e− t r(τ )dτ P (T0 , Ti ) = P (t, T0 )EQT0 [P (T0 , Ti )] (The first equality is the FTAP, the sec-
F
Therefore,
" n #
X +
pswaption (t) = P (t, T0 ) · EQT0 δi−1 P (T0 , Ti ) (Rswap (T0 ) − K)
F
i=1
" n #
X
= δ i−1 P (t, Ti ) · EQswap (Rswap (T0 ) − K)+ .
i=1
Furthermore, the forward swap rate Rswap is a Qswap -martingale.27 And naturally, it is positive.
Therefore, it must satisfy:
dRswap (τ )
= γ swap (τ )dWswap (τ ), τ ∈ [t, T0 ] ,
Rswap (τ )
where Wswap is a Qswap -Brownian motion, and γ swap (τ ) is adapted. We can use Black 76 to price
the swaption in closed-form, once we assume that γ swap (τ ) is deterministic.
Inconsistencies
An issue is that if F is solution to Eq. (11.86), γ swap can not be deterministic. And as you may
easily conjecture, if you assume that forward swap rates are lognormal, then you don’t end
up with Eq. (11.86). Therefore, you may use Black 76 to price either caps or swaptions, not
both. This limits considerably the importance of market models. A couple of tricks that seem to
work in practice. The best known is based on a suggestion by Rebonato (1998), to replace the
true pricing problem with an approximating pricing problem in which γ swap is deterministic.
That works in practice, but in a world with stochastic volatility, we should expect that trick to
generate unstable things in periods experiencing highly volatile volatility. See, also, Rebonato
(1999) for an essay on related issues. The next section suggests to use numerical approximation
based on Montecarlo techniques.
11.7.6.4 Numerical approximations
Suppose forward rates are lognormal. Then you price caps with Black 76. As regards swaptions,
you may wish to implement Montecarlo integration as follows.
By a change of measure,
" Ã n !+ #
T0 X
pswaption (t) = E e− t r(τ )dτ δ i−1 (F (T0 , Ti−1 , Ti ) − K) P (T0 , Ti )
i=1
" n #+
X
= P (t, T0 )EQT0 δ i−1 (F (T0 , Ti−1 , Ti ) − K) P (T0 , Ti ) ,
F
i=1
326
11.7. Interest rate derivatives c
°by A. Mele
where the second line follows from Eq. (11.91) in the main text. Replacing this into Eq. (11.94)
leaves:
dFi−1 (τ ) X
i−1
δ j Fj (τ ) T0
= γ i−1 (τ ) γ j (τ )dτ + γ i−1 (τ )dW QF (τ ), i = 1, · · ·, n.
Fi−1 (τ ) j=0
1 + δ j Fj (τ )
These can easily be simulated with the methods described in any standard textbook such as
Kloeden and Platen (1992).
11.7.6.5 Volatility surfaces
The market practice relies on the models of this section, rather than those of Sections 11.7.4.1-
11.7.4.2, in providing volatility surfaces. In the models of Sections 11.7.4.1-11.7.4.2, volatility
surfaces might be produced, but only indirectly, after calibration of the two parameters κ and
σ, as Eq. (11.82) indicates. It is easier, however, to provide volatility surfaces in the first place,
through the models of this section. Quite simply, traders use Eq. (11.93) and quote volatilities
such that the market price of a cap equals to the value predicted by Eq. (11.93) using the
desired implied volatility si . In Eq. (11.93),
p
si = Ti−1 − t · γ (i) ,
for some γ (i), although traders simply quote the value of γ i that satisfies:
X
n
γ̂ n : p$cap (t; n) = δ i−1 P (t, Ti ) · Black76 (Fi−1 (t) ; K, ŝi,n ) ,
i=1
Given n, we can bootstrap γ̂ (i), i.e. we can recursively solve for γ̂ (i), as follows:
X
n
0= δ i−1 P (t, Ti ) · [Black76 (Fi−1 (t) ; K, ŝi,n ) − Black76 (Fi−1 (t) ; K, ŝi )] , n = 1, · · ·, N,
i=1
√
where N is the latest available maturity, and ŝi = Ti−1 − t· γ̂ (i). The values of γ̂ (i) constitute
what is known as the term structure of caps volatilities.
327
11.8. Appendix 1: The FTAP for bond prices c
°by A. Mele
dPi
= μbi · dτ + σ bi · dW, i = 1, · · ·, m, (11A.1)
Pi
where W is a Brownian motion in Rd , and μbi and σ bi are progressively F(τ )-measurable functions
guaranteeing the existence of a strong solution to the previous system (σ bi is vector-valued). The value
process V of a self-financing portfolio in these m bonds and a money market technology satisfies:
h i
dV = π > (μb − 1m r) + rV dτ + π > σ b dW,
Next, suppose that there exists a portfolio π such that π > σ b = 0. This is an arbitrage opportunity if
there exist events for which at some time, μb − 1m r 6= 0 (use π when μb − 1m r > 0, and −π when
μm − 1d r < 0: the drift of V will then be appreciating at a deterministic rate that is strictly greater
than r). Therefore, arbitrage opportunities are ruled out if:
In other terms, arbitrage opportunities are ruled out when every vector in the null space of σ b is
orthogonal to μb − 1m r, or when there exists a λ taking values in Rd satisfying some basic integrability
conditions, and such that
μb − 1m r = σ b λ
or,
μbi − r = σ bi λ, i = 1, · · ·, m. (11A.2)
In this case,
dPi
= (r + σ bi λ) · dτ + σ bi · dW, i = 1, · · ·, m.
Pi
R RT > R
1 T 2
Now define W̃ = W + λdτ , dQ dP = exp(− t λ dW − 2 t kλk dτ ). The Q-martingale property of
the “normalized” bond price processes now easily follows by Girsanov’s theorem. Indeed, define for a
generic i, P (τ , T ) ≡ P (τ , Ti ) ≡ Pi , and:
τ
g(τ ) ≡ e− t r(u)du
· P (τ , T ), τ ∈ [t, T ] .
or τ
h T
i h T
i
r(u)du
P (τ , T ) = e t · E e− t r(u)du
= E e− τ r(u)du
, all τ ∈ [t, T ],
328
11.8. Appendix 1: The FTAP for bond prices c
°by A. Mele
• In Section 11.3, it is assumed that the primitive of the economy is the short-term rate, solution
of a multidimensional diffusion process, and μbi and σ bi will be derived via Itô’s lemma.
• In Section 11.4, μbi and σ bi are restricted through a model describing the evolution of the forward
rates.
329
11.9. Appendix 2: Certainty equivalent interpretation of forward prices c
°by A. Mele
But in the applications we have in mind, S(T ) is random. Define then its certainty equivalent by the
number S(T ) that solves: h i
T
P (t, T ) · S(T ) = E e− t r(τ )dτ · S(T ) ,
or
S(T ) = E [η T (T ) · S(T )] , (11A.3)
where η T (T ) has been defined in (11.15).
Comparing Eq. (11A.3) with Eq. (11.14) reveals that forward prices can be interpreted in terms of
the previously defined certainty equivalent.
330
11.10. Appendix 3: Additional results on T -forward martingale probabilities c
°by A. Mele
Rτ
By the FTAP, {exp(− t r(u)du) · P (τ , T )}τ ∈[t,T ] is a Q-martingale (see Appendix 1 to this chapter).
¯
dQT ¯
Therefore, E[ dQF ¯ Fτ ] = E[ ηT (T )| Fτ ] = η T (τ ) all τ ∈ [t, T ], and in particular, ηT (t) = 1. We now
show that this works. And at the same time, we show this by deriving a representation of ηT (τ ) that
can be used to find “forward premia”.
We begin with the dynamic representation (11A.1) given for a generic bond price # i, P (τ , T ) ≡
P (τ , Ti ) ≡ Pi :
dP
= μ · dτ + σ · dW,
P
where we have defined μ ≡ μbi and σ ≡ σ bi .
Under the risk-neutral probability Q,
dP
= r · dτ + σ · dW̃ ,
P
R
where W̃ = W + λ is a Q-Brownian motion.
By Itô’s lemma,
dη T (τ )
= − [−σ(τ , T )] · dW̃ (τ ), η T (t) = 1.
η T (τ )
The solution is:
∙ Z Z τ ¸
1 τ 2
η T (τ ) = exp − kσ(u, T )k du − (−σ(u, T )) · dW̃ (u) .
2 t t
Under the usual integrability conditions, we can now use the Girsanov’s theorem and conclude that
Z τ³ ´
QT
W (τ ) ≡ W̃ (τ ) +
F −σ(u, T )> du (11A.4)
t
Therefore,
T
Z τ h i
T
QFi QFi−1
W (τ ) = W (τ ) − σ(u, Ti )> − σ(u, Ti−1 )> du, i = 1, 2, · · ·, (11A.5)
t
is a Brownian motion under the Ti -forward martingale probability QTFi . Eqs. (11A.5) and (11A.4) are
used in Section 11.7 on interest rate derivatives.
331
11.11. Appendix 4: Principal components analysis c
°by A. Mele
where var (Y1 ) = C1> ΣC1 , and the constraint is an identification constraint. The first order conditions
lead to,
(Σ − λI) C1 = 0,
where λ is a Lagrange multiplier. The previous condition tells us that λ must be one eigenvalue of
the matrix Σ, and that C1 must be the corresponding eigenvector. Moreover, we have var (Y1 ) =
C1> ΣC1 = λ which is clearly maximized by the largest eigenvalue. Suppose that the eigenvalues of Σ
are distinct, and let us arrange them in descending order, i.e. λ1 > · · · > λp . Then,
var (Y1 ) = λ1 .
¡ ¢
Therefore, the first principal component is Y1 = C1> R − R̄ , where C1 is the eigenvector corresponding
to the largest eigenvalue, λ1 .
Next, consider the second principal component. The program is, now,
where var (Y2 ) = C2> ΣC2 . The first constraint, C2> C2 = 1, is the usual identification constraint. The
second constraint, C2> C1 = 0, is needed to ensure that Y1 and Y2 are orthogonal, i.e. E (Y1 Y2 ) = 0.
The first order conditions for this problem are,
where λ is the Lagrange multiplier associated with the first constraint, and ν is the Lagrange multiplier
associated with the second constraint. By pre-multiplying the first order conditions by C1> ,
0 = C1> ΣC2 − ν,
where we have used the two constraints C1> C2 = 0 and C1> C1 = 1. Post-multiplying the previous
expression by C1> , one obtains, 0 = C1> ΣC2 C1> − νC1> = −νC1> , where the last equality follows by
C1> C2 = 0. Hence, ν = 0. So the first order conditions can be rewritten as,
(Σ − λI) C2 = 0.
11.12 Appendix 5: A few analytics for the Hull and White model
As in the Ho and Lee model, the instantaneous forward rate f (τ , T ) predicted by the Hull and White
model is as in (11.45), where functions A2 and B2 can be easily computed from Eqs. (11.49) and
(11.50) as:
Z T Z T
A2 (τ , T ) = σ 2 B(s, T )B2 (s, T )ds − θ(s)B2 (s, T )ds, B2 (τ , T ) = e−κ(T −τ ) .
τ τ
Therefore, the instantaneous forward rate f (τ , T ) predicted by the Hull and White model is obtained
by replacing the previous equations in Eq. (11.45). The result is then equated to the observed forward
rate f$ (t, τ ) so as to obtain:
σ2 h i2 Z τ
f$ (t, τ ) = − 2 1 − e−κ(τ −t) + θ(s)e−κ(τ −s) ds + e−κ(τ −t) r(t).
2κ t
333
11.13. Appendix 6: Expectation theory and embedding in selected models c
°by A. Mele
where Z T
α(τ , T ) = σ(τ , T ) σ(τ , )d + σ(τ , T )λ(τ ) = σ 2 (T − τ ) + σλ.
τ
Hence, Z τ
1
α(s, τ )ds = σ 2 (τ − t)2 + σλ(τ − t).
t 2
Finally,
1
r(τ ) = f (t, τ ) + σ 2 (τ − t)2 + σλ(τ − t) + σ (W (τ ) − W (t)) ,
2
and since E ( W (τ )| F(t)) = W (t),
1
E [ r(τ )| F(t)] = f (t, τ ) + σ 2 (τ − t)2 + σλ(τ − t).
2
Even with λ < 0, this model is not able to always generate E[ r(τ )| F(t)] < f(t, τ ). As shown in the
following exercise, this is due to the nonstationary nature of the volatility function. Indeed, suppose,
next, that instead of Eq. (11A.6), we have that
where
Z
2 −γ(τ −s)
τ
σ 2 h −γ(τ −s) i
α(s, τ ) = σ e e−γ( −s)
d + σλe−γ(τ −s) = e − e−2γ(τ −s) + σλe−γ(τ −s) .
s γ
Finally,
Z τ
σ³ ´∙ σ ³ ´ ¸
−γ(τ −t) −γ(τ −t)
E [ r(τ )| F(t)] = f (t, τ ) + α(s, τ )ds = f (t, τ ) + 1−e 1−e +λ .
t γ 2γ
σ
Therefore, it is sufficient to have a risk-premium such that −λ > 2γ , to generate the prediction that:
In other words, λ < 0 is a necessary condition, not sufficient. Notice that when λ = 0, it always holds
that E ( r(τ )| F(t)) > f (t, τ ).
334
11.13. Appendix 6: Expectation theory and embedding in selected models c
°by A. Mele
B. Embedding
We now embed the Ho and Lee model in Section 11.5.2 in the HJM format. In the Ho and Lee model,
σ(t, T ) = B2 (t, T ) · σ = σ,
α(t, T ) − σ(t, T )λ(t) = −A12 (t, T ) + B12 (t, T )r + B2 (t, T )θ(t) = σ2 (T − t).
Next, we embed the Vasicek model in Section 11.4 in the HJM format. The Vasicek model is:
σ2 h i
α(t, T ) − σ(t, T )λ(t) = −A12 (t, T ) + B12 (t, T )r + (θ − κr)B2 (t, T ) = 1 − e−κ(T −t) e−κ(T −t) .
κ
Naturally, this model can never be embedded within a HJM model because it is not of the perfectly
fitting type. In practice, condition (11.60) can never hold in the simple Vasicek model. However, the
model is embeddable once θ is turned into an infinite dimensional parameter à la Hull and White (see
Section 11.3).
335
11.14. Appendix 7: Additional results on string models c
°by A. Mele
336
11.15. Appendix 8: Change of numeraire techniques c
°by A. Mele
dA
= μA dτ + σ A dW,
A
and consider a similar process B with coefficients μB and σ B . We have:
d(A/B) ¡ ¢
= μA − μB + σ 2B − σ A σ B dτ + (σ A − σ B ) dW. (11A.7)
A/B
P (τ ,S)
Next, let us apply this change-of-numeraire result to the process y(τ , S) ≡ P (τ ,T ) under QSF and
under QTF . The goal is to obtain the solution as of time T of y(τ , S) viz
P (T, S)
y(T, S) ≡ = P (T, S) under QSF and under QTF .
P (T, T )
dP (τ , x)
= rdτ − σB(τ , x)dW̃ (τ ), x ≥ T.
P (τ , x)
dy(τ , S) £ ¤
= σ 2 B(τ , T )2 − B(τ , T )B(τ , S) dτ − σ [B(τ , S) − B(τ , T )] dW̃ (τ ). (11A.8)
y(τ , S)
All we need to do now is to change measure with the tools of Appendix 3. We have that:
x
dW QF (τ ) = dW̃ (τ ) + σB(τ , x)dτ
x
is a Brownian motion under the x-forward martingale probability. Replace then W QF into Eq. (11A.8),
then integrate, and obtain:
y(T, S) P (t, T ) 1 2 T 2 T QT
= P (T, S) = e− 2 σ t [B(τ ,S)−B(τ ,T )] dτ +σ t [B(τ ,S)−B(τ ,T )]dW F (τ ) ,
y(t, S) P (t, S)
y(T, S) P (t, T ) 1 2 T 2 T QS
= P (T, S) = e 2 σ t [B(τ ,S)−B(τ ,T )] dτ +σ t [B(τ ,S)−B(τ ,T )]dW F (τ ) ,
y(t, S) P (t, S)
B. Black (1976)
To prove Eq. (11.93) is equivalent to evaluate the following expectation:
E [x(T ) − K]+ ,
where
1 T T
γ(τ )2 dτ +
x(T ) = x(t)e− 2 t t γ(τ )dW̃ (τ )
. (11A.9)
337
11.15. Appendix 8: Change of numeraire techniques c
°by A. Mele
dQx x(T ) 1 T
γ(τ )2 dτ + T
= = e− 2 t t γ(τ )dW̃ (τ )
,
dQ x(t)
dW x (τ ) = dW̃ (τ ) − γdτ
Applying this to EQTi [Fi−1 (Ti−1 ) − K]+ gives the formulae of the text.
F
338
11.15. Appendix 8: Change of numeraire techniques c
°by A. Mele
References
Aı̈t-Sahalia, Y. (1996): “Testing Continuous-Time Models of the Spot Interest Rate.” Review
of Financial Studies 9, 385-426.
Balduzzi, P., S. R. Das, S. Foresi and R. K. Sundaram (1996): “A Simple Approach to Three
Factor Affine Term Structure Models.” Journal of Fixed Income 6, 43-53.
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Brace, A., D. Gatarek and M. Musiela (1997): “The Market Model of Interest Rate Dynamics.”
Mathematical Finance 7, 127-155.
Brémaud, P. (1981): Point Processes and Queues: Martingale Dynamics. Berlin: Springer Ver-
lag.
Cox, J. C., J. E. Ingersoll and S. A. Ross (1985): “A Theory of the Term Structure of Interest
Rates.” Econometrica 53, 385-407.
Dai, Q. and K. J. Singleton (2000): “Specification Analysis of Affine Term Structure Models.”
Journal of Finance 55, 1943-1978.
Duffie, D. and R. Kan (1996): “A Yield-Factor Model of Interest Rates.” Mathematical Finance
6, 379-406.
Fong, H. G. and O. A. Vasicek (1991): “Fixed Income Volatility Management.” The Journal
of Portfolio Management (Summer), 41-46.
Geman, H. (1989): “The Importance of the Forward Neutral Probability in a Stochastic Ap-
proach to Interest Rates.” Unpublished working paper, ESSEC.
Geman H., N. El Karoui and J. C. Rochet (1995): “Changes of Numeraire, Changes of Prob-
ability Measures and Pricing of Options.” Journal of Applied Probability 32, 443-458.
Goldstein, R. S. (2000): “The Term Structure of Interest Rates as a Random Field.” Review
of Financial Studies 13, 365-384.
339
11.15. Appendix 8: Change of numeraire techniques c
°by A. Mele
Heath, D., R. Jarrow and A. Morton (1992): “Bond Pricing and the Term-Structure of Interest
Rates: a New Methodology for Contingent Claim Valuation.” Econometrica 60, 77-105.
Ho, T. S. Y. and S.-B. Lee (1986): “Term Structure Movements and the Pricing of Interest
Rate Contingent Claims.” Journal of Finance 41, 1011-1029.
Hull, J. C. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (Inter-
national Edition).
Hull, J. C. and A. White (1990): “Pricing Interest Rate Derivative Securities.” Review of
Financial Studies 3, 573-592.
Jagannathan, R. (1984): “Call Options and the Risk of Underlying Securities.” Journal of
Financial Economics 13, 425-434.
Jamshidian, F. (1989): “An Exact Bond Option Pricing Formula.” Journal of Finance 44,
205-209.
Jamshidian, F. (1997): “Libor and Swap Market Models and Measures.” Finance and Stochas-
tics 1, 293-330.
Karlin, S. and H. M. Taylor (1981): A Second Course in Stochastic Processes. San Diego:
Academic Press.
Kennedy, D. P. (1994): “The Term Structure of Interest Rates as a Gaussian Random Field.”
Mathematical Finance 4, 247-258.
Knez, P. J., R. Litterman and J. Scheinkman (1994): “Explorations into Factors Explaining
Money Market Returns.” Journal of Finance 49, 1861-1882.
Langetieg, T. (1980): “A Multivariate Model of the Term Structure of Interest Rates.” Journal
of Finance 35, 71-97.
Litterman, R. and J. Scheinkman (1991): “Common Factors Affecting Bond Returns.” Journal
of Fixed Income 1, 54-61.
Litterman, R., J. Scheinkman, and L. Weiss (1991): “Volatility and the Yield Curve.” Journal
of Fixed Income 1, 49-53.
340
11.15. Appendix 8: Change of numeraire techniques c
°by A. Mele
Longstaff, F. A. and E.S. Schwartz (1992): “Interest Rate Volatility and the Term Structure:
A Two-Factor General Equilibrium Model.” Journal of Finance 47, 1259-1282.
Mele, A. (2003): “Fundamental Properties of Bond Prices in Models of the Short-Term Rate.”
Review of Financial Studies 16, 679-716.
Mele, A. and F. Fornari (2000): Stochastic Volatility in Financial Markets: Crossing the Bridge
to Continuous Time. Boston: Kluwer Academic Publishers.
Merton, R.C. (1973): “Theory of Rational Option Pricing.” Bell Journal of Economics and
Management Science 4, 141-183.
Miltersen, K., K. Sandmann and D. Sondermann (1997): “Closed Form Solutions for Term
Structure Derivatives with Lognormal Interest Rate.” Journal of Finance 52, 409-430.
Santa-Clara, P. and D. Sornette (2001): “The Dynamics of the Forward Interest Rate Curve
with Stochastic String Shocks.” Review of Financial Studies 14, 149-185.
Stanton, R. (1997): “A Nonparametric Model of Term Structure Dynamics and the Market
Price of Interest Rate Risk.” Journal of Finance 52, 1973-2002.
341
12
Risky debt and credit derivatives
12.1 Introduction
12.2 Conceptual approaches to valuation of defaultable securities
12.2.1 Firm’s value, or structural, approaches
Relies on the structure of the firm. Shares and bonds as derivatives on the firm’s assets.
At the time of debt expiration, debtholders receive the minimum between the debt nominal value
and the value of the assets the firm can liquidate to honour the debt obligation. Debtholders are
senior claimants. Equity holders are residual claimants to the firm’s assets –> Junior claimants
We can use these basic insights to illustrate the first model about the risk-structure of interest
rates, the Merton - KMV approach. Equity is like a European call option written on the firm’s
assets, with expiration equal to the debt expiration, and strike equal to the nominal value of
debt. Current value of debt equals the value of the assets minus the value of equity, i.e. the
value of a risk-free discount bond minus the value of a put option on the firm with strike price
= nominal value of debt, as shown below: see Eq. (12.3).
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
Merton (1974) uses the Black and Scholes (1973) formula to derive the price of debt. Main
assumption: Assets can be traded, and their value At satisfies,1
dAt
= rdt + σdW̃t , (12.1)
At
where W̃t is a Brownian motion under the risk-neutral probability, σ is the instantaneous
standard deviation, and r is the short-term rate on riskless bonds.
Let N be the nominal value of debt, T be time of expiration of debt; Dt the debt value as
of at time t ≤ T . As argued earlier, shareholders have long a European call option, and the
bondholders are residual claimants. Mathematically,
½
AT , if the firm defaults, i.e. AT < N
DT =
N, if the firm is solvent, i.e. AT ≥ N
We can decompose the assets value at time T , into the sum of the value of equity and the value
of debt, at time T ,
DT = min {AT , N} = AT − max {AT − N, 0} . (12.2)
≡ Equity at T
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
A_T
FIGURE 12.1. Dashed line: the value of equity at the debt maturity, T , max {AT − N, 0}, plotted as a
function of the value of assets, AT . Solid line: the value of debt at maturity, min {AT , N } as a function
of AT . Nominal value of debt is fixed to N = 1.
A word on convexity, and risk-taking behavior. Convexity: Managers have incentives to invest
in risky assets, as the terminal payoff to them is increasing in the assets volatility, σ. Concavity:
The value of debt, instead, is decreasing in the assets volatility.
1 Eq.(12.1) could be generalized to one in which dAt = (rAt − δ t ) dt + σAt dW̃t , where δt is the instantaneous cash flow to the
firm. This would make the firm value equal to A0 = E 0∞ e−rt δ t dt . For example, one could take δ t to be a geometric Brownian
motion with parameters g and σ, in which case At = (r − g)−1 δt , forever, but we’re just ignoring this complication.
343
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
12.2.1.1 Merton
The current value of the bonds equal the current value of the assets, A0 , minus the current value
of equity. The current value of equity can obtained through the Black & Scholes formula, as
equity is a European call option on the firm, struck at N. By standard risk-neutral evaluation,
then, the current value of debt, D0 , is,
D0 = A0 Φ (−d1 ) + N e−rT Φ (d2 ) , (12.4)
¡ ¢
log (A0 / N) + r + 12 σ 2 T √
d1 = √ , d2 = d1 − σ T
σ T
where Φ (·) denotes the distribution function of a standard normal variable.2
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A_0
FIGURE 12.2. Solid line: the no-arbitrage bound, min {A0 , N }, depicted as a function of A0 , when
the nominal value of debt is fixed to N = 1. Dashed line: the bond value predicted by the Merton’s
model when T = 1, r = 3% and σ = 20%, annualized. Dotted line: same as the dashed line, but with
a larger asset volatility, σ = 40%.
Bond prices are decreasing in the asset volatility as bad outcomes are exaggerated on the
downside, due to the concavity properties depicted in Figure 12.1.
The risk-structure of interest rates is obtained with the standard formula for continuously
compounded interest rates as,
µ ¶
1 D0
R = − log = r + Spread,
T N
where the term-spread or, simply, the spread, is
µ ¶
1 A0 rT
Spread = − log e Φ (−d1 ) + Φ (d2 ) . (12.5)
T N
2 For the computation details, note that D0 = e−rT E [ DT | A0 ] and, then, by Eq. (12.2),
D0 = e−rT E ( AT | A0 ) − e−rT E [ max {AT − N, 0}| A0 ] = A0 − A0 Φ (d1 ) − Ne−rτ Φ (d2 ) ,
where the last equality follows by the Black & Scholes formula. Eq. (12.4) follows after rearranging terms in the previous equation.
344
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
Spread
0.03
0.02
0.01
0.00
0 1 2 3 4 5
Time to maturity
FIGURE 12.3. The term structure of spreads, s0 , obtained with initial asset values A0 = 1.1 (solid
line), A0 = 1.2 (dashed line), and A0 = 1.3 (dotted line). The short-term rate, r = 3%, and asset
volatility is σ = 0.20. Nominal debt N = 1.
We can introduce a useful summary statistics for credity quality: distance-to-default (under
Q). We can use the previous model to estimate the likelihood of default for a given firm. First,
we develop Eq. (12.2),
where I{E} is the indicator function, i.e. I{E} = 1 if the event E is true and I{E} = 0 if the event
E is false. Second, we have,
D0 = e−rT E (DT )
£ ¡ ¢ ¡ ¢¤
= e−rT E AT · I{AT <N} + N · E I{AT ≥N}
= e−rT [E (AT | Default) Q (Default) + N · Q (Survival)] , (12.6)
where E (AT | Default) is the expected asset value given the event of default, Q (Default) is the
probability of default, and Q (Survival) = 1 − Q (Default) is the probability the firms does not
default.
Comparing Eq. (12.6) with Eq. (12.4) reveals that for the Merton’s model,
Q (Survival) = Φ (d2 ) .
345
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
1.0
Pr(surv)
0.9
0.8
0.7
0.6
0.5
0.0 0.1 0.2 0.3 0.4
sigma
FIGURE 12.4. Probability of survival for a given firm predicted by the Merton’s model, Φ (d2 ), depicted
as a function of the asset volatility, σ. Assets value is fixed at A0 = 1.1, and plotted are survival
probabilities for bonds maturing at T = 0.5 years (solid line), T = 1 year (dashed line) and T = 2
years (dotted line). The short-term rate, r = 3%. Nominal debt N = 1.
The probability of survival, (i) decreases with debt maturity and (ii) the asset volatility.
Property (i) is not a general property, though. With lower values of A0 , the relation between
maturity and probability of survival can be increasing or decreasing, according to the values of
σ, as shown in Figure 12.5.
346
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
1.0
Pr(surv)
0.9
0.8
0.7
0.6
0.5
FIGURE 12.5. Probability of survival for a given firm predicted by the Merton’s model, Φ (d2 ), depicted
as a function of the asset volatility, σ. Assets value is fixed at A0 = 1.01, and plotted are survival
probabilities for bonds maturing at T = 0.5 years (solid line), T = 1 year (dashed line) and T = 2
years (dotted line). The short-term rate, r = 3%. Nominal debt N = 1.
Loss-given-default is defined as the fraction of the bond value the bondholders expect to loose
in the event of default, at maturity:
The timing of default can be triggered by some exogeneously specified events. For example,
default occurs if the value of the assets hits some exogenously lower bound even before the
expiration of debt. These models are known as “first passage” models, because they rely on
mathematical techniques that solve for the probability the first time the asset value hit some
exogenous “barrier,” as in Black and Cox (1976).
The timing of default can be endogenous. Managers choose the defaulting barrier (i.e. the
asset value that triggers bankruptcy) so as to maximize the firm’s value. Naturally, strategic
defaulting works if the assumptions underlying the Modigliani-Miller theorem do not hold. The
mechanism is as follows: on the one hand, debt is a tax-shielding device. On the other hand,
issuing too much debt increases the likelihood of default, which triggers bankruptcy costs. The
first effect raises the value of the firm while the second, decreases the value of the firm. Equity
holders choose the value of the asset that triggers bankruptcy to maximize the value of equity.
Leland (1994): Long-term debt. Leland and Toft (1996): Extension to finite maturity debt.
Anderson and Sundaresan (1996): Debt re-negotiation.
The Leland model works as follows. First, debt is infinitely lived in that, it pays off an
instantaneous coupon equal to Cdt. In the absence of default risk, debt would simply equal
C/r. Second, tax benefits are proportional to the coupon, τ Cdt. Third, there are bankruptcy
348
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
By Itô’s lemma, this is an ordinary differential equation, subject to the following boundary
conditions. At bankruptcy, D (VB ) = (1 − α) VB . For large V , debt is substantially riskless, i.e.
limV →∞ D (V ) = Cr .
The solution to this is,
C
D (V ) = (1 − pB (V )) + pB (V ) [(1 − α) VB ] ,
r
where µ ¶ 2r2
VB σ
pB (V ) ≡ .
V
Note, we may interpret pB (V ) as the present value of $1 contingent on future bankruptcy.
Accordingly, the total benefits arising from tax shielding are,
C
T B (V ) = (1 − pB (V )) τ .
r
and the present value of bankruptcy costs is,
BC (V ) = pB (V ) αVB .
We have,
Value of the firm = Equity + Debt
= Value of Assets (V ) + T B (V ) − BC (V ) .
Summing up,
C
E (V ) ≡ Equity = V − (1 − pB (V )) (1 − τ ) − pB (V ) VB .
r
Equity equals (i) the value of the assets, V ; minus (ii) the present value of debt contingent
on no-bankruptcy, (1 − pB (V )) (1 − τ ) Cr ; minus (iii) the present value of debt contingent on
bankruptcy net of bankruptcy costs, pB (V ) VB . The second term decreases with the default
boundary, VB or, equivalently, pB (V ). The third term, instead, increases with VB . So the time
equityholder wait before declaring bankruptcy, which is inversely related to VB , affects in op-
posite ways the last two terms. Equityholders choose VB to maximize the value of equity. Their
solution is a default boundary, VB , such that the value of equity does not change for small
changes in the value of the assets V around VB , or VB : E 0 (V )|V =VB = 0, a smooth pasting
condition. The result is,
C
VB = (1 − τ ) .
r + 12 σ 2
How is it that tax shielding does not seem to affect the existence of a solution to this problem?
That is, the default boundary, VB , still exists, even with τ = 0. This issue is easily resolved.
If τ = 0, there are no reasons to issue debt in the first place, as no tax benefits would flow
to the firm, thereby increasing its value! In fact, when τ > 0, there is a value of leverage that
maximizes the value of the firm, according to simulations reported in Leland (1994).
349
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
Pros. First, they allow to think about more complicated structures or instruments easily (e.g.,
convertibles). Second, they lead to simple yet consistent relations between different securities
issued by the same name. Structural approach were very useful for theoretical research in the
1990s.
Cons. The firm’s asset value and asset volatility are not observed. Must rely on calibra-
tion/estimation methods. Bond prices generated by the model 6= market prices. That is, traders
can not use these models. Traders would require theoretical prices that exact match the market
prices. How do we go for sovereign issuers?
Most important. Structural models predict unrealistically low short-term spreads: see, e.g.,
Figure 12.3. The intuition is that diffusion processes are smooth; hence, the probability of
default tends to zero maturity approaches, because default can not just jump in an unexpected
way. This is not what we exactly observe. Jumps seem to be a more realistic device to modeling
spreads.
We model the arrival of defaults through the Poisson processes introduced in Chapter 4, as
follows. Suppose to “count” the number of times some event happens. Denote with Nt the
corresponding “counting process”.
Nt
D efault
t0 t1 t2 t3
FIGURE 12.6.
350
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
Default time is simply defined as t0 , i.e. the first time Nt “jumps,” as in Figure 12.6. So
assume we chop a given interval [0, T ] in n pieces, and consider each resulting interval ∆t = Tn .
Assume jump probability over each of these small intervals of time ∆t is proportional to ∆t,
with proportionality factor equal to λ,
Assume the number of jumps over the n intervals follows a binomial distribution:
µ ¶
n k T
Pr {k jumps over [0, T ]} = p (1 − p)n−k , where p = λ .
k n
We can use the previous basic computations to come up with a few fundamental properties
for the distribution of default. We have,
where Rec is the expected recovery rate. Using the probabilities predicted by the Poisson model,
we obtain: ¡ ¢
B0 = Rec · 1 − e−λT + N · e−λT . (12.9)
The Appendix supplies an alternative derivation of Eq. (12.9).
The implications for spreads, for small maturities T , are easily seen, after some innocous ap-
proximations,
µ ¶ µ ¶
1 B0 1 B0 1¡ ¢
Spread = − log ≈− −1 = 1 − e−λT · Loss-given-default.
T N T N T
Note, for T small, the spread is not zero, as in the previous structural models. Rather, it
is given by the expected default loss per period, defined as the instantaneous probability of
default times loss-given-default,
Figure 12.7 depicts the behavior of the spread predicted by the model at all maturities, given
by,
µ ¶ µ ¶
1 B0 1 Rec ¡ −λT
¢ −λT
Spread = − log = − log · 1−e +e .
T N T N
240
Spread
239
238
237
236
235
234
233
232
231
0 1 2 3 4 5
Time to maturity
FIGURE 12.7. The term structure of spreads (in basis points) implied by an intensity model, with
recovery rate equal to 40% and intensity equal to λ = 0.04, implying an expected time-to-default equal
to λ−1 = 25 years.
It’s a decreasing function in time-to-maturity. Eventually, as time to maturity gets large, the
bond becomes certain to default, thus behaving as a bond without default risk–the bond is
certain to deliver the recovery rate. Indeed, in Appendix 1, we show that if the recovery rate is
not constant, but shrinks exponentially to zero as RecT ≡ R · e−κT , for two constants R and κ,
then, asymptotically, the spread is
½
λ, if κ ≥ λ
lim s (T ) = (12.10)
T →∞ κ, if κ ≤ λ
A few additional issues. λ is the risk-neutral instantaneous probability of default, not the
physical probability of default, λ∗ say. The ratio λ/λ∗ is generally larger than one. Its inverse,
λ∗ /λ, is an indicator of the risk-appetite in the credit market. Similarly, loss-given-default is
an expectation under the risk-neutral probability, and should contain useful indications about
market participants risk appetite.
12.2.3 Ratings
In practice, corporate debt is rated by rating agencies, such as Moodys and Standard and Poors.
Depending on the rating, corporate debt may be either investment grade or non-investment
grade (“junk”). Moodys rating go from Aaa to C. Standard and Poor’s go from Aaa to D.
One can compute the probability of “migrations” based on past experience −→ Transition
352
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
12.2.3.1 Foundations
A natural approach, then, is to assess credit risk by making reference to probabilities of default
built up on transition probabilities like those in Table 12.1.
Such an approch, also known as a migration approach, is somewhat less drastic than that
based on rare events, and hopefully more realistic. However, it is also technically more complex
than the intensity approach of the previous section. We provide the most foundational issues
related to this approach, leaving some details in the Appendix.
At time t, there exists several rating classes, N say, denoted as Ratt ,
Ratt ∈ {1, 2, · · ·, N} .
We can build a Markov chain from here, by assuming that P (T − t)ij only depends on T − t.
Finally, we must have that,
XN
P (T − t)ij ≥ 0 and P (T − t)ij = 1.
j=1
For example, the probability of transition from rating Ratt = i to rating Ratt+1 = j in one
year is, P (1)ij . Table 12.1 contains one possible example of P (1)ij . The probability of transition
from rating Ratt = i to rating Ratt+2 = j in two years is P (2)ij , and is obtained as follows,
X
N
P (2)ij = P (1)ik · P (1)
k=1
| {z } | {z kj}
Pr(transition from i to k in one year) Pr(transition from k to j in one further year)
More generally, we have, P (T ) = P (1)T .where P (T ) is the matrix with elements {P (T )ij }.
353
12.2. Conceptual approaches to valuation of defaultable securities c
°by A. Mele
The previous probabilities, {P (T )ij }, are meant to be taken under the physical world, not
the risk-neutral world. They can be used for risk-management purposes, but certainly not for
pricing. Indeed, historical default rates are too low to explain the price of defaultable securities.
A natural explanation relies on the presence of risk-premia. To use migration data for pricing,
it is vital to implement a number of steps.
First, clean up the data – smoothing. For example, it might well be that downgrades from
class i to class i + 2 are more frequent than downgrades from class i to class i + 1. Moreover,
remove zero entries: although some rating event dids not happen in the past, they might well
occur in the future. Second, add positive risk-premia to the previous smoothed data so as to
obtain realistic asset prices.
As regards pricing, according to the migration model, there are N classes of assets. Each
single asset may migrate from one class to another. Because evaluation is a dynamic business,
we can not evaluate defaultable securities within a given asset class without simultaneously
evaluate the defaultable securities in the remaining classes. For example, there could be a
chance that a given asset will “mutate” into a different one in the next year. Given this, the
price of this asset, today, must reflect the price of the asset in the other classes where it can
possibly migrate. Hence, we must simultaneously solve for all the asset prices in all the rating
classesThis approach, developed by Jarrow, Lando and Turnbull (1997), is quite complex and
is given a succinct account in the Appendix.
Consider the simplest case, which arises when the expected recovery rate is zero. In this case,
by Eq. (12.6),
D0,i
= e−rT (1 − Qi (T − t)) ,
N
where Qi (T − t) is the risk-neutral probability the firm defaults, by time T , given it belongs
to rating i at time T .
354
12.3. Derivatives on corporate assets c
°by A. Mele
The risk neutral probabilities, Qi (T − t), must be found using migration frequencies such as
those in Table 12.1, to which we must add appropriate risk-premia.
Therefore, we see that the price of a callable bond with maturity date S, equals the price of
a non-callable bond with the same maturity date S, minus the value to call the bond, which
equals the price of an hypothetical option on the non-callable bond, struck at K.
We can apply these insights to price a callable option in a concrete example. Consider, for
example, the short-term rate in the Vasicek’s model discussed in Chapter 11. Then, if the
short-term rate is r (t) at time t, the value as of time t of the non-defaultable zero coupon bond
maturing at time S, callable at time T < S, at a strike price equal to K, is,
where P (r (t) , t, S) is the value of the non-callable zero maturing at time S, and C b (r(t), t, T, S)
is the value of a call option on the non-callable S-zero, maturing at time T and having a strike
price equal to K.
Eq. (12.11) shows that the presence of the option to call the bond raises the cost of capital
for the issuer.
In the context of the Vasicek’s model, the solution to C b (r(t), t, T, S) in Eq. (12.11) is given
by the Jamshidian’s (1989) formula given in Chapter 11, which we now use below. Figure 12.8
below depicts the behavior of the price of the callable bond in Eq. (12.11), P callable (r, 0, T, S),
as a function of the short-term rate, r, when the exercise price, K = 0.65, and S = 10, T = 0.5,
κ = 0.2, σ = 0.03, θ̄ = 0.06 ∗ κ − λ, where λ, the unit risk-premium, equals −1.7146 × 10−2 .3
3 To evaluate Eq. (12.11), we make use of the closed-form solution for the bond price, given by P (r, t, T ) = eA(T −t)−B(T −t)·r ,
1−e−κ(T −t) σ2 2 1 1 σ 2
where the functions A and B are given by A (T − t) = −(T − κ
)r̄ − 4κ3
1 − e−κ(T −t) , r̄ = κ
θ̄ − 2 κ
and
1
B (T − t) = κ 1 − e−κ(T −t) .
355
12.3. Derivatives on corporate assets c
°by A. Mele
£
0.70
0.65
0.60
0.55
0.50
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
short-term rate
FIGURE 12.8. “Negative convexity”. Solid line: the price of a callable bond. Dashed line: the price
of a non-callable bond. The price of a callable bond exhibits negative convexity with respect to the
short-term rate.
As Figure 12.8 illustrates, the convexity of the non-callable bond price is destroyed by the
convexity of the price of the option embedded in the callable bond. Intuitively, as the short-term
rate gets small, callable and non-callable bond prices increase. However, the price of callable
bonds increases less because as the short-term rate decreases, bond prices increase and then, the
probability the issuer will exercise the option, at maturity, increases. This makes the risk-neutral
distribution of the callable bond price markedly shifted towards the value of the strike price,
K = 0.65, which entails a progressively lower decay rate for the bond price as the short-term
rate gets small.
12.3.2 Convertibles
We only consider corporate convertible bonds. Convertible bonds offer bondholders the option
to convert their bonds into shares of the firm. The option to convert can be exercized at any
time up to maturity. How do they work? By definition, the face value of the convertible is,
where CR is the conversion ratio, i.e. the number of shares this face value converts into, and
CP is the conversion price, i.e. the stock price implicitly defined by Eq. (12.12).
Typically, the bond is any like other fixed income instrument, with coupon payments, callable
features, credit risk, etc. Callable features are almost invariably embedded into this type of
contracts. The parity, or conversion value, is the value of the bond if the bondholders decide
to convert. It is defined as,
CV = CR ∗ S,
where S is the price of the common share. Not only is the convertible bond price affected by
interest rates, credit risk, timing risk, etc. The convertible bond price is also affected by the
movements of the underlying stock price. This is quite natural as there is a positive probability
356
12.3. Derivatives on corporate assets c
°by A. Mele
that the bond will “become” a share in the future. To emphasize this, we also say that convertible
bonds are hybrid instruments. The embedded option offers the bondholders the possibility to
obtain equity returns (not just bond returns) in good times, while offering protection against
the downside. As mentioned earlier, convertible bonds are almost always callable. The holder
can always convert the bond, once it has been called. The rationale behind callability is to
induce the bondholder to convert the bond earlier.
The pricing problem of convertible bonds has been intensively studied, theoretically. Inger-
soll (1977) provides the first theoretical article which lays down the foundations to rational
evaluation of convertible callable bonds. Let us define the dilution factor, denoted as γ, as the
fraction of common equity that would be held by the convertible bondowners if the entire issue
was converted. If there are nout shares outstanding, and the convertible bond can be exchanged
for n shares, then, in aggregate,
n
γ= .
n + nout
Let V be the market value of the firm and B conv (V, τ ; N) the aggregate value of the convertible
bond with maturity τ and balloon payment N. To simplify the presentation, we do not consider
callability issues. However, we shall provide some intuition about this issue later. Let us assume
that the stocks and the convertible bonds are the only two claims in the capital structure of
the firm. Since, after conversion, only the stocks will remain, then, the post-conversion value of
the convertible bonds is simply the conversion value of the convertible, i.e. γV . Moreover, we
have, for any τ ≥ 0,
γV ≤ B conv (V, τ ; N) ≤ V. (12.13)
The first inequality in (12.13) is simple to understand. Indeed, suppose that B conv (V, τ ; N) <
γV . Then, we can purchase the convertibles, convert them into shares and, finally, sell the shares
for γV . The second inequality follows by limited liability for equities, and the Modigliani-Miller
theorem.
At maturity, we have that,
Indeed, B̄ ≡ max {N, γV } is the value of the convertible, in case of no-default. Then, min{V, B̄}
is what the firm will pay, to the bondholders: V in case of default, and B̄ in case of no-default. It
is possible to show that it is never optimal to exercize the option to convert before the maturity.
Therefore, to price the convertible bond, we only need to be concerned with the risk-neutral
evaluation of the terminal payoff in Eq. (12.14).
We can re-express the terminal payoff in Eq. (12.14) in a manner that allows a better un-
derstanding of the issues underlying the exercise of the convertibles. In particular, we have
that,
B conv (V, 0; N) = min {V, max {N, γV }} = max {γV, min {V, N}} . (12.15)
Indeed, let B̂ ≡ min{V, N}. B̂ is what the firm is ready to pay, to the bondholders, if the
bondholders do not exercise the option to convert. Then, max{γV, B̂} is obviously the payoff
profile for the bondholders.
The terminal payoff in Eq. (12.15) illustrates very clearly that convertible bonds embed an
option to convert - on top of the plain vanilla non-convertible bond. Intuitively, at maturity, a
non-convertible bond is worth min {V, N}, and the option to convert is either worthless (in case
of non conversion) or γV − N (in case of conversion), i.e. it is max {γV − N, 0}. This intuition
357
12.3. Derivatives on corporate assets c
°by A. Mele
Eq. (12.16) shows that the current value of the convertible bond is the sum of the value of
a “straight” bond plus the value of γ options on the firm with strike price equal to N/γ.
Accordingly, let B (V, τ ; N) and W (V, τ ; N/γ) be the prices of the straight bond and the option
on the firm. We have,
We may use the Merton’s (1974) model to find the price of the straight bond, B (V, τ ; N). By
the results in Section 12.2, it is:
B (V, τ ; N) = V Φ (−d1 ) + Ne−rτ Φ (d2 )
log( V /N)+(r+ 12 σ 2 )τ √ (12.18)
d1 = √
σ τ
, d2 = d1 − σ τ
where σ is the instantaneous volatility of the asset value, r is the (constant) instantaneous
short-term rate, and Φ is the cumulative distribution of a standard normal. Similarly, we may
use the Black-Scholes model to compute the function W .
Eq. (12.18) reveals the intuitive property that as V gets large, then, B (V, τ ; N) ≈ Ne−rτ .
(The probability of default gets extremely tiny as the value of the assets is large.) Moreover, the
Black-Scholes model suggests that W (V, τ ; N/γ) ≈ V − e−rτ N/γ as V gets large. Therefore, by
Eq. (12.17), we have that, for large V , B conv (V, τ ; N) ≈ Ne−rτ + γ (V − e−rτ N/γ) = γV . Eq.
(12.18) also shows that for small values of V , B conv (V, τ ; N) ≈ 0.
To sum-up, the value of the convertible bond is less than the value of the firm, V , and larger
than the conversion value, γV . Moreover, it approaches γV , as the value of the firm gets large.
Figure 12.9 below illustrates the shape of the convertible bond price, as a function of the value
of the firm.
γV
Ne − rτ
Straight bond value
V
FIGURE 12.9. The value of a convertible bond
358
12.4. Risk shifting derivatives and structured products c
°by A. Mele
The value of a callable convertible bond is between the value of the straight bond and the
value of the convertible bond.
election, where the underlying are the Italian and Government bonds expiring in ten years, say.
A possible payoff to the CSO holder can be proportional to, (ITA/US − K), where ITA/US is
the ten year Italian-US spread in three months, and K is the strike spread.
CDS differ from TRS insofar as they provide protection against a credit event. TRS, instead,
provide protection against a loss in asset value, which is often triggered by market risk The
premium, assumed to be paid quarterly, on a CDS contract at time t, is obtained by equating
the expected discounted value of the premium paid over the life of the contract (i.e. at dates
t < t1 < t2 < · · · < tM ), where M = 4 · N, and N is the number of the years the CDS extends
to,
X4·N
Premiumt = e−r(ti −t) · CDSt (N) Pr {Survival at ti } ,
i=1
X
4·N
Protectiont = e−r(ti −t) · LGD (ti ) Pr {Default ∈ (ti−1 , ti )} ,
i=1
where r is the (constant) risk free rate, CDSt (N) is the premium paid every quarter, prevailing
at time t, and LGD (ti ) is the Loss-Given-Default at time ti , which for simplicity is assumed to
be constant, i.e. known at time t.
Equating Premiumt and Protectiont , and solving for CDSt , leaves:
P4N
i=1 e−r(ti −t) · LGD (ti ) Pr {Default ∈ (ti−1 , ti )}
CDSt (N) = P4N −r(t −t) .
i=1 e
i · Pr {Survival at ti }
This is a general formula we can use, once we have a model determining the probability of
default under the risk-neutral probability.
Using a reduced-form approach, we can find quite easily the quarterly premium (or spread)
CDSt (N). We have, denoting again with λ the instantaneous probability of default, that
Pr{Survival at ti } = e−λ(ti −t) , and that Pr{Default at any z ∈ (ti−1 , ti )} = e−λ(ti−1 −t) − e−λ(ti −t) .
Intuitively, if the name survives at ti (event Ei ), it must necessarily have survived at ti−1 (event
Ei−1 ), but the converse is not true: Ei ⊂ Ei−1 , and the complement of Ei to Ei−1 is nothing
but the event of default between ti−1 and ti .4
Substituting the previous probabilities into the expressions for Premiumt and Protectiont ,
P4N
e−r(ti −t) · LGD (ti ) Pr{Default ∈ (ti−1 , ti )}
i=1
CDSt (N) = P4N −r(t −t)
i=1 e
i · Pr{Survival at ti }
P4N −r(ti −t) ¡ ¢
i=1 e · LGD (ti ) e−λ(ti−1 −t) − e−λ(ti −t)
= P4N −(r+λ)(t −t) .
i=1 e i
4 Mathematically, ti
we have that Pr{Default at any z ∈ (ti−1 , ti )} = ti−1 Pr{Default at z}dz, where Pr{Default at z} =
λe−λ(z−t) dz.
360
12.4. Risk shifting derivatives and structured products c
°by A. Mele
For example, if LGD (ti ) is constant and equal to LGD for each ti , then,
where the approximation is obtained by making e−λ(ti−1 −t) − e−λ(ti −t) ≈ λe−λ(ti −t) ∆t. Naturally,
λ is the risk-neutral instantaneous probability of default for the security.
Note, Eq. (12.19) shows that the CDS premium is approximately the same as the instan-
taneous spread of a defaultable bonds, as explained in Section 12.2. This property is to be
expected, so to speak, as a purchase of a defaultable bond and protection on it is nothing
but a synthetic default-free bond. Therefore, there must be a no-arbitrage relation between
CDS spreads and defaultable bond spreads. But in general, Eq. (12.19) does not hold as the
assumptions made to achieve it (λ is constant, LGD is constant, r is constant, etc.) are quite
unrealistic. On the contrary, we often observe CDS spreads curves that increase with maturity.
Indeed, we may take interesting views. For example, buying CDS for 2Y and sell CDS for 3Y
is a view that default will not occur between the second and the third year from now.
A CDS index is a basket of credit entities in which the protection buyer, pays the same pre-
mium, called the fixed rate, on all the names in the index. Credit events are typically bound
to bankruptcy or delinquencies. After a credit event, the entity is removed from the index and
the contract goes through, although with a reduced notional amount, until expiration. While
CDS on single names are over-the-counter, CDS indexes are completely standardized and can
be more liquid, as historical data on bid-ask spreads show. In fact, it can be cheaper to hedge
a portfolio of CDS or bonds with a CDS index than it would be to buy many CDS to achieve
a similar effect. There exist two main indices: (i) CDX index, which contains North American
and Emerging Market companies; and (ii) iTraxx index, which contains companies from the
rest of the world
The following picture, taken from Fender and Hördahl (2007), illustrates the behavior of the
credit market risk appetite before the 2007 credit market turmoil.
361
12.4. Risk shifting derivatives and structured products c
°by A. Mele
FIGURE 12.10. Antonio Mele does not claim any copyright on this picture, which is taken from Fender
and Hördahl (2007). The picture has been put here for illustrative purposes only, and permission to
the authors shall be duly asked before the book will be published.
How did the authors estimate the price of risk? Consider the expected losses under the
actuarial, or physical probability for a given security. The counterpart to Eq. (12.19), under the
physical probability, is:
Expected LossesP ≡ λP · LGD · ∆t,
where λP is the physical instantaneous probability of default for a given security. Assume that
LGD is constant, to simplify. If the investors require compensation for the default event, then,
the actuarial losses should be less than the CDS spread, i.e. Expected LossesP < CDS, or,
λ > λP .
The risk-premium is defined as the difference between the actuarial losses, Expected LossesP ,
and the CDS premium, ¡ ¢
Risk-Premium = λ − λP · LGD · ∆t.
The price of risk is defined as the ratio of the Expected LossesP over the CDS spread,
λ
Price-of-Risk = .
λP
Early references to estimation methods are Duffie et al. (2005) and Amato (2005). Typically,
Expected LossesP are proxied by Moody’s KMV’s Expected Default Frequencies (EDFsTM ),
obtained through fully specified structural models for credit risk. The next pictures are taken
from Amato (2005).
362
12.4. Risk shifting derivatives and structured products c
°by A. Mele
FIGURE 12.11. Antonio Mele does not claim any copyright on this picture, which is taken from Amato
(2005). The picture has been put here for illustrative purposes only, and permission to the author shall
be duly asked before the book will be published.
FIGURE 12.12. Antonio Mele does not claim any copyright on this picture, which is taken from Amato
(2005). The picture has been put here for illustrative purposes only, and permission to the author shall
be duly asked before the book will be published.
The following picture illustrates the behavior of CDS indexes during approximately 20 years
before the 2007-2009 credit market turmoil.
363
12.4. Risk shifting derivatives and structured products c
°by A. Mele
FIGURE 12.13. Valuation of Financial Instruments Based on Implied Probability of Default. Antonio
Mele does not claim any copyright on this picture, which is taken from IMF (2008). The picture has
been put here for illustrative purposes only, and permission to the authors shall be duly asked before
the book will be published.
12.4.4.4 Continuous-time
We may relax the assumption the instantaneous intensity of default, λ, is constant. This inten-
sity is defined under the risk-neutral probability and can change either because the intensity
of default under the physical probability changes or because risk-appetite changes, or both.
We aim to examine the asset pricing implications of time-varying intensities, by first exploring
how probabilities of survival change in a simple setting, where we do not single out the reasons
leading to variations in λ.
First, we assume the instantaneous probability of default can only change at discrete times,
giving rise to random intensities λt , meaning that λt is the intensity of default in the time interval
[t − 1, t]. Let Ft be the information set as of time t. We assume that λt is Ft -measurable. What
is, then, the probability of survival of any given name in this case? We have, by Bayes’s theorem,
Pr {Surv at t}
Pr {Surv at t| Surv at t − 1} = . (12.20)
Pr {Surv at t − 1}
By a repeated use of Eq. (12.20),
Pr {Surv at t} = Pr {Surv at t| Surv at t − 1} Pr {Surv at t − 1}
= ···
Yt
= Pr {Surv at n| Surv at n − 1} . (12.21)
n=1
364
12.4. Risk shifting derivatives and structured products c
°by A. Mele
So we are left with finding Pr {Surv at n| Surv at n − 1}. Consider the following arguments.
If λn was not random and fixed at some λ̄n , then, Pr {Surv at n| Surv at n − 1} = e−λ̄n .
When λn is random, e−λn is the probability of survival, conditioned upon some particular
value the intensity could possibly take. Heuristically, then, Pr {Surv at n| Surv at n − 1} =
P −λn (s)
s∈S e Pr {s}, where λn (s) is, so to speak, the value λn would take in state s, Pr {s}
is the likelihood that state s occurs £ and, ¤ S is the set of all possible states. Therefore,
¯ finally,
−λn ¯
Pr {Surv at n| Surv at n − 1} = E e Fn−1 , where E denotes the expectation taken under
the risk-neutral probability. Inserting this result into Eq. (12.21), and using the Law of Iterated
Expectations, leaves: ∙ Pt ¸
− λ
Pr {Surv at t} = E e n=1 n
.
Under regularity conditions, we can easily extend the previous result in a continuous time
setting. For example, we may assume that the default risk-neutral intensity, λ (t), is solution
to: ¡ ¢ p
dλ (t) = φ λ̄ − λ (t) dt + σ λ (t)dW (t) , λ (0) = λ. (12.22)
where W is a standard Brownian motion under the risk-neutral probability, and φ, λ̄ and σ
are three positive constants. Under the parameter restrictions reviewed in Chapter 11, λ (t) is
always positive, and
∙ Rt ¸
− λ(s)ds
Psurv (λ, t) ≡ Pr {Surv at t} = E e 0 . (12.23)
Note that the model for the survival probability in Eqs. (12.22)-(12.23) has the same mathe-
matical structure as that leading to the price of a bond in the Cox, Ingersoll and Ross (1985),
as we reviewed Chapter 11. Therefore, a closed-form solution is available for Psurv (λ, t). It is
given by:
Psurv (λ, N) = Φ (N) e−B(N)λ ,
à ! 2φ2λ̄ ¡ ¢
1
2γe 2 (φ+γ)N σ
2 eγN − 1 p
Φ (N) = , B (N) = , γ= φ2 + 2σ 2 .
(φ + γ) (eγN − 1) + 2γ (φ + γ) (eγN − 1) + 2γ
(12.24)
More generally, we can build up a whole family of models with a closed-form solution, the
affine class reviewed in Chapter 11, by just assuming that:
λ (t) = λ0 + λ1 · y (t) , (12.25)
where λ0 is a constant, λ1 is a vector of constants, and y is a multivariate jump-diffusion process,
with drift and diffusion terms as in Section 11.3.6 of Chapter 11. This model is interesting, as
we can judiciously choose the components of y (t) which we suppose may affect the default
intensity. For example, some of them could be unobservable, and other could be observable,
and relate for example to the business cycle or even the structure of the firm.
So given any solution for the survival probability predicted by any of these affine models
when y (0) = y, Psurv (y, t) say, we can easily compute
Pr{Default ∈ (ti−1 , ti )} = Psurv (y, ti−1 ) − Psurv (y, ti ) . (12.26)
We can then look at the bond spreads and the CDS spreads implied by this modeling choice.
In Appendix 3, we show the price of a defaultable pure discount bond expiring in N years is:
RN
P (y, N) = e−rN Psurv (y, N) + 0 e−rt Pr{Default ∈ dt}Rec (t) dt, (12.27)
365
12.4. Risk shifting derivatives and structured products c
°by A. Mele
where Rec (t) denotes the recovery rate in case of default, supposed to be known. This evaluation
result is, naturally, consistent with a similar derivation provided in Section 11.3.7 of Chapter
11, although in this chapter we are emphasizing more “survival arguments.”
As for the CDS spreads,
P4N −r(ti −t)
e LGD (ti ) Pr{Default ∈ (ti−1 , ti )}
CDSt (N) = i=1 P4·N
−r(ti −t) Pr{Survival at t }
i=1 e i
P4N −r(ti −t)
e LGD (ti ) [Psurv (y, ti−1 ) − Psurv (y, ti )]
= i=1 P4N −r(t −t) ,
i=1 e Psurv (y, ti )
i
where N is, again, the number of years the CDS extends to.
Assume the short-term rate, r, is zero, and that loss-given-default is constant and equal to
LGD. Then, as shown in Appendix 3,
1 − Psurv (λ, 4N)
P (λ, N) = 1 − LGD · (1 − Psurv (λ, N)) , CDS0 (N) = LGD · P4N . (12.28)
i=1 Psurv (λ, ti )
230
170
225
160
220
215
150
210
140
205
130
200
195 120
0 2 4 6 8 10 0 2 4 6 8 10
years years
FIGURE 12.14. Spreads on bonds and CDS predicted by the affine model (12.22). The left panel
depicts the spreads when the currenty default intensity equals the long-run mean, λ = λ̄ = 0.04. The
right panel depicts the spreads in good times, i.e., when the current intensity of default takes a low
value, λ = 0.02.
The mechanism is that good times are followed by bad times, and so when λ = 0.02, we
expect default rates to rise in the future. As a consequence, spreads are increasing in maturity.
366
12.4. Risk shifting derivatives and structured products c
°by A. Mele
In a pricing context, the relevant probabilities of survival are obviously conditioned upon the
time of evaluation, time 0 say. For example, the probability of default in Eq. (12.26) is only con-
ditioned to the information we have at time zero. More generally, the probability of defaulting
in the interval of time (ti−1 , ti ), conditional upon survival at time t < ti−1 , is:
For t = ti−1 , and (ti−1 , ti ) small, the previous expression is known as the hazard rate, and coin-
cides with λ (t) dt, when λ (t) is deterministic. If λ (t) is not deterministic, simple computations
lead to:
Pr{Default ∈ (t, t + dt)| Survival at t} = EQλ [λ (t)] dt, (12.30)
where Qλ is a new probability, with Radon-Nikodym derivative given by:
¯ Rt
dQλ ¯¯ e 0 λ(s)ds
−
= . (12.31)
dQ ¯F0 Psurv (λ, t)
Accordingly, under Qλ , the state variables in Eq. (12.25) follow a diffusion process, with a drift
process tilted, due to this change of measure. For example, in the simple setting of Eq. (12.22),
we have that, for a fixed t,
p
dλ (s) = (B0 − B1t (s) λ (s)) ds + σ λ (s)dWλ (s) , s ∈ (0, t] , λ (0) = λ,
(12.32)
B0 = φλ̄, B1t (s) = φ + B (t − s) σ 2 , B (·) as in Eq. (12.24),
Appendix 4 provides a proof of these results, which to the best of our knowledge, are developed
here for the first time.
b = e−0.06 · ( |{z}
0.10 · 40 + 0.90
|{z} · 100) = 88.526.
≡Def. Prob ≡Surv. Prob
b
The yield is, naturally, − ln 100 = 12.19%.
367
12.4. Risk shifting derivatives and structured products c
°by A. Mele
A CDO can restructure the payments promised by the three bonds in a way that transforms
the riskiness and attractiveness of the initial assets. Consider the following example:
Face Value
= 300 Mezzanine tranche = 90
Junior tranche = 70
(i) the senior tranches receives the minimum between N1 and π̃. For example, if only one
bond defaults, π̃ = 240, and the senior tranche receives 140. If, however, three bonds
default, then, π̃ = 120, which is less than the nominal value, and the senior tranche
receives 120. So a quite severe loss is needed to erode the senior tranche claims.
(ii) The mezzanine tranche receives the minimum between N2 and the “left-over” from the
senior tranche. Finally, at the expiration, the junior tranche reveives the minimum between
N3 and the “left-over” from the senior and mezzanine tranches.
Synthetically, ½ ½ ¾ ¾
P
i−1
π i = min max π̃ − π k , 0 , Ni .
k=1
All we need, now, is to model the risk-neutral probability of default for each firm. Initially, we
assume the default events are independent across firms. Assume binomial distribution,
µ ¶
3 k
Pr (No. of Defaults = k) = p (1 − p)n−k , p = 10%, k ∈ {1, 2, 3} .
k
368
12.4. Risk shifting derivatives and structured products c
°by A. Mele
The price of each tranche is computed as the tranche payoff, averaged across states, discounted
at the safe interest rate. For example, the price of the mezzanine tranche is,
Note, now, that mezzanine and subordinated tranches yields the same as they each pay off
either their nominal value or zero in exactly the same states of nature.
The previous cases (with independent and perfectly correlated defaults) are extreme. It is
by far more relevant to see what happens when defaults are only imperfectly correlated. When
defaults are imperfectly correlated, there are no simple tables to use to come up with tranche
pricing. Instead, one might make use of simulations, described succinctly in the Appendix.
Figure 12.14 below, obtained through Monte Carlo simulations (described in the Appendix),
illustrates how the yield on each tranche changes as a result of a change in the default correlation
underlying the assets in the CDO.
369
12.4. Risk shifting derivatives and structured products c
°by A. Mele
0.3
0.25
Yield
0.2
0.15
0.1
0.05
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
default correlation
FIGURE 12.15. Yields on the three CDO tranches, computed in correspondence of the default corre-
lation among the assets in the structure
“Arbitrage” CDOs
Figure 12.14 illustrates how arbitrage CDOs work. The CDO has three assets yielding the
same, 12.19% (the dashed, black line in the picture). However, by restructuring the asset
base through a CDO, we can create claims (Senior and Mezzanine tranches) that yield less
than 12.19%, as they are considerably less risky than the asset base. Such an excess return,
(12.19% − Yieldtranche ), with Yieldtranche ∈ {Senior, Mezzanine}, is “made available” to the sub-
ordinated tranche/equity holders, once we account for management fees and expenses. Note,
the previous redistribution of risk always works when the default correlation is relatively low.
As the default correlation in the asset base increases, the situation may change dramatically,
as we now illustrate.
Correlation assumptions
In Figure 12.14, the subordinate yield decreases with default correlation. This happens because
we are assuming that the probability of default is fixed at p = 10% for each default correlation
ρ (say). As ρ increases, the probability of clustering events increases, which makes the Senior
and Mezzanine tranches relatively less valuable and, correspondingly, the Subordinate tranches
more valuable. A more appropriate model is one in which p increases as ρ increases, to capture
the fact that in bad times, both default correlation and probability of defaults increase as these
two things are intimately related (by, e.g., some common business cycle factor). Figure 12.16
below makes some comparative statics: with p = 20%, instead of p = 10%. The yields are
obviously larger for each tranche.
370
12.4. Risk shifting derivatives and structured products c
°by A. Mele
0.5
0.4
Yield
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
default correlation
FIGURE 12.16. Yields on the three CDO tranches, computed in correspondence of the default corre-
lation among the assets in the structure, with probability of default for each name equal to 20%.
Relax the assumption that the probability of default, p, and the default correlation, ρ are
independent. For simplicity, assume that ρ = 3.8 ∗ log (p + 1). The situation, then, changes
dramatically. Modeling matters a lot.
371
12.4. Risk shifting derivatives and structured products c
°by A. Mele
0.45
0.4
0.35
0.3
Yield
0.25
Senior
0.2 Mezzanine
Subordinate
Yield on each securitized asset
0.15
0.1
0.05
0.4 0.5 0.6 0.7 0.8 0.9 1
default correlation
FIGURE 12.17. Yields on the three CDO tranches, computed in correspondence of the default corre-
lation among the assets in the structure, with probability of default and default correlation related by
ρ = 3.8 ∗ log (p + 1).
In this contract, the owner of the 1st to default bears the risk of the first default that occurs in
the asset pool.
Finally, the owner of the 3rd to default bears the risk of the third default that occurs in the
asset pool:
Let us assume that default correlation is zero for simplicity. We have previously computed
the previous probabilities as:
From here, we can compute the yields as follows, Yield1st -to-def = − log (78.863/100) = 23.74%,
Yield2nd -to-def = − log (92.594/100) = 7.69%, and Yield3rd -to-def = − log (94.120/100) = 6.06%.
Definition I: VaR measures the worst expected loss over a given horizon under normal market
conditions at a given confidence level.
Definition II: We are (1 − p)% certain that a given portfolio will not suffer of a loss larger
than $W over the next N weeks, Pr (Loss < −W ) = p. That is, $VaRp = $W .
−W
373
12.5. The risk-management practice c
°by A. Mele
Definition III: We are (1 − p)% certain that a given portfolio will not experience a relative
loss larger than VaR
V0
p
over the next N weeks.
So in practice, we shall have to find the relative loss, p , for a given confidence p, as follows:
µ ¶
∆V VaRp
p = Pr < − p , where p = .
V0 V0
0.4
0.35
0.3
0.25
0.2
0.15
0.1
1%
VaR/V
0.05 0
0
−3 −2 −1 0 1 2 3
We are 99% certain that our portfolio will not suffer of a loss larger than −2.32 times its
current value over the next 2 weeks. We are 99% certain that our portfolio will not experience
a relative loss larger than −2.32 over the next 2 weeks.
374
12.5. The risk-management practice c
°by A. Mele
As a second example note that the previous assumption about the portfolio return was
extreme. Assume, instead, the porfolio return over the next 2 weeks, ∆V V0
, is normally distributed
2 2 2 2
with mean zero and variance σ = 52 σ year , where σ year is the annualized variance. We assume
that σ 2year = 0.152 . We have to re-scale the previous formulas, as follows. First, we introduce a
variable ˜ ∼ N (0, 1), i.e. ˜ is normally distributed with mean zero and variance = 1. So we can
write,
∆V d ¡ ¢
= ˜ · σ ∼ N 0, σ 2 ,
V0
and, hence,
0.01 = Pr (˜ < −2.32) = Pr (∆V < −2.32 · V0 · σ) ,
whence, VaRp = 2.32 · V0 · σ. We know the annualized variance, σ 2year = 0.152 , from which we
can derive the two-week standard deviation, σ 2 = 52 σ year ≈ 0.032 , and, hence, VaR
2 2
V0
p
= 2.32 · σ =
2.32 · 0.03 ≈ 7%. Thata is, we are 99% certain that our portfolio will not suffer of a loss larger
than 7% times its current value over the next 2 weeks. We are 99% certain that our portfolio
will not experience a relative loss larger than 7% over the next 2 weeks.
More generally, we may assume the porfolio return over the next 2 weeks, ∆V V0
, is normally
distributed with mean μ and variance σ 2 . In this case,
∆V d ¡ ¢
= μ + ˜ · σ ∼ N μ, σ2 ,
V0
and, hence,
0.01 = Pr (˜ < −2.32) = Pr (∆V < −V0 · (2.32 · σ − μ))
whence, VaRp = V0 · (2.32 · σ − μ). In practice, μ is very small if the horizon is as short as two
weeks.
The assumption that data are generated by a normal distribution does not describe asset
returns well. Chapters 10 and 11 explain that we need ARCH effects, stochastic volatility and
multifactor models. More generally, data can exhibit changes in regimes, nonlinearities and fat
tails. Fat tails are particularly important to understand, since this is what we’re interested
in after all. More in general, it is quite challenging to understand what the data generating
process is, especially in so far as we consider portfolios of assets. Asset returns and volatilities
are typically correlated, with correlation rising in bad times–correlation is stochastic.
We may make distributional assumptions but, then, these assumptions have to be carefully
assessed through, for example, backtesting (to be explained below). We may proceed with
nonparametric methods, and this is indeed a promising avenue, but with its caveats.
How do nonparametric methods work? These methods rely on an old and idea, which is to
estimate the data distribution through histograms. These histograms, then, can be readily used
to compute VaR. This approach is nonparametric in nature, as it does not rely on any model.
A more refined method replaces “rough” histograms with “smoothed” histograms, as follows.
Suppose to have access to a time series of data xn , which are drawn from a certain probability
375
12.5. The risk-management practice c
°by A. Mele
law, with density f (x). We may define the following estimate of the density f (x),
XN µ ¶
1 1 x − xn
fˆN (x) = K ,
N n=1 λ λ
where N is the sample size, and K is some symmetric function integrating to one. We may
think of fˆN (x) as a smoothed histogram, with window bin equal to λ. It is possible to show
that as N goes to infinity and λ goes to zero at a certain rate, fˆN (x) converges “in probability”
to f (x), for all x. But we are not done, since there are not obvious rules to choose λ and K?
The choice of λ is notoriously difficult. Unfortunately, the “bias,” fˆN (x) − f (x), tends to be
large exactly on the tails of f (x), which do represent the region we’re interested in. In general,
we can use Montecarlo simulations out of a smoothed density like this to compute VaR.
Nonlinearities
Finally, portfolios of assets can behave in a nonlinear fashion, especially when the portfolio
contains derivatives. In general, the value of a portfolio including M assets is,
X
M
P = αi Si ,
j=1
where αi is the number of the i-th asset in the portfolio, and Si is the price of the i-th asset
in the portfolio. Holding αi constant, the variation on the portfolio return is simply a weigthed
average of all the asset returns,
X
M XM µ ¶
∆P αi Si ∆Si
∆PT ≡ PT − Pt = αi ∆Si ⇐⇒ = ,
j=1
P j=1
P Si
where the variations relate to any time interval. Often, the prices Si are rational functions of
the state variables, or are interlinked through arbitrage restrictions. Use factors to determine
the risk associated with fixed income securities. When the horizon of the VaR is large, it is
unlikely that αi is constant. Typically, we shall need to go for numerical methods, based, for
example, on Monte Carlo simulations. So all in all, we need to have a careful understanding of
the derivatives in the book, and proceed with back testing and stress testing.
VaR as an appropriate measure of risk
There are technical difficulties related to the very definition of VaR. VaR suffers from some
statistic-theoretic foundation. VaR tells us that 1% of the time, losses will exceed the VaR figure,
but it does not tell us the entity of the loss. So we need to compute the expected shortfall. Any
risk measure should enjoy a number of sensible properties. Artzner et al. (1999) have noted a
number of properties, and showed that VaR does not enjoy the so-called subadditivity property,
according to which the sum of the risk measures for any two portfolios should be larger than the
risk measure for the sum of the two portfolios. VaR doesn’t satisfy the subadditivity property,
but expected shortfall does satisfy the subadditivity property.
12.5.2 Backtesting
How well the VaR estimate would have performed in the past? How often the loss in a given
sample exceeded the reference-period 99% VaR? If the exceptions occur more than 1% of the
376
12.5. The risk-management practice c
°by A. Mele
time, there is evidence that the models leading to VaR estimates are “misspecified”–a nice
word for saying “bad” models.
The mechanics of backtesting is as follows. Suppose the models leading to the VaR are
“good”. By construction, the probability the VaR number is exceeded in any reference period
is p, where p is the coverage rate for the VaR. Next, we go to our sample, which we assume
it comprises N days, and let M be the number of days the VaR is exceeded. We wish to test
whether the number of exceptions we observe in the sample “conforms” to the expected number
of exceptions based on the VaR. For example, it might be that the number of exceptions we
have observed, M, is larger than the expected number of exceptions, p · N. We want to make
sure this circumstance arose due to sample variability, rather than model misspecification. A
simple one-tail test is described below.
Let us compute the probability that in N days, the VaR is exceeded for M or more days.
Assuming exceptions are binomially distributed, this probability is,
X
N
N!
Πp = pk (1 − p)N−k .
k=M
k! (N − k)!
Then, we can say the following. If Πp ≤ 5% (say), we reject the hypothesis that the probability
of exceptions is p at the 5% level–the models we’re using are misspecified. If Πp > 5% (say),
we can not reject the hypothesis that the probability of exceptions is p at the 5% level–we
can’t say the models we’re using are misspecified. This test is reviewed in more detail by Hull
(2007, p. 208). Other tests are reviewed by Christoffersen (2003, p. 184).
(i) The first scenario is that in which each variable grows by the same amount it grew at
time 1,
v1
vT +1 = vT · .
v0
377
12.5. The risk-management practice c
°by A. Mele
(ii) The second scenario is that in which each variable grows by the same amount it grew at
time 2,
v2
vT +1 = vT · .
v1
(iii) · · ·
(iv) The T -th scenario is that in which each variable grows by the same amount it grew at
time T ,
vT
vT +1 = vT · .
vT −1
(v) The T scenarios are generated for all the market variables, which would give us an artificial
multivariate sample of T observations. We can use this sample for many things, including
VaR.
P (T ) = Φ (ξ PD ) ≡ PD,
where Φ−1 denotes the inverse of Φ. Conditionally upon the realization of the macroeconomic
factor F , the probability of default for each firm is,
µ −1 √ ¶
Φ (PD) − ρF
Pr (Default| F ) < Φ √ .
1−ρ
This is a quite good approximation to the default rate for a portfolio of a large number of assets
falling within the same class of risk.
We see that this conditional probability is decreasing in F : the larger the level of the common
macroeconomic factor, the smaller the probability each firm defaults. Hence, we can fix a value
378
12.5. The risk-management practice c
°by A. Mele
of F such that Pr (Default| F ) = Default rate is what we want. Note, the probability F is larger
than −Φ−1 (x) is just x!5 Then, with probability x, the default rate will exceed
µ −1 √ ¶
Φ (PD) + ρΦ−1 (x)
VaRCredit Risk (x) = Φ √ .
1−ρ
The reason why Basel II requires the term VaRCredit Risk (0.999)−PD, rather than just VaRCredit Risk ,
is that what is really needed here is the capital in excess of the 99.9% worst case loss over the
expected idiosyncratic loss, PD. Well functioning capital markets should already discount the
idiosyncratic losses.
Finally, Basel II requires banks to compute ρ through a formula in which ρ is inversely related
to PD. The formula is based on empirical research (see Lopez, 2004): for a firm which becomes
less creditworthy, the PD increases and its probability of default becomes less affected by market
conditions. Basel II requires banks to compute a maturity adjustment factor that takes into
account that the longer the maturity the more likely it is a given name might eventually migrate
towards a more risky asset class.
∂B (t)
+ λ (Rec − B (t)) = rB = 0, with B (T ) = N,
∂t
380
12.7. Appendix 2: Details on transition probability matrixes and pricing c
°by A. Mele
We are defining the constants λij as they were the counterparts of the intensity of the Poisson process
in Eq. (12.8). Accordingly, these constants are simply interpreted as the instantaneous probabilities
of migration from rating i to rating j over the time interval ∆t. Naturally, for each i, we have that
P N
j=1 P (∆t)ij = 1, and using into Eq. (12A.1), we obtain,
N
X
λii = − λij . (12A.2)
j=1
The matrix Λ containing the elements λij defined in Eqs. (12A.1) and (12A.2) is called the generating
matrix.
Next, let us rewrite Eq. (12A.1) in matrix form,
P (∆t) = I + Λ∆t.
T
Suppose we have a time interval [0, T ], which we chop into n pieces, so to have ∆t = n. We have,
µ ¶
n T n
P (T ) = P (∆t) = I + Λ .
n
For large n,
P (T ) = exp (ΛT ) ,
P (T Λ)n
the matrix exponential, defined as, exp (ΛT ) ≡ ∞ n=0 n! .
To evaluate derivatives “written on states,” we proceed as follows. Suppose Fi is the price of deriva-
tive in state i ∈ {1, · · ·, N }. Suppose the Markov chain is the only source of uncertainty relevant for
the evaluation of this derivative. Then,
∂Fi
dFi = dt + [FR̃ − Fi ],
∂t
where R̃ ∈ {1, · · ·, N }, with the usual conditional probabilities. In words, the instantaneous change in
the derivative value, dFi , is the sum of two components: one, ∂F ∂t dt, related to the mere passage of
i
time, and the other, [FR̃ − Fi ], related to the discrete change arising from a change in the rating.
Suppose that r = 0. Then,
N
E (dFi ) ∂Fi X ∂Fi X
rFi = 0 = = + λij [Fj − Fi ] = + λij [Fj − Fi ] ,
dt ∂t ∂t
j=1 j6=i
Fi (T − t) = xQi (T − t) + 1 − Qi (T − t) ,
The term indicated inside the integral of the second term, is indeed the density of default time at t,
because, ∙ Rt ¸
Pdefault by time t (λ) = 1 − E e− 0 λ(s)ds ,
such that by differentiating with respect to t, yields, under the appropriate regularity conditions, that
Pr{Default∈ (t, t + dt)} is just the term indicated in Eq. (12A.3). So Eq. (12.27) follows. Naturally,
∂
Pr{Default ∈ (t, t + dt)} = − Psurv (λ, t) .
∂t
Replacing this into Eq. (12A.3),
∙ RN ¸ Z N ∙ ¸
∂
P (y, N ) = e−rN E e− 0 λ(t)dt + Rec e−rt − Psurv (λ, t) dt
0 ∂t
Z N
¡ ¢
= 1 − LGD 1 − e−rN Psurv (λ, N ) − (1 − LGD) re−rt Psurv (λ, t) dt,
0
where the second equality follows by integration by parts and the assumption of constant recovery
rates. Setting r = 0, produces Eq. (12.28).
382
12.9. Appendix 4: Conditional probabilities of survival c
°by A. Mele
It is easy to show that the drift of Psurv is λ (τ ) dτ , such that by Itô’s lemma,
dη T (τ )
= − [−Vol (Psurv (λ (τ ) , τ , T ))] dW (τ ) ,
η T (τ )
where,
∂
Psurv (λ (τ ) , τ , T ) p p
−Vol (Psurv (λ (τ ) , τ , T )) ≡ − ∂λ σ λ (τ ) = B (T − τ ) σ λ (τ ),
Psurv (λ (τ ) , τ , T )
where the second line follows by the closed-form expression of Psurv in Eq. (12.24). Therefore, Wλ (τ )
is a Brownian motion under Qλ , where
p
dWλ (τ ) = dW (τ ) + B (T − τ ) σ λ (τ )dτ ,
383
12.10. Appendix 5: Details on CDO pricing with imperfect correlation c
°by A. Mele
where F is a common factor among the three names, zi is an idiosynchratic term, and F ∼ N (0, 1),
zi ∼ N (0, 1). Finally, ρ ≥ 0 is meant to capture the default correlation among the names, as follows.
Assume that the risk-neutral probability each firm defaults, by T , is given by,
Qi (T ) = Φ (ξ 0.10 ) ≡ 10%,
where Φ is the cumulative distribution of a standard normal variable. That is, by time T , each firm
defaults any time that,
xi < ξ 0.10 ≡ Φ−1 (10%) .
Therefore, ρ is the default correlation among the assets in the CDO.
We can now simulate Eq. (12A.4), build up payoffs for each simulation, and price the tranches by
just averaging over the simulations, as explained below. Naturally, the same simulation technique can
be used to price tranches on CDOs with an arbitrary number of assets. Precisely, simulate Eq. (12A.4),
and obtain values x̃i,s , s = 1, · · ·, S, where S is the number of simulations and i = 1, 2, 3. At simulation
no. s, we have
x̃1,s , x̃2,s , x̃3,s , s ∈ {1, · · ·, S} .
We use the previously simulated values as follows:
• For each simulation s, count the number of defaults across the three names, defined as the
number of times that x̃i,s < ξ 0.10 , for i = 1, 2, 3. Denote the number of defaults as of simulation
s with Def s .
• For each simulation s, compute the total realized payoff of the asset pool, defined as,
• For each simulation s, compute recursively the payoffs to each tranche, π i,s ,
½ ½ ¾ ¾
P
i−1
π i,s = min max π̃ s − π k,s , 0 , Ni ,
k=1
where Ni is the nominal value of each tranche (N1 = 140, N2 = 90, N3 = 70).
1 PS 1 PS 1 PS
Price Senior = e−r π 1,s , Price Mezzanine = e−r π 2,s , Price Junior = e−r π 3,s .
S s=1 S s=1 S s=1
Note, the previous computations have to be performed under the risk-neutral probability Q. Using
the probability P in the previous algorithm can only be lead to something useful for risk-management
and VaR calculations at best.
384
12.10. Appendix 5: Details on CDO pricing with imperfect correlation c
°by A. Mele
References
Amato, J. D. (2005): “Risk Aversion and Risk Premia in the CDS Market.” BIS Quarterly
Review, September, 55-68.
Anderson, R. W. and S. Sundaresan (1996): “Design and Valuation of Debt Contracts.” Review
of Financial Studies 9, 37-68.
Artzner, P., F. Delbaen, J.-M. Eber, and D. Heath (1999): “Coherent Measures of Risk.”
Mathematical Finance 9, 203-228.
Berndt, A., R. Douglas, D. Duffie, M. Ferguson and D. Schranz (2005): “Measuring Default
Risk-Premia from Default Swap Rates and EDFs.” BIS Working Papers no. 173.
Black, F. and J. Cox (1976): “Valuing Corporate Securities: Some Effect of Bond Indenture
Provisions.” Journal of Finance 31, 351-367.
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Cox, J. C., J. E. Ingersoll and S. A. Ross (1985): “A Theory of the Term Structure of Interest
Rates.” Econometrica 53, 385-407.
Fender, I. and P. Hördahl (2007): “Overview: Credit Retrenchement Triggers Liquidity Squeeze.”
BIS Quarterly Review (September), 1-16.
Hull, J. C. (2007): Risk Management and Financial Institutions. Pearson Education Interna-
tional.
International Monetary Fund, (2008): Global Financial Stability Report. April 2008.
Jamshidian, F. (1989): “An Exact Bond Option Pricing Formula.” Journal of Finance 44,
205-209.
Jarrow, R. A., D. Lando and S. M. Turnbull (1997): “A Markov Model for the Term-Structure
of Credit Risk Spreads.” Review of Financial Studies 10, 481-523.
Leland, H. E. (1994): “Corporate Debt Value, Bond Covenants and Optimal Capital Struc-
ture.” Journal of Finance 49, 1213-1252.
Lopez, J. (2004): “The Empirical Relationship Between Average Asset Correlation, Firm Prob-
ability of Default and Asset Size.” Journal of Financial Intermediation 13, 265-283.
385
12.10. Appendix 5: Details on CDO pricing with imperfect correlation c
°by A. Mele
Mele, A. (2003): “Fundamental Properties of Bond Prices in Models of the Short-Term Rate.”
Review of Financial Studies 16, 679-716.
Merton, R. C. (1974): “On the Pricing of Corporate Debt: The Risk-Structure of Interest
Rates.” Journal of Finance 29, 449-470.
Vasicek, O. (1987): “Probability of Loss on Loan Portfolio.” Working paper KMV, published
in: Risk (December 2002) under the title “Loan Portfolio Value.”
386
13
Financial engineering and fixed income securities
13.1 Introduction
This chapter lies down foundational issues relating to financial engineering for fixed income
securities. Fixed income securities can be particularly complex, as outlined in the previous
two chapters. Many instruments in the fixed income markets differ substantially from those in
the remaining portions of the capital markets. For example, a simple instrument such a pure
discount bond is very difficult to price. Intuitively, the price of a pure discount bond reflects
the time value for money. It is related to the intertemporal preferences and beliefs of the
market participants, which are unobservable. The situation is different in the case of traditional
“relative pricing”, in which we price a number of assets given the price of some other assets,
while ensuring that there are no arbitrage opportunities “left on the table”. In this case, we
can evaluate derivatives without reference to any preferences or beliefs. The Black & Scholes
formula, for example, is a preference free formula.
available to price fixed income products. The general principles underlying the APT are still
the same, though.
The previous procedure can be generalized to the case in which “some maturity is missing”.
The resulting algorithm is known as the bootstrap, which is described next.
13.2.2 Bootstrapping
Bootstrapping proceeds as follows. Let Bi be the price of a bond paying off coupons at the
sequence of dates t1 , t2 , · · · , ti and a principal of $1 at ti . Let Pi the zero maturing at ti . Then,
(i) The equation B1 = (C11 + 1) P1 implies that we can extract the zero P1 as follows,
B1
P1 = 1+C11
.
(ii) Given the equation (C22 + 1) P2 + C21 P1 = B2 , and the previously computed P1 , we
proceed to extract the zero P2 as follows, P2 = B2C−C 21 P1
22 +1
.
Pn−1
Bn − Cni Pi
(iii) In general, we extract the zero Pn as follows, Pn = i=1
Cnn +1
.
(iv) The previous steps work if we have an ordered number of bonds and all of the maturity
dates. Indeed, the previous procedure boils down to the computation of the solution of
Eq. (13.2). When some of the maturity dates are not available, we replace the required
coupon rate Cni at time ti with a linear interpolation Ĉni between the coupon Cn,i−1 at
time ti−1 and Cn,i+1 at time ti+1 , as follows,
ti+1 − ti ti − ti−1
Ĉni = Cn,i−1 + Cn,i+1 .
ti+1 − ti−1 ti+1 − ti−1
The effects of the interpolation should be “visible” near the missing maturitites.
where the ai are the parameters. Cubic splines are polynomials up to the third order, and
are veryPpopular. The parameters ai can be estimated by minimizing the sum of the squared
errors, N 2
i=1 i . A well-known pitfall of polynomials is that a high k might imply that while the
polynomial approximation works reasonably well near the observed maturities, it may exhibit
389
13.3. Duration, convexity and asset liability management c
°by A. Mele
an erratic behavior in between. To avoid this problem, we can use local polynomials, which are
low-order polynomials (typically splines) fitted to non-overlapping subintervals.
Naturally, we may also want to parametrize the spot rates, R (T ), as polynomials. Alterna-
tively, Nelson and Siegel (1987) propose the following parametrization,
µ ¶ µ ¶
1 − e−λT 1 − e−λT −λT
R (T ) = β 1 + β 2 + β3 −e ,
λT λT
where β i and λ are the parameters. These coefficients may be given an interpretation, in terms
of the factors driving the yield curve, reviewed in Chapter 11. The coefficient β 1 governs the
level of the yield curve. The coefficient β 2 relates to the slope, as an increase in this coefficient
increases short yields more than long yields. The coefficient β 3 shapes the curvature, as an
increase in this coefficient has little effect on very short and very long yields, but increases the
middle of the yield curve. Moreover, the coefficient λ controls the exponential decay of the yield
curve: small values of λ translate to slow decay and can better fit the curve at long maturities;
large values of λ, instead, lead to a fast decay, which helps fit the short-end of the yield curve.
Finally, λ determines where the loading on β 3 achieves its maximum. Diebold and Li (2006)
have used this setting to estimate β i for each date, and then used these estimated time series
of β i to forecast future values of β i through vector autoregressions and, then, the future yield
curve.
X
n
Cti 1
B (y; T ) = ti + .
i=1
(1 + y) (1 + y)T
This function aims to “mimic” how the market price B (T ) would behave if the YTM ŷ changed
to some value y. Naturally,
B (ŷ; T ) = B (T ) .
Motivated by the previous remarks, we can define a measure of risk of the bond based on
the sensitivity of the bond price with respect to changes in y. Economically, we are trying
to answer the following question: What happens to the bond price once we perturb the one
rate ŷ that discounts all the payoffs? Mathematically, this sensitivity is the first partial of the
“bond-pricing” formula B (y; T ) with respect to y,
" n #
1 X ti · Ct T · 1
i
By (y; T ) = − +
1 + y i=1 (1 + y)ti (1 + y)T
∂
where the subscript denotes a partial derivative, i.e. By (y; T ) = ∂y B (y; T ). Graphically, this
sensitivity measure By (y; T ) is the tangent to the price-yield relation, as shown in Figure 13.1
below.
390
13.3. Duration, convexity and asset liability management c
°by A. Mele
Bond price
YTM
FIGURE 13.1. The bond price-yield relation (solid line), its first-order approximation (duration) and
its second-order approximation (convexity).
13.3.1 Duration
We define the “Macaulay duration” as,
−By (y; T ) X n
DMac ≡ (1 + y) = ω ti · ti + ω̂ T · T,
B (y; T ) i=1
where
Cti / (1 + y)ti 1/ (1 + y)T
ωti = ; ω̂ T = .
B (y; T ) B (y; T )
In words, the Macaulay duration is a weighted average of the payment dates. The weights ω ti
are the discounted coupons at the various payment dates, Cti / (1 + y)ti , related to the current
market value of these coupons, i.e. the bond price B (y; T ) when the YTM is y. That is, the
weights are the proportionsPof the bond’s present value that is attributable to the payoff at
date t. The weights satisfy ni=1 ω ti + ω̂T = 1. Therefore, DMac ≤ T . The Macaulay duration is
a measure of how far in the future the bond pays off. For zeros, DMac = T .
For small y, DMac (y) is simply the semi-elasticity of the bond price with respect to the YTM.
This semi-elasticity is also referred to as “modified duration”,
−By DMac
D≡ = .
B 1+y
³ ´2
−Byy By
A simple computation reveals that the modified duration, D, satisfies: ∂D∂y
= B
+ B
.
Therefore, the modified duration is decreasing in the YTM when the bond price is sufficiently
convex in the YTM, which is surely the case for long-term maturity dates.
Interestingly, the modified duration is increasing in the YTM when the bond price is concave
in the YTM, a property that arises for callable bonds and mortgage-backed securities (MBS,
henceforth). Intuitively, the incentives to proceed to early repayments “kick in” as the YTM
decreases, which makes the duration of the MBS decrease.
391
13.3. Duration, convexity and asset liability management c
°by A. Mele
The Macaulay duration for continuously compounded rates is even simpler to compute. First,
define the continuously compounded YTM as the single number x̂ such that
X
n
B(x̂; T ) = cti e−x̂·ti + e−x̂·T ,
i=1
where B(x̂; T ) is the market price of a bond paying off the principal of one at maturity and the
stream of payoffs cti . Next, consider, the function x 7→ B (x; T ). Compute the semi-elasticity of
the bond price B (x; T ) with respect to the continuously compounded YTM x,
P
n
cti ti e−x·ti + T · e−x·T X
n
−Bx (x; T ) i=1
= = wti · ti + ŵT · T,
B (x; T ) B (x; T ) i=1
c e−x·ti −x·T
where Bx (x; T ) = ∂B(x;T ) ti
, wti = B(x;T e
and ŵT = B(x;T . Note, the weights are such that
Pn ∂x ) )
i=1 wti + ŵT = 1. Therefore, the “Macaulay duration” for continuously compounded rates
is equal to the semi-elasticity of the bond price with respect to the continuously compounded
YTM x.1 This result may simplify some calculations.
13.3.2 Convexity
Convexity measures how the sensitivity, By , changes with y. Mathematically, convexity is related
to the second partial of the bond price with respect to y, Byy . If the second partial, Byy , is
positive, then, the interest rate sensitivity declines as y increases (see Figure 13.1). This is
∂
because ∂y (−By ) = −Byy < 0. Formally, convexity is defined as,
Byy
C≡ .
B
We may, then, consider the following expansion of the bond price:
∆B 1
≈ −D · ∆y + C · (∆y)2 .
B 2
That is, for very “convex securities”, duration may not be a safe measure of return, as also
shown in Figure 13.1
We can use duration to assess how exposed a bond portfolio is to movements in the interest
rates. We can then “immunize” a portfolio of bonds to changes in the interest rates. Duration
is relevant for asset-liability management. For example, pension funds have known streams of
liabilities that must be matched by the assets they hold. In words, the duration of the assets
must equal the duration of the liabilities. In the UK, pension funds must mark-to-market the
1 Mathematically, we could have obtained this result in a straightforward manner, as follows. Define the bond price function as
B (y (x)), where by definition, y (x) = ex − 1. Hence, Bx (y (x)) = By (y (x)) y 0 (x) = By (y (x)) ex = By (y (x)) (1 + y). It follows
−By (1+y) −Bx
that DMac = B
= B
.
392
13.3. Duration, convexity and asset liability management c
°by A. Mele
liabilities. Therefore, one objective of these funds is to “immunize” their liabilities against
movements in the interest rates.
Alternatively, consider the following basic example. A bank borrows $100 at 2% for a year
and lends this money at 4% for 5 years, where the higher rate compensates for many things
such as risk, the bank’s market power, etc. Assuming that the bank’s borrower does not default,
in the first year, the bank generates profits equal to $(4% − 2%) · 100 = 2, according to its
books. However, the right computation to make should not relate to past market (interest rate)
conditions, but to the current ones. Suppose for example that in one year, the interest rate
for borrowing raises from 2% to 5%. This is of course a bit unrealistic, but it gives the idea
5
of where the action is. In this case, The market value of the assets is: 100·1.04
1.054
= 100.09. The
market value of the liabilities is, of course, 100 · 1.02 = 102. The bank’s problem is, of course,
a duration mismatch.
Let us consider a more substantive example, based on asset-liability management for pension
funds. We consider the following extreme example. In 30 years from now, a pension fund is due
to deliver $100,000 to some future retiree. Suppose the current market situation is such that
the yield curve is flat at 4%, such that the market value of this liability is $100, 000·(1.04)−30 =
$30, 832. Accordingly, the would-be retiree invest $30.832 in the pension fund. So we have the
following situation:
Cash Pensions
$30, 832 $30, 832
Suppose, now, that the pension fund does not invest this cash. This is of course inefficient, but
it is precisely the point of this simple exercise to see why the strategy is inefficient.
Consider two extreme cases, occuriring under two scenarios underlying developments in the
fixed income market. In one week,
(i) Scenario ↑: the yield curve shifts up parallely to 5%. Accordingly, the value of the liability
for the pension fund is: $100, 000 · (1.05)−30 = 23, 138.
Cash Profit
$30, 832 $7, 694
Pensions
$23, 138
(ii) Scenario ↓: the yield curve shifts down parallely to 3%. Accordingly, the value of the
liability for the pension fund is: $100, 000 · (1.03)−30 = 41, 199.
Cash Loss
$30, 832 −$10, 367
Pensions
$41, 199
393
13.3. Duration, convexity and asset liability management c
°by A. Mele
Therefore, a drop in the yield curve results in a loss for the pension fund: when interest rates
go down, the pension fund faces a challenging situation as it has to honour its obligations in
30 years, but the financial market “yields less” than one week ago.
Naturally, the pension fund would face an opposite situation were interest rates to go up. In
some countries, we do not like pension funds to experience volatility. The previous volatility
arises simply because the pension fund, receives $30, 832, and then it just put this money
“under the pillow.” The most efficient way to kill volatility is, of course, to invest $30, 832 in
a 30 bond as soon as we receive this money–at the market conditions of 4%. This is perfect
hedging! But, we do not necessarily have access to such a bond. How do we proceed, then?
We now develop examples that illustrate how to deal systematically with issues relating to
asset-liability management.
13.3.3.2 Hedging
Let us consider a portfolio of two bonds with different durations. Its value is given by,
V = B1 (ŷ1 ) θ1 + B2 (ŷ2 ) θ2 ,
where B1 (ŷ1 ) and B2 (ŷ2 ) are the market value of the bonds, ŷ1 and ŷ2 are the YTM on the
bonds and, finally, θ1 and θ2 are the quantities of bonds in the portfolio. Let us consider a small
change in the two YTM ŷ1 and ŷ2 . We have,
The question is: How should we choose θ1 and θ2 so as to make the value of the portfolio remain
constant after a change in ŷ1 and ŷ2 ?
Let us assume a parallel shift in the term structure of interest rates. In this case, dŷ1 = dŷ2 .
The portfolio is said to be immunized if its value V does not change as ŷ1 and ŷ2 change, i.e.
dV = 0, which is true when,
D (ŷ2 ) B2 (ŷ2 )
θ1 = − θ2 . (13.3)
D (ŷ1 ) B1 (ŷ1 )
A useful interpretation of this portfolio is that we may be holding a bond with some duration,
say we hold θ2 units of the second bond. Given these holdings, we may wish to sell another
bond, possibly with a lower duration, to hedge against movements in the price of the bond we
hold.
Alternatively, we can think of the second asset as a liability the value of which fluctuates after
a change in the interest rates. Then, we may wish to purchase some asset to hedge against the
liability. Mathematically, θ2 < 0 and θ1 > 0. Moreover, Eq. (13.3) reveals that the number of
assets to hold to hedge against the liability is high if the ratio of the two durations of the assets,
D (ŷ2 )/ D (ŷ1 ), is large. In this case, the hedging position is obviously inefficient. Asset-liability
management, and “immunization”, is costly when we hedge high-duration liabilities with low
duration assets. We illustrate these cases in a few examples, developed in the Appendix.
What happens when bond prices have “negative convexity”? In later sections, we shall see
that callable bonds and mortgage-backed-securities (MBS, henceforth) can be concave in the
YTM! As we explained earlier, early repayments are likely to occur as the YTM decreases,
which entails two inextricable consequences: (i) the price of the MBS “increases less” than a
394
13.4. Foundational issues on interest rate modeling c
°by A. Mele
conventional bond price after a decline in the YTM, especially when the YTM is low; (ii) the
duration of the MBS decreases as the YTM decreases.
MBS may be responsible of financial turmoil. The mechanism is well-known. Institutions
that hold MBS typically short conventional bonds for hedging purposes. But the MBS duration
increases as interest rates increase. Therefore, an interest rate increase can lead these institutions
to short additional conventional bonds, which worsens liquidity and leads to a further increase
in the interest rates, thereby feeding a vicious circle. Perli and Sack (2003) estimate that in
2002 and 2003, this mechanism may have amplified the volatility of the long-term US rates by
a factor between 15% and 30%.
(i) Create a probabilistic representation of how the price develops over time using a tree-
information structure.
(ii) For example, at the time of evaluation, we observe the state. In the next period, there
can be two mutually exclusive states of the world: (a) the state “up,” occurring with
probability p; and (b) the state “down,” occurring with probability 1 − p.
(iii) After two periods, there can be three mutually exclusive states of the world, as in the
following diagram. We label the tree in this diagram a “recombining” tree, to emphasize
that the “up & down” and the “down & up” nodes are the same.
“u p ”, “u p ”
p
“u p ”
p state
1 -p
“u p ”, “d o w n ”
Today “d o w n ”, “u p ”
p
1 -p “d o w n ”
state
1 -p
“d o w n ”, “d o w n ”
F irst p erio d S eco n d p erio d
The previous diagram can be used to price options written on stocks. The stock price unfolds
through the branches of the tree. Then, we figure out the no-arbitrage movements of the option
price along the tree. Suppose we wish to price an option written on a zero, a 3 Year zero say.
Can we apply the same methodology to price the option? The answer is no, and the reason is
that we can not exogenously “track” the movements of the prices of the zero, as in the case
of the stock price. Instead, after one year, the 3 Year zero becomes a 2 Year zero, i.e. a quite
different asset.
395
13.4. Foundational issues on interest rate modeling c
°by A. Mele
The trick, here, is to model the movements of the yield curve. There are two approaches. In
the first approach, we model the dynamics of the short-term rate, defined as the interest rate
on a loan with maturity equal to the time intervals in the tree. The resulting model, which in
Chapter 11 we called model of the short-term rate, has implications in terms of the movements of
the entire term-structure. This approach gives rise to evaluation formulae in which the current
prices of the zeros predicted by the model are not necessarily equal to the market prices. We
develop this approach in this section. In a second approach, which in Chapter 11 we called
no-arbitrage, or calibration, approach, we model the dynamics of the entire term-structure.
This approach gives rise to option evaluation formulae in which the current prices of the zeros
predicted by the model are equal to the market prices. We develop this approach in the last
sections of this chapter.
Suppose, also, that two zeros with distinct maturities are available for trading. A money mar-
ket accounting technology is also available (MMA, in the sequel). Investing $1 in the MMA
generates $1·(1 + r) in the second period. We aim to derive an evaluation formula for the zero
based on the previous probabilistic model for the short-term rate dynamics. The general idea
is to build up a portfolio that contains one zero and the MMA. We shall make the value of this
portfolio in the second period replicate the value of the zero we wish to price. By no-arbitrage,
then, the value of the portfolio in the first period must equal the value of the zero we wish
to price, and we shall be done. The appendix develops the arguments, and shows that in the
absence of arbitrage, there is a constant λ, such that the following relation holds true:
∆P (r̃, T )
Ep [P (r̃, T )] − (1 + r) P (r, T ) = · Vol (r̃ − r) · λ
|{z} (13.4)
| ∆r̃ {z } = unit risk premium
= volatility of the price
where Ep [P (r̃, T )] denotes the expectation of the bond price under the probability p.
As we explained in previous chapters, Eq. (13.4) is an APT relation. It says that the excess
return on the zero equals the volatility of its price multiplied by the unit price of risk. We call
the term,
∆P (r̃, T )
· Vol (r̃ − r)
∆r̃
“price volatility” because it measures the amplitude of the price variation due to changes in the
short-term rate in the future, ∆P∆r̃(r̃,T )
, i.e. the “price-sensitivity”, where this price sensitivity is
normalized by the volatility of the short-term rate, Vol(r̃ − r).
396
13.4. Foundational issues on interest rate modeling c
°by A. Mele
P =1
r = 6%
P = 1 / 1 .0 6 = 0 .9 4 3 3
P =1
r = 5%
r = 4%
r = 4% P = 1 / 1 .0 4 = 0 .9 6 1 5
r = 3% P =1
r = 2%
P = 1 / 1 . 0 2 = 0 .9 8 0 4
P =1
t=0 t =1 t=2 t=3
Eq. (13A.7), or equivalently, (13.4), can now be cast in a format that we can use to make it
more “operational”. After rearranging terms, we obtain:
(p − λ) P (r+ , T ) + [1 − (p − λ)] P (r− , T ) Eq [P (r̃, T )]
P (r, T ) = = (13.5)
1+r 1+r
where q ≡ p − λ is the risk-neutral probability.
A few considerations. We “expect” that λ < 0 because bond prices are decreasing in the
short-term rate here. Then, q ≡ p − λ > p.2 Hence, the risk-neutral probability of an upward
movement of the short-term rate, q, is higher than the true probability, p. An investor who longs
a bond, is concerned by an increase of the short-term rate in the future and, hence, “corrects”
the true probability p by assigning a higher risk-adjusted probability to the “upward” state.
2 To be able to interpret q as a probability, we must have that (i) q ≡ p − λ > 0 ⇔ −λ > −p and q ≡ p − λ < 1 ⇔ −λ < 1 − p.
Also shown in the previous diagram is the price of a hypothetical 3 Year zero, P , at time
t = 3 and at time t = 2. At time t = 3, the expiration date, P = 1 in all states of nature. At
time t = 2, the price P is P (r, T ) = Eq [P (r̃, T )]/ (1 + r) = 1/ (1 + r), for r = 6%, 4% and 2%.
The issue, now, is how to compute the price of the zero in correspondence of the remaining
nodes. We should use the formula, P (r, T ) = Eq [P (r̃, T )]/ (1 + r) to populate the tree, but
we do not know p, λ, and q. Suppose we “estimate” p and λ. In this case, we compute q as
q = p − λ, as in Eq. (13.5). (For example, p = 20% and λ = −30%, so that q = 50%.) Suppose
that we come up with q = 12 . Then, the following diagram gives the price of the zero at all the
nodes as of time t = 1, and at the evaluation time t = 0.
P =1
q = 12 P = 0.9433
r = 5%
P = ( 0.9433 + 12 0.9615) / 1.05 = 0.9070
1
2
P =1
q = 12
r = 4% P = 0.9615
P = ( 0.9070 + 12 0.9427 ) / 1.04 = 0.8893
1
2
q = 12
P =1
r = 3%
P = ( 12 0 .9615 + 12 0.9804 ) / 1 .03 = 0.9427
P = 0.9804
P =1
t =0 t =1 t =2 t =3
So the price of the 3 Year zero equals 0.8893. Next, consider a European call option written
on the 3 Year zero, with expiration date equal to 2 and strike price K = 0.95. The following
diagram gives the value of the option predicted by the model at each node of the tree.
P = 0.9433, K = 0.9500
q = 12
C = max{P − K ,0} = 0
r = 5%
C = ( 12 ⋅ 0 + 12 0.0115) / 1.05 = 0.0055
q = 12
r = 4% P = 0.9615, K = 0.9500
C = ( 12 0.0055 + 12 0.0203 ) / 1.04 = 0.0124 q= 1
C = max{P − K,0} = 0.0115
2
r = 3%
C = ( 12 0.0115 + 12 0.0304 ) / 1.03 = 0.0203
P = 0.9804, K = 0.9500
C = max{P − K,0} = 0.0304
t =0 t =1 t =2
398
13.4. Foundational issues on interest rate modeling c
°by A. Mele
The model predicts that the current price of the call option is 0.0124.
13.4.2.1 Calibration
The model we are dealing with predicts that the price of the 3 Year zero is equal to 0.8893.
However, there is no guarantee that this model-implied price equals the market price of the 3
Year zero. Suppose, instead, that the market price of the 3 Year zero, P$ say, equals 0.8700.
What should we do to make the model-implied price of the 3 Year zero equal to the market
price? The question is important: how can we trust an option pricing model that is not even
able to pin down the initial market value of the underlying zero?
To make the model-implied price of the 3 Year zero equal to the market price, P$ = 0.8700,
we can not take the risk-neutral probability q as given, i.e. independent of the observed price
P$ = 0.8700, as we did before. Rather, we should calibrate the probability q, as follows,
1
P$ = 0.8700 = [q · P1 (5%) + (1 − q) · P1 (3%)] (13.6)
1.04
where P1 (5%) and P1 (3%) are the prices of the zero at time t = 1, in the events that the
short-term rate is up to 5% or down to 3%.
The previous equation follows, again, by Eq. (13.5). But here, the unknown is not the price,
which is instead given by the market price. Rather, we are looking for, or calibrating, the
probability q that makes the RHS of Eq. (13.6) equal to its LHS. Naturally, we need to compute
the prices of the zeros P1 (5%) and P1 (3%). These prices can be found by another application
of Eq. (13.5), as follows,
By replacing the previous expressions for P1 (5%) and P1 (3%) into Eq. (13.6), we obtain,
µ ¶
1 q · 0.9433 + (1 − q) · 0.9615 q · 0.9615 + (1 − q) · 0.9804
P$ = 0.8700 = q· + (1 − q) · .
1.04 1.05 1.03
This is a nonlinear equation in q, that we can easily solve with a computer to obtain, q = 0.8779.
Hence,
The next diagram depicts the implied binomial tree, i.e. the tree that results after matching
the model-implied price of the 3 Year zero to the market price, P$ = 0.8700.
399
13.4. Foundational issues on interest rate modeling c
°by A. Mele
P =1
P = 0.9433
q = 0.8779
r = 5%
P1 (5% ) = [q 0.9433 + (1 − q )0.9615 ] / 1 .05 = 0.9005
P =1
q = 0.8779
r = 4%
P = 0.9615
P$ = 0.8700 = [qP1 (5% ) + (1 − q )P1 (3% )] / 1.04
q = 0.8779
P =1
r = 3%
P1 (3% ) = [q 0.9615 + (1 − q )0.9804 ] / 1.03 = 0 .9357
P = 0.9804
P =1
t =0 t =1 t =2 t =3
Note how different P1 (5%) and P1 (3%) are from those we found earlier by imposing that
q = 12 . In the implied tree, they are smaller than those obtained with q = 12 , state by state. This
is because in the implied tree, q = 0.8779. The implied tree puts more weight on those states
of nature in which the short-term rate is high or, equivalently, bond prices are low. We expect
that the price of the option on the implied binomial tree to be different (lower) from that we
found earlier.
So let’s do the computations by utilizing the implied binomial tree:
P = 0.9433, K = 0.9500
q = 0.8779 C = max{P − K,0} = 0
r = 5%
C = [q ⋅ 0 + (1 − q)0.0115] / 1.05 = 0.0013
q = 0.8779
r = 4% P = 0.9615, K = 0.9500
C = (q 0.0013 + (1 − q )0.0134 ) / 1.04 = 0.0026 C = max{P − K,0} = 0.0115
q = 0.8779
r = 3%
C = [q 0.0115 + (1 − q )0.0304 ] / 1.03 = 0.0134
P = 0.9804, K = 0.9500
C = max{P − K,0} = 0.0304
t =0 t =1 t =2
The computations in the previous diagram reveal that the option price predicted by the
implied binomial tree is 0.0026, which is one order of magnitude less than the option price
we find earlier, 0.0124! The interpretation for this result is, again, related to the implied risk-
neutral probability, which is much larger than q = 12 . The implied tree puts a relatively large
400
13.4. Foundational issues on interest rate modeling c
°by A. Mele
weight on the events in which the short-term rate is high or bond prices are low, which makes
the option price relatively so small.
We are not done. Let us go back to the zero pricing problem, and suppose that we observe the
price of a 2 Year zero, and that this price equals 0.9200, a quite reasonable figure. Is there any
chance that the inputs to the pricing problem related to the 3 Year zero are such that we can
“fit” the 2 Year zero as well? The answer is, of course, not. There are no reasons for which the
inputs utilized to fit the price of the 3 Year zero can also lead to fit the price of the 2 Year
zero. The 2 Year zero is quite a different asset! Indeed, in the next diagram, we use the inputs
to the pricing problem related to the 3 Year zero, and Eq. (13.5), and find that the price of the
2 Year zero implied by the price of the 3 Year zero is equal to 0.9178. Unless the market price
happens, by chance, to equal 0.9178, we can not simultaneously fit the price of the 3 Year and
the 2 Year zeros.
P =1
r = 5%
P1,1 (5 % ) = 1 / 1 . 05 = 0 . 9523
q = 0.8779
P =1
P = [q 0 . 9523 + (1 − q )0 . 9709 ] / 1 . 04 = 0.9178
r = 3%
P1,1 (3 % ) = 1 / 1 . 03 = 0 . 9709
P =1
t =0 t =1 t=2
To simultaneously fit the price of the 3 Year and the 2 Year zeros, we should implement at
least one of the two strategies: (i) To make the probabilities q time-varying; (ii) To calibrate the
entire structure of the short-term movements in Figure 13.1 and fit the initial term-structure
of market prices. We implement the first of these two strategies in the next subsection. We
develop the second strategy in Section 13.4.
We now build up the implied binomial tree in the general case. Suppose the time interval
is six months, so that the short-term rate is for six months. The current short-term rate is
3.99%, annualized. It can change to either 4.50% or to 4.00%, with equal (physical) probability.
Suppose that two zeros are available for trading: a 6M zero and a 1Y zero, where the current
price of the 1Y zero is 0.95974. What is the risk-neutral probability implied by this tree? This
probability must be such that, the price of all the zeros are matched exactly.
The tree we face is depicted below.
401
13.4. Foundational issues on interest rate modeling c
°by A. Mele
r = 4 .5 0 %
p= 1
2
2
r = 3 .9 9 %
2
r = 4 .0 0 %
2
t=0 t = 0 .5
FIGURE 13.3.
r= 4.50 %
p = 12 2
P (0 .5,1) = 1 / (1 + 0.045
2 ) = 0.9779
r= 3.99 %
2
P$ (0,1) = 0.95974 £1
r= 4.00 %
2
P (0.5,1) = 1 / (1 + 0.040
2
) = 0.9804
£1
t =0 t = 0.5 t =1
Note, the current market price, P$ (0, 1) = 0.95974, is less than the expected price to prevail
tomorrow, discounted at the current interest rate,
µ ¶
1 1 1 1
Ep [P (0.5, 1)] = 0.9779 + 0.9804 = 0.9599.
1+r 1 + 0.0399
2
2 2
Hence, p = 12 can not be the risk-neutral probability. To find out the risk-neutral probability,
we proceed as follows. In the absence of arbitrage opportunities,
P$ (0, 1) = 0.95974
1
= [qPup (0.5, 1) + (1 − q) Pdown (0.5, 1)]
1+r
1
= [q · 0.9779 + (1 − q) · 0.9804]
1 + 0.0399
2
with obvious notation. This is one equation with one unknown, q. The solution for q is, q = 0.605.
402
13.4. Foundational issues on interest rate modeling c
°by A. Mele
We may now proceed with pricing derivatives. Consider a call option on the 1Y zero, with
expiration date in six months and exercise price equal to 0.9785. Its payoff is as depicted below:
£1
P (0 . 5 ,1 ) = 0 . 9 7 7 9
q = 0.6 0 5 C = m a x {P (0 . 5 ,1 ) − K , 0 } = 0
r = 3 .9 9 %
2
C = ? £1
P (0 . 5 ,1 ) = 0 . 9 8 0 4
C = m a x {P (0 . 5 ,1 ) − K , 0 } = 0 . 0 0 1 9
£1
t=0 t = 0 .5 t =1
What happens when the short-term rate does not evolve as in the diagram of Figure 13.3
but, instead, as in Figure 13.4?
r = 4.4154 %
2
r= 3.99 %
2
r= 4 . 00 %
2
t =0 t = 0.5
FIGURE 13.4.
The previous tree is one in which the short-term in the upper state of the world equal to
r = 4.4154%, not 4.50%, as in Figure 13.3. It implies that:
1 1
Pup (0.5, 1) = r = 4.4154%
= 0.9784.
1+ 2 1+ 2
The solution is, q = 0.756, which is lower than the solution we found earlier using the tree in
Figure 13.3 (i.e., q = 0.605). The option price is, now,
1
C= 0.0399 [q · 0 + (1 − q) · 0.0019] = 0.9804 [0.244 · 0.0019] = 4.5451 × 10−4 ,
1+ 2
which is, of course, smaller than that computed in Eq. (13.7). In the tree of Figure 13.4, the
up-state of the world is, so to speak, less severe than the up-state of the world in the tree of
Figure 13.3. To be able to match the initial price P$ (0, 1) = 0.95974, the model in Figure 13.4
must put more weight on the up-state of the world, i.e. a larger implied risk-neutral probability.
In a segmented market, two investment banks might have different views about the evolution
of the short-term rate (the view in Figure 13.3 and the view in Figure 13.4), although they must
agree on the initial bond price, P$ (0, 1) = 0.95974. The segmentation could arise, for example,
because the clientèle of the first bank and that of the second bank are unlikely to meet and,
the prices charged by the banks are not publicly known. In the absence of market imperfections
(and arbitrage), however, the investment banks should agree on the option price too.
Next, let us another period to the diagram in Figure 13.3, assuming that the short-term rate
is as in the following diagram:
r = 4 .9 0 %
2
q1 = ?
r = 4 .5 0 %
q 0 = 0 .6 0 5 2
r = 3 .9 9 % r = 4 .3 0 %
2
2
r = 4 .0 0 %
2
r = 3 .9 0 %
2
t =0 t = 0 .5 t =1
FIGURE 13.5.
where q0 is the risk-neutral probability for the first period, and q1 is the risk-neutral probability
for the second period.
We already know that q0 = 0.605. The probability q1 is the risk-neutral probability for the
time-period (0.5, 1), and can be different from q0 . Suppose, also, that an additional zero is
available for trading, a 1.5Y zero. The current price of the 1.5Y zero is P$ (0, 1.5) = 0.9382.
To derive the the risk-neutral probability q1 , we proceed as follows. First, we consider the tree
404
13.4. Foundational issues on interest rate modeling c
°by A. Mele
below.
£1
r= 4.90 %
q1 = ? 2
P (1,1.5) = 1 / (1 + 0.049
2
) = 0.9761
r= 4.50 %
2
q0 = 0.605 PU (0.5,1.5 ) = ? £1
r= 3.99 %
2 r= 4.30 %
P$ (0,1.5 ) = 0.9382
2
q1 = ?
P (1,1.5) = 1 / (1 + 0.043
2
) = 0.9789
r= 4.00 %
2
PD (0.5,1.5) = ? £1
r= 3.90 %
2
P (1,1.5) = 1 / (1 + 0.039
2
) = 0.9808
£1
t =0 t = 0.5 t =1 t =1.5
We need to compute the prices PU (0.5, 1) and PD (0.5, 1). Once we compute these prices,
we shall use the no-arbitrage property of the zero, and the previously computed q0 = 0.605, to
recover q1 . By the usual no-arbitrage property of the zero, we have that:
1
PU (0.5, 1) = 0.045 [q1 · 0.9761 + (1 − q1 ) · 0.9789] (13.8)
1+ 2
1
PD (0.5, 1) = 0.040 [q1 · 0.9789 + (1 − q1 ) · 0.9808] (13.9)
1+ 2
The problem, q1 is not known. Therefore, Eqs. (13.8)-(13.9) do not allow us to pin down the
prices PU (0.5, 1) and PD (0.5, 1). But here is where calibration comes in! We know the current
price of the 1.5Y zero, which is, P$ (0, 1.5) = 0.9382. In the absence of arbitrage,
1
P$ (0, 1.5) = 0.9382 = 0.0399 [q0 · PU (0.5, 1) + (1 − q0 ) · PD (0.5, 1)] ,
1+ 2
where PU (0.5, 1) and PD (0.5, 1) are as in Eqs. (13.8)-(13.9), and where q0 = 0.605. So we have,
1
0.9382 = 0.0399 [0.605 · PU (0.5, 1) + 0.395 · PD (0.5, 1)] , (13.10)
1+ 2
where PU (0.5, 1) and PD (0.5, 1) are as in Eqs. (13.8)-(13.9). Hence, by replacing Eqs. (13.8)-
(13.9) into Eq. (13.10) leaves one equation with exactly one unknown, q1 . Solving, yields,
q1 = 0.8412, which implies that,
£1
r= 4 .90 %
2
q1 = 0.8418 P (1,1 .5 ) = 0 .9761
r = 4.502 % £1
q0 = 0.605 P (0 .5,1 .5 ) = 0 .9549
U
r= 3 .99 %
2 r= 4. 30 %
2
P$ (0,1 .5 ) = 0 .9382 P (1,1 .5 ) = 0 .9789
r= 4.00 %
2
£1
PD (0 .5,1 .5 ) = 0 .9600
r= 3 .90 %
2
P (1,1 .5 ) = 0 .9808
£1
t =0 t = 0.5 t =1 t = 1.5
We are now ready to compute the no-arbitrage price of a call option on the 1.5Y zero, with
expiration date in 1Y and exercise price equal to 0.9800. The price of the option at time t = 0.5,
is C = 0.00012, as illustrated below.
P (1 ,1 . 5 ) = 0 . 9 7 6 1
q1 = 0 .8 4 1 8 C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0
C = 0
P (1 ,1 . 5 ) = 0 . 9 7 8 9
C = [q 1 ⋅ 0 + (1 − q 1 ) ⋅ 0 . 0 0 0 8 ] / (1 + 0 .0 4
) C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0
2
= 0 .0 0 0 1 2
P (1 ,1 . 5 ) = 0 . 9 8 0 8
C = m a x {P (1 ,1 . 5 ) − K , 0 } = 0 . 0 0 0 8
t = 0 .5 t =1
We can now calculate the no-arbitrage price of the 1Y call option on the 1.5Y zero, struck at
K = 0.9800. It is,
1
C= 0.0399 [0 · q0 + 0.00012 · (1 − q0 )] = 0.9804 [0.00012 · (1 − 0.605)] = 4.647 × 10−5 .
1+ 2
We can use Figure 13.? to price derivatives, such as, say, a call option on the 1.5Y zero, with
expiration date in six months, and exercise price equal to 0.9580. We have the following tree.
406
13.4. Foundational issues on interest rate modeling c
°by A. Mele
P U (0 . 5 ,1 . 5 ) = 0 . 9 5 4 9
q 0 = 0 .6 0 5 C = m a x{P U (0 . 5 ,1 . 5 ) − K , 0 } = 0
r= 3 .9 9 %
2
C =?
PD (0 . 5 ,1 . 5 ) = 0 . 9 6 0 0
C = m a x{PD (0 . 5 ,1 . 5 ) − K , 0 } = 0 . 0 0 2 0
t =0 t = 0.5
1
C= 0.0039 [q0 · 0 + (1 − q0 ) · 0.0020] = 0.9804 [0.395 · 0.0020] = 7.745 × 10−4 .
1+ 2
13.4.2.4 Summing up
So let’s sum up what we’ve done. Given is the “evolution” of the short-term rate in Figure 13.5,
which we use to recover the two risk-neutral probabilities q0 (for the time span (0, 0.5)) and q1
(for the time span (0.5, 1)), starting from the knowledge of the market prices of two zeros, the
1Y zero and the 1.5Y zero. Precisely, given P$ (0, 1), the price of the 1Y zero, we recover q0 , as
illustrated below:
£1
PU (0 . 5 ,1 )
q0
P $ (0 ,1 ) £1
P D (0 . 5 ,1 )
£1
t=0 t = 0 .5 t =1
This is possible as PU (0.5, 1) and PD (0.5, 1) do not “depend” on q0 and so they are obtained
in a straightforward manner. Given q0 , then, we compute q1 , using P$ (0, 1.5), the price of the
407
13.5. The Ho and Lee model c
°by A. Mele
q1 PU U (1 ,1 . 5 )
PU (0 . 5 ,1 . 5 ) £1
q̂0
P$ (0 ,1 . 5 ) PU D (1 ,1 . 5 )
PD (0 . 5 ,1 . 5 )
£1
PD D (1 ,1 . 5 )
£1
t=0 t = 0 .5 t =1 t = 1 .5
Again, the risk-neutral probability, q1 , can be recovered is feasible because PUU (1, 1.5), PUD (1, 1.5)
and PDD (1, 1.5) do not “depend” on q1 , and are easily obtained. So, given PU U (1, 1.5), PU D (1, 1.5)
and PDD (1, 1.5), we can express
as (linear) functions of q1 . Finally, we impose the no-arbitrage property to P$ (0, 1.5), which
makes the observed price, P$ (0, 1.5), a (linear) function of PU (0.5, 1.5) and PD (0.5, 1.5) and,
hence, q1 , thereby “recovering” q1 .
We can continue, and consider an additional time period, as in the tree in Figure 13.6. We
can recover q2 , once we are given the market price of a 2Y zero, P$ (0, 2), as follows:
• The prices of the 2Y zero at time t = 1.5 (the filled nodes in Figure 13.6) (say P (1.5, 2))
are easily computed, given an assumption about the numerical values of the short-term
rate in those nodes.
• Then, given the prices P (1.5, 2) at time t = 1.5, and the previously calibrated probabilities
q̂0 and q̂1 , we can express the current market price P$ (0, 2) as a (linear) function of q2 .
Then, we solve for q2 .
The calibration can continue. We extend the tree one period more. Then, we use the price
of one additional zero to “recover” time varying risk-neutral probabilities. As we know, an
alternative procedure, developed at the beginning of this section, consists in (i) fixing the risk-
neutral probabilities q to some value at all times (e.g., q = 12 ), and (ii) figuring out the “implied”
values for the short-term rate in each node of the tree. The next sections develop a systematic
approach for implementing this procedure.
£1
q2
q̂ 1
£1
q̂ 0
£1
P $ (0 , 2 )
£1
£1
t = 0 t = 0 .5 t =1 t = 1 .5 t = 2
FIGURE 13.6.
Rather, they take the yield curve as given, and then model the movements of the entire yield
curve in order to price interest rate derivatives.
The main idea underlying the Ho and Lee model is to model the movements of the yield
curve along a binomial tree, much in the spirit of the Cox, Ross and Rubenstein (1979) tree
representation of the Black and Scholes (1973) model. The main issues can be summarized as
follows. In Black and Scholes (1973) and Cox, Ross and Rubenstein (1979), the asset underlying
the option contract is a traded asset. So the underlying asset price satisfies the martingale
condition. Interest rate derivatives, instead, generally depend on non-traded assets. The mere
presence of boundary conditions induce bond return volatility to be time-varying.
The two functions u (·) and d (·), also called “perturbation functions”, are introduced to capture
the fact that in the case of uncertainty, the price of the zero can either go up or down with respect
to the risk-free of return. If there was no uncertainty, we would have u (T − t) = d (T − t) = 1,
for all t ≤ T . In general, we have that d (T − t) ≤ 1 ≤ u (T − t), as we shall now demonstrate.
One period before the expiration date, i.e. at t = T − 1, our price is certain to jump to one.
This simple consideration leads to the following boundary condition for the two functions u (·)
and d (·),
u (1) = d (1) = 1. (13.13)
In terms of the two functions u (·) and d (·), the martingale restriction in Eq. (13.11) is,
1 = qu (T − t) + (1 − q) d (T − t) , t ≤ T. (13.14)
This relation is quite familiar as it matches the standard risk-neutral relation for stock prices
in which the short-term rate is tied down to the up and down movements of the stock price.
However, in this context the up and down movements of the zero price depend on the maturity
of the price itself, which makes the evaluation problem more difficult.
Pj+1 (t + 1, T ) 1 Pj (t + 1, T ) 1
= u (T − t) and = d (T − t) .
Pj (t, T ) Pj (t, t + 1) Pj (t, T ) Pj (t, t + 1)
| {z } | {z }
up at t down at t
We can use these two relations to figure out the two paths leading to the bond price at time
t + 2 in the event of j + 1 jumps, i.e. Pj+1 (t + 2, T ). We have that along the first path,
Pj+1 (t + 1, T ) 1 Pj+1 (t + 2, T ) 1
= u (T − t) , = d (T − t − 1) ,
Pj (t, T ) Pj (t, t + 1) Pj+1 (t + 1, T ) Pj+1 (t + 1, t + 2)
| {z } | {z }
up at t down at t+1
To sum up:
≡ Pj+1 (t+1,T )
z }| {
1 1
Pj+1 (t + 2, T ) = d (T − t − 1) · u (T − t) Pj (t, T ) (up & down)
Pj+1 (t + 1, t + 2) Pj (t, t + 1)
1 1
Pj+1 (t + 2, T ) = u (T − t − 1) · d (T − t) Pj (t, T ) (down & up)
Pj (t + 1, t + 2) Pj (t, t + 1)
| {z }
≡ Pj (t+1,T )
where we assume that δ is constant. Clearly, 0 ≤ δ ≤ 1. Substituting back into Eq. (13.15),
u (T − t) u (T − t − 1) −1
= δ .
d (T − t) d (T − t − 1)
u (T − t)
= δ −(T −t−1) . (13.16)
d (T − t)
411
13.5. The Ho and Lee model c
°by A. Mele
Eq. (13.16) gives us the condition under which the tree is recombining. To rule out arbitrage
opportunities, Eq. (13.14) must also hold true. Therefore, we have to solve the following system
of two equations (Eq. (13.16) and Eq. (13.14)) with two unknowns (u (·) and d (·)),
½
u (T − t) = δ −(T −t−1) d (T − t)
qu (T − t) + (1 − q) d (T − t) = 1
1 δ T −t−1
u (T − t) = , d (T − t) = . (13.17)
q + (1 − q) δT −t−1 q + (1 − q) δ T −t−1
So we have solved the problem. We know how to “populate” the tree. Suppose we know how
to assign values to q and δ. Given q and δ, and an initial bond price P (t, T ), we can use Eqs.
(13.12) to populate the tree, using the solution for u (T − t) and d (T − t) given in Eqs. (13.17).
In this way, we can figure out the exact bond prices to insert in each node of the tree. Once
we have computed the bond prices in each node, we can price interest rate derivatives, i.e. the
asset the payoff of which depend on the particular value taken by the bond price on a given set
of nodes. Below, we provide the closed-form solution for the bond price in this model.
P (t+1,t+2)
What is the interpretation of δ? We have defined δ to be, δ −1 ≡ Pj+1 j (t+1,t+2)
. By taking
natural logs,
µ ¶
−1 Pj+1 (t + 1, t + 2)
log δ = log = − [rj+1 (t + 1) − rj (t + 1)] . (13.18)
Pj (t + 1, t + 2)
But we know that conditionally upon time t and (price) jumps equal to j ≤ t, the short-term
rate is binomially distributed, and can take on two values: (i) rj+1 (t + 1) with probability q
and rj (t + 1) with probability 1 − q. Then, the conditional variance of the short-term rate is,
where vart [r̃ (t + 1)] is conditional variance at time t, of the short-term rate one-period ahead.
Then, we may use Eq. (13.18), and the previous equation, to obtain,
p p
vart [r̃ (t + 1)] = q (1 − q) · log δ −1 .
where F̂t (0) ≡ log P (0, t) − log P (0, t + 1). We also have,
Hence, the parameter δ can be chosen so that the volatility of the short-term rate predicted by
the model matches exactly the volatility p of the short-term rate that we see in the data. Con-
cretely, we can take δ̂ = exp(− Std (∆r)/ q (1 − q)), where Std(∆r) is the standard deviation
of the short-term rate in the data.
Note, then, the interesting feature of the model. The Ho and Lee model doesn’t take any
a priori stance on the dynamics of the short-term rate. Rather, it imposes the martingale
restriction on bond prices (an economic restriction), and the simplifying assumption that the
tree is recombining (a technical condition).
These two simple conditions are sufficient to tell what to expect from the dynamics of the
short-term rate. While deliberately simple, the Ho and Lee model is one of the most powerful
models in the history of financial economics. The modern approach to interest rate modeling
simply aims to make the Ho and Lee methodology more accurate for practical purposes.
£0
q
s,τ 1− q
£1
Arrow-Debreu security
q s ,τ + 1
1− q
0,0 s − 1,τ
£0
£0
FIGURE 13.7. An Arrow-Debreu security for state s at time τ + 1 is a security that pays $1 at time
τ + 1 in state s, and zero otherwise
is necessarily,
X
T
P$ (0, T ) = ps (T ) .
s=0
More generally, consider a derivative that pays off Ds (τ ) in node (s, τ ), meaning a dividend
equal to D1 (τ ) in state s = 1, equal to D2 (τ ) in state s = 2, · · · , and equal to Dτ (τ ) in state
s = τ . The price of this asset, denoted as C$ (0, T ), is given by,
X
T X
τ
C$ (0, T ) = ps (τ ) Ds (τ ) . (13.20)
τ =1 s=0
Our objective, now, is to “recover” the price of the Arrow-Debreu securities ps (τ ) for all s
and τ , where τ ∈ {1, · · · , T }, from the observation of the initial term-structure of interest rates.
Consider the Arrow-Debreu security that promises to pay $1 in node (s, τ + 1) (see Figure
13.7). Let its value at time τ in state j (j ≤ τ ) be denoted as πj,τ [s, τ + 1]. What is this value at
time τ in all states? The key observation, here, is that in this binomial tree, the node (s, τ + 1)
(the filled circle) can only be “accessed to” through the nodes (s, τ ) and the nodes (s − 1, τ )
occurring at time τ (the two empty circles in Figure 13.7). For this reason, at time τ , the
value π j,τ [s, τ + 1] is zero in all the nodes (j, τ ) that are distinct from the empty circles (s, τ )
and (s − 1, τ ) in Figure 13.7. This is because starting from any node different from these empty
circles, it is impossible to reach the node (s, τ + 1) (the filled circle) on which the Arrow-Debreu
security pays off.
So, we are left with finding the values πj,τ [s, τ + 1] in the nodes corresponding to the empty
circles (s, τ ) and (s − 1, τ ), i.e. π s,τ [s, τ + 1] and π s−1,τ [s, τ + 1]. Let rs (τ ) be the continuously
compounded short-term rate in node (s, τ ). Consider the upper node (s, τ ). We have,
The input is, of course, a number of zeros equal to the largest maturity date the tree extends
to. We describe how the algorithm works by developing two concrete examples.
We start with the Ho and Lee. We assume continuous compounding, for analytical reasons
clarified below. We assume that the continuously compounding short-term rate is solution to,
rs (τ ) = r0 (τ ) + log δ −1 · s, (13.23)
where rs (τ ) is the short-term rate at time τ , in the event of s upward movements of the
short-term rate, and δ is a volatility parameter, i.e. such that
Std (∆r)
log δ −1 = p ,
q (1 − q)
where Std (∆r) is the standard deviation of the short-term rate in the data.3 At time zero, the
price of a zero maturing at time τ + 1 is:
X
τ X
τ
P$ (0, τ + 1) = −rs (τ )
ps (τ ) e =e −r0 (τ )
δ s ps (τ ) ,
s=0 s=0
3 Hence, the short-term rate movements that we shall derive do depend on the value of the risk-neutral probability q that we
choose.
415
13.6. Beyond Ho and Lee: Calibration c
°by A. Mele
where the second equality follows by the assumption that the short-term rate is solution to Eq.
(13.23).
By rearranging terms in the previous equation, we obtain a closed-form expression for the
future short-term rate at time τ , in the event of zero upward movements,
µPτ s ¶
s=0 δ ps (τ )
r0 (τ ) = log . (13.24)
P$ (0, τ + 1)
We use Eq. (13.24) and the forward equation (13.22) to populate the interest rate tree, under
the assumption that q = 12 . Precisely, the algorithm proceeds as follows:
(i) Given the boundary condition for the Arrow-Debreu price, p0 (0) = 1, compute the initial
value of the short-term rate, r0 (0), using Eq. (13.24), as r0 (0) = log(1/ P$ (0, 1)).
(ii) Suppose we know the future value of the short-term rate at time τ − 1, in the event of
no upward movements, i.e. r0 (τ − 1). Then, given the value of r0 (τ − 1), and the price
of the Arrow-Debreu securities ps (τ − 1) for s ≤ τ − 1, compute ps (τ ) for s ≤ τ , through
the forward equation (13.22),
1
ps (τ ) = ps (τ − 1) δ s e−r0 (τ −1) (1 − q) + ps−1 (τ − 1) δ s−1 e−r0 (τ −1) q, q = ,
2
where the last equation follows by plugging Eq. (13.23) into Eq. (13.22).
(iii) Given the Arrow-Debreu prices ps (τ ) for s ≤ τ , use Eq. (13.24) to compute the future
value of the short-term rate at time τ , in the event of no upward movements, i.e. r0 (τ ).
As as second example, consider the Black, Derman and Toy (BDT) model. In this model, the
short-term rate is solution to,
rs (τ ) = δ s r0 (τ ) , (13.25)
where δ is, once again, a volatility parameter.4 For computational convenience, the BDT model
assumes that the short-term rate in Eq. (13.25) is discretely compounded. Accordingly, we
rewrite the forward equation (13.22) in terms of discretely compounded rates,
1 1
ps (τ + 1) = ps (τ ) (1 − q) + ps−1 (τ ) q. (13.26)
1 + rs (τ ) 1 + rs−1 (τ )
(i) Compute the initial value of the short-term rate, r0 (0), as the solution to,
1
P$ (0, 1) = .
1 + r0 (0)
4 In its most general form, the BDT model assumes that r (τ ) = δ s r (τ ), where δ
s τ 0 τ is a volatility parameter that varies
deterministically over time. This more general formulation leads to more flexibility, which is useful to fit the entire volatility
structure of the yield curve (also known as the term structure of volatility).
416
13.7. Copying with credit risk c
°by A. Mele
(ii) Suppose we know the future value of the short-term rate at time τ − 1, in the event of
no upward movements, i.e. r0 (τ − 1). Then, given the value of r0 (τ − 1), and the price
of the Arrow-Debreu securities ps (τ − 1) for s ≤ τ − 1, compute ps (τ ) for s ≤ τ , through
the forward equation (13.26),
1 1 1
ps (τ ) = ps (τ − 1) s (1 − q) + ps−1 (τ − 1) s−1 q, q = ,
1 + δ r0 (τ − 1) 1+δ r0 (τ − 1) 2
where the last equation follows by plugging Eq. (13.25) into Eq. (13.26).
(iii) Given the boundary condition p0 (0) = 1, and the Arrow-Debreu prices, ps (τ ) for s ≤ τ ,
use the pricing equation for the zero,
X
τ
1
P$ (0, τ + 1) = ps (τ ) s ,
s=0
1 + δ r0 (τ )
to solve, numerically, for the future value of the short-term rate at time τ , in the event
of no upward movements, i.e. r0 (τ ). Note, we did not need this additional step for the
solution of the Ho and Lee model, as the short-term rate r0 (τ ) is known in closed form
in the Ho and Lee model (see Eq. (13.24)).
(iv) If τ = T , stop. Otherwise, go to (ii).
(v) Fifth, we correct for credit risk. The price we found in the fourth step is typically different
than the market price. One issue is that the market price reflects the credit risk of the
firm, and should be typically less than the price obtained in the fourth step. The trick,
here, is to search for an additional spread to add to the short-term rate process obtained
in the first step, such that the theoretical bond price equals the market price of the bond.
This is done numerically, and of course alters the results obtained in steps 3 and 4.
At this point, we may price options written on callable bonds. Ho and Lee (2004) (Chapter
8, Section 8.3 p. 274-278) develop a number of useful exercises on the pricing of options on
callable bonds, through tree methods.
(i) First, we set the life of the tree equal to the life of the callable convertible bond.
(ii) Second, we assess the evolution of the stock price along the tree, under the risk-neutral
probability. (This is done according to the usual Cox, Ross and Rubinstein (1979) ap-
proach.)
(iii) Third, in each node, we compute the value of the bond as max{CV, min{B, K}}, where
CV is the conversion value, K is the call value, and B is the value of the bond which is
“rolled-back” from the values of the bond in the next nodes (by using, as usual, recur-
sive, backward solution, i.e. the risk-neutral expectation of the future payoffs). That is,
assuming the bondholder does not convert, the value is B ∗ = min {B, K}, where B is the
“rolled-back” value of the bond. Then, the value is max{CV, B ∗ }.
Note, this procedure leads to fill in the nodes, once we know the appropriate interest rate. If
the firm was not subject to default risk, we would simply use the riskless interest rate. However,
the firm is obviously subject to default risk. In practice, we proceed as follows. In each node, the
value of the bond is decomposed in two parts. One part, related to the “pure debt component”,
which is discounted at the defaultable interest rate; and one part related to the “pure equity
component”, which is discounted at the default-free interest rate. Exercise 25.7 in Hull (2003)
(p. 653-654) illustrates a specific example.
418
13.8. Appendix 1: Duration hedging c
°by A. Mele
1 1
B1 (ŷ1 ) = = = 0.95238
1 + ŷ1 1 + 0.05
1 1
B2 (ŷ2 ) = 5 = = 0.78353
(1 + ŷ2 ) (1 + 0.05)5
DMac (ŷ1 ) 1
D (ŷ1 ) = = = 0.95238
1 + ŷ1 1 + 0.05
DMac (ŷ2 ) 5
D (ŷ2 ) = = = 4.7619
1 + ŷ2 1 + 0.05
and:
D (ŷ2 ) B2 (ŷ2 ) 4.7619 · 0.78353
θ1 = − θ2 = − · 1 = −4.1135.
D (ŷ1 ) B1 (ŷ1 ) 0.95238 · 0.95238
That is, to hedge the 5Y zero, we need to short-sell approximately four 1Y zeros. The balance of
this hedging position is,
The left hand side of this equation is the price of the 5Y bond. The right hand side is the value of
the “replicating” portfolio, which consists of (i) approximately 4 units of the 1Y bond, and (ii) the
balance of the hedging position.
When y 6= 5%, the previous relation can only approximately hold,
Figure 12A.1 below plots the left hand side and the right hand side of this relation.
419
13.8. Appendix 1: Duration hedging c
°by A. Mele
1.0
y
0.9
0.8
0.7
0.6
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
YTM
1
FIGURE 13A.1: Dotted line (top): The price of the 5Y zero, B2 (y) = (1+y) 5 , where y is
the YTM. Solid line (bottom): The value of the “replicating” portfolio consisting of (i)
4.1135 units of the 1Y zero, and (ii) the balance of the hedging position, which is equal
1
to −$3.1341, i.e. 4.1135 · B1 (y) − 3.1341, where B1 (y) = 1+y is the 1Y zero price.
What is going on? We are hedging the 5Y zero by selling approximately four 1Y zeros. In a neigh-
borhood of y = 5%, the value of the “synthetic” bond we sold, 4.1135 · B1 (y) − 3.1341, behaves as
B2 (y). However, the 5Y zero displays more convexity than the “synthetic” bond. This larger convexity
implies that:
• If the interest rates go down, the price of the 5Y zero bond we hold increases more than the
value of the “synthetic” bond we sold. As a result, we make profits.
• If the interest rates go up, the price of the 5Y zero bond we hold decreases less than the value
of the “synthetic” bond we sold. As a result, we make profits.
In all cases, we make profits.5 However, this is not an arbitrage opportunity! The previous reasoning
hinges on the assumption of a parallel shift in the term-structure of interest rates, that is dŷ1 = dŷ2 ,
where ŷ1 = spot rate for 1 year, and ŷ2 = spot rate for 5 years. While parallel shifts in the term-
structure seem empirically relevant, they are not the only shifts that are likely to occur, as we explained
in Chapter 11.
5 Mathematically, we buy 1 unit of the 5Y zero at B and sell θ units of the 1Y zero at B , thereby cashing in θ B −B = 3.1341.
2 1 1 1 1 2
Then, in one one month (say), consider what would happen if we had to reverse the position in Eq. (13A.1), i.e. sell the 5Y zero and
buy back the 1Y zeros we sold. We consider three scenarios: (i) The yield curve will be the same as today. In this case, reversing the
position in Eq. (13A.1) implies that we shall simply have to pay 3.1341 (assuming the change in value of the two bonds due to the
mere passage of time is small enough). (ii) The yield curve will experience a positive parallel shift. In this case, the prices of the two
zeros will be B1 − ∆B1 and B2 − ∆B2 , where ∆B1 and ∆B2 are both positive. Therefore, we shall obtain, −3.1341 + θ1 ∆B1 − ∆B2 ,
where θ1 ∆B1 − ∆B2 is positive because by convexity, the value of the portfolio decreases more than the price of the 5Y zero, thus
yielding a profit. (iii) The yield curve will experience a negative parallel shift. In this case, the prices of the two zeros will be
B1 + ∆B1 and B2 + ∆B2 , where ∆B1 and ∆B2 are both negative. Therefore, we shall obtain, −3.1341 + ∆B2 − θ1 ∆B1 , where
∆B2 − θ1 ∆B1 is positive because by convexity, the value of the portfolio increases less than the price of the 5Y zero, thus yielding
a profit.
420
13.8. Appendix 1: Duration hedging c
°by A. Mele
and (ii) the duration of the portfolio equals the duration of the 5Y zero, viz,
By the same computations made in the previous example, we have that B3 (ŷ3 ) = 0.61391 and
D (ŷ3 ) = 9.5238. By using the figures in the previous example, we compute θ1 and θ3 in Eqs. (13A.4)
to be
9.5238 − 4.7619 0.78353 4.7619 − 0.95238 0.78353
θ1 = = 0.45706, θ3 = = 0.56724.
9.5238 − 0.95238 0.95238 9.5238 − 0.95238 0.61391
Figure 12A.2 depicts the behavior of the bullet price and the market value of the barbell as we
change the YTM. Several comments are in order. First, note that the barbell portfolio is now more
convex than the bullet! Now, large movements in the YTM lead to profits, provided we maintain the
assumption of parallel shifts in the term-structure of interest rates. Second, the “barbell-bet” is “self-
financing”. By construction, the value of the bullet we sell equals the value of the barbell portfolio.
However, the barbell is clearly not an arbitrage opportunity. The scenario underlying Figure 12A.2
relies on the assumption of a parallel shift in the term structure of interest rates. As we explained
in Chapter 11, it is not realistic to simultaneously assume large and parallel movements in the term-
structure of interest rates. Historically, large interest rate shifts (that is, typically, shifts occurring over
large horizons of time) are accompanied by the occurrence of a variety of shape modifications.
421
13.8. Appendix 1: Duration hedging c
°by A. Mele
1.0
0.9
0.8
0.7
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
YTM
FIGURE 13A.2: “Barbell trading.” Dotted line (bottom): The price of the 5Y zero,
1
B2 (y) = (1+y) 5 , where y is the YTM. Solid line (top): The value of the “barbell” port-
folio consisting of (i) 0.45706 units of the 1Y zero and (ii) 0.56724 of the 10Y zero,
1
i.e. B1 (y1 ) · 0.45706 + B3 (y3 ) · 0.56724, where B1 (y) = 1+y is the 1Y zero price and
1
B3 (y) = (1+y)10 is the 10Y zero price.
Table 13A.1 considers the case of non-parallel shifts in the term-structure. We assume that the
initial term-structure is not flat. Then, we consider two cases: (i) A “twist” in the term-structure, i.e.
long-term rates lower than short-term rates; (ii) a “steepening” of the term-structure.
TABLE 13A.1
Barbell value =
YTM Bullet price Mod. dur. θ1 B1 (ŷ1 ) + θ3 B3 (ŷ3 )
Initial term-structure
1Y ŷ1 = 4% B1 (ŷ1 ) = 0.961 D (ŷ1 ) = 0.961
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 6% B3 (ŷ3 ) = 0.558 D (ŷ3 ) = 9.434
Barbell value = 0.783
“Twist”
1Y ŷ1 = 6% B1 (ŷ1 ) = 0.943 D (ŷ1 ) = 0.943
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 4% B3 (ŷ3 ) = 0.675 D (ŷ3 ) = 9.615
Barbell value = 0.847
“Steepening”
1Y ŷ1 = 4% B1 (ŷ1 ) = 0.961 D (ŷ1 ) = 0.961
5Y ŷ2 = 5% B2 (ŷ2 ) = 0.783 D (ŷ2 ) = 4.762
10Y ŷ3 = 7% B3 (ŷ3 ) = 0.508 D (ŷ3 ) = 9.346
Barbell value = 0.751
422
13.8. Appendix 1: Duration hedging c
°by A. Mele
We use the portfolio in Eq. (13A.4), and find that in correspondence of the initial term-structure
(ŷ1 = 4%, ŷ2 = 5%, ŷ3 = 6%), θ1 = 0.449 and θ3 = 0.629. We keep this portfolio fixed, and compute
the barbell value, θ1 B1 (ŷ1 ) + θ3 B3 (ŷ3 ), occurring at the two scenarios “twist” and “steepening”. The
convexity of the barbell bet is in fact a bet on long-term bonds and leads to a profit in the “twist”
scenario (since B2 (ŷ2 ) = 0.783 in all cases). That is, by convexity, the price B3 varies more than the
price of shorter maturity zeros, thus leading to profits. However, note that the barbell bet leads to
losses in the “steepening” scenario.
A caveat. The previous computations should be interpreted with some care, as the value of the
zeros changes over time. Notably, the value of the zeros changes over the horizon after which we are
designing scenarios, even without any changes in the yield curve. However, this effect is usually minor
when the horizon is sufficiently small and, generally, can be factored into the analysis.
To sumup, duration hedging is a useful tool, although it has some limitations. It is only a first-
order approximation to the price of bond. A conventional bond is typically strictly convex in the
YTM. Therefore, for large changes in the YTM, we should update the duration-based hedging ratios.
Re-adjustments are in order anyway, since fixed income securities have duration that decreases over
time.
423
13.9. Appendix 2: Proof of Eq. (13.4) c
°by A. Mele
We also know that in the second period, the value of the second zero is,
½
P (r+ , T2 ) with probability p
P (r̃, T2 ) = −
P (r , T2 ) with probability 1 − p
Next, we select ∆ and M to make the value of the portfolio equal the value of the second zero, in each
state of nature, viz
V (r̃) = P (r̃, T2 ) , in each state.
Mathematically, this is tantamount to solving the following system of two equations with two unknowns
(∆ and M ), ½
V (r+ ) = ∆ · P (r+ , T1 ) + M · (1 + r) = P (r+ , T2 )
(13A.5)
V (r− ) = ∆ · P (r− , T1 ) + M · (1 + r) = P (r− , T2 )
The solution is,
+ − P (r− , T2 )P (r+ , T1 ) − P (r+ , T2 )P (r− , T1 )
ˆ = P (r , T2 ) − P (r , T2 ) ,
∆ M̂ = .
P (r+ , T1 ) − P (r− , T1 ) [P (r+ , T1 ) − P (r− , T1 )] (1 + r)
V0 (r)|∆=∆,M= ˆ
ˆ M̂ = ∆ · P (r, T1 ) + M̂ = P (r, T2 ) ,
or,
ˆ · P (r, T1 ) .
(1 + r) M̂ = (1 + r) P (r, T2 ) − (1 + r) ∆ (13A.6)
Next, let us figure out the prediction of the model in terms of the expected return it generates for
the price of the bond maturing at T1 , when (∆, M ) = (∆, ˆ M̂ ). To do this, multiply the first equation in
(13A.5) by p, and multiply the second equation in (13A.5) by 1 − p. Add the result for ∆ = ∆, ˆ M = M̂
to obtain,
£ ¤
ˆ · pP (r+ , T1 ) + (1 − p) P (r− , T1 ) + M̂ · (1 + r) = pP (r+ , T2 ) + (1 − p)P (r− , T2 ).
∆
[pP (r+ , T1 ) + (1 − p)P (r− , T1 )] − (1 + r)P (r, T1 ) [pP (r+ , T2 ) + (1 − p) P (r− , T2 )] − (1 + r) P (r, T2 )
= .
P (r+ , T1 ) − P (r− , T1 ) P (r+ , T2 ) − P (r− , T2 )
The previous equation is easy to interpret. The numerators are the expected excess returns from
holding the assets. They equal Ep [P (r̃, Ti )] − (1 + r) P (r, Ti ), where Ep [P (r̃, Ti )] is what the investors
expect to receive, the next period, by investing £P (r, Ti ) today, in the bond; and (1 + r) P (r, Ti )
is what the investors expect to receive, the next period, by investing £P (r, Ti ) today, in the MMA.
The denominators constitute a measure of volatility related to holding the assets. Then, the previous
equation tells us that the Sharpe ratios, or the unit risk premiums, on the two zeros agree.
Let the Sharpe ratio on any zero be equal to some function λ of the short-term rate r only (and
possibly of calendar time). This function, λ, does not clearly depend on the maturity of the zeros.
Then, we have,
£ ¤ £ ¤
pP (r+ , T1 ) + (1 − p)P (r− , T1 ) − (1 + r) P (r, T1 ) = P (r+ , T1 ) − P (r− , T1 ) λ
P (r+ , T1 ) − P (r− , T1 )
= · [(r+ − r− )λ]. (13A.7)
r+ − r−
We can interpret (r+ − r− ) as a measure of interest rate volatility, and define Vol(r̃ − r) ≡ (r+ − r− ).
Eq. (13.4) follows by rewriting Eq. (13A.7) for a generic maturity date T > 2.
425
13.10. Appendix 3: Proof of Eq. (13.19) c
°by A. Mele
T −1
P (0, T ) Y 1 + FS (0)
P (t, T ) = . (13A.8)
P (0, t) 1 + FS (t)
S=t
It is quite natural, at this juncture, to search for the model’s predictions about the evolution of future
forward rates. Not only is this task theoretically important, it is also relevant as a matter of the
practical implementation of the model. Indeed, if the model’s predictions about the evolution of future
forward rates yields a closed-form solution, the bond price at the future date t, P (t, T ), could be
expressed in a closed-form, which might facilitate the implementation details of the model.
Let us introduce some further notation. Let FSj (t) be the forward rate as of time t after the occur-
rence of j upward movements of the bond price, and let the continuously compounded forward rate
F̂Sj (t) be defined as,
³ ´
F̂Sj (t) ≡ log 1 + FSj (t) , j ≤ t.
T −1
P (0, T ) Y 1 + FS (0) P (0, T ) − T −1
(F̂Sj (t)−F̂S (0)) .
Pj (t, T ) = j
= e S=t (13A.9)
P (0, t) P (0, t)
S=t 1 + FS (t)
We have the following important result, that we shall prove later on:
u (S + 1 − t)
F̂Sj (t) = F̂S (0) + log − (t − j) log δ, j ≤ t. (13A.10)
u (S + 1)
By replacing Eq. (13A.10) into Eq. (13A.9), and using the solution for the perturbation function u (·)
in Eqs. (13.17), we get Eq. (13.19).
So we are left with proving Eq. (13A.10). The proof proceeds by induction. Eq. (13A.10) holds true
for t = 0. Next, suppose that it holds at time t. We wish to show that in this case, Eq. (13A.10) would
also hold at time t + 1. At time t + 1, we have two cases.
Case 1 : A positive price jump occurs between time t and time t + 1. In this case,
Pj+1 (t + 1, S)
F̂Sj+1 (t + 1) = log
Pj+1 (t + 1, S + 1)
∙ ¸ ∙ ¸
Pj (t, S) Pj (t, S + 1)
= log u (S − t) − log u (S + 1 − t)
Pj (t, t + 1) Pj (t, t + 1)
u (S − t)
= log + F̂Sj (t)
u (S + 1 − t)
u (S + 1 − (t + 1))
= log + F̂S (0) − [(t + 1) − (j + 1)] log δ,
u (S + 1)
where the first equality and the third follow by the definition of F̂Sj+1 (t), the second equality holds
by the definition of the jump in Eq. (13.12), the fourth equality follows by using Eq. (13A.10). Hence,
Eq. (13A.10) holds at time t + 1 in the occurrence of a positive price jump between time t and time
t + 1.
426
13.10. Appendix 3: Proof of Eq. (13.19) c
°by A. Mele
Case 2 : A negative price jump occurs between time t and time t + 1. In this case,
Pj (t + 1, S)
F̂Sj (t + 1) = log
Pj (t + 1, S + 1)
∙ ¸ ∙ ¸
Pj (t, S) Pj (t, S + 1)
= log d (S − t) − log d (S + 1 − t)
Pj (t, t + 1) Pj (t, t + 1)
d (S − t)
= log + F̂Sj (t)
d (S + 1 − t)
d (S − t) δ −(S−t)+1 u (S + 1 − t)
= log −(S+1−t)+1
δ −1 + F̂S (0) + log − (t − j) log δ
d (S + 1 − t) δ u (S + 1)
u (S − t)
= log + F̂S (0) − [(t + 1) − j] log δ,
u (S + 1)
where the first four equalities follow by the same arguments produced in Case 1, the fifth equality
holds by the relation u (T ) = d (T ) δ −(T −1) in Eq. (13.16) and the last equality follows by rearranging
terms. Hence, Eq. (13A.10) holds at time t + 1 in the occurrence of a negative price jump between
time t and time t + 1.
These two cases reveal that if Eq. (13A.10) holds at time t for any j ≤ t, it also holds at time t + 1,
in each state of nature. By induction, Eq. (13A.10) is therefore true.
427
13.10. Appendix 3: Proof of Eq. (13.19) c
°by A. Mele
References
Black, F. and M. Scholes (1973): “The Pricing of Options and Corporate Liabilities.” Journal
of Political Economy 81, 637-659.
Black, F., E. Derman and W. Toy (1990): “A One Factor Model of Interest Rates and its
Application to Treasury Bond Options.” Financial Analysts Journal (January-February),
33-39.
Cox, J. C., S. A. Ross and M. Rubinstein (1979): “Option Pricing: A Simplified Approach.”
Journal of Financial Economics 7, 229-263.
Diebold, F. X. and C. Li (2006): “Forecasting the Term Structure of Government Bond Yields.”
Journal of Econometrics 130, 337-364.
Ho, T. S. Y. and S.-B. Lee (1986): “Term Structure Movements and the Pricing of Interest
Rate Contingent Claims.” Journal of Finance 41, 1011-1029.
Ho, T. S. Y. and S.-B. Lee (2004): The Oxford Guide to Financial Modeling. Oxford University
Press.
Hull, J. C. (2003): Options, Futures, and Other Derivatives. Prentice Hall. 5th edition (Inter-
national Edition).
Hull, J. C. and A. White (1990): “Pricing Interest Rate Derivative Securities.” Review of
Financial Studies 3, 573-592.
McCulloch, J. (1971): “Measuring the Term Structure of Interest Rates.” Journal of Business
44, 19-31.
McCulloch, J. (1975): “The Tax-Adjusted Yield Curve.” Journal of Finance 30, 811-830.
Nelson, C.R. and A.F. Siegel (1987): “Parsimonious Modeling of Yield Curves.” Journal of
Business 60, 473-489.
428