derivatives
Bermudanstyle interest rate derivatives are an important class of
options. Many banking and insurance products, such as mortgages,
cancellable bonds, and life insurance products, contain Bermudan
interest rate options associated with early redemption or cancella
tion of the contract. The abundance of these options makes evident
that their proper valuation and risk measurement are important to
banks and insurance companies. Risk measurement allows for off
setting market risk by hedging with underlying liquidly traded assets
and options. Pricing models must be arbitragefree, and consistent
with (calibrated to) prices of actively traded underlying options.
Model dynamics need be consistent with the observed dynamics of
the term structure of interest rates, e.g., correlation between inte
rest rates. Moreover, valuation algorithms need be efficient: Finan
cial decisions based on derivatives pricing calculations often need to
be made in seconds, rather than hours or days. In recent years, a
successful class of models has appeared in the literature known as
market models. This thesis extends the theory of market models, in
the following ways: (i) it introduces a new, efficient, and more
accurate approximate pricing technique, (ii) it presents two new and
fast algorithms for correlationcalibration, (iii) it develops new models
that enable efficient calibration for a whole new range of deriva
tives, such as fixedmaturity Bermudan swaptions, and (iv) it presents
novel empirical comparisons of the performance of existing calibra
tion techniques and models, in terms of reduction of risk.
ERIM
The Erasmus Research Institute of Management (ERIM) is the Research
School (Onderzoekschool) in the field of management of the
Erasmus University Rotterdam. The founding participants of ERIM
are RSM Erasmus University and the Erasmus School of Economics.
ERIM was founded in 1999 and is officially accredited by the Royal
Netherlands Academy of Arts and Sciences (KNAW). The research
undertaken by ERIM is focussed on the management of the firm in its
environment, its intra and interfirm relations, and its business
processes in their interdependent connections.
The objective of ERIM is to carry out first rate research in manage
ment, and to offer an advanced graduate program in Research in
Management. Within ERIM, over two hundred senior researchers and
Ph.D. candidates are active in the different research programs. From
a variety of academic backgrounds and expertises, the ERIM commu
nity is united in striving for excellence and working at the forefront
of creating new business knowledge.
www.erim.eur.nl ISBN 9058920992
RAOUL PI ETERSZ
Pricing models for
Bermudanstyle
interest rate derivatives
D
e
s
i
g
n
:
B
&
T
O
n
t
w
e
r
p
e
n
a
d
v
i
e
s
w
w
w
.
b

e
n

t
.
n
l
P
r
i
n
t
:
H
a
v
e
k
a
w
w
w
.
h
a
v
e
k
a
.
n
l
71
R
A
O
U
L
P
I
E
T
E
R
S
Z
P
r
i
c
i
n
g
m
o
d
e
l
s
f
o
r
B
e
r
m
u
d
a
n

s
t
y
l
e
i
n
t
e
r
e
s
t
r
a
t
e
d
e
r
i
v
a
t
i
v
e
s
Erim  05 omslag Pietersz 9/23/05 1:41 PM Pagina 1
1
Pricing Models for BermudanStyle
Interest Rate Derivatives
2
3
Pricing Models for BermudanStyle
Interest Rate Derivatives
Waarderingsmodellen voor Bermudastijl rente derivaten
Proefschrift
ter verkrijging van de graad van doctor aan de
Erasmus Universiteit Rotterdam
op gezag van de
rector magniﬁcus
Prof.dr. S.W.J. Lamberts
en volgens besluit van het College voor Promoties.
De openbare verdediging zal plaatsvinden op
donderdag 8 december 2005 om 16.00 uur
door
Raoul Pietersz
geboren te Rotterdam
4
Promotiecommissie
Promotoren:
Prof.dr. A.A.J. Pelsser
Prof.dr. A.C.F. Vorst
Overige leden:
Prof.dr. P.J.F. Groenen
Prof.dr. F.C.J.M. de Jong
Dr.ir. M.P.E. Martens
Erasmus Research Institute of Management (ERIM)
Erasmus University Rotterdam
Internet: http://www.erim.eur.nl
ERIM Ph.D. Series Research in Management 71
ISBN 9058920992
c 2005, Raoul Pietersz
All rights reserved. No part of this publication may be reproduced or transmitted in
any form or by any means electronic or mechanical, including photocopying, recording,
or by any information storage and retrieval system, without permission in writing from
the author.
5
Voor Beata, Karsten en Dani¨el
6
7
Acknowledgements
First and foremost, I would like to thank my promotor Antoon Pelsser. His guidance
throughout the Ph.D. period has been excellent. Chapters 2, 5 and 6 were written in
cooperation with Antoon. The research beneﬁtted greatly from his invaluable suggestions,
and he truly is an inspirator. Thank you Antoon.
Second, I thank my promotor and former employer Ton Vorst. He is the one who
suggested me to start the Ph.D. track with Antoon Pelsser, and who parttime employed
me at ABN AMRO Bank, Quantitative Risk Analytics (at the time “Market Risk –
Modelling and Product Analysis”), for the period September 2001 till June 2004. For all
this, I am very grateful.
Third, many thanks go to the other members of the small committee; Patrick Groenen,
Frank de Jong, and, Martin Martens. Also, I express my gratitude to the other members
of the committee; Lane Hughston, Farshid Jamshidian, and, Thierry Post.
Fourth, special thanks go to Marcel van Regenmortel, for teaching me many technical
and exciting aspects of interest rate derivatives pricing, and for parttime employing me
at Product Development Group, Quantitative Analytics, ABN AMRO Bank, from July
2004 onwards. Chapters 5 and 7 were written in cooperation with Marcel.
Fifth, I am grateful to Patrick Groenen, for coauthoring Chapter 3.
Sixth, I thank Igor Grubiˇsi´c for coauthoring Chapter 4.
I am much obliged to my colleagues at Erasmus University Rotterdam: Jaap Spronk
for support through the Erasmus Center for Financial Research (ECFR); Martin Martens
for his help while I was coteaching one of his example classes and for suggesting me
as a lecturer at the Rotterdam School of Management (RSM); Winfried Hallerbach for
help with preparation of RSM lectures; Wilfred Mijnhardt for guiding me through the
publication process; Tineke Kurtz, Tineke van der Vhee, Elli Hoek van Dijke, and Ella
Boniuk, for eﬃciently aiding me in administrative matters; and, Mari¨elle Sonnenberg, for
being a pleasant roommate.
During the Ph.D. period I have been able to present my work at leading international
conferences. I am very grateful for the ﬁnancial support that made this possible, received
from Erasmus Research Institute of Management (ERIM), from the Econometric Institute
(EI), and from ECFR.
8
viii ACKNOWLEDGEMENTS
A special thank you to past and present managers at ABN AMRO Bank: Ton Vorst,
Dick Boswinkel, Nam Kyoo Boots, Marcel van Regenmortel, Bernt van Linder and Geert
Ceuppens. Thank you to past and present colleagues at Product Development Group:
Nicolas Carr´e in Amsterdam, and Thilo Roßberg and Russell Barker in London. Thanks to
members of CAL: Nancy Appels, Reinier Bosman, Danny Wester, Frank Putman, Andr´e
Roukema, Jelper Striet, and Willem van der Zwart. Thank you to past and present
colleagues at ‘Product Analysis’: Steﬀen Lukas, Benjamin Schiessl´e, Martijn van der
Voort, Lukas Phaf, Alice Gee, Rutger Pijls, Drona Kandhai, and Alex Zilber. A thank
you to colleagues that have become friends: Dion Hautvast, Bram Warmenhoven, and
Glyn Baker.
I am also grateful to a number of anonymous referees and to Riccardo Rebonato and
Mark Joshi who provided valuable comments and suggestions to earlier versions of the
papers that form the basis for this thesis. Many thanks to Frank de Jong and Joanne
Kennedy for providing much appreciated feedback to my research proposal.
Finally, I would like to thank my parents for always being there for me. I thank my
son Karsten for bringing so much joy to my life. I thank Beata for her love, support, and
kindness.
Raoul Pietersz
February 7th 2005, Amsterdam
9
Contents
Acknowledgements vii
Notation xix
Outline xxiii
1 Introduction 1
1.1 Arbitragefree pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Use of models in practice . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Interest rate markets and options . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Linear products: Deposits, bonds, and swaps . . . . . . . . . . . . . 7
1.2.2 Interest rate options: Caps, ﬂoors, and swaptions . . . . . . . . . . 8
1.3 Interest rate derivatives pricing models . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Short rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Market models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Markovfunctional models . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 American option pricing with Monte Carlo simulation . . . . . . . . . . . . 17
2 Riskmanaging Bermudan swaptions in a LIBOR model 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Recalibration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Swap vega and the swap market model . . . . . . . . . . . . . . . . . . . . 27
2.5 Alternative method for calculating swap vega . . . . . . . . . . . . . . . . 29
2.6 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Comparison with the swap market model . . . . . . . . . . . . . . . . . . . 30
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.A Appendix: Negative vega twostock Bermudan options . . . . . . . . . . . 34
10
x CONTENTS
3 Rank reduction of correlation matrices by majorization 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Modiﬁed PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2 Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.3 Geometric programming . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.4 Alternating projections without normal correction . . . . . . . . . . 45
3.2.5 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.6 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.7 Alternating projections with normal correction (d = n) . . . . . . . 47
3.3 Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 The algorithm and convergence analysis . . . . . . . . . . . . . . . . . . . 50
3.4.1 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.2 Local rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.1 Numerical comparison with other methods . . . . . . . . . . . . . . 54
3.5.2 Nonconstant weights . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.3 The order eﬀect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5.4 Majorization equipped with the power method . . . . . . . . . . . . 62
3.5.5 Using an estimate for the largest eigenvalue . . . . . . . . . . . . . 62
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.A Appendix: Proof of Equation (3.11) . . . . . . . . . . . . . . . . . . . . . . 64
4 Rank reduction of correlation matrices by geometric programming 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Weighted norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Solution methodology with geometric optimisation . . . . . . . . . . . . . . 71
4.2.1 Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.2 Topological structure . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.3 A dense part of M
n,d
equipped with a diﬀerentiable structure . . . . 74
4.2.4 The Cholesky manifold . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.5 Choice of representation . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Optimisation over the Cholesky manifold . . . . . . . . . . . . . . . . . . . 76
4.3.1 Riemannian structure . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3.2 Normal and tangent spaces . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.3 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.4 Parallel transport along a geodesic . . . . . . . . . . . . . . . . . . 80
4.3.5 The gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.6 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
11
CONTENTS xi
4.3.7 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4 Discussion of convergence properties . . . . . . . . . . . . . . . . . . . . . 81
4.4.1 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4.2 Local rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 A special case: Distance minimization . . . . . . . . . . . . . . . . . . . . . 85
4.5.1 The case of d = n . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.2 The case of d = 2, n = 3 . . . . . . . . . . . . . . . . . . . . . . . . 85
4.5.3 Formula for the diﬀerential of ϕ . . . . . . . . . . . . . . . . . . . . 85
4.5.4 Connection normal with Lagrange multipliers . . . . . . . . . . . . 86
4.5.5 Initial feasible point . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6.1 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.2 Numerical comparison . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.A Appendix: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.A.1 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.A.2 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.A.3 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.A.4 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.A.5 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.A.6 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.A.7 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Fast driftapproximated pricing in the BGM model 97
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Notation for BGM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3 Single time step method for pricing on a grid . . . . . . . . . . . . . . . . . 100
5.3.1 Justiﬁcation of the above assumptions . . . . . . . . . . . . . . . . 100
5.3.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.3 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.4 Single time step method . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.5 Valuation of interest rate derivatives with the single time step method103
5.4 Discretizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4.1 Euler discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4.2 Predictorcorrector discretization . . . . . . . . . . . . . . . . . . . 104
5.4.3 Milstein discretization . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.4 Brownian bridge discretization . . . . . . . . . . . . . . . . . . . . . 105
5.5 The Brownian bridge scheme for single time steps . . . . . . . . . . . . . . 107
5.5.1 Theoretical result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
12
xii CONTENTS
5.5.2 LIBORinarrears case . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.6 The Brownian bridge scheme for multitime steps . . . . . . . . . . . . . . 110
5.6.1 Weak convergence of the Brownian bridge scheme . . . . . . . . . . 110
5.6.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7 Example: onefactor driftapproximated BGM . . . . . . . . . . . . . . . . 114
5.7.1 A simple numerical example . . . . . . . . . . . . . . . . . . . . . . 115
5.8 Example: Bermudan swaption . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.8.1 Twofactor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.9 Test of accuracy of drift approximation . . . . . . . . . . . . . . . . . . . . 124
5.9.1 Driftapproximation accuracy test based on noarbitrage . . . . . . 125
5.9.2 Numerical results for single time step test . . . . . . . . . . . . . . 125
5.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.A Appendix: Mean of geometric Brownian bridge . . . . . . . . . . . . . . . 127
5.B Appendix: Approximation of substituting the mean . . . . . . . . . . . . . 128
5.C Appendix: MATLAB code for Brownian bridge scheme . . . . . . . . . . . 129
6 A comparison of single factor Markovfunctional and multi factor market
models 133
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2.1 The LIBOR and swap market models . . . . . . . . . . . . . . . . . 139
6.2.2 The Markovfunctional model . . . . . . . . . . . . . . . . . . . . . 141
6.2.3 Estimating Greeks for callable products in market models . . . . . . 143
6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4 Accuracy of the terminal correlation formula . . . . . . . . . . . . . . . . . 146
6.5 Empirical comparison results . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.5.1 Delta hedging versus delta and vega hedging . . . . . . . . . . . . . 150
6.5.2 ‘Large’ perturbation sizes versus constant exercise decision method
with ‘small’ perturbation sizes . . . . . . . . . . . . . . . . . . . . . 151
6.5.3 Deltavega hedge results . . . . . . . . . . . . . . . . . . . . . . . . 152
6.6 The impact of smile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7 Generic market models 161
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.2.1 Absence of arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.3 Necessary and suﬃcient conditions for noarbitrage . . . . . . . . . . . . . 168
7.3.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
13
CONTENTS xiii
7.4 Generic expressions for noarbitrage drift terms . . . . . . . . . . . . . . . 175
7.4.1 Terminal measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.4.2 Spot measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.3 An example: The LIBOR market model . . . . . . . . . . . . . . . 179
7.5 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.6 Generic calibration to correlation . . . . . . . . . . . . . . . . . . . . . . . 185
7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.A Appendix: Rationale for Approximation 1 . . . . . . . . . . . . . . . . . . 186
7.B Appendix: Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . 187
8 Conclusions 189
Nederlandse samenvatting (Summary in Dutch) 193
Bibliography 195
Author index 207
14
15
List of Figures
1 Outline of the thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
1.1 Payoﬀs of caplets and ﬂoorlets versus realized LIBOR. . . . . . . . . . . . 9
2.1 Recalibration swap vega results for 10,000 simulation paths. . . . . . . . . 24
2.2 Empirical standard errors of vega for 10,000 simulation paths. . . . . . . . 25
2.3 Recalibration THFRV vega results for 1 million simulation paths. . . . . . 25
2.4 Observed change in swap rate instantaneous variance. . . . . . . . . . . . . 26
2.5 Natural increment of Black implied swaption volatility. . . . . . . . . . . . 28
2.6 Swap vega results for 10,000 simulation paths. . . . . . . . . . . . . . . . . 31
2.7 Comparison of LMM and SMM for swap vega per bucket, 5% strike. . . . . 31
2.8 Comparison of LMM and SMM for total swap vega against strike. . . . . . 32
3.1 The idea of majorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Performance proﬁle for n = 10, d = 2, t = 0.05s. . . . . . . . . . . . . . . . 56
3.3 Performance proﬁle for n = 20, d = 4, t = 0.1s. . . . . . . . . . . . . . . . . 57
3.4 Performance proﬁle for n = 80, d = 20, t = 2s. . . . . . . . . . . . . . . . . 58
3.5 Convergence run of the power method versus lambda=max(eig(B)). . . . . 63
3.6 The equality P
y
(∞) (y
(k)
−y
(∞)
) = δ
(k)
_
1 −(δ
(k)
)
2
/4. . . . . . . . . . . . 65
4.1 Shell representing the set of 3 3 correlation matrices of rank 2 or less. . . 69
4.2 Convergence runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Performance proﬁle for n = 30, d = 3, t = 2s, equal weights. . . . . . . . . 89
4.4 Performance proﬁle for n = 50, d = 4, t = 1s, equal weights. . . . . . . . . 90
4.5 Performance proﬁle for n = 60, d = 5, t = 3s, equal weights. . . . . . . . . 91
4.6 Performance proﬁle for n = 15, d = 3, t = 1s, nonequal weights. . . . . . . 92
5.1 LIBORinarrears test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Monte Carlo convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Exercise boundaries for the eightyear deal. . . . . . . . . . . . . . . . . . . 121
5.4 Risk sensitivities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Timing inconsistency in the single time step framework for BGM. . . . . . 124
16
xvi LIST OF FIGURES
5.6 Setup for inconsistency test. . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1 Fitted aparameter versus βparameter. . . . . . . . . . . . . . . . . . . . . 141
6.2 Bermudan swaption values per trade date. . . . . . . . . . . . . . . . . . . 149
6.3 Comparison of delta versus delta and vega hedging. . . . . . . . . . . . . . 150
6.4 ‘Large’ versus ‘small’ perturbation sizes and constant exercise method. . . 151
6.5 Deltavega hedge results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.1 Swaptions from swaption matrix to which various models are calibrated. . 169
7.2 An overview of the forward swap agreements for various market models. . . 170
7.3 Test results of exact versus approximate drift terms in CMS(q) models. . . 183
17
List of Tables
1.1 Some short rate models and their speciﬁcation of short rate dynamics. . . . 13
2.1 Market European swaption volatilities. . . . . . . . . . . . . . . . . . . . . 23
2.2 Swap vega per bucket test results for varying strikes. . . . . . . . . . . . . 33
2.3 Deal description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Results for negative vega per bucket for twostock Bermudan option. . . . 36
3.1 Excerpt of Table 3 in De Jong et al. (2004). . . . . . . . . . . . . . . . . . 55
3.2 Comparative results of the parametrization and majorization algorithms. . 59
3.3 Results for the ratchet cap and trigger swap. . . . . . . . . . . . . . . . . . 60
3.4 The order eﬀect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 A simple numerical example. . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Speciﬁcation of the Bermudan swaption comparison deal. . . . . . . . . . . 118
5.3 Results of the Bermudan swaption comparison deal. . . . . . . . . . . . . . 119
5.4 Computational times for the Bermudan swaption comparison deal. . . . . . 120
5.5 Simulation rerun using precomputed exercise boundaries. . . . . . . . . . 122
5.6 Twofactor model comparison. . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.7 Quality of drift approximations: volatility/meanreversion 15%/10%. . . . 126
5.8 Quality of drift approximations: various scenarios. . . . . . . . . . . . . . . 126
6.1 Statistical description of the swaption volatility data. . . . . . . . . . . . . 145
6.2 Discount factors for the USD data of 21 February 2003. . . . . . . . . . . . 146
6.3 Smile swaption volatility USD data of 21 February 2003. . . . . . . . . . . 146
6.4 Error analysis of the terminal correlation. . . . . . . . . . . . . . . . . . . . 147
6.5 The Bermudan swaption deal used in the comparison. . . . . . . . . . . . . 148
6.6 Fitted displaced diﬀusion parameters. . . . . . . . . . . . . . . . . . . . . . 155
6.7 Fitted swaption volatility and ﬁt errors with the displaced diﬀusion model. 155
6.8 Benchmark results for the displaced diﬀusion Markovfunctional model. . . 157
6.9 The Bermudan swaption deal used in the test of impact of smile. . . . . . . 158
6.10 Prices of Bermudan swaptions in smile versus nonsmile models. . . . . . . 159
18
xviii LIST OF TABLES
7.1 Example of a hybrid coupon swap payment structure for the ﬂoating side. . 163
7.2 Deal description for test of exact versus approximate drift in CMS models. 184
19
Notation
s A lower case italic denotes a scalar.
v A lower case bold denotes a vector.
M An upper case bold denotes a matrix.
α Day count fraction.
β Correlation parameter, see (2.4).
γ Weights; correlation parameters.
Γ Lagrange multipliers.
δ Delta: risk sensitivity with respect to
underlying rate or asset price.
∆ Tangent vector for geometric program
ming.
ε Perturbation size; convergence criterium.
η Pay (+1)/receive (−1) ﬁxed index; caplet
(+1)/ﬂoorlet (−1) index.
θ Drift in short rate models.
λ Largest eigenvalue of B.
Λ Diagonal eigenvalues matrix.
µ Drift.
ν Vega: risk sensitivity with respect to
volatility.
π Realized numeraire relative payoﬀ.
ρ Target correlation, matrix form: P.
(p,s)
Performance ratio for algorithm s on
test problem p.
σ Volatility.
τ
i
Discretization time point. Time dis
cretization: τ
1
< < τ
m
.
φ Cumulative normal distribution func
tion; performance proﬁle.
ϕ Objective function for rank reduction
of correlation matrices.
χ Auxiliary majorization function.
Ψ Error perentry in correlation matrix:
Ψ = YY
T
−P.
ω Outcome of probability space, ω ∈ Ω.
Ω Probability space.
S
n
Sphere in R
n+1
.
E Expectation.
P Realworld probability measure.
Q Arbitragefree pricing measure.
R Set of real numbers.
S
n
Set of real n n symmetric matrices.
T Filtration.
^ Normal distribution.
20
xx NOTATION
O Order.
o Swap market model.
a Model parameter.
b Discount bond price; model parame
ter.
b
i
Price of a discount bond maturing at
time t
i
.
B Helper matrix for majorization.
c Model parameter.
C Correlation matrix.
d Number of stochastic factors; the usual
d
1
, d
2
in Blacktype formulas.
e End index of a forward rate.
f Forward rate.
i(t) Spot LIBOR index at time t, see (2.3).
I Identity matrix.
k Strike rate.
m Number of discretization time points.
n Number of forward rates; numeraire
value.
o Option price.
p PVBP, present value of a basis point,
see (2.1).
P Target correlation matrix, P = (ρ
ij
)
ij
.
Q Orthogonal matrix, QQ
T
= I, with I
the identity matrix.
r Short rate; instantaneous continuously
compounded interest rate.
s Swap rate; start index of a forward
rate; asset price.
t Time.
t
i
Tenor time. Tenor structure: 0 = t
0
<
< t
n
.
v Value of a security, asset or derivative.
w Brownian motion; weight coeﬃcient.
Y Decomposition matrix: YY
T
= P,
with P a correlation matrix.
z Normally distributed random variable.
i
(Subscript i) Associated with the pe
riod [t
i
, t
i+1
] (e.g., f
i
, σ
i
, α
i
); associ
ated with t
i
(e.g, b
i
).
s:e
(Subscript s : e) Associated with the
period [t
s
, t
e
] (e.g., f
s:e
, σ
s:e
).
(i+1)
(Superscript (i + 1)) Associated with
the forward measure for which b
i+1
is
the numeraire.
d Inﬁnitesimal diﬀerential.
bp Basis point, 0.01%.
P&L Proﬁt and loss.
xNCy x noncall y option, exercisable on an
underlying with a maturity of x years
from today but callable only after y
years.
Scalar product of vectors.
21
NOTATION xxi
¸, ) Scalar product of vectors; quadratic
crossvariation.
¸) Quadratic variation.
Var Variance.
Cov Covariance.
. Vector length: Square root of sum of
squares of vector entries.
 
F
Frobenius norm, Y
2
F
:= tr(YY
T
)
for matrices Y.
¸ Much smaller than.
T
(Superscript T) Matrix transpose.
22
23
Outline
The purpose of this thesis is to further knowledge of eﬃcient valuation and risk man
agement of interest rate derivatives (mainly of Bermudanstyle but other types are also
included) by extending the theory on market models. Here, we provide an outline of the
thesis. Readers that are nonexperts in the ﬁeld of interest rate derivatives pricing could
skip the outline at ﬁrst reading and return here after reading the introductory Chapter 1.
A schematic outline of the thesis is given in Figure 1.
Chapter 2 investigates various popular calibration choices for the LIBOR market model
and their eﬀect on the quality of risk sensitivities of Bermudan swaptions. The results
show that care should be taken when selecting a calibration method: Certain choices,
e.g., socalled timehomogeneous volatility, may lead to noneﬃcient estimates of risk
sensitivities. Poor and unstable estimates of risk, in turn, lead to ﬂuctuations in a hedge
portfolio that are spurious and have no economic meaning, and the risk associated with
the derivative is not adequately reduced. The results however also show that socalled
constant volatility leads to eﬃcient and stable estimates of risk sensitivities. The combined
results are important and valuable to ﬁnancial institutions that need to select a calibration
method for market models, with the aim to risk manage Bermudan swaptions and other
interest rate derivatives.
Chapter 2 has been published in the Journal of Derivatives, see Pietersz & Pelsser
(2004a). An extended abstract of the chapter has been published in Risk Magazine, see
Pietersz & Pelsser (2004b). The Risk article has been republished as part of a Risk book,
see Pietersz & Pelsser (2005b).
Chapters 3 and 4 solve the same problem in two completely diﬀerent ways. The
problem is the socalled rank reduction of correlation matrices, and occurs as a key part
of calibrating multifactor market models to correlation. Mathematically formulated, rank
reductions of correlation matrices are nonconvex optimization problems, which are known
to be diﬃcult to solve: The problem is to minimize, over lowrank correlation matrices
attainable by the model, an objective value, which is the error with the original given
correlation matrix. We present two elegant solution algorithms. The beneﬁt over existing
algorithms is the enhanced eﬃciency: In terms of computational speed, the algorithms of
Chapters 3 and 4 outperform existing algorithms, in the numerical tests we considered.
24
xxiv OUTLINE
New calculation method
Calculation method A
New calibration method
Calibration method B
Calibration method A
Model A
Performance in practice:
Reduction of risk,
efficiency, etc.
Model B
Perfor
mance in
practice
Ch. 7:
Develop
ment of a
new
model
New model
Ch. 6: Comparison: hedge performance of models
Ch. 5
Ch.s 3&4
Ch. 2:
Comparison:
performance of
calibration
methods
Figure 1: Outline of the thesis. Here, models A and B denote market models and Markov
functional models, respectively. The new model denotes CMS and generic market models.
25
OUTLINE xxv
Chapter 3 presents a solution for rank reduction of correlation matrices, based on
majorization, which is a general technique from optimization. We perform the task of
showing that majorization can be applied to rank reduction of correlation matrices. The
resulting algorithm is globally convergent (i.e., from any starting point) to a local mini
mum. The algorithm is shown to be straightforward to implement, which makes its use
accessible to nonexperts. The majorization algorithm is extremely eﬃcient, because of
its low cost per iterate.
Chapter 3 has been published in Quantitative Finance, see Pietersz & Groenen (2004b).
An extended abstract of the chapter has been published in Risk Magazine, see Pietersz
& Groenen (2004a).
Chapter 4 develops a solution for rank reduction of correlation matrices based on geo
metric programming, which is optimization over curved space (manifolds). The manifold
equivalents of Newton and conjugate gradient optimization algorithms are presented for
the problem of rank reduction of correlation matrices. By carefully selecting the man
ifold, we are able to bring the gradient and Hessian to natural forms, enabling an eﬃ
cient implementation. The geometric curved algorithms enjoy the same superlinear local
convergence properties as their Euclidean ﬂat counterparts: Quadratic convergence for
Newton and msteps quadratic convergence for conjugate gradient, where m denotes the
dimension of the manifold. Additionally, we develop a novel method to immediately check
whether a stationary point is a global minimum, by extending the Lagrange multiplier
results of Zhang & Wu (2003) and Wu (2003). This feature is very rare for nonconvex
optimisation problems, and makes the problem of rank reduction of correlation matrices
all the more interesting. Extensive numerical tests show that geometric programming
compares favourably with other existing algorithms, in terms of computational speed.
Chapter 4 has been submitted. For the working paper version, see Grubiˇsi´c & Pietersz
(2005).
Chapter 5 introduces a new discretization for the LIBOR market model, the Brow
nian bridge discretization. Discretizations are required for implementation of a pricing
algorithm. The beneﬁt of Brownian bridge is its accuracy when single or large time steps
are used. For single time steps, we show that it is leastsquares optimal to use Brow
nian bridge (in a to be deﬁned natural sense). This is also conﬁrmed in the numerical
LIBORinarrears test extended from Hunter, J¨ackel & Joshi (2001). As a multi step dis
cretization, we show that Brownian bridge converges weakly with order one. The multi
step convergence is illustrated by numerical tests. Finally, we show that a single time
step discretization combined with a separability assumption on the volatility, allows for
an even more eﬃcient implementation via pricing on a grid or on a recombining lattice,
instead of Monte Carlo.
26
xxvi OUTLINE
Chapter 5 has been published in the Journal of Computational Finance, see Pietersz,
Pelsser & van Regenmortel (2004). An extended abstract has been published in Wilmott
Magazine, see Pietersz, Pelsser & van Regenmortel (2005).
Chapter 6 presents novel empirical comparisons on the performance of models in terms
of reduction of risk. The proﬁt and loss (P&L) of hedge portfolios of Bermudan swaptions
are recorded for USD swap rates and swaptions data over a one year period. We compare
LIBOR and swap market models, and the Markovfunctional model of Hunt, Kennedy
& Pelsser (2000). The Markovfunctional model is representative of singlefactor models,
such as short rate models. The market models are representative of multifactor interest
rate pricing models. Both market models and Markovfunctional models can be calibrated
to relevant interest rate correlations. Therefore, correlation pricing eﬀects can be captured
in both model types. The three main conclusions of the hedge tests are quite remarkable:
First, delta hedging is compared to delta and vega hedging. Delta hedging is the
oretically justiﬁed by the replication argument of Black & Scholes (1973) and Merton
(1973), of continuous trading in the underlying asset. Vega hedging is the oﬀsetting of
volatility risk by trading in underlying options. Vega hedging is not based on a replica
tion argument, and it is considered a ﬁnancial engineering trick, widely applied by traders
and practitioners. We show that delta and vega hedging signiﬁcantly outperforms delta
hedging, in terms of reduction of variance of P&L.
Second, the algorithm of Longstaﬀ & Schwartz (2001) for estimating the optimal
exercise decision of American options in Monte Carlo is investigated. This algorithm is
required for market models, but not for the Markovfunctional model. We show that the
algorithm contains a discontinuity, which renders convergence of ﬁnite diﬀerence estimates
of risk sensitivities to be slower, see, for example, Glasserman (2004, Section 7.1). The
hedge tests show that the less eﬀective estimation of risk sensitivities adversely aﬀects
reduction of variance of P&L. Moreover, we propose a novel adjustment of the Longstaﬀ &
Schwartz (2001) algorithm, termed constant exercise decision method. With our proposed
modiﬁcation, a far greater reduction of variance of P&L is attained than with the original
algorithm. The reduction is comparable to the reduction in the Markovfunctional model.
Our proposal thus enables market models to function properly as risk management tools
of callable derivatives.
Third, the eﬀect of the number of stochastic factors and correlation speciﬁcation is
investigated. The hedge tests show no signiﬁcant diﬀerences in terms of reduction of
variance of P&L, across: models, number of factors and correlation speciﬁcation.
Finally, the eﬀect of smile on pricing is investigated. Volatility smile is the phenomenon
that diﬀerent Black (1976) implied volatilities are quoted for diﬀerent strikes of otherwise
equal options. The results show that the impact of smile can be much larger than the
impact of correlation. Also, the impact of smile is similar in both market models and
Markovfunctional models.
27
OUTLINE xxvii
Chapter 6 has been submitted. For a working paper version, see Pietersz & Pelsser
(2005a).
In Chapter 7, new CMS and generic market models are developed, which allow for
ease of volatility calibration for a whole new range of derivatives, such as ﬁxedmaturity
Bermudan swaptions and Bermudan CMS swaptions. CMS and generic market mod
els allow for a choice of forward rates other than the classical LIBOR and swap rates
in the LIBOR and swap market models, respectively. We present a theoretical result
with necessary and suﬃcient conditions for an arbitrary structure of forward rates to be
arbitragefree at all possible states of the model. CMS and generic drift terms for the
forward rates are derived, by use of matrix notation, for both terminal and spot measures.
A fast algorithm is presented that approximately, but accurately, calculates forward rates
over time steps for CMS market models.
Chapter 7 has been submitted. For a working paper version, see Pietersz & van
Regenmortel (2005).
28
29
Chapter 1
Introduction
Bermudanstyle interest rate derivatives are an important class of options. Many banking
and insurance products, such as mortgages, cancellable bonds, and life insurance products,
contain Bermudan interest rate options associated with early redemption or cancellation
of the contract. The abundance of these options makes evident that their proper valuation
and risk measurement are important to banks and insurance companies. Risk measure
ment allows for oﬀsetting market risk by hedging with underlying liquidly traded assets
and options.
The purpose of this thesis is to further knowledge of eﬃcient valuation and risk man
agement of Bermudanstyle interest rate derivatives. In this chapter, we provide a historic
background and comprehensive framework for the chapters that are to follow.
The outline of this chapter is as follows. First, we introduce the use of models for
arbitragefree pricing. Second, we brieﬂy describe interest rate markets and options.
Third, we provide an overview of interest rate derivatives pricing models relevant to the
thesis. Fourth, American option pricing with Monte Carlo simulation is discussed.
1.1 Arbitragefree pricing
In this thesis, pricing models produce relative valuations. The relative valuation of an
asset (most often a derivative) is in terms of other asset prices. Pricing models are thus
viewed as an ‘extrapolation tool’ that aim to extrapolate derivative prices from underlying
assets.
Key to relative valuation models is the exclusion of arbitrage. An arbitrage is an
opportunity to make a riskless proﬁt with positive probability, with no costs at time
of execution. If an arbitrage opportunity occurs, then many investors buy the arbitrage
opportunity, driving up the arbitrage price. Eventually, this causes the arbitrage to
30
2 CHAPTER 1. INTRODUCTION
disappear. Arbitrage opportunities are therefore not likely to occur in an eﬃcient and
competitive economy.
When we construct a relative valuation model, then we usually do so by specifying
dynamics of certain base asset prices in a frictionless arbitragefree market. We then
consider, for example, derivatives whose values derive from these base assets. A self
ﬁnancing portfolio is a portfolio without injection or withdrawal of funds. A derivative
that we added to the model is said to be attainable if its payoﬀ can be exactly replicated
by a dynamically managed selfﬁnancing portfolio of the base assets. We call a model
complete if all the derivatives that we added to the model are attainable. In a complete
model, any added derivative is eﬀectively redundant (in this theoretical world), since it
is merely a particular dynamic portfolio of the base assets. The added derivatives are
said to be spanned by the underlying assets. In a complete model, any derivative price
is already known when the current underlying prices and their dynamics are known: The
current derivative price is simply equal to the current value of a replicating portfolio.
Next to absence of arbitrage, we also assume absence of transaction costs: there is
no diﬀerence in price when buying or selling an asset. This assumption obviously does
not reﬂect reality. In the presence of transaction costs, arbitrageurs cannot exploit all
theoretical arbitrage opportunities, since transaction costs make some of these no longer
proﬁtable. However, for large market participants, transaction costs are suﬃciently low
relative to transaction sizes. In eﬀect, the assumption of zero transaction costs is quite
accurate for the market as a whole. Moreover, the assumption of absence of transaction
costs leads to a theory that is still suﬃciently accurate, but much more tractable.
Model dynamics of returns on asset prices, in the realworld measure, consist of two
parts: An average part (drift) and a random part (diﬀusion). The diﬀusion part can be
modelled as either continuous (e.g., Brownian motion) or discontinuous (e.g., jumps as in
Merton (1976)). The use of continuous diﬀusion models is widespread, foremost because
continuous models provide a more than suﬃciently accurate description of reality, and
also because of analytical tractability. This thesis therefore considers only continuous
diﬀusion models. If it is then assumed that asset returns over disjoint time periods are
independent, then it follows from the L´evyKhinchin theorem that the diﬀusion term is a
(time and statedependent) volatility coeﬃcient times a Brownian motion.
We summarize these concepts in terms of stochastic diﬀerential equations (SDEs). We
consider a model with a ﬁltered probability space (Ω, P, T) with ﬁltration T = (T(t))
t≥0
,
on which is deﬁned a Tadapted Brownian motion w. Here, P denotes the realworld
measure. The asset price is denoted by s, its Tadapted drift by µ and its Tadapted
volatility by σ. A realization in Ω is denoted by ω. We have:
ds(t)
s(t)
= µ(t, ω)dt + σ(t, ω)dw(t).
31
1.1. ARBITRAGEFREE PRICING 3
Also, we assume the existence of a money market account with price b and return r(t, ω):
db(t)
b(t)
= r(t, ω)dt.
The market price of risk (or Sharpe ratio, see Sharpe (1964)) is deﬁned to be the
excess average return over the risk free rate divided by the volatility of the asset. In other
words, it is (drift[risk free rate])/volatility and (µ − r)/σ, see, e.g., Baxter & Rennie
(1996, page 119). A suﬃcient condition for noarbitrage is equality of market prices of
risk for all assets, e.g., Hull (2000, Equation (19.6)). The actual levels of market prices of
risk then turn out not to matter for valuation.
We introduce some key concepts of arbitragefree pricing. A numeraire is an asset
with a strictly positive value at all times. Asset prices may be denominated in terms
of amounts of the numeraire. A martingale is a process with zero drift. Suppose we
can construct a new measure (the socalled risk neutral measure) such that all numeraire
expressed asset prices become martingales. It can be shown (e.g., Hunt & Kennedy (2000,
Theorem 7.32)) that the assumption of existence of such a measure automatically implies
that the model is arbitragefree. Moreover, if there exists a single unique risk neutral
measure, then it can be shown (e.g., Hunt & Kennedy (2000, Theorem 7.41)) that the
model is complete, i.e., every derivative security is attainable by a replicating portfolio in
the underlying assets.
Given the assumption of equality of all market prices of risk, we may apply Girsanov’s
theorem (e.g., Øksendal (1998, Theorem 8.6.4)) to construct the riskneutral measure.
Under the risk neutral measure, market prices of risk then turn out to disappear. Therefore
these do not aﬀect arbitragefree pricing.
The martingale property of numerairerelative asset prices implies that their future
expectations take today’s value. If s and n denote prices of an asset and a numeraire,
respectively, then
s(0)
n(0)
= E
_
s(t)
n(t)
_
, for t ≥ 0.
where the expectation is with respect to the riskneutral measure. The price v of a
derivative (also an asset), necessarily sharing the same market price of risk, then satisﬁes
v(0) = n(0)E
_
v(t)
n(t)
_
, for t ≥ 0, (1.1)
which is the fundamental arbitragefree pricing formula, see, e.g., Bj¨ork (2004, Theorem
10.18). We note that the market price of risk does not occur in this formula. If we calculate
(1.1) for a call option on a stock that is modelled as geometric Brownian motion, then we
obtain the famous formula of Black & Scholes (1973).
32
4 CHAPTER 1. INTRODUCTION
In practice, we directly model under a riskneutral measure, by specifying the part
that stems from the realworld measure, the diﬀusion part. Noarbitrage requirements
then fully ﬁx the drift term. In fact, for European call and put options, traders quote
the diﬀusion part: socalled implied volatility. It is the volatility to be used in the Black
Scholes formula to obtain the European option price. A European option is an option
that is exercisable only at a single point in time (usually during a single trading day).
The reason that the derivative price is fully ﬁxed by (1.1) is replication by a self
ﬁnancing portfolio. Suppose amounts δ
s
of s and δ
b
of b are held, then the value v is given
by:
v = sδ
s
+ bδ
b
, (1.2)
The change in value v of a selfﬁnancing portfolio thus satisﬁes (e.g, Joshi (2003a, Equa
tions (5.58) and (5.59))):
dv = δ
s
ds + δ
b
db. (1.3)
In other words, value changes due to trading in the asset s and riskfree asset b cancel
exactly:
sdδ
s
+ bdδ
b
= 0.
A replicating portfolio is a dynamically managed selfﬁnancing portfolio of underlying
assets s and b, which has a value equal to the payoﬀ of derivative v, in all possible future
states of the economy.
The replicating portfolio holds a dynamic amount of δ
s
of the underlying asset s. The
amount δ
s
can be calculated from (1.2), (1.3), and from Itˆo’s formula: If s is a stochastic
process, and v : R →R is a function, then for the process v(s) we have:
dv =
∂v
∂s
ds +
1
2
∂
2
v
∂s
2
d¸s), (1.4)
see, for example, Karatzas & Shreve (1991, Theorem 3.3.3). Here, angle brackets ¸)
denote quadratic variation, see, e.g., Øksendal (1998, Exercise 2.17). By rewriting (1.4)
in the form of (1.3), we can show that
δ
s
=
∂v
∂s
. (1.5)
The quantity δ
s
is called the delta, and it is an example of a risk sensitivity: the risk
sensitivity with respect to the underlying asset price.
1.1.1 Use of models in practice
A dynamic hedge of the derivative v consists of taking an opposite position in the repli
cating portfolio. In practice, the hedge is not rebalanced on a continuous basis, rather at
33
1.1. ARBITRAGEFREE PRICING 5
discrete points in time. Rebalancing usually takes place when delta risk exceeds a cer
tain threshold level. The adaption from continuoustime to discretetime hedging works
extremely well in practice, and forms the basis for the success of arbitragefree pricing
models.
While we use riskneutral pricing models as relative valuation and hedging tools, it is
interesting to note that these models also make assertions on realworld price dynamics,
through the connection with the realworld measure. Riskneutral models can thus also
be viewed as economic models, attempting to model economic reality. Though modelling
of realworld price dynamics is a vital aspect of riskneutral models, it is not the most
important aspect. More important to riskneutral models are:
1. To produce sensible prices of derivatives, and to reproduce prices of underlying
assets and options exactly.
2. To adequately reduce variance of proﬁt and loss (P&L), when a hedge is set up.
3. To eﬃciently produce prices, i.e., within a limited amount of computational time.
The diﬀerence between the use of a model as a hedging tool or economic model has
important implications. For example, consider modelling the term structure of interest
rates. Extensive empirical research has shown that the term structure is driven by more
than one stochastic factors, see the review article of Dai & Singleton (2003). For an
economic model, we should thus use at least two factors, and a singlefactor model is
simply not acceptable. However, for a model used as hedging tool, it is perfectly sensible
to consider a singlefactor model, as long as it satisﬁes the above three properties.
The necessity that pricing models need reproduce prices of underlying assets and
options has two further implications:
First, option price data (implied volatility) determines the diﬀusion part (or, equiv
alently, volatility) of the model, rather than timeseries estimates from historic data on
asset returns. The reason is straightforward: Certain features of a derivative may become
redundant during the life of the derivative, which may render the derivative equivalent to
(or almost equivalent to) a market traded option. In that case, the model should produce
a derivative price equal to the market traded option price, otherwise the ﬁnancial institu
tion holding the derivative can incur arbitrage, which should be avoided. The only way
to avoid arbitrage is to make the model consistent with prices of underlying options, i.e.,
to use implied volatility for the diﬀusion term in the model. The process of making the
model consistent with market prices is called calibration.
Second, a pricing model is recalibrated to the most recent implied volatility data,
whenever the derivative needs be valued. The important reason is again that of no
arbitrage, and is the same as above for the use of implied volatility in models. Implied
34
6 CHAPTER 1. INTRODUCTION
volatility quotes change over time. The practice of recalibration to unpredictably chang
ing volatility is not consistent with most pricing models (excluding stochastic volatility
models), since most models assume volatility to be known over the model time horizon.
When implied volatility changes, then a derivative value may change too, due to the
practice of recalibration. As a result, derivative traders face volatility risk (vega). The
risk may be oﬀset by vega hedging. If σ denotes the volatility, and o the price of an
underlying option, then the following portfolio has zero volatility risk (for small changes
in σ):
one derivative (v) and −
∂v/∂σ
∂o/∂σ
options (o). (1.6)
A delta hedge with the underlying asset as in (1.5) can then be applied to the veganeutral
portfolio in (1.6).
Vega hedging is outofmodel hedging, since we hedge parameters that are input to
the model. Delta hedging with the underlying asset in (1.5) is inmodel hedging, since
the underlying asset price is a state variable of the model. Nonetheless, vega hedging
is not inconsistent with arbitragefree pricing models: We are only holding a diﬀerent
portfolio of derivatives and options that needs to be delta hedged. From an arbitragefree
pricing perspective though, there is simply no need to add the additional options (though
such addition is allowed): the original derivative is already perfectly deltareplicable in
the theoretical model world. In practice however, vega hedging enables a signiﬁcant
additional reduction of variance of P&L, and it thus contributes to wealth preservation.
Another practice for arbitragefree pricing models is that of customizing a model to a
certain product. We construct the model in such way that all parts of economic reality,
relevant to the product, are incorporated into the model. The beneﬁt is that the product
is priced correctly, while not having to fully model all parts of the market, thereby often
attaining a more eﬃcient implementation.
1.2 Interest rate markets and options
In interest rate markets, participants trade primarily in interest rate agreements. An
interest rate agreement is an agreement to borrow or lend money, over an agreed period
of time, against agreed periodical payments (interest rate payments) that are in some
form denoted as a percentage (interest rate) of the underlying borrowed or lent amount
(notional amount).
We refer to the length of an interest rate agreement as its tenor. Diﬀerent tenors may
attract diﬀerent interest rates, which gives rise to the socalled term structure of interest
rates.
35
1.2. INTEREST RATE MARKETS AND OPTIONS 7
The above description of an interest rate agreement includes money market deposits,
bonds, forward interest rate agreements, and swaps. Money market deposits, bonds, and
swaps are of particular relevance to this thesis, therefore we explain their workings in
some detail.
1.2.1 Linear products: Deposits, bonds, and swaps
Money market deposits usually have a maturity of one year or less. The two parties
agree on an interest rate and one party deposits the notional amount. At the end of
the agreement (at maturity), the other party returns the notional and makes the agreed
interest rate payments.
A bond is an agreement between two parties, the borrower and the lender, on a
designated notional amount. At initiation, the borrower receives a prenegotiated amount
for the bond (not necessarily equal to the notional amount). During the life of the bond,
the borrower makes coupon payments on the notional amount, usually on the basis of a
ﬁxed contractually agreed rate, but the coupon payments could also be based on a ﬂoating
interest rate. By a ﬂoating interest rate, we mean the prevailing market interest rate for
the tenor spacing between the ﬂoating interest rate payments. We discuss the method for
determining this ﬂoating interest rate below. If there are no coupon payments during the
life of the bond, then we call such a bond a zero coupon bond. At maturity of a bond, the
borrower returns the notional amount to the lender.
Interest rate swaps typically have a maturity of two years or more. Interest rate swaps
involve only exchanges of interest rate payments, but normally do not involve exchanges
of notional. The two parties agree on an interest rate. Periodically, one party pays this
agreed interest rate (the ﬁxed rate), while the other party pays a ﬂoating interest rate. We
remark that the frequency of ﬁxed and ﬂoating payments may diﬀer. Typical frequencies
are annually, semiannually, quarterly, and monthly.
The ﬁxed rate at which market participants can enter into a swap agreement at other
wise zero cost is called the swap rate. Swap rates can be seen as long term borrowing and
lending rates. In fact, the swap rate for a swap with a particular tenor is more or less the
interest rate for that particular tenor. The reason is that a swap can be used to create a
synthetic borrowing or lending agreement at a single interest rate over the tenor period
of the swap: Suppose we borrow money through deposits with ﬂoating interest rates, and
we enter into a swap in which we pay ﬁxed, on a notional equal to the amount borrowed
from the deposit. At the end of each deposit, we borrow the same amount again in the
deposit market, in order to pay back the notional from the previous deposit agreement.
Rolling over money market deposits in such way, the resulting deposit interest payments
cancel against the ﬂoating interest we receive from the swap; the ﬁxed swap payments
remain. Eﬀectively, we then pay a ﬁxed interest rate on our loan over the life of the swap.
36
8 CHAPTER 1. INTRODUCTION
The two parties in a swap determine the ﬂoating interest rate usually via a reference
interest rate. Reference rates are used to calculate payments not only of swaps, but also
of other securities, such as interest rate derivatives. A reference interest rate is a rate
that is set by a ﬁnancial authority or calculation agent. Examples are:
LIBOR: London interbank oﬀered rate, published by the British Bankers’ Association
(BBA), each trading day at noon (12.00am) London time.
EURIBOR: Euro interbank oﬀered rate, published by the European Banking Federation
(FBE) and by the Financial Markets Association (ACI), each trading day at around
11.00am central European time.
These reference rates are published for several tenors and currencies. Upon publication
of the reference rate, practitioners say that the rate then ﬁxes.
Financial authorities determine reference rates normally along the following lines: A
number of panel banks are consulted. Each panel bank provides rates at which it conceives
it possible to borrow money in the interbank market, for various tenors and currencies.
For each tenor and currency, some percentile of the top and bottom of the quotes are
discarded. The remaining quotes are averaged to form the reference rate for that tenor
and currency. It is interesting to note there is now an interest rate derivatives pricing
model that bears the name of a reference rate: the LIBOR market model, see Section 1.3.
An interesting note is that the ﬁrst major swap took place only in 1981, between IBM
and the World Bank, see Valdez (1997, pages 269–270).
1.2.2 Interest rate options: Caps, ﬂoors, and swaptions
The plainvanilla European interest rate options most relevant to this thesis are (i) caps
and ﬂoors, and, (ii) swaptions. A cap consists of a sequence of consecutive caplets, and,
likewise, a ﬂoor consists of a sequence of consecutive ﬂoorlets. Caplets and ﬂoorlets are
call and put options, respectively, on LIBOR rates. Swaptions are options on swap rates.
A caplet (respectively, a ﬂoorlet) gives its holder at expiry the right, but not the
obligation, to enter into a borrowing deposit (lending deposit) at a prearranged strike
rate. If an option holder claims the option right, then we say that he or she exercises
the option. If LIBOR ﬁxes below (above) the respective strike rate, then it is cheaper to
borrow (more rewarding to lend) in the market; whereby it is sensible not to exercise the
caplet (ﬂoorlet) and it ends worthless. If LIBOR ﬁxes above (below) the respective strike
rate, then it is sensible to exercise the caplet (ﬂoorlet), since we then receive the positive
diﬀerence LIBOR minus strike (strike minus LIBOR) at the deposit payment date. The
option gains at the deposit payment date as dependent on realized LIBOR are displayed
in Figure 1.1.
37
1.2. INTEREST RATE MARKETS AND OPTIONS 9
Strike
rate
LIBOR→
Strike
rate
LIBOR→
Caplet Floorlet
Figure 1.1: Payoﬀs of caplets and ﬂoorlets versus realized LIBOR.
38
10 CHAPTER 1. INTRODUCTION
A European swaption gives its holder at expiry the right, but not the obligation, to
enter into a swap with a ﬁxed rate equal to a prearranged strike rate. Market participants
invariably indicate the direction of swap cash ﬂows from the point of view of whether ﬁxed
payments are payed or received. Thus, if we hold a swaption that gives us the right to
enter into a swap for which we pay or receive ﬁxed, then such a swaption is a payer or
receiver swaption, respectively. A payer (respectively, a receiver) swaption corresponds
to a call option (put option). The payoﬀ structures for payer and receiver swaptions are
similar to those of caplets and ﬂoorlets in Figure 1.1: instead of ‘LIBOR’ read ‘swap rate’,
and instead of ‘caplet’ or ‘ﬂoorlet’ read ‘payer swaption’ or ‘receiver swaption’.
Cash settled contracts diﬀer from normal option contracts, in that they pay the relevant
diﬀerence between realized rate and strike, if this is positive. For a cash settled swaption
at expiry, both parties need to agree on ‘the’ swap rate manifest in the market. Usually,
again a reference rate is used. An example of a reference swap rate is ISDAFIX, published
for various tenors and currencies by the International Swaps and Derivatives Association
(ISDA). Financial authorities calculate swap reference rates more or less in the same way
as deposit reference rates are set; via consultation of a group of panel banks.
From Figure 1.1, we ﬁnd that an option always provides a nonnegative cash ﬂow, with a
positive probability to provide a positive cash ﬂow, therefore we require a positive premium
for the option. To calculate cap and ﬂoor or swaption premiums, market participants
initially used a Blacktype formula that is based on assuming a lognormal distribution for
the LIBOR or swap rate at expiry. This approach however lacked a theoretical justiﬁcation
for long, but the approach later turned out to be valid. Moreover, an assumption of
jointly lognormal LIBOR and swap rates is inconsistent. Many researchers therefore
considered the use of the Black formula for caps and ﬂoors or swaptions to be unsound
for a considerable period: While the Black swaptions approach had already been justiﬁed
in 1990, articles establishing its validity kept appearing at least until 1997, see Rebonato
(2004a, Section 4(d)).
We present the Black approach for caps and ﬂoors; the approach is similar for swap
tions. Prior to that, we introduce some terminology: We set out by specifying a tenor
structure, 0 = t
0
< t
1
< < t
n+1
. Let α
i
denote the day count fraction for the period
[t
i
, t
i+1
]. A discount bond is a hypothetical security that pays one unit of currency at its
maturity, and has no other cash ﬂows. The timet price of a discount bond with maturity
t
i
is denoted by b
i
(t).
To present the Black approach for caps and ﬂoors, we consider two tenor points 0 <
t
1
< t
2
. LIBOR ﬁxes at time t
1
and interest is paid at time t
2
. We deﬁne the forward
LIBOR rate f
1
(t) by
f
1
(t) =
b
1
(t) −b
2
(t)
α
1
b
2
(t)
. (1.7)
39
1.3. INTEREST RATE DERIVATIVES PRICING MODELS 11
We consider the forward measure, which is the measure associated with the discount bond
b
2
maturing at the payment date of the LIBOR deposit. By the assumption of absence
of arbitrage, it follows that f
1
is a martingale under its forward measure. To see this,
note that f
1
(t) in (1.7) is the value of a portfolio ((b
1
(t) − b
2
(t))/α
1
) expressed in terms
of amounts of the numeraire b
2
(t).
Continuing, we assume that the forward LIBOR rate is a lognormal martingale under
its forward measure:
df
1
(t)
f
1
(t)
= σ
1
dw(t), equivalently, f
1
(t) = f
1
(0) exp
_
−
1
2
σ
2
1
t + σ
1
w(t)
_
, (1.8)
where σ
1
is a scalar constant. The payoﬀ v(t
2
) at time t
2
of a caplet with strike rate k is
then given by α
1
max(f
1
(t
1
) −k, 0). From (1.1), we ﬁnd for the caplet value v(0):
v(0) = n(0)E
_
v(t
2
)
n(t
2
)
_
= b
2
(0)E
_
α
1
max(f
1
(t
1
) −k, 0)
b
2
(t
2
)
_
(∗)
= α
1
b
2
(0)E
_
max(f
1
(t
1
) −k, 0)
¸
. (1.9)
Equality (∗) holds since b
2
(t
2
) = 1. If we calculate (1.9) in full, we obtain the formula of
Black (1976):
v(0) = α
1
b
2
(0)η
_
f
1
(0)φ
_
ηd
1
_
−kφ
_
ηd
2
_
_
,
d
1,2
=
ln
_
f
1
(0)
k
_
±
1
2
σ
2
1
t
1
σ
1
√
t
1
.
Here, η denotes +1 for a caplet, and −1 for a ﬂoorlet; φ() denotes the cumulative normal
distribution function.
1.3 Interest rate derivatives pricing models
Up to here we have examined options on a single interest rate, such as caps, ﬂoors and
European swaptions. There are however many interesting interest rate derivatives that
depend not only on a single interest rate, but on multiple interest rates. Examples in
clude Bermudanstyle interest rate derivatives. Bermudan means that the derivative is
exercisable (equivalently: callable) at multiple discrete time points, usually separated by,
e.g., annual or semiannual periods.
Exercise of a Bermudan derivative is a tradeoﬀ between taking the option gains now
or holding onto the option at possibly more favourable option gains later. Inherently,
values of Bermudan interest rate derivatives therefore depend on multiple interest rates.
There are of course also many nonBermudan interest rate derivatives that dependent on
40
12 CHAPTER 1. INTRODUCTION
multiple interest rates. To value such multirate dependent products, we need a model
that features dynamics for the whole term structure of interest rates. Preferably, such
dynamic term structure models need be consistent with the Black formula for caps, ﬂoors
and swaptions.
1.3.1 Short rate models
Historically, the ﬁrst dynamic term structure models are short rate models. The short
rate r is a hypothetical rate: it is the instantaneous rate of interest for the ﬂoating money
market account (equivalently: bank account) with value n:
dn
n
= rdt, equivalently, n(t) = n(0) exp
__
t
0
r(s)ds
_
.
In a short rate model, we select the bank account as numeraire. Discount bond prices
satisfy, by the fundamental arbitragefree pricing formula (1.1):
b
i
(t, r) = E
_
b
i
_
t
i
, r(t
i
)
_
. ¸¸ .
=1
n(t)
n(t
i
)
¸
¸
¸
¸
r(t) = r
_
= E
_
exp
_
−
_
t
i
t
r(s)ds
_¸
¸
¸
¸
r(t) = r
_
, for t < t
i
.
(1.10)
From (1.10), we can calculate discount bond prices, once the arbitragefree dynamics of
the short rate are known. For most mainstream short rate models, we can ﬁnd explicit
and analytical formulas for (1.10) for discount bond prices given the associated short rate.
Short rate models are characterized by their speciﬁcation of dynamics for the short rate.
Examples of short rate models are given in Table 1.1 (this table is not complete).
As can be seen from Table 1.1, there are many short rate models. Next to short rate
models, there are many more other interest rate derivatives pricing models. The reason for
this abundance of interest rate models stems from diﬀerent speciﬁcations of the interest
rate market, as explained below.
To specify discount bond price dynamics, we need only specify the volatility term σ
(b)
i
,
the drift term then follows from noarbitrage restrictions, as explained in Section 1.1.
Therefore, we omit the drift, and focus on the diﬀusion term:
db
i
b
i
= + σ
(b)
i
(t, ω)dw(t). (1.11)
The bond price volatility may thus be statedependent.
A set of discount bond prices may be alternatively given by a set of interest rates. For
example, discount bond prices may be given (implicitly or explicitly) in terms of a set of
forward LIBOR rates f = (f
1
, . . . , f
n
):
1 + α
i
f
i
=
b
i
b
i+1
, (1.12)
41
1.3. INTEREST RATE DERIVATIVES PRICING MODELS 13
Table 1.1: Some short rate models and their speciﬁcation of short rate dynamics. Here,
the scalars a, b, and c denote model parameters.
Model Speciﬁcation
Merton (1973) dr = bdt + σdw
Vasicek (1977) dr = (b −ar)dt + σdw
Dothan (1978) dr = ardt + σrdw
Brennan & Schwartz (1979) dr = (b −ar)dt + σrdw
Cox, Ingersoll & Ross (1985) dr = (b −ar)dt + σ
√
rdw
Ho & Lee (1986) dr = θ(t)dt + σdw
Hull & White (1990) dr = (θ(t) −ar)dt + σdw
dr = (θ(t) −ar)dt + σ
√
rdw
Black, Derman & Toy (1990) dr = θ(t)rdt + σrdw
Black & Karasinski (1991) dr = (ar −br log r)dt + σrdw
Pearson & Sun (1994) dr = (b −ar)dt + σ
√
r −c dw
or in terms of the short rate r, see (1.10). Dynamics for a set of forward LIBOR rates or
for the short rate give rise to dynamics for discount bond prices, and vice versa. In fact, it
is the requirement of deterministic and known volatility for an interest rate speciﬁcation
that determines a model. Thus, if we require volatility to be only time dependent σ(t),
and not state dependent, for one of the speciﬁcations, then we obtain stochastic volatility
for the other speciﬁcations:
db
i
b
i
= + σ
(b)
i
(t) dw(t) ⇒
_
_
_
dr
r
= + σ
(r)
(t, ω) dw(t)
df
i
f
i
= + σ
(f)
i
(t, ω) dw(t)
(1.13)
dr
r
= + σ
(r)
(t) dw(t) ⇒
_
_
_
df
i
f
i
= + σ
(f)
i
(t, ω) dw(t)
db
i
b
i
= + σ
(b)
i
(t, ω) dw(t)
(1.14)
df
i
f
i
= + σ
(f)
i
(t) dw(t) ⇒
_
_
_
db
i
b
i
= + σ
(b)
i
(t, ω) dw(t)
dr
r
= + σ
(r)
(t, ω) dw(t)
(1.15)
Speciﬁcation (1.13) corresponds to the extension of Black & Scholes (1973) from stocks
to bonds, (1.14) corresponds to short rate models, and (1.15) to the LIBOR market model.
We restrict our exposition to these model classes. More speciﬁcations are available: in
fact, Rebonato (2004a, Section 3) lists ﬁve speciﬁcations.
42
14 CHAPTER 1. INTRODUCTION
Bond options can be viewed as caplets and ﬂoorlets, see Hull (2000, Equation 20.10).
The straight extension of the model of Black & Scholes (1973) from stocks to bonds, i.e.,
deterministic and known instantaneous bond volatility, suﬀers however from the problem
that discount bond prices do not necessarily converge to one at maturity. There are also
other related problems, see Rebonato (2004a, Section 4(b)). Therefore a direct application
of Black & Scholes (1973) to bond prices yields an interest rate derivatives pricing model
with many undesirable features.
The initial success of short rate models is mainly due to their analytical tractability
and numerical eﬃciency. There are, however, also some drawbacks to short rate models:
They are, in a sense, diﬃcult to calibrate, as model parameters need be implied from mar
ket option prices via nonstraightforward numerical procedures. The resulting numerical
calibration procedures can be instable and computationally costly. The reason is that
short rate models are formulated in terms of an artiﬁcial short rate that is not directly
observable in the market. Moreover, deterministic volatility for an abstract short rate in
(1.14) does not correspond to market practice of quoting implied volatility for LIBOR
and swap rates, see Section 1.2. Consequently, model parameters need to be tweaked to
ensure the model ﬁts to the relevant market rates and volatilities.
An example of the indirect calibration of short rate models is when they are calibrated
to swaption volatility: we then have to resort to the formula of Jamshidian (1989): A
swaption is viewed as an option on a coupon paying bond. Jamshidian (1989) decomposes
the option on the coupon paying bond into several options on discount bonds.
Another disadvantageous feature of short rate models is that they produce an arbitrary
volatility smile. Volatility smile is the phenomenon that diﬀerent implied volatility is
quoted for options that have diﬀerent strikes but that are otherwise identical. The classical
model of Black & Scholes (1973) exhibits a socalled ﬂat volatility smile, in the sense that
the implied volatility is independent of strike. We thus expect from any interest rate
model that aims to be the equivalent of Black & Scholes (1973) for interest rates to also
produce a ﬂat volatility smile. Such is, as stated before, not the case for short rate models.
Smile volatility is more realistic than ﬂat volatility, since we can observe a pronounced
volatility smile in interest rate markets. However, the produced smile ought to correspond
also qualitatively to the observed smile, and such is not the case for short rate models:
the latter exhibit rather arbitrary smiles, and smile shapes are not controllable by model
users, at least not without further modiﬁcation.
Typically, short rate models have only a single stochastic driver, although some two
factor short rate models exist too, see, for example, Longstaﬀ & Schwartz (1992) and
Ritchken & Sankarasubramanian (1995). An advantage of single factor models is that
recombining lattices can be used to eﬃciently price mildly pathdependent derivatives,
including Americantype options. A disadvantage is the resulting diﬃculty to model
instantaneous decorrelation.
43
1.3. INTEREST RATE DERIVATIVES PRICING MODELS 15
For a more extensive discussion on advantages and disadvantages of short rate models
(e.g., positivity of interest rates and nonexplosiveness of short rate models), the reader
is referred to Cairns (2004, Section 4.1).
1.3.2 Market models
In recent years, a successful class of models has appeared in the literature known as market
models (LIBOR and swap market models, also referred to as BGM models, see Brace,
G¸ atarek & Musiela (1997), Miltersen, Sandmann & Sondermann (1997) and Jamshidian
(1997)) or string models (see SantaClara & Sornette (2001) and Longstaﬀ, SantaClara &
Schwartz (2001)). Kerkhof & Pelsser (2002) show that the two formulations are equivalent.
Market models correspond to speciﬁcation (1.15). For an arbitragefree construction of
the LIBOR market model starting from discount bond dynamics (1.11), see Pietersz (2001,
Section 3.2). It is speciﬁcation (1.15) with which traders quote prices of underlying options
(caps and swaptions). Market models therefore allow for straightforward calibration to
prices of underlying options: the model parameter is simply equal to the market quoted
volatility.
Market models are based on the forward measure technique for caplets, presented in
Section 1.2. Moreover, market models are the equivalent of Black & Scholes (1973) for
interest rates, because the LIBOR and swap market model produce ﬂat volatility for caps
and swaptions, respectively. Positivity of interest rates is guaranteed for the deterministic
volatility LIBOR market model, since forward LIBOR rates are lognormal martingales.
In LIBOR market models, forward swap rates are generally not lognormally dis
tributed. This seems to imply that the LIBOR market model produces nonﬂat swaption
volatility, and is thus not a canonical model for swaptions. Such deviation from the log
normal paradigm however turns out to be extremely small in the LIBOR market model,
see Chapter 2.5. Fortunately for the LIBOR market model, there also exist extremely
accurate approximate formulas for swaption implied volatility. Consequently, the LIBOR
market model can be calibrated to swaption volatility. This joint capswaption calibration
potential of the LIBOR market model has very much contributed to its success. An inter
esting use of the approximate swaption volatility calibration via semideﬁnite programming
is given in Brace & Womersley (2000) and D’Aspremont (2003).
The ease of calibration of market models allows for modelling of other market aspects,
such as time homogeneity of cap and swaption volatility. Time homogeneity of volatil
ity means that the model preserves the cap or swaption volatility curves as modeltime
progresses. In Chapter 2, we investigate the eﬀect of including such time homogeneous
calibration procedures on the quality of risk sensitivities produced by the model.
Market models correspond to (1.15), and are thus based on a set of forward rates.
These forward rates are, in fact, the state variables of the model. The number of state
44
16 CHAPTER 1. INTRODUCTION
variables can grow large for particular trades: For a thirty years model with semiannual
forward rates, the model has sixty forward rates. This means that market models have
large dimensionality. We address reduction of dimensionality in Chapters 3 and 4.
Due to the dimensionality of market models, Monte Carlo simulation has to be used to
value derivatives, which is noneﬃcient compared to recombining lattices used in short rate
models. Technological computer hardware developments have however recently enabled
the use of Monte Carlo as a suﬃciently eﬃcient pricing and risk management tool. In
Chapter 5, we study an eﬃcient and accurate approximation of the LIBOR market model
that enables pricing on a recombining lattice.
Though some derivatives can be valued without Monte Carlo when a short rate or
Markovfunctional model is used, the trend shows a growing complexity in derivatives.
Certain derivatives have become so complex and strongly pathdependent, that these can
only be valued with Monte Carlo anyway, whether a low factor (short rate or Markov
functional) or multi factor model is used.
Volatility smile can be incorporated in the LIBOR market model, see, for example,
Andersen & Andreasen (2000).
The traditional LIBOR market model is based on a set of forward LIBOR rates.
Jamshidian (1997) extends the market model technology to a set of forward swap rates
that coend at the same ﬁnal date. Hunt & Kennedy (2000, Section 18.4) and Galluccio &
Hunter (2004) extend with a set of forward swap rates that costart at the same initial date,
which is useful for, e.g., European options on interest rate spreads. An interest rate spread
is a diﬀerence between separate interest rates. In Chapter 7, we extend market models
to include forward constant maturity swap (CMS) rates and other generic speciﬁcations
of rates. The derivation methodology is fully generic and includes the previous market
model speciﬁcations. The CMS market model is key to the pricing of, for example, ﬁxed
maturity Bermudan swaptions and Bermudan CMS swaptions.
1.3.3 Markovfunctional models
A model class that combines some of the features of short rate models and market models
is the Markovfunctional model of Hunt et al. (2000), see also Hunt & Kennedy (2000,
Section 19) and Pelsser (2000, Section 9). Markovfunctional models can be calibrated
to interest rate option volatility much like market models, and they do not suﬀer from
the drawback of short rate models of producing an arbitrary smile: Volatility smile is
very much controllable in Markovfunctional models, and a ﬂat volatility smile can be
achieved. Moreover, Markovfunctional models do not suﬀer from the computational
burden of market models, and pricing can be eﬃciently performed on, e.g., a grid.
The numerical eﬃciency of pricing on a grid comes however with a price: a single
stochastic Markov driver implies that the model has diﬃculty in attaining instantaneous
45
1.4. AMERICAN OPTION PRICING WITH MONTE CARLO SIMULATION 17
decorrelation between interest rates. A question that has not yet been addressed in the
academic literature so far, is whether a lack of decorrelation causes a signiﬁcant impact
on the pricing and hedge performance of a model. In Chapter 6, we address this research
question by empirical comparison of market models and Markovfunctional models, in
terms of pricing and hedge performance.
1.4 American option pricing with Monte Carlo sim
ulation
American options feature a period during which the option can be exercised. Bermudan
options can be exercised at several discrete points in time. Bermudan options are thus
somewhat in between European and American options.
The valuation of American options typically involves a backward induction routine,
while Monte Carlo simulation is of forward induction type. An eﬃcient algorithm for
pricing American options in Monte Carlo was therefore not known in the literature for
long. The problem is that we need to know, at a simulation node, the value of holding
onto the option. This conditional expectation value is not known, and to calculate it
would, in principle, require simulation within simulation, which is most ineﬃcient. Only
recently has eﬃcient American option valuation with Monte Carlo been enabled by novel
regressionbased methods, see, for example, Longstaﬀ & Schwartz (2001). The key is
to make use of crosssectional information present in the simulation, by regressing the
holdon value onto functions of explanatory variables present in the simulation. The
regressionbased approximate holdon value may then be used to formulate an exercise
decision.
The regression based techniques have been generalized to stochastic mesh methods
by Broadie & Glasserman (2004) and Avramidis & Matzinger (2004). Other American
option pricing techniques include the dual approach of Rogers (2002), see also Jamshidian
(2003), and the highdimensional grid approach of Berridge & Schumacher (2003).
In Chapter 6, we show that the regressionbased algorithm of Longstaﬀ & Schwartz
(2001) leads to ineﬃciently estimated risk sensitivities. We propose a modiﬁcation of
the algorithm, deemed the constant exercise method, that enhances the quality of risk
sensitivity estimates.
46
47
Chapter 2
Riskmanaging Bermudan swaptions
in a LIBOR model
1
This chapter presents a new approach to calculating swap vega per bucket in a LIBOR
model. It shows that for some forms of volatility an approach based on recalibration may
make estimated swap vega very uncertain, as the instantaneous volatility structure may
be distorted by recalibration. This does not happen in the case of constant swap rate
volatility.
An alternative approach not based on recalibration comes out of comparison with
the swap market model. It accurately estimates vegas for any volatility function in few
simulation paths. The key to the method is that the perturbation in LIBOR volatility
is distributed in a clear, stable, and wellunderstood fashion, while in the recalibration
method the change in volatility is hidden and potentially unstable.
2.1 Introduction
The LIBOR interest rate model discussed in Section 1.3.2 is popular among both aca
demics and practitioners alike. We will call this the BGM model.
One reason the LIBOR BGM model is popular is that it can riskmanage interest rate
derivatives that depend on both the cap and swaption markets, which would make it a
central interest rate model. It features lognormal LIBOR and almost lognormal swap rates,
1
This chapter has been published in diﬀerent form as Pietersz, R. & Pelsser, A. A. J. (2004a), ‘Risk
managing Bermudan swaptions in a LIBOR model’, Journal of Derivatives 11(3), 51–62. An extended
abstract of this chapter appeared as Pietersz, R. & Pelsser, A. A. J. (2004b), ‘Swap vega in BGM:
pitfalls and alternatives’, Risk Magazine pp. 91–93. March issue. This Risk article was republished
as part of a Risk book, as Pietersz, R. & Pelsser, A. A. J. (2005b), Swap vega in BGM: pitfalls and
alternatives, in N. Dunbar, ed., ‘Derivatives Trading and Option Pricing’, Risk Books, London, UK,
pp. 277–285.
48
20 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
and thus also the marketstandard Black formula for caps and swaptions. Approximate
swaption volatility formulas such as in Hull & White (2000) have been shown to be of
high quality (see Brace, Dun & Barton (1998)).
There remain a number of issues to be resolved to use BGM as a central interest rate
model. One issue is the calculation of swap vega. A common and usually very successful
method for calculating a Greek in a model equipped with a calibration algorithm is to
perturb market input, recalibrate, and then revalue the option. The diﬀerence in value
divided by the perturbation size is then an estimate for the Greek.
If this technique is applied to the calculation of swap vega in the LIBOR BGM model,
however, it may (depending on the volatility function) yield estimates with high uncer
tainty. In other words, the standard error of the vega is relatively high. The uncertainty
disappears, of course, if we increase the number of simulation paths, but the number
required for clarity can far exceed 10,000, which is probably the maximum in a practical
environment.
For a constantvolatility calibration, however, the vega is estimated with low uncer
tainty. The number of simulation paths needed for clarity of vega thus depends on the
chosen calibration. The reason is that for certain calibrations, under a perturbation,
the additional volatility is distributed unevenly and one might even say unstably over
time. For a constantvolatility calibration, of course, this additional volatility is natu
rally distributed evenly over time. It follows that there is higher correlation between the
discounted payoﬀs along the original path and perturbed volatility. As the vega is the
expectation of the diﬀerence between these payoﬀs (divided by the perturbation size), the
standard error will be lower.
We develop a method that is not based on recalibration to compute swap vega per
bucket in the LIBOR BGM model. It may be used to calculate swap vega in the presence
of any volatility function, with predictability at 10,000 or fewer simulation paths. The
strength of the method is that it accurately estimates swap vegas for any volatility function
and in few simulation paths.
The key to the method is that the perturbation in the LIBOR volatility is distributed
in a clear, stable, and wellunderstood fashion, while in the recalibration method the
change in volatility is hidden and potentially unstable. The method is based on keeping
swap rate correlation ﬁxed but increasing the instantaneous volatility of a single swap
rate evenly over time, while all other swap rate volatilities remain unaltered.
It is important to verify that a calculation method reproduces the correct numbers
when the answer is known. We benchmark our swap vega calculation method using
Bermudan swaptions for two reasons. First, a Bermudan swaption is a complicated enough
(swapbased) product (in a LIBORbased model) that depends nontrivially on the swap
rate volatility dynamics; for example, its value depends also on swap rate correlation.
Second, a Bermudan swaption is not as complicated as some other more exotic interest
49
2.2. RECALIBRATION APPROACH 21
rate derivatives, and some intuition exists about its vega behavior. We show for Bermudan
swaptions that our method yields almost the same swap vega as found in a swap market
model.
Glasserman & Zhao (1999) provide eﬃcient algorithms for calculating risk sensitivities,
given a perturbation of LIBOR volatility. Our problem diﬀers from theirs in that we
derive a method to calculate the perturbation of LIBOR volatility to obtain the correct
swap rate volatility perturbation for swaption vega. The Glasserman and Zhao approach
may then be applied to eﬃciently compute the swaption vega, with the LIBOR volatility
perturbation we ﬁnd using our method.
2.2 Recalibration approach
We ﬁrst consider examples of the recalibration approach to computing swap vega. Three
calibration methods are considered. We show that, for two of the three methods, the
resulting vega is hard to estimate and many simulation paths are needed for clarity.
Each forward rate is modeled as a geometric Brownian motion under its forward
measure:
df
i
(t)
f
i
(t)
= σ
i
(t) dw
(i+1)
(t), for 0 ≤ t ≤ t
i
,
The positive integer d is referred to as the number of factors of the model. The
function σ
i
: [0, t
i
] → R
d
is the volatility vector function of the ith forward rate. The
kth component of this vector corresponds to the kth Wiener factor of the Brownian
motion. w
(i+1)
is a ddimensional Brownian motion under the forward measure Q
(i+1)
.
A discount bond pays one unit of currency at maturity. From (1.12), the forward rates
are related to discount bond prices as follows:
f
i
(t) =
1
α
i
_
b
i
(t)
b
i+1
(t)
−1
_
.
The swap rate corresponding to a swap starting at t
i
and ending at t
j
is denoted by
s
i:j
. The swap rate is related to discount bond prices as follows:
s
i:j
(t) =
b
i
(t) −b
j
(t)
p
i:j
(t)
,
where p denotes the present value of a basis point:
p
i:j
(t) =
j−1
k=i
α
k
b
k+1
(t). (2.1)
It is understood that p
i:j
≡ 0 whenever j ≤ i.
50
22 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
We consider the swap rates s
1:n+1
, . . . , s
n:n+1
corresponding to the swaps underlying a
coterminal Bermudan swaption.
2
Swap rate s
i:n+1
is a martingale under its forward swap
measure Q
(i:n+1)
. We may thus implicitly deﬁne its volatility vector σ
i:n+1
by:
ds
i:n+1
(t)
s
i:n+1
(t)
= σ
i:n+1
(t) dw
(i:n+1)
(t), for 0 ≤ t ≤ t
i
. (2.2)
In general, σ
i:n+1
will be stochastic because swap rates are not lognormally distributed
in the BGM model, although they are very close to lognormal as shown, for example, by
Brace et al. (1998). Because of near lognormality, the Black formula approximately holds
for European swaptions. There are closedform formulas for the swaptions Black implied
volatility; see, for example, Hull & White (2000).
We model LIBOR instantaneous volatility as constant in between tenor dates (piecewi
seconstant). A volatility structure ¦σ
i
()¦
n
i=1
is piece wiseconstant if:
σ
i
(t) = (const), t ∈ [t
i−1
, t
i
).
The volatility will sometimes be modeled as timehomogeneous. To deﬁne this, ﬁrst
deﬁne a ﬁxing to be one of the time points t
1
, . . . , t
n
. Deﬁne i : [0, t
n
] →¦1, . . . , n¦:
i(t) = 1 + #
_
ﬁxings in [0, t]
_
. (2.3)
A volatility structure is said to be timehomogeneous if it depends only on the index to
maturity i −i(t).
Three volatility calibration methods are considered:
1. (THFRV)—Timehomogeneous forward rate volatility. This approach is based on
ideas of Rebonato (2001). Because of the timehomogeneity restriction, there are as
many parameters as market swaption volatilities. A NewtonRhapson sort of solver
may be used to ﬁnd the exact calibration solution (if there is one).
2. (THSRV)—Timehomogeneous swap rate volatility. The algorithm for calibrating
with such a volatility function is a twostage bootstrap. The ﬁrst and the second
stage are described in Equation (6.20) and Section 7.4 of Brigo & Mercurio (2001).
3. (CONST)—Constant forward rate volatility. The corresponding calibration algo
rithm is similar to the second stage of the twostage bootstrap. We note that
constant forward rate volatility implies constant swap rate volatility.
2
A coterminal Bermudan swaption is an option to enter into an underlying swap at several exercise
opportunities. The holder of a Bermudan swaption has the right at each exercise opportunity to either
enter into a swap or hold the option; all the underlying swaps that may possibly be entered into have the
same ending date.
51
2.2. RECALIBRATION APPROACH 23
Table 2.1: Market European swaption volatilities.
Expiry (Y) 1 2 3 . . . 28 29 30
Tenor (Y) 30 29 28 . . . 3 2 1
Swaption
Volatility 15.0% 15.2% 15.4% . . . 20.4% 20.6% 20.8%
All calibration methods have in common that the forward rate correlation structure is
calibrated to a historical correlation matrix using principal components analysis (PCA);
see Hull & White (2000). Correlation is assumed to evolve timehomogeneously over time.
We consider a 31NC1 coterminal Bermudan payers swaption deal struck at 5% with
annual compounding. The notation xNCy denotes an “x noncall y” Bermudan option,
which is exercisable into a swap with a maturity of x years from today but is callable only
after y years. The option is callable annually.
The BGM tenor structure is 0 < 1 < 2 < < 31. All forward rates are taken
to equal 5%. The time zero forward rate instantaneous correlation is assumed following
Rebonato (1998, p. 63) as:
ρ
ij
(0) = e
−β[t
i
−t
j
[
, (2.4)
where β is chosen to equal 5%. The market European swaption volatilities were taken as
displayed in Table 2.1.
To determine the exercise boundary, we use the Longstaﬀ & Schwartz (2001) least
squares Monte Carlo method. Only a single explanatory variable is considered, namely,
the swap net present value (NPV). Two regression functions are employed, a constant
and a linear term.
For each bucket a perturbation ∆σ(≈ 10
−8
) is applied to the swaption volatility in the
calibration input data.
3
The model is recalibrated, and we check to see that the calibration
error for all swaption volatilities is a factor 10
6
lower than the volatility perturbation. The
Bermudan swaption is repriced through Monte Carlo simulation using the exact same
random numbers.
Denote the original price by v and the perturbed price by v
i:n+1
. Then the recalibration
method of estimating swap vega ν
i:n+1
for bucket i is given by:
ν
i:n+1
=
v
i:n+1
−v
∆σ
. (2.5)
3
It was veriﬁed that the resulting vega is stable for a wide range of volatility perturbation. For very
extreme perturbation, the vega is unstable. At high levels of perturbation, vegagamma terms aﬀect the
vega. At too low levels of volatility perturbation, ﬂoating point number roundoﬀ errors aﬀect the vega.
52
24 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
1 5 10 15 20 25 30
−15
−10
−5
0
5
10
15
20
25
30
35
Bucket (Y)
S
w
a
p
v
e
g
a
s
c
a
l
e
d
t
o
1
0
0
b
p
s
h
i
f
t
(
b
p
)
THFRV
THSRV
CONST
Figure 2.1: Recalibration swap vega results for 10,000 simulation paths.
Usually the swap vega is denoted in terms of a shift in the swaption volatility. For
example, consider a 100 basis point (bp) shift in the swaption volatility. The swap vega
scaled to a 100 bp shift ν
100bp
i:n+1
is then deﬁned by
ν
100bp
i:n+1
= (0.01) ν
i:n+1
.
Swap vega results for a Monte Carlo simulation of 10,000 scenarios are displayed in
Figure 2.1. The standard errors (SEs) are displayed separately in Figure 2.2. The levels of
SE for THFRV and CONST are 6.00 and 0.25, respectively. The number of paths needed
for THFRV to obtain the same SE as CONST is thus (6/0.25)
2
10, 000 = 5.8M. For
THSRV, we ﬁnd 1.4M paths are needed.
Figure 2.3 displays the THFRV vega for 1 million simulation paths.
2.3 Explanation
The key to explanation of the vega results under recalibration is the change in swap rate
instantaneous variance after recalibration. For the THFRV and THSRV recalibration
approaches, the instantaneous variance increment (in the limit) is completely diﬀerent
from a constant volatility increment. This holds for all buckets.
53
2.3. EXPLANATION 25
1 5 10 15 20 25 30
0
1
2
3
4
5
6
7
Bucket (Y)
S
t
a
n
d
a
r
d
e
r
r
o
r
o
f
s
w
a
p
v
e
g
a
(
b
p
)
THFRV
THSRV
CONST
6
3
0.25
5.8M paths
1.4M paths
Figure 2.2: Empirical standard errors of vega for 10,000 simulation paths.
1 5 10 15 20 25 30
−10
−5
0
5
10
15
20
25
30
35
Bucket (Y)
S
w
a
p
v
e
g
a
s
c
a
l
e
d
t
o
1
0
0
b
p
s
h
i
f
t
(
b
p
)
THFRV 1M paths
CONST 10k paths
Figure 2.3: Recalibration THFRV vega results for 1 million simulation paths.
54
26 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
1 5 10 15 20 25 30
−100%
0%
100%
200%
Time period (Y)
P
e
r
c
.
o
f
i
n
c
r
e
m
e
n
t
i
n
s
w
a
p
r
a
t
e
i
n
s
t
a
n
t
a
n
e
o
u
s
v
a
r
i
a
n
c
e
Constant volatility re−calibration approach
THFRV re−calibration approach
Figure 2.4: Observed change in swap rate instantaneous variance for THFRV and CONST
recalibration approach.
For illustration, we consider the volatility perturbation shown in Figure 2.4, which
is associated with the calculation of swap vega corresponding to bucket 30. The price
diﬀerential has to be computed in the limit of the 30 1 swaption implied volatility
perturbation ∆σ tending to zero. This implies a swap rate instantaneous variance incre
ment of 30∆σ
2
. This total variance increment has to be distributed over all time periods.
We note that for both data sets the sum of the variance increments equals 100%. For
THFRV, the distribution of the variance increment is concentrated in the begin and end
time periods, and is even negative in the second time period. This is at variance with the
natural and intuitive even distribution in the CONST recalibration.
From (2.5), it follows that the simulation variance of the vega is given by
Var
_
ν
100bp
i:n+1
¸
= c
2
Var
_
π
i:n+1
−π
¸
= c
2
_
Var
_
π
i:n+1
¸
−2Cov
_
π
i:n+1
, π
¸
+ Var
_
π
¸
_
, (2.6)
where π and π
i:n+1
are the payoﬀs along the path of the original and the perturbed model,
respectively. Here c := 0.01/∆σ
i:n+1
.
The vega standard error is thus minimized if there is high covariance between the
discounted payoﬀs in the original and the perturbed model. This does not occur for a
55
2.4. SWAP VEGA AND THE SWAP MARKET MODEL 27
perturbation such as dictated by THFRV, because the stochasticity in the simulation is
basically moved around to other time periods (in our case from period 2 to period 1).
Because the rate increments over diﬀerent time periods are independent, this leads to a
reduced covariance, leading in turn to a higher standard error of the vega.
There is higher covariance between the payoﬀs under the perturbations of variance
implied by the CONST calibration, because then each independent time period maintains
approximately the same level of variance; no stochasticity is moved to other random
sources. From (2.6), it then follows that the standard error is lower.
2.4 Swap vega and the swap market model
An alternative method for calculating swap vega has the advantage that the estimates of
vega have a low standard error for any volatility function. The ﬁrst step is to study the
deﬁnition of swap vega in the swap market model, which we will extend to the LIBOR
BGM model. This will give us an alternative method to calculate swap vega per bucket.
How much our dynamically managed hedging portfolio should hold in European swap
tions is essentially determined by the swap vega per bucket. The latter is the derivative
of the exotic price with respect to the Black swaption implied volatility.
We consider a swap market model o. In the model, swap rates are lognormally dis
tributed under their forward swap measure. This means that all swap rate volatility
functions σ
i:n+1
() of (2.2) are deterministic. The Black implied swaption volatility σ
k:n+1
is given by
σ
k:n+1
=
¸
1
t
k
_
t
k
0
[σ
k:n+1
(s)[
2
ds.
As may be seen in this equation, there are an uncountable number of perturbations of
the swap rate instantaneous volatility that produce the same perturbation as the Black
implied swaption volatility. There is, however, a natural onedimensional parameterized
perturbation of the swap rate instantaneous volatility, namely, a simple proportional
increment. This is illustrated in Figure 2.5.
We deﬁne swap vega in the swap market model as follows. Denote the price of an
interest rate derivative in a swap market model o by v. We consider a perturbation of
the swap rate instantaneous volatility given by
σ
ε
k:n+1
() = (1 + ε)σ
k:n+1
(), (2.7)
where the shift applies only to k : n+1. Denote the corresponding swap market model by
o
k:n+1
(ε). We note that the implied swaption volatility in o
k:n+1
(ε) is given by σ
ε
k:n+1
=
(1 +ε)σ
k:n+1
. Denote the price of the derivative in o
k:n+1
(ε) by v
k:n+1
(ε). Then the swap
56
28 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
0 T
Time
S
w
a
p
r
a
t
e
i
n
s
t
a
n
t
a
n
e
o
u
s
v
o
l
a
t
i
l
i
t
y
ε
ε
ε
σ(⋅)
(1+ε)σ(⋅)
Figure 2.5: Natural increment of Black implied swaption volatility.
vega per bucket ν
k:n+1
is deﬁned as
ν
k:n+1
= lim
ε→0
v
k:n+1
(ε) −v
εσ
k:n+1
. (2.8)
Equation (2.8) is the derivative of the exotic price with respect to the Black implied
swaption volatility. In conventional notation we may write
ν
k:n+1
=
∂v
∂σ
k:n+1
= lim
∆σ
k:n+1
→0
v(σ
k:n+1
+ ∆σ
k:n+1
) −v(σ
k:n+1
)
∆σ
k:n+1
(2.9)
In (2.8) εσ
k:n+1
is equal to the swaption volatility perturbation ∆σ
k:n+1
, and v
k:n+1
(ε)
and v denote the prices of the derivative in models where the kth swaption volatility
equals σ
k:n+1
+ ∆σ
k:n+1
and σ
k:n+1
, respectively.
The swap rate volatility perturbation in (2.7) deﬁnes a relative shift. It is also possible
to apply an absolute shift in the form of
σ
ε
k:n+1
() =
_
1 +
ε
σ
k:n+1
()
_
σ
k:n+1
(), (2.10)
where the shift applies only to k : n +1. This ensures that the absolute level of the swap
rate instantaneous volatility is increased by an amount ε. We note that the relative and
57
2.5. ALTERNATIVE METHOD FOR CALCULATING SWAP VEGA 29
absolute perturbation are equivalent when the instantaneous volatility is constant over
time.
The method for calculating swap vega per bucket is largely the same for both relative
and absolute perturbation (but we will point out any diﬀerences). The ﬁrst diﬀerence
is in the change in swaption implied volatility ∆σ
k:n+1
of (2.9); namely, straightforward
calculations reveal that the perturbed volatility satisﬁes
σ
ε
k:n+1
= σ
k:n+1
+ ε
1
t
k
_
t
k
0
¯ σ
k:n+1
(s)ds
σ
k:n+1
+O
_
ε
2
_
.
2.5 Alternative method for calculating swap vega
An alternative method for calculating swap vega in the BGM framework may be applied
to any volatility function to yield accurate vega with a small number of simulation paths.
The method is based on a perturbation in the forward rate volatility to match a constant
swap rate volatility increment. Rebonato (2002) also derives this method in terms of
covariance matrices, but our derivation is explicitly in terms of volatility vectors.
Swap rates are not lognormally distributed in the LIBOR BGM model. This means
that swap rate instantaneous volatility is stochastic. The stochasticity is almost invisible
as shown empirically, for example, by Brace et al. (1998). D’Aspremont (2002) shows
that the swap rate is uniformly close to a lognormal martingale.
Hull & White (2000) show that the swap rate volatility vector is a weighted average
of forward LIBOR volatility vectors:
σ
i:n+1
(t) =
n
j=i
w
i:n+1
j
(t)σ
j
(t), w
i:n+1
j
(t) =
α
j
γ
i:n+1
j
(t)f
j
(t)
1 + α
j
f
j
(t)
, (2.11)
γ
i:n+1
j
(t) =
b
i
(t)
b
i
(t) −b
n+1
(t)
−
p
i:j
(t)
p
i:n+1
(t)
,
where the weights w
i:n+1
are in general statedependent.
Hull and White derive an approximating formula for European swaption prices that
is based on evaluating the weights in (2.11) at time zero. This is a good approximation
by virtue of the near lognormality of swap rates in the LIBOR BGM model. We denote
the resulting swap rate instantaneous volatility by σ
HW
i:n+1
as follows:
σ
HW
i:n+1
(t) =
n
j=i
w
i:n+1
j
(0)σ
j
(t). (2.12)
When we write w
i:n+1
j
:= w
i:n+1
j
(0) and adopt the convention that
σ
i
(t) = σ
i:n+1
(t) = 0 when t > t
i
,
58
30 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
a useful form of (2.12) is:
σ
HW
1:n+1
(t) = w
1:n+1
1
σ
1
(t) + . . . + w
1:n+1
n
σ
n
(t)
.
.
.
.
.
.
.
.
.
σ
HW
n:n+1
(t) = w
n:n+1
n
σ
n
(t)
(2.13)
If W is the upper triangular nonsingular weight matrix (with upper triangular inverse
W
−1
), these volatility vectors can be jointly related through the matrix equation:
_
σ
:n+1
¸
= W
_
σ
¸
.
The swap rate volatility under relative perturbation (2.7) of the kth volatility is
_
σ
:n+1
¸
→
_
σ
:n+1
¸
+ ε
_
0 . . . 0 σ
k:n+1
0 . . . 0
¸
¯
.
We note that the swap rate correlation is left unaltered. The corresponding perturba
tion in the BGM volatility vectors is given by
_
σ
¸
→
_
σ
¸
+ εW
−1
_
0 . . . 0 σ
k:n+1
0 . . . 0
¸
¯
. (2.14)
We note that only the volatility vectors σ
k
(t), . . . , σ
n
(t) are aﬀected (due to the upper
triangular nature of W
−1
), which are the vectors that underlie σ
k:n+1
(t) in the Hull and
White approximation. With the new LIBOR volatility vectors, prices can be recomputed
in the BGM model and the vegas calculated.
2.6 Numerical results
We demonstrate the algorithm in a simulation with 10,000 paths. The results are displayed
in Figure 2.6. We note that the approach yields slightly negative vegas for buckets 1730.
In Appendix 2.A, we show that negative values are not a spurious result. That is, for
the analytically tractable setup of a twostock Bermudan option, negativity of vega occurs
with correlation ≈1, and volatilities for short expiration dates are higher than volatilities
at longer expiration dates—this of course is in a typical interest rate setting.
The vegas were also calculated for the absolute perturbation method in results not
displayed. The diﬀerences in the vegas for the two methods are minimal; for any vega
with absolute value above 1 bp, the diﬀerence is less than 4%, and for any vega with
absolute value below 1 bp, the diﬀerence is always less than a third of a basis point.
2.7 Comparison with the swap market model
The swap market model (SMM) is the canonical model for computing swap vega per
bucket. We compare the LIBOR BGM model and a swap market model with the very
59
2.7. COMPARISON WITH THE SWAP MARKET MODEL 31
1 5 10 15 20 25 30
−2
0
2
4
6
8
10
12
14
Bucket (Y)
S
w
a
p
v
e
g
a
s
c
a
l
e
d
t
o
1
0
0
b
p
s
h
i
f
t
(
b
p
)
THFRV
THSRV
CONST
Figure 2.6: Swap vega results for 10,000 simulation paths. Error bars denote 95% conﬁ
dence bound based on the standard error.
1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
European Swaption Bucket (Y)
S
w
a
p
V
e
g
a
(
b
p
)
BGM Libor Model
Swap Market Model
Figure 2.7: Comparison of LMM and SMM for swap vega per bucket, 5% strike.
60
32 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
0% 3% 6% 9% 12% 15%
0
5
10
15
20
25
Strike / Fixed Coupon
T
o
t
a
l
S
w
a
p
V
e
g
a
(
b
p
)
BGM Libor Model
Swap Market Model
Figure 2.8: Comparison of LMM and SMM for total swap vega against strike.
same swap rate quadratic crossvariation structure. Approximate equivalence between
the two models has been established by Joshi & Theis (2002, Equation (3.8)).
We perform the test for an 11NC1 payﬁxed Bermudan option on a swap with annual
ﬁxed and ﬂoating payments. A singlefactor LIBOR BGM model is used with constant
volatility calibrated to the euro cap volatility curve of October 10, 2001. The zero rates
were taken to be ﬂat at 5%. In the Monte Carlo simulation of the SMM we apply the
discretization suggested in Lemma 5 of Glasserman & Zhao (2000).
Results appear in Table 2.2, and are displayed partially in Figures 2.7 and 2.8. In this
particular case, the BGM LIBOR model reproduces the swap vegas of the swap market
model very accurately.
2.8 Conclusions
We have presented a new approach to calculating swap vega per bucket in the LIBOR
BGM model. We show that for some forms of the volatility an approach based on re
calibration may lead to great uncertainty in estimated swap vega, as the instantaneous
volatility structure may be distorted by recalibration. This does not happen in the case
of constant swap rate volatility.
61
2.8. CONCLUSIONS 33
Table 2.2: Swap vega per bucket test results for varying strikes—10,000 simulation paths.
BGM LIBOR MODEL
Fixed
Rate 2% 3% 3.5% 4% 4.5% 5% 6% 7% 8% 9% 10% 12% 15%
Value 2171 1476 1138 829 585 410 210 112 64 36 21 8 2
(4) (5) (5) (5) (5) (4) (3) (2) (2) (1) (1) (1) (0)
1Y 2.0 2.0 2.6 10.9 11.1 7.0 1.2 0.1 0.0 0.0 0.0 0.0 0.0
2Y 1.5 1.6 1.0 2.6 5.7 6.8 4.0 1.0 0.0 0.0 0.0 0.0 0.0
3Y 0.0 0.0 0.3 0.1 2.5 4.5 4.1 2.1 1.0 0.3 0.0 0.0 0.0
4Y 0.0 0.0 0.1 0.1 1.1 2.7 4.4 3.6 2.0 1.1 0.5 0.2 0.1
5Y 0.0 0.0 0.1 0.2 0.4 1.5 3.7 3.6 2.7 1.5 1.0 0.3 0.1
6Y 0.0 0.0 0.1 0.2 0.1 0.8 2.1 2.5 2.0 1.7 1.2 0.3 0.2
7Y 0.0 0.0 0.1 0.2 0.0 0.3 1.3 1.8 1.8 1.6 1.1 0.5 0.0
8Y 0.0 0.0 0.0 0.1 0.1 0.1 0.7 1.3 1.5 1.3 1.3 0.9 0.3
9Y 0.0 0.0 0.0 0.1 0.1 0.0 0.3 0.7 0.8 0.8 0.8 0.6 0.3
10Y 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.3 0.4 0.4 0.3 0.2
Total
Vega 0.5 0.4 2.9 12.8 20.8 23.8 21.9 16.9 12.3 8.8 6.2 3.1 1.0
SWAP MARKET MODEL
Fixed
Rate 2% 3% 3.5% 4% 4.5% 5% 6% 7% 8% 9% 10% 12% 15%
Value 2172 1480 1146 841 592 411 204 109 61 34 19 7 1
(6) (6) (6) (5) (5) (4) (4) (3) (2) (1) (1) (1) (0)
1Y 1.9 0.7 4.4 11.3 11.5 6.2 0.4 0.0 0.0 0.0 0.0 0.0 0.0
2Y 1.6 1.6 1.1 2.2 5.2 7.5 3.6 0.5 0.0 0.0 0.0 0.0 0.0
3Y 0.0 0.1 0.4 0.0 2.0 4.6 4.7 2.2 0.6 0.2 0.0 0.0 0.0
4Y 0.0 0.1 0.2 0.1 0.9 2.7 4.8 3.7 1.7 0.8 0.3 0.1 0.0
5Y 0.0 0.0 0.2 0.2 0.4 1.6 3.7 3.0 2.3 1.2 0.5 0.1 0.0
6Y 0.0 0.0 0.1 0.2 0.1 0.8 2.6 3.3 3.1 2.3 1.2 0.2 0.0
7Y 0.0 0.0 0.1 0.2 0.1 0.3 1.3 2.0 1.9 1.3 1.4 0.8 0.1
8Y 0.0 0.0 0.0 0.1 0.1 0.1 0.8 1.3 1.5 1.5 1.2 0.6 0.2
9Y 0.0 0.0 0.0 0.1 0.1 0.0 0.4 0.9 1.0 1.0 0.9 0.7 0.3
10Y 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.4 0.5 0.5 0.4 0.3
Total
Vega 0.3 0.6 4.5 12.6 19.9 23.8 22.3 17.2 12.5 8.8 6.0 2.9 0.9
62
34 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
We derive an alternative approach that is not based on recalibration, using the swap
market model. The method accurately estimates swaption vegas for any volatility function
and at a small number of simulation paths.
The key to the method is that the perturbation in the LIBOR volatility is distributed
in a clear, stable, and wellunderstood fashion, but in the recalibration method the change
in volatility is hidden and potentially unstable. We also show for a Bermudan swaption
deal that our method yields almost the same swap vega as a swap market model.
2.A Appendix: Negative vega for twostock Bermu
dan options
We examine a twostock Bermudan option to show that its vega per bucket is negative in
certain situations. The holder of a twostock Bermudan option has the right to call the
ﬁrst stock s
1
at strike k
1
at time t
1
; if the holder decides to hold the option, the right
remains to call the second stock s
2
at strike k
2
at time t
2
; if this right is not exercised,
then the option becomes worthless. Here t
1
< t
2
.
The Bermudan option is valued under standard BlackScholes conditions. Under the
riskneutral measure, the stock prices satisfy the stochastic diﬀerential equations:
ds
i
s
i
= rdt + σ
i
dw
i
, i = 1, 2, dw
1
dw
2
= ρdt,
where σ
i
is the volatility of the ith stock, and w
i
, i = 1, 2, are Brownian motions under
the riskneutral measure, with correlation ρ. It follows that the time t
1
stock prices are
distributed as follows:
s
i
(t
1
) = f
_
s
i
(0), 0; t
1
_
exp
_
σ
i
√
t
1
z
i
−
1
2
σ
2
i
t
1
_
, i = 1, 2, (2.15)
where the pair (z
1
, z
2
) is standard bivariate normally distributed with correlation ρ and
where
f(s, t; u) := s exp
_
r(u −t)
_
, (2.16)
is the time t forward price for delivery at time u of a stock with current price s.
At time t
1
, the holder of the Bermudan option will choose whichever of two alternatives
has a higher value: either calling the ﬁrst stock, or holding the option on the second stock;
the value of the latter is given by the BlackScholes formula. on conditioning and involves
a onedimensional numerical integration over the Black formula.
Therefore the (cashsettled) payoﬀ v(s
1
(t
1
), s
2
(t
1
), t
1
) of the Bermudan at time t
1
is
given by:
max
_
_
s
1
(t
1
) −k
1
_
+
, BS
2
_
s
2
(t
1
), t
1
_
_
, (2.17)
63
2.A. APPENDIX: NEGATIVE VEGA TWOSTOCK BERMUDAN OPTIONS 35
Table 2.3: Deal description.
Spot price for stock 1 s
1
(0) 150
Spot price for stock 2 s
2
(0) 140
Strike price for stock 1 k
1
100
Strike price for stock 2 k
2
100
Exercise time for stock 1 t
1
1Y
Exercise time for stock 2 t
2
2Y
Volatilities σ
i
Variable
Correlation ρ 0.9
Riskfree rate r 5%
where BS is the BlackScholes formula:
BS
i
(s, t) = e
−r(t
i
−t)
_
f(s, t; t
i
)φ
_
d
(i)
1
_
−k
i
φ
_
d
(i)
2
_
_
,
d
(i)
1,2
(s, t) =
ln
_
f(s, t; t
i
)/k
i
_
±
1
2
σ
2
i
t
σ
i
√
t
,
where φ() is the cumulative normal distribution function.
The time zero value v(s
1
, s
2
, 0) of the Bermudan option may thus be computed by a
bivariate normal integration of the discounted version of the payoﬀ in (2.17):
v(s
1
, s
2
, 0) = e
−rt
1
E
_
v
_
t
1
, s
1
(t
1
), s
2
(t
1
)
_
_
.
The vega per bucket ν
i
is deﬁned as
ν
i
:=
∂v(s
1
, s
2
, 0)
∂σ
i
, i = 1, 2.
The vega may be numerically approximated by ﬁnite diﬀerences:
ν
i
=
v(s
1
, s
2
, 0; σ
i
+ ∆σ
i
) −v(s
1
, s
2
, 0; σ
i
)
∆σ
i
+O
_
∆σ
2
i
_
, i = 1, 2,
for a small volatility perturbation ∆σ
i
¸1.
We note that the vega per bucket may possibly be negative for both the ﬁrst and the
second bucket. As an example of vega negativity, we compute the vega per bucket for the
deal described in Table 2.3. Results are displayed in Table 2.4. The volatility is perturbed
by a small amount.
64
36 CHAPTER 2. BERMUDAN SWAPTIONS IN A LIBOR MODEL
Table 2.4: Results for negative vega per bucket for twostock Bermudan option.
σ
1
σ
2
price ν
100bp
1
ν
100bp
2
Scenario 1 10% 30% 64.53 0.45 0.56
Scenario 2 30% 10% 65.11 0.56 0.44
The resulting vega is insensitive to either the perturbation size or the density of the
2D integration grid. In several instances a vega per bucket is negative, in both the ﬁrst
and the second bucket.
To ensure that the negative vega is not due to an implementation error, we develop
an alternative valuation of the twostock Bermudan option (available upon request). It
is based on conditioning and involves a onedimensional numerical integration over the
Black formula. The alternative method yields the exact same results.
We note in Table 2.4 that the negative vegas occur in the case of high correlation and
for the bucket with the lowest volatility. In the case of high correlation and one stock
with signiﬁcantly higher volatility than the other, we contend that the only added value
of the additional option on the lowvolatility stock lies in oﬀering protection against a
down move of both stocks (recall that the stocks are highly correlated). There are two
scenarios:
• Up move. Both stocks move up. Because the highvolatility stock moves up much
more than the lowvolatility stock, the highvolatility call will be exercised.
• Down move. Both stocks move down. Because the highvolatility stock moves down
much more than the lowvolatility stock, the highvolatility call becomes out of the
money, and the lowvolatility call will be exercised.
If now the volatility of the lowvolatility stock is increased by a small amount, then in
these scenarios the exercise strategy remains unchanged. Also, in the case of an up move,
the payoﬀ remains unaltered. In the case of a down move, however, the lowvolatility stock
(volatility slightly increased) moves down more than in the unperturbed case. Therefore,
the payoﬀ of the protection call is reduced. In total, the Bermudan option is thus worth
less.
We give an alternative explanation of the source of vega negativity for Bermudan
swaptions. We consider a European maximum option on the two highly correlated stocks
s
1
and s
2
, struck at k > 0, with payoﬀ:
max(s
1
−k, s
2
−k, 0).
65
2.A. APPENDIX: NEGATIVE VEGA TWOSTOCK BERMUDAN OPTIONS 37
We deem the risk behaviour of this option to be similar to the risk behaviour of a Bermu
dan swaption, since the choice of calling either s
1
or s
2
corresponds to the choice of
exercising at the ﬁrst or second exercise opportunity. We have:
max(s
1
−k, s
2
−k, 0) = max(s
1
−k, 0) + 1
¡s
1
>k¦
max(s
2
−s
1
, 0).
The maximum option is thus the sum of an ordinary European call option and a (con
ditional) European spread option. If the volatility of the ﬁrst stock increases then the
volatility of the spread s
2
−s
1
decreases (for highly correlated stocks), by which the value
of the European spread option decreases. This causes a negative component in the to
tal composition of the vega. However the ordinary call option value increases when the
volatility of the ﬁrst stock increases, which thus constitutes a positive component of the
vega.
The same argument can be applied to show that an increase in volatility of the second
stock causes a negative component in the vega. The spread option argument carefully
shows a negative component in the vega of a Bermudan swaption, however it does not
explain that this negative component can sometimes outweigh the other positive compo
nents of the vega. This outweighing of the negative component is explained in the up and
down moves argument above.
66
67
Chapter 3
Rank reduction of correlation
matrices by majorization
1
A novel algorithm is developed for the problem of ﬁnding a lowrank correlation ma
trix nearest to a given correlation matrix. The algorithm is based on majorization and,
therefore, it is globally convergent. The algorithm is computationally eﬃcient, is straight
forward to implement, and can handle arbitrary weights on the entries of the correlation
matrix. A simulation study suggests that majorization compares favourably with compet
ing approaches in terms of the quality of the solution within a ﬁxed computational time.
The problem of rank reduction of correlation matrices occurs when pricing a derivative
dependent on a large number of assets, where the asset prices are modelled as correlated
lognormal processes. Such an application mainly concerns interest rates.
3.1 Introduction
In this chapter, we study the problem of ﬁnding a lowrank correlation matrix nearest
to a given (correlation) matrix. First we explain how this problem occurs in an interest
rate derivatives pricing setting. We will focus on interest rate derivatives that depend on
several rates such as the 1 year LIBOR deposit rate, the 2 year swap rate, etc. An example
of such a derivative is a Bermudan swaption. A Bermudan swaption gives its holder the
right to enter into a ﬁxed maturity interest rate swap at certain exercise dates. At an
exercise opportunity, the holder has to choose between exercising then or holding onto the
option with the chance of entering into the swap later at more favourable interest rates.
1
This chapter has been published in diﬀerent form as Pietersz, R. & Groenen, P. J. F. (2004b), ‘Rank
reduction of correlation matrices by majorization’, Quantitative Finance 4(6), 649–662. An extended
abstract of this chapter appeared as Pietersz, R. & Groenen, P. J. F. (2004a), ‘A major LIBOR ﬁt’, Risk
Magazine p. 102. December issue.
68
40 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
Evidently, the value depends not only on the current available swap rate but, amongst
others, also on the forward swap rates corresponding to future exercise dates. In contrast,
an example of a derivative that is dependent on a single interest rate is a caplet, which
can be viewed as a call option on LIBOR. In this case, the value of the caplet depends
only on a single forward LIBOR rate.
Here, we will focus on derivatives depending on several rates. Our discussion can
however also be applied to the situation of a derivative depending on several assets. To
do so a model is set up that speciﬁes the behaviour of the asset prices. Each of the
asset prices is modelled as a lognormal martingale under its respective forward measure.
Additionally, the asset prices are correlated. Suppose we model n correlated lognormal
price processes,
ds
i
s
i
= . . . dt + σ
i
d ˜ w
i
, ¸d ˜ w
i
, d ˜ w
j
) = ρ
ij
, (3.1)
under a single measure. Here s
i
denotes the price of the i
th
asset, σ
i
its volatility and ˜ w
i
denotes the associated driving Brownian motion. Brownian motions i and j are correlated
with coeﬃcient ρ
ij
, the correlation coeﬃcient between the returns on assets i and j. The
matrix P = (ρ
ij
)
ij
should be positive semideﬁnite and should have a unit diagonal. In
other words, P should be a true correlation matrix. The term . . . dt denotes the drift
term that stems from the change of measure under the nonarbitrage condition.
The models that ﬁt into the framework of (3.1) and which are most relevant to our
discussion are the LIBOR and swap market models for valuation of interest rate deriva
tives, as introduced in Section 1.3.2. These models were developed by Brace et al. (1997),
Jamshidian (1997) and Miltersen et al. (1997). In this case, an asset price corresponds to
a forward LIBOR or swap rate. For example, if we model a 30 year Bermudan swaption
with annual call and payment dates, then our model would consist of 30 annual forward
LIBOR rates or 30 coterminal forward swap rates. In the latter case, we consider 30 for
ward starting annualpaying swaps, starting at each of the 30 exercise opportunities and
all ending after 30 years. Model (3.1) could however be applied to a derivative depending
on a number of, for example, stocks, too.
Given the model (3.1), the price of any derivative depending on the assets can be
calculated by nonarbitrage arguments. Because the number of assets is assumed to be
high and the derivative is assumed complex in this exposition, the derivative value can be
calculated only by Monte Carlo simulation. To implement scheme (3.1) by Monte Carlo
we need a decomposition P = YY
T
, with Y an n n matrix. In other words, if we
denote the i
th
row vector of Y by y
i
, then the decomposition reads ¸y
i
, y
j
) = ρ
ij
, where
¸., .) denotes the scalar product. We then implement the scheme
ds
i
s
i
= . . . dt + σ
i
_
y
i1
dw
1
+ + y
in
dw
n
_
, ¸y
i
, y
j
) = ρ
ij
, (3.2)
69
3.1. INTRODUCTION 41
where the w
i
are now independent Brownian motions. Scheme (3.2) indeed corresponds
to scheme (3.1) since both volatility and correlation are implemented correctly. The
instantaneous variance is ¸ds
i
/s
i
) = σ
2
i
dt since y
i
 = ρ
ii
= 1 and volatility is the square
root of instantaneous variance divided by dt. Moreover, for the instantaneous covariance
we have ¸ds
i
/s
i
, ds
j
/s
j
) = σ
i
σ
j
¸y
i
, y
j
)dt = σ
i
σ
j
ρ
ij
dt.
For large interest rate correlation matrices, usually almost all variance (say 99%) can
be attributed to only 3–6 stochastic Brownian factors. Therefore, (3.2) contains a large
number of almost redundant Brownian motions that cost expensive computational time
to simulate. Instead of taking into account all Brownian motions, we would wish to do
the simulation with a smaller number of factors, d say, with d < n and d typically between
2 and 6. The scheme then becomes
ds
i
s
i
= . . . dt + σ
i
_
y
i1
dw
1
+ + y
id
dw
d
_
, ¸y
i
, y
j
) = ρ
ij
. (3.3)
The nd matrix Y is a decomposition of P. This approach immediately implies that the
rank of P be less than or equal to d. For ﬁnancial correlation matrices, this rank restric
tion is generally not satisﬁed. It follows that an approximation be required. We could
proceed in two possible ways. The ﬁrst way involves approximating the covariance matrix
(σ
i
σ
j
ρ
ij
)
ij
. The second involves approximating the correlation matrix while maintaining
an exact ﬁt to the volatilities. In a derivatives pricing setting, usually the volatilities
are wellknown. These can be calculated via a Blacktype formula from the European
option prices quoted in the market, or mostly these volatilities are directly quoted in the
market. The correlation is usually less known and can be obtained in two ways. First, it
can be estimated from historical time series. Second, it can be implied from correlation
sensitive markettraded options such as spread options. A spread option is an option on
the diﬀerence between two rates or asset prices. Such correlation sensitive products are
not traded as liquidly as the European plainvanilla options. Consequently, in both cases
of historic or marketimplied correlation, we are more conﬁdent of the volatilities. For
that reason, in a derivative pricing setting, we approximate the correlation matrix rather
than the covariance matrix.
The above considerations lead to solving the following problem:
Find Y ∈ R
nd
,
to minimize ϕ(Y) :=
1
c
i<j
w
ij
_
ρ
ij
−¸y
i
, y
j
)
_
2
,
subject to y
i

2
= 1, i = 1, . . . , n.
(3.4)
Here w
ij
are nonnegative weights and c := 4
i<j
w
ij
. The objective value ϕ is scaled
by the constant c in order to make it independent of the problem dimension n. Because
each term ρ
ij
−¸y
i
, y
j
) is always between 0 and 2, it follows for the choice of c that ϕ is
always between 0 and 1.
70
42 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
An interesting alternative is to approximate the covariance matrix while keeping vari
ance ﬁxed
2
. If
˜
P = (˜ ρ
ij
)
ij
denotes the instantaneous covariance matrix, and σ denotes
the vector of volatilities, then
˜
P = Diag(σ) P Diag(σ),
where Diag(σ) denotes a diagonal matrix with the diagonal ﬁlled with the vector σ.
Approximating the covariance matrix while ensuring a perfect ﬁt to variance amounts to
solving the following problem (we leave out weights w and the scalar factor c, from (3.4),
for clarity of presentation):
Find
˜
Y ∈ R
nd
,
to minimize ˜ ϕ(
˜
Y) :=
i<j
_
˜ ρ
ij
−¸˜ y
i
, ˜ y
j
)
_
2
,
subject to ˜ y
i

2
2
= ˜ ρ
ii
, i = 1, . . . , n.
(3.5)
Here, ˜ y
i
relates to y
i
in (3.3) via
˜ y
i
= σ
i
y
i
.
Approximating covariance with ﬁxed variance as in (3.5) yields, in general, diﬀerent results
than approximating correlation as in (3.4), since the added variance may change the
importance of certain factors. We carefully state “in general” and “may change”, since
if all variances are equal to one, then problems (3.4) (with constant weights) and (3.5)
are obviously identical. We note that the majorization algorithm in this chapter, and
the geometric programming algorithm in Chapter 4, can be applied to the covariance
approximating problem in (3.5) by setting the weights in (3.4) as w
ij
= σ
2
i
σ
2
j
. The
covariance problem (3.5) is thus a special case of the correlation problem (3.4), therefore
we focus on (3.4) in the remainder of the thesis.
The weights w
ij
in (3.4) have been added for three reasons:
• For squared diﬀerences, a large diﬀerence constitutes a far greater part of the total
error in (3.4) than a small diﬀerence. The weights for small diﬀerences can then be
appropriately increased to adjust for this.
• Financial reasons may sometimes compel us to assign higher weights to particular
correlation pairs. For example, we could be more conﬁdent about the correlation
between the 1 and 2 year swap rates than about the correlation between the 8 and
27 year swap rates.
• The objective function with weights has been considered before in the literature.
See for example Rebonato (1999c, Section 10). Rebonato (2002, Section 9) provides
an excellent discussion of the pros and cons of using weights.
2
Many thanks to Ton Vorst for pointing out this alternative.
71
3.2. LITERATURE REVIEW 43
The simplest case of ϕ is ϕ(Y) := c
−1
P − YY
T

2
F
, where  
F
denotes the Frobenius
norm, Y
2
F
:= tr(YY
T
) for matrices Y. This objective function (which we shall also call
‘Frobenius norm’) ﬁts in the framework of (3.4); it corresponds to the case of all weights
equal. The objective function in (3.4) will be referred to as ‘general weights’.
In the literature, there exist six other algorithms for minimizing ϕ deﬁned in (3.4).
These methods are outlined in the next section and are shown to have several disadvan
tages, namely none of the methods is simultaneously
(i) eﬃcient,
(ii) straightforward to implement,
(iii) able to handle general weights and
(iv) guaranteed to converge to a local minimum.
In this chapter, we develop a novel method to minimize ϕ that simultaneously has the
four mentioned properties. The method is based on iterative majorization that has the
important property of guaranteed convergence to a stationary point. The algorithm is
straightforward to implement. We show that the method can eﬃciently handle general
weights. We investigate empirically the eﬃciency of majorization in comparison to other
methods in the literature. The benchmark tests that we will consider are based on the
performance given a ﬁxed small amount of computational time. This is exactly the situ
ation in practice: decisions based on derivative pricing calculations have to be made in a
limited amount of time.
The remainder of this chapter is organized as follows. First, we provide an overview
of the methods available in the literature. Second, the idea of majorization is introduced
and the majorizing functions are derived. Third, an algorithm based on majorization is
given along with reference to associated MATLAB code. Global convergence and the local
rate of convergence are investigated. Fourth, we present empirical results. The chapter
ends with some conclusions.
3.2 Literature review
We describe seven existing algorithms available in the literature for minimizing ϕ. Because
a review of the majorization method is interesting from the point of view of Chapter
4 (Rank reduction by geometric programming), we include here the discussion of the
majorization method itself. For each of the seven algorithms, it is indicated whether
it can handle general weights. If not, then the most general objective function it can
handle stems from the weighted Frobenius norm  
F,Ω
with Ω a symmetric positive
72
44 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
deﬁnite matrix, where Y
2
F,Ω
:= tr(YΩY
T
Ω). The objective function ϕ(Y) := c
−1
R−
YY
T

2
F,Ω
will be referred to as ‘weighted Frobenius norm’ too.
3.2.1 Modiﬁed PCA
First, we mention the ‘modiﬁed principal component analysis (PCA)’ method. For ease
of exposition, we restrict to the case of the Frobenius norm, however the method can be
applied to the weighted Frobenius norm as well though not for general weights. Modiﬁed
PCA is based on an eigenvalue decomposition P = QΛQ
T
, with Q orthogonal and Λ
the diagonal matrix with eigenvalues. If the eigenvalues are ordered descendingly then a
lowrank decomposition with associated approximated matrix close to the original matrix
is found by
¦Y
PCA
¦
i
=
z
z
2
, (3.6)
z :=
_
Q
d
Λ
1/2
d
_
i
, i = 1, . . . , n. (3.7)
Here ¦X¦
i
denotes the i
th
row of a matrix X, Q
d
the ﬁrst d columns of Q, and Λ
d
the
principal submatrix of Λ of degree d. Ordinary PCA stops with (3.7) and it is the scaling
in (3.6) that is the ‘modiﬁed’ part, ensuring that the resulting correlation matrices have
unit diagonal. Modiﬁed PCA is popular among ﬁnancial practitioners and implemented
in numerous ﬁnancial institutions. The modiﬁcation of PCA in this way is believed to
be due to Flury (1988). For a description in ﬁnance related articles, see, for example,
Sidenius (2000) and Hull & White (2000). Modiﬁed PCA is easy to implement, because
almost all that is required is an eigenvalue decomposition. The calculation is almost
instant, and the approximation is reasonably accurate. A strong drawback of modiﬁed
PCA is its nonoptimality: generally one may ﬁnd decompositions Y (even locally) for
which the associated correlation matrix YY
T
is closer to the original matrix P than the
PCAapproximated correlation matrix Y
PCA
Y
T
PCA
. The modiﬁed PCA approximation
becomes worse when the magnitude of the left out eigenvalues increases.
Throughout this chapter we choose the starting point of any method considered (be
yond modiﬁed PCA) to be the modiﬁed PCA solution.
3.2.2 Majorization
Second, we mention the majorization approach of Pietersz & Groenen (2004a, b), see
also this Chapter 3. Majorization can handle an entryweighted objective function and is
guaranteed to converge to a stationary point. The rate of convergence is sublinear.
73
3.2. LITERATURE REVIEW 45
3.2.3 Geometric programming
The third algorithm that we discuss, is the geometric programming approach of Grubiˇsi´c
& Pietersz (2005), see also Chapter 4. Here, the constraint set is equipped with a diﬀer
entiable structure. Subsequently geometric programming is applied, which can be seen
as NewtonRhapson or conjugate gradient over curved space. By formulating these algo
rithms entirely in terms of diﬀerential geometric means, a simple expression is obtained
for the gradient. The latter allows for an eﬃcient implementation. Another advantage
of geometric programming is that it can handle general weights. However, a drawback of
the geometric programming approach is that it takes many lines of nonstraightforward
code to implement, which may hinder its use for nonexperts.
3.2.4 Alternating projections without normal correction
Fourth, we consider the alternating projections algorithm of Grubiˇsi´c (2002) and Morini
& Webber (2004). The discussion below of alternating projections applies only to the
problem of rank reduction of correlation matrices. The method is based on alternating
projections onto the set of n n matrices with unit diagonal and onto the set of n n
matrices of rank d or less. Both these projections can be eﬃciently calculated. For
projection onto the intersection of two convex sets, Dykstra (1983) and Han (1988) have
shown that convergence to a minimum can be obtained with alternating projections onto
the individual convex sets if a normal vector correction is applied. Their results do not
automatically hold for an alternating projections algorithm with normal correction for
Problem (4.1)
3
, since for d < n the set of n n matrices of rank d or less is nonconvex.
The alternating projections algorithm could in principle be extended to the case with rank
restrictions, since we can eﬃciently calculate the projection onto the set of rankd matrices.
Convergence of the algorithm is however no longer guaranteed by the general results of
Dykstra (1983) and Han (1988) because the constraint set ¦rank(C) ≤ d¦ is no longer
convex for d < n. Some preliminary experimentation showed indeed that the extension
to the nonconvex case did not work generally. Also, Morini & Webber (2004) report
that alternating projections with normal correction may fail in solving Problem (4.1).
Higham (2002, Section 5, ‘Concluding remarks’) mentions that he has been investigating
alternative algorithms, such as to include rank constraints. The alternating projections
algorithm without normal correction stated in Grubiˇsi´c (2002) and Morini & Webber
(2004) however always converges to a feasible point, but not necessarily to a stationary
point. In fact, in general, the alternating projections method without normal correction
does not converge to a stationary point. The algorithm thus does not minimize the
3
The algorithm with normal correction for rank reduction has also been studied in Weigel (2004).
74
46 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
objective function in (4.1), it only selects a feasible point satisfying the constraints of
(4.1).
3.2.5 Lagrange multipliers
As the ﬁfth algorithm, we mention the Lagrange multiplier technique developed by Zhang
& Wu (2003) and Wu (2003). This method lacks guaranteed convergence: Zhang & Wu
(2003, Proposition 4.1) and Wu (2003, Theorem 3.4) prove the following result. The
Lagrange multiplier algorithm produces a sequence of multipliers for which accumulation
points exist. If, for the original matrix plus the Lagrange multipliers of an accumulation
point, the d
th
and (d + 1)
th
eigenvalues have diﬀerent absolute values, then the resulting
rankd approximation is a global minimizer of problem (3.4). However, the condition
that the d
th
and (d + 1)
th
eigenvalues are diﬀerent has not been guaranteed. In numeri
cal experiments, this equaleigenvalues phenomenon occurs. Therefore, convergence of the
Lagrange multiplier method to a global minimum or even to a stationary point is not guar
anteed. It is beyond the scope of this chapter to indicate how often this ‘nonconvergence’
occurs. If the algorithm has not yet converged, then the produced lowrank correlation
matrix will not satisfy the diagonal constraint. The appropriate adaptation is to rescale
the associated conﬁguration similarly to the modiﬁed PCA approach (3.6). For certain
numerical settings, the resulting algorithm has been shown to perform not better and even
worse than the geometric programming approach (Grubiˇsi´c & Pietersz 2005). Another
drawback of the Lagrange multiplier algorithm is that only the weighted Frobenius norm
can be handled and not general weights.
3.2.6 Parametrization
Sixth, we mention the ‘parametrization method’ of Rebonato (1999a, 1999b, 1999c (Sec
tion 10), 2002 (Section 9), 2004b (Sections 20.1–20.4)), Brigo (2002), Brigo & Mercurio
(2001, Section 6.9) and Rapisarda, Brigo & Mercurio (2002). The set of correlation
matrices of rank d or less ¦YY
T
: Y ∈ R
nd
, Diag(YY
T
)= I¦ is parameterized by
trigonometric functions through spherical coordinates y
i
= y
i
(θ
i
) with θ
i
∈ R
d−1
. As
a result, the objective value ϕ(Y) becomes a function ϕ(Y(Θ)) of the angle parameters
Θ that live in R
n(d−1)
. Subsequently, ordinary nonlinear optimisation algorithms may
be applied to minimize the objective value ϕ(Y(Θ)) over the angle parameters Θ. In
essence, this approach is the same as geometric optimisation, except for the key diﬀerence
of optimising over Θ versus over Y. The major beneﬁt of geometric optimisation over the
parametrization method is as follows. We consider, for ease of exposition, the case of equal
weights. The diﬀerential ϕ
Y
, in terms of Y, is given simply as 2ΨY, with Ψ = YY
T
−P,
see (4.24) below. We note that ϕ
Y
= 2ΨY can thus be eﬃciently calculated. The dif
75
3.3. MAJORIZATION 47
ferential ϕ
Θ
, in terms of Θ however, is 2ΨY multiplied by the diﬀerential of Y with
respect to Θ, by the chain rule of diﬀerentiation. The latter diﬀerential is less eﬃcient to
calculate since it involves numerous sums of trigonometric functions. Grubiˇsi´c & Pietersz
(2005, Section 6) have shown empirically for a particular numerical setting with many
randomly generated correlation matrices that the parametrization method is numerically
less eﬃcient than either the geometric programming approach or the Lagrange multiplier
approach. The parametrization approach can handle general weights.
3.2.7 Alternating projections with normal correction (d = n)
The seventh important contribution is due to Higham (2002). The algorithm of Higham
(2002) is the alternating projection algorithm with normal correction applied to the case
d = n, i.e., to the problem of ﬁnding the nearest (possibly fullrank) correlation matrix.
This method can only be used when there are no rank restrictions (d := n) and only with
the weighted Frobenius norm. To understand the methodology, note that minimization
Problem (3.4) with equal weights and d := n can be written as min¦P − C
2
F
; C _
0, Diag(C) = I¦. The two constraint sets ¦C _ 0¦ and ¦Diag(C) = I¦ are both convex.
The convexity was cleverly exploited by Higham (2002), in which it was shown that the
alternating projections algorithm of Dykstra (1983) and Han (1988) could be applied. The
same technique has been applied in a diﬀerent context in Chu, Funderlic & Plemmons
(2003), Glunt, Hayden, Hong & Wells (1990), Hayden & Wells (1988) and Suﬀridge &
Hayden (1993). Since the case d < n is the primary interest of this chapter, the method
of Higham (2002) will not be considered in the remainder.
3.3 Majorization
In this section, we brieﬂy describe the idea of majorization and apply majorization to
the objective function ϕ of Problem (3.4). The idea of majorization has been described,
amongst others, in De Leeuw & Heiser (1977), Kiers & Groenen (1996) and Kiers (2002).
We follow here the lines of Borg & Groenen (1997, Section 8.4). The key to majorization
is to ﬁnd a simpler function that has the same function value at a supporting point x
and anywhere else is larger than or equal to the objective function to be minimized. Such
a function is called a majorization function. By minimizing the majorization function –
which is an easier task since this function is ‘simpler’ – we obtain the next point of the
algorithm. This procedure guarantees that the function value never increases along points
generated by the algorithm. Moreover, if the objective and majorization functions are once
continuously diﬀerentiable (which turns out to hold in our case), then the properties above
imply that the gradients should match at the supporting point x. As a consequence,
from any point where the gradient of the objective function is nonnegligible, iterative
76
48 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
majorization will be able to ﬁnd a next point with a strictly smaller objective function
value. This generic fact for majorization algorithms has been pointed out in Heiser (1995).
We formalize the procedure somewhat more. Let ϕ() denote the function to be min
imized. Let for each x in the domain of ϕ be given a majorization function χ(, x) such
that
(i) ϕ(x) = χ(x, x),
(ii) ϕ(y) ≤ χ(y, x) for all y, and
(iii) the function χ(, x) is ‘simple’, that is, it is straightforward to calculate the minimum
of χ(, x).
A majorization algorithm is then given by
(i) Start at y
(0)
. Set k := 0.
(ii) Set y
(k+1)
equal to the minimum argument of the function χ
_
, y
(k)
_
.
(iii) If ϕ
_
y
(k)
_
−ϕ
_
y
(k+1)
_
< ε then stop with y := y
(k+1)
.
(iv) Set k := k + 1 and repeat from (ii).
Figure 3.1 illustrates the majorization algorithm.
Below we derive the majorizing function for ϕ() in (3.4). The ﬁrst step is to majorize
ϕ(X) as a function of the i
th
row only and then to repeat this for each row. To formalize
the notion of ‘ϕ(X) as a function of the i
th
row only’ we introduce the notation ϕ
i
(y; Y)
to denote the function
ϕ
i
(, Y) : y →ϕ
_
ˆ
Y
i
(y)
_
,
for (column)vectors y ∈ R
d
with
ˆ
Y
i
(y) denoting the matrix Y with the i
th
row replaced
by y
T
. We interpret Y as [y
1
y
n
]
T
. We ﬁnd
ϕ(Y) =
1
c
j
1
<j
2
w
j
1
j
2
_
ρ
j
1
j
2
−¸y
j
1
, y
j
2
)
_
2
=
1
c
j
1
<j
2
w
j
1
j
2
_
ρ
2
j
1
j
2
+ (y
T
j
1
y
j
2
)
2
−2ρ
j
1
j
2
y
T
j
1
y
j
2
_
= (const in y
i
) +
1
c
_
y
T
i
_
j:j,=i
w
ij
y
j
y
T
j
_
y
i
. ¸¸ .
(I)
−2y
T
i
_
j:j,=i
w
ij
ρ
ij
y
j
_
. ¸¸ .
(II)
_
. (3.8)
77
3.3. MAJORIZATION 49
y
0
y
1
y
2
ĳ(.)
Ȥ(.,y
0
)
Ȥ(.,y
1
)
ĳ(y
0
)=Ȥ(y
0
,y
0
)
Ȥ(y
1
,y
0
)
ĳ(y
1
)=Ȥ(y
1
,y
1
)
Ȥ(y
2
,y
1
)
ĳ(y
2
)
Figure 3.1: The idea of majorization. (Figure adopted from Borg & Groenen (1997,
Figure 8.4).) The algorithm sets out at y
0
. The majorization function χ(, y
0
) is ﬁtted by
matching the value and ﬁrst derivative of ϕ() at y
0
. Subsequently the function χ(, y
0
) is
minimized to ﬁnd the next point y
1
. This procedure is repeated to ﬁnd the point y
2
etc.
78
50 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
Part (I) is quadratic in y
i
whereas part (II) is linear in y
i
; the remaining term is constant
in y
i
. We only have to majorize part (I), as follows. Deﬁne
B
i
(Y) :=
j:j,=i
w
ij
y
j
y
T
j
. (3.9)
For notational convenience, we shall denote B
i
(Y) by B, the running y
i
by y, and the
current y
i
, that is, the current i
th
row vector of Y, is denoted by x. Let λ denote the
largest eigenvalue of B. Then, the matrix B − λI is negative semideﬁnite, so that the
following inequality holds:
(y −x)
T
(B−λI)(y −x) ≤ 0, ∀y,
which gives after some manipulations
y
T
By ≤ 2λ −2y
T
(λx −Bx) −x
T
Bx, ∀y, (3.10)
using the fact that y
T
y = x
T
x = 1.
Combining (3.8) and (3.10) we obtain the majorizing function of ϕ
i
(y; Y), that is,
ϕ
i
(y; Y) ≤ −
2
c
y
T
_
λx −Bx +
j:i,=j
w
ij
ρ
ij
y
j
_
+ (const in y) = χ
i
(y; Y), ∀y.
The advantage of χ
i
(; Y) over ϕ
i
(, Y) is that it is linear in y and that the minimization
problem
min
_
χ
i
(y; Y) ; y
2
= 1
_
(3.11)
is readily solved by
y
∗
:= z/z
2
, z := λx −Bx +
j:j,=i
w
ij
ρ
ij
y
j
.
If z = 0 then this implies that the gradient is zero, from which it would follow that the
current point x is already a stationary point.
3.4 The algorithm and convergence analysis
Majorization algorithms are known to converge to a point with negligible gradient. This
property holds also for the current situation, as will be shown hereafter. As the conver
gence criterion is deﬁned in terms of the gradient Gradϕ, an expression for Gradϕ is
needed. We restrict to the case of all w
ij
equal. As shown in Grubiˇsi´c & Pietersz (2005)
(see also Section 4.5.3), the gradient is then given by
Gradϕ = 4c
−1
ΨY, Ψ := YY
T
−P. (3.12)
79
3.4. THE ALGORITHM AND CONVERGENCE ANALYSIS 51
Algorithm 1 The majorization algorithm for ﬁnding a lowrank correlation matrix locally
nearest to a given matrix. Here P denotes the input matrix, W denotes the weight matrix,
n denotes its dimension, d denotes the desired rank, ε
Gradϕ
is the convergence criterion
for the norm of the gradient and ε
ϕ
is the convergence criterion on the improvement in
the function value.
Input: P, W, n, d, ε
Gradϕ
, ε
ϕ
.
1: Find starting point Y by means of the modiﬁed PCA method (3.6)–(3.7).
2: for k = 0, 1, 2, ... do
3: stop if the norm of the gradient of ϕ at X
(k)
:= X is less than ε
Gradϕ
and the
improvement in the function value ϕ
k−1
/ϕ
k
−1 is less than ε
ϕ
.
4: for i = 1, 2, ..., n do
5: Set B :=
j,=i
w
ij
y
j
y
T
j
.
6: Calculate λ to be the largest eigenvalue of the d d matrix B.
7: Set z := λy
i
−By
i
+
j,=i
w
ij
ρ
ij
y
j
.
8: If z ,= 0, then set the i
th
row y
i
of Y equal to z/z
2
.
9: end for
10: end for
Output: the nn matrix YY
T
is the rankd approximation of Psatisfying the convergence
constraints.
An expression for the gradient for the objective function with general weights can be
found by straightforward diﬀerentiation. The majorization algorithm has been displayed
in Algorithm 1.
The rowwise approach of Algorithm 1 makes it dependent of the order of looping
through the rows. This order eﬀect will be addressed in Section 3.5.3. In Sections 3.5.4
and 3.5.5 we study diﬀerent ways of implementing the calculation of the largest eigenvalue
of B in line 6 of Algorithm 1. In particular, we study the use of the power method.
In the remainder of this section the convergence of Algorithm 1 is studied. First, we
establish global convergence of the algorithm. Second, we investigate the local rate of
convergence.
3.4.1 Global convergence
Zangwill (1969) developed generic suﬃcient conditions that guarantee convergence of
an iterative algorithm. The result is repeated here in a form adapted to the case of
majorization. Let M be a compact set. Assume the speciﬁcation of a subset S ⊂ M
called the solution set. A point Y ∈ S is deemed a solution. An (autonomous) iterative
algorithm is a map A : M → M ∪ ¦stop¦ such that A
−1
(¦stop¦) = S. The proof of the
following theorem is adapted from the proof of Theorem 1 in Zangwill (1969).
80
52 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
Theorem 1 (Global convergence) Consider ﬁnding a local minimum of the objective
function ϕ(Y) by use of Algorithm 1. Suppose given a ﬁxed tolerance level ε on the
gradient of ϕ. A point Y is called a solution if Gradϕ(Y) < ε. Then from any start
ing point Y
(0)
, the algorithm either stops at a solution or produces an inﬁnite sequence of
points none of which are solutions, for which the limit of any convergent subsequence is a
solution point.
PROOF: Without loss of generality we may assume that the procedure generates an
inﬁnite sequence of points ¦Y
(k)
¦ none of which are solutions. It remains to be proven
that the limit of any convergent subsequence must be a solution.
First, note that the algorithm A() is continuous in Y. Second, note that if Y
(k)
is
not a solution then
ϕ
_
Y
(k+1)
_
= ϕ
_
A
_
Y
(k)
_
_
< ϕ
_
Y
(k)
_
.
Namely if Y
(k)
is not a solution then its gradient is nonnegligible. Since the objective and
all majorization functions are diﬀerentiable, we necessarily have that the gradients agree
at Y
(k)
. Therefore, when minimizing the majorization functions χ
i
(, Y) there will be at
least one i for which we ﬁnd a strictly smaller objective value. Thus Y
(k+1)
:= A(Y
(k)
)
has a strictly smaller objective function value than Y
(k)
. Third, note that the sequence
¦ϕ(Y
(k)
)¦
∞
k=0
has a limit since it is monotonically decreasing and bounded from below by
0.
Let ¦Y
(k
j
)
¦
∞
j=1
be any subsequence that converges to Y
∗
, say. It must be shown
that Y
∗
is a solution. Assume the contrary. By continuity of the iterative procedure,
A(Y
(k
j
)
) →A(Y
∗
). By the continuity of ϕ(), we then have
ϕ
_
A
_
Y
(k
j
)
_
_
¸
_
ϕ
_
A
_
Y
∗
_
_
< ϕ(Y
∗
),
which is in contradiction with ϕ(A(Y
(k
j
)
)) →ϕ(Y
∗
). 2
The algorithm thus converges to a point with vanishing ﬁrst derivative. We expect such
a point to be a local minimum, but, in principle, it may also be a stationary point.
In practice, however, we almost always obtain a local minimum, except for very rare
degenerate cases. Moreover, global convergence to a point with zero ﬁrst derivative is
the best one may expect from generic optimization algorithms. For example, the globally
convergent version of the NewtonRhapson algorithm may converge to a stationary point,
too: Applied to the function ϕ(x, y) = x
2
− y
2
, it will converge to the stationary point
(0, 0) starting from any point on the line ¦y = 0¦.
3.4.2 Local rate of convergence
The local rate of convergence determines the speed at which an algorithm converges to a
solution point in a neighbourhood thereof. Let ¦Y
(k)
¦ be a sequence of points produced
81
3.5. NUMERICAL RESULTS 53
by an algorithm converging to a solution point Y
(∞)
. Suppose, for k large enough,
_
_
Y
(k+1)
−Y
(∞)
_
_
≤ α
_
_
Y
(k)
−Y
(∞)
_
_
ζ
. (3.13)
If ζ = 1 and α < 1 or if ζ = 2 the local convergence is called linear or quadratic,
respectively. If the convergence estimate is worse than linear, the convergence is deemed
sublinear. For linear convergence, α is called the linear rate of convergence.
When considering several algorithms and indeﬁnite iteration, eventually the algorithm
with best rate of convergence will provide the best result. Among the algorithms available
in the literature, both the geometric programming and parametrization approach can have
a quadratic rate of convergence given that a NewtonRhapson type algorithm is applied.
As the proposition below will show, Algorithm 1 has a sublinear local rate of convergence,
that is, worse than a linear rate of convergence. Thus the majorization algorithm makes
no contribution to existing literature for the case of indeﬁnite iteration. However, we did
not introduce the majorization algorithm for the purpose of indeﬁnite iteration, but rather
for calculating a reasonable answer in limited time, as is the case in practical applications
of ﬁnancial institutions. Given a ﬁxed amount of time, the performance of an algorithm
is a tradeoﬀ between rate of convergence and computational cost per iterate. Such
performance can almost invariably only be measured by empirical investigation, and the
results of the next section on numerical experiments indeed show that majorization is the
best performing algorithm in a number of ﬁnancial settings. The strength of majorization
lies in the low costs of calculating the next iterate.
The next proposition establishes the local sublinear rate of convergence.
Proposition 1 (Local rate of convergence) Algorithm 1 has locally a sublinear rate of
convergence. More speciﬁcally, let ¦Y
(k)
¦ denote the sequence of points generated by
Algorithm 1 converging to the point Y
(∞)
. Deﬁne δ
(k,i)
= y
(k)
i
−y
(∞)
i
. Then
δ
(k+1,i)
= δ
(k,i)
+O
_ _
δ
(k,i)
_
2
_
. (3.14)
PROOF: The proof of Equation (3.14) may be found in Appendix 3.A. Equation (3.14)
can be written as δ
(k+1,i)
= α(δ
(k,i)
)δ
(k,i)
with α(δ
(k,i)
) → 1 as k → ∞. It follows that
the convergencetype deﬁning Equation (3.13) holds, for Algorithm 1, with ζ = 1, but for
α = 1 and not for any α < 1. We may conclude that the local convergence is worse than
linear, thus sublinear. 2
3.5 Numerical results
In this section, we study and assess the performance of the majorization algorithm in
practice. First, we numerically compare majorization with other methods in the litera
ture. Second, we present an example with nonconstant weights. Third, we explain and
82
54 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
investigate the order eﬀect. Fourth and ﬁfth, we consider and study alternative versions
of the majorization algorithm.
Algorithm 1 has been implemented in a MATLAB package called major. It can
be downloaded from www.few.eur.nl/few/people/pietersz. The package consists of
the following ﬁles: clamp.m, dF.m, F.m, grad.m, guess.m, major.m, P tangent.m and
svdplus.m. The package can be run by calling [Yn,fn]=major(P,d,ftol,gradtol).
Here P denotes the input correlation matrix, d the desired rank, Yn the ﬁnal conﬁguration
matrix, fn denotes the ﬁnal objective function value, ftol the convergence tolerance
on the improvement of ϕ, and gradtol the convergence tolerance on the norm of the
gradient. The aforementioned webpage also contains a package majorw that implements
nonconstant weights for the objective function ϕ.
3.5.1 Numerical comparison with other methods
The numerical performance of the majorization algorithm was compared to the perfor
mance of the Lagrange multiplier method, geometric programming
4
and the parametriza
tion method. Additionally, we considered the function fmincon available in the MATLAB
optimization toolbox. MATLAB refers to this function as a ‘mediumscale constrained
nonlinear program’.
We have chosen to benchmark the algorithms by their practical importance, that is the
performance under a ﬁxed small amount of computational time. In ﬁnancial applications,
rank reduction algorithms are usually run for a very short time, typically 0.05 to 2 seconds,
depending on the size of the correlation matrix. We investigate which method produces,
in this limited amount of time, the best ﬁt to the original matrix.
The ﬁve algorithms were tested on random ‘interest rate’ correlation matrices that
are generated as follows. A parametric form for correlation matrices is posed in De Jong,
Driessen & Pelsser (2004, Equation (8)). We repeat here the parametric form for com
pleteness, that is,
ρ
ij
= exp
_
−γ
1
[t
i
−t
j
[ −
γ
2
[t
i
−t
j
[
max(t
i
, t
j
)
γ
3
−γ
4
¸
¸
√
t
i
−
_
t
j
¸
¸
_
, (3.15)
with γ
1
, γ
2
, γ
4
> 0 and with t
i
denoting the expiry time of rate i. (Our particular choice
is t
i
= i, i = 1, 2, . . . ) This model was then subsequently estimated with USD historical
interest rate data. In Table 3 of De Jong et al. (2004), the estimated γ parameters are
listed, along with their standard errors. An excerpt of this table is displayed in Table 3.1.
The random ﬁnancial matrix that we used is obtained by randomizing the γparameters
4
For geometric programming we used the MATLAB package LRCM MIN downloadable from
www.few.eur.nl/few/people/pietersz. The Riemannian Newtonalgorithm was applied.
83
3.5. NUMERICAL RESULTS 55
Table 3.1: Excerpt of Table 3 in De Jong et al. (2004).
γ
1
γ
2
γ
3
γ
4
estimate 0.000 0.480 1.511 0.186
standard error  0.099 0.289 0.127
in (3.15). We assumed the γparameters distributed normally with mean and standard
errors given by Table 3.1, with γ
1
, γ
2
, γ
4
capped at zero.
Hundred matrices were randomly generated, with n, d, and the computational time t
varied as (n = 10, d = 2, t = 0.05s), (n = 20, d = 4, t = 0.1s) and (n = 80, d = 20, t = 2s).
Subsequently the ﬁve algorithms were applied each with t seconds of computational time
and the computational time constraint was the only stopping criterion. The results are
presented in the form of performance proﬁles, as described in Dolan & Mor´e (2002). The
reader is referred there for the merits of using performance proﬁles. These proﬁles are an
elegant way of presenting performance data across several algorithms, allowing for insight
into the results. We brieﬂy describe the workings here. We have 100 test correlation
matrices p = 1, . . . , 100 and 5 algorithms s = 1, . . . , 5. The outcome of algorithm s on
problem p is denoted by Y
(p,s)
. The performance measure of algorithm s is deﬁned to be
ϕ(Y
(p,s)
). The performance ratio
(p,s)
is
(p,s)
=
ϕ(Y
(p,s)
)
min
s
_
ϕ(Y
(p,s)
)
_.
The cumulative distribution function φ
(s)
of the (‘random’) performance ratio p → ρ
(p,s)
is then called the performance proﬁle,
φ
(s)
(τ) =
1
100
#
_
(p,s)
≤ τ; p = 1, . . . , 100
_
.
A rule of thumb is, that the higher the proﬁle of an algorithm, the better its performance.
The quantity φ
(s)
(τ) for τ > 1 is the empirical probability that the achieved performance
measure of an algorithm s is less than τ times the performance measure of the algorithm
with the smallest (i.e., best) performance measure. The proﬁles are displayed in Figures
3.2, 3.3 and 3.4. From the performance proﬁles we may deduce that majorization is the
best overall performing algorithm in the numerical cases studied.
The tests were also run with a strict convergence criterion on the norm of the gradient.
Because the Lagrange multiplier algorithm has not been guaranteed to converge to a local
minimum, we deem an algorithm not to have converged after 30 seconds of CPU time.
The majorization algorithm still performs very well, but geometric programming and the
84
56 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
0
10
20
30
40
50
60
70
80
90
100
1 1.02 1.04 1.06 1.08 1.1
Performance ratio
P
e
r
c
e
n
t
a
g
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Lagrange
Newton
major
param
fmincon
fmincon
Lagrange
param
major
Newton
(
ĭ
)
(Ĳ)
Figure 3.2: Performance proﬁle for n = 10, d = 2, t = 0.05s.
Lagrange multiplier method perform slightly better when running up to convergence.
This can be expected from the sublinear rate of convergence of majorization versus
the quadratic rate of convergence of the geometric programming approach. The results
have not been displayed since these are not relevant in a ﬁnance setting. In ﬁnancial
practice, no additional computational time will be invested to obtain convergence up to
machine precision. Having found that majorization is the most eﬃcient algorithm in a
ﬁnance setting for the numerical cases considered, with the tests of running to convergence
we do warn the reader for using Algorithm 1 in applications outside of ﬁnance where
convergence to machine precision is required. For such nonﬁnance applications, we would
suggest a mixed approach: use majorization in an initial stage and ﬁnish with geometric
programming. It is the low cost per iterate that makes majorization so attractive in a
ﬁnance setting.
To assess the quality of the solutions found in Figures 3.2–3.4, we checked whether
the matrices produced by the algorithms were converging to a global minimum. Here, we
have the special case (only for equal weights) that we can check for a global minimum,
85
3.5. NUMERICAL RESULTS 57
0
10
20
30
40
50
60
70
80
90
100
1 1.2 1.4 1.6 1.8 2 2.2
Performance ratio
P
e
r
c
e
n
t
a
g
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Lagrange
Newton
major
param
fmincon
fmincon
Lagrange
param
major
Newton
(
ĭ
)
(Ĳ)
Figure 3.3: Performance proﬁle for n = 20, d = 4, t = 0.1s.
although in other minimization problems it may be diﬃcult to assess whether a minimum
is global or not. For clarity, we point out that the majorization algorithm does not
have guaranteed convergence to the global minimum, nor do any of the other algorithms
described in Section 3.2. We only have guaranteed convergence to a point with vanishing
ﬁrst derivative, and in such a point we can verify whether that point is a global minimum.
If a produced solution satisﬁed a strict convergence criterion on the norm of the gradient,
then it was checked whether such stationary point is a global minimum by inspecting the
Lagrange multipliers, see Zhang & Wu (2003), Wu (2003) and Grubiˇsi´c & Pietersz (2005,
Lemma 6.1). The reader is referred to Lemma 1 in Section 4.5.4 for details.
The percentage of matrices that were deemed global minima was between 95% and
100% for both geometric programming and majorization, respectively, for the cases n =
20, d = 4 and n = 10, d = 2. The Lagrange multiplier and parametrization methods
did not produce any stationary points within 20 seconds of computational time. The
percentage of global minima is high since the eigenvalues of ﬁnancial correlation matrices
are rapidly decreasing. In eﬀect, there are large diﬀerences between the ﬁrst 4 or 5
86
58 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
0
10
20
30
40
50
60
70
80
90
100
1 1.2 1.4 1.6 1.8 2 2.2
Performance ratio
P
e
r
c
e
n
t
a
g
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Lagrange
Newton
major
param
fmincon
fmincon
Lagrange
param
major
Newton
(
ĭ
)
(Ĳ)
Figure 3.4: Performance proﬁle for n = 80, d = 20, t = 2s.
consecutive eigenvalues. For the case n = 80, d = 20 it was more diﬃcult to check
the global minimum criterion since subsequent eigenvalues are smaller and closer to each
other. In contrast, if we apply the methods for all cases to random correlation matrices
of Davies & Higham (2000), for which the eigenvalues are all very similar, we ﬁnd that a
much lower percentage of produced stationary points were global minima.
3.5.2 Nonconstant weights
We considered the example with nonconstant weights described in Rebonato (2002, Sec
tion 9.3), in which a functional form for the correlation matrix is speciﬁed, that is,
ρ
ij
= LongCorr + (1 −LongCorr) exp
_
−β[t
i
−t
j
[
_
, i, j = 1, . . . , n.
The parameters are set to n = 10, LongCorr = 0.6, β = 0.1, t
i
= i. Subsequently
Rebonato presents the rank 2, 3, and 4 matrices found by the parametrization method for
the case of equal weights. The majorization algorithm was also applied and its convergence
criterion was set to machine precision for the norm of the gradient. Comparative results
87
3.5. NUMERICAL RESULTS 59
Table 3.2: Comparative results of the parametrization and majorization algorithms for
the example described in Rebonato (2002, Section 9.3.1).
d Gradϕ
F
ϕ ϕ I II CPU
major. major. Rebonato major.
2 210
−17
5.13110
−04
5.13710
−04
4110
−04
0.0210
−04
0.4s
3 210
−17
1.2630710
−04
1.2631110
−04
1510
−04
0.0110
−04
1.0s
4 210
−17
4.8510
−05
4.8610
−05
7010
−04
0.0110
−04
2.1s
for the parametrization and majorization algorithms are displayed in Table 3.2. Columns
I and II denote P
Approx
Reb
−P
Approx
major

F
and P
Approx
major, rounded
−P
Approx
major

F
, respectively. Here
‘Approx’ stands for the rankreduced matrix produced by the algorithm and ‘rounded’
stands for rounding the matrix after 6 digits, as is the precision displayed in Rebonato
(2002). Columns I and II show that the matrices displayed in Rebonato (2002) are not yet
fully converged up to machine precision, since the roundoﬀ error from displaying only 6
digits is much smaller than the error in obtaining full convergence to the stationary point.
Rebonato proceeds by minimizing ϕ for rank 3 with two diﬀerent weights matrices.
These weights matrices are chosen by ﬁnancial arguments speciﬁc to a ratchet cap and a
trigger swap, which are interest rate derivatives. The weights matrix W
(R)
for the ratchet
cap is a tridiagonal matrix
w
(R)
ij
= 1 if j = i −1, i, i + 1, w
(R)
ij
= 0, otherwise
and the weights matrix W
(T)
for the trigger swap has ones on the ﬁrst two rows and
columns
w
(T)
ij
= 1 if i = 1, 2 or j = 1, 2, w
(T)
ij
= 0, otherwise.
Rebonato subsequently presents solution matrices found by the parametrization method.
These solutions exhibit a highly accurate yet nonperfect ﬁt to the relevant portions of the
correlation matrices. In contrast, majorization ﬁnds exact ﬁts. The results are displayed
in Table 3.3.
3.5.3 The order eﬀect
The majorization algorithm is based on sequentially looping over the rows of the matrix
Y. In Algorithm 1, the row index runs from 1 to n. There is however no distinct reason to
start with row 1, then 2, etc. It would be equally reasonable to consider any permutation
88
60 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
Table 3.3: Results for the ratchet cap and trigger swap. Here ‘tar.’ denotes the target
value, ‘maj.’ and ‘Reb.’ denote the resulting value obtained by the majorization algorithm
and Rebonato (2002, Section 9.3), respectively.
Ratchet cap
First principal subdiagonal
CPU time major: 2.8s; obtained ϕ < 2 10
−30
tar. .961935 .961935 .961935 .961935 .961935 .961935 .961935 .961935 .961935
maj. .961935 .961935 .961935 .961935 .961935 .961935 .961935 .961935 .961935
Reb. .961928 .961880 .961977 .962015 .962044 .962098 .961961 .961867 .962074
Trigger swap
First two rows (or equivalently ﬁrst two columns)
CPU time major: 2.4s; obtained ϕ < 2 10
−30
Row 1 (without the unit entry (1,1))
tar. .961935 .927492 .896327 .868128 .842612 .819525 .798634 .779732 .762628
maj. .961935 .927492 .896327 .868128 .842612 .819525 .798634 .779732 .762628
Reb. .961944 .927513 .896355 .868097 .842637 .819532 .798549 .779730 .762638
Row 2 (without the unit entry (2,2))
tar. .961935 .961935 .927492 .896327 .868128 .842612 .819525 .798634 .779732
maj. .961935 .961935 .927492 .896327 .868128 .842612 .819525 .798634 .779732
Reb. .961944 .962004 .927565 .896285 .868147 .842650 .819534 .798669 .779705
89
3.5. NUMERICAL RESULTS 61
p of the numbers ¦1, . . . , n¦ and then let the row index run as p(1), p(2), . . . , p(n). A
priori, there is nothing to guarantee or prevent that the resulting solution point produced
with permutation p would diﬀer from or be equal to the solution point produced by the
default loop 1, . . . , n. This dependency of the order is termed ‘the order eﬀect’. The order
eﬀect is a bad feature of Algorithm 1 in general. We show empirically that the solutions
produced by the algorithm can diﬀer when using a diﬀerent permutation. However, we
show that this is unlikely to happen for ﬁnancial correlation matrices. The order eﬀect
can have two consequences. First, the produced solution correlation matrix can diﬀer –
this generally implies a diﬀerent objective function value as well. Second, even when the
produced solution correlation matrix is equal, the conﬁguration Y can diﬀer – in this
case we have equal objective function values. To see this, consider a n d conﬁguration
matrix Y and assume given any orthogonal d d matrix Q, that is, QQ
T
= I. Then
the conﬁguration matrices Y and YQ are associated with the same correlation matrices
5
:
YQQ
T
Y = YY
T
.
We investigated the order eﬀect for Algorithm 1 numerically, as follows. We generated
either a random matrix by (3.15), see Section 3.5.1, or a random correlation matrix in
MATLAB by
rand(’state’,0);randn(’state’,0);n=30;R=gallery(’randcorr’,n);
The random correlation matrix generator gallery(’randcorr’,n) has been described
in Davies & Higham (2000). Subsequently we generated 100 random permutations with
p=randperm(n);. For each of the permutations, Algorithm 1 was applied with d = 2 and
a high accuracy was demanded: ε
Gradϕ
= ε
ϕ
= 10
−16
. The results for the two diﬀerent
correlation matrices are as follows.
(Random interest rate correlation matrix as in (3.15).) Only one type of produced
solution correlation matrix could be distinguished, which turned out to be a global min
imum by inspection of the Lagrange multipliers. We also investigated the orthogonal
transformation eﬀect. For R
2
, an orthogonal transformation can be characterized by the
rotation of the two basis vectors and then by 1 or +1 denoting whether the second basis
vector is reﬂected in the origin or not. All produced matrices Y were diﬀerently rotated,
but no reﬂection occurred. The maximum rotation was equal to 0.8 degrees and the
standard deviation of the rotation was 0.2 degrees.
(Davies & Higham (2000) random correlation matrix.) Essentially four types of pro
duced solution correlation matrices could be distinguished, which we shall name I, II, III,
and IV. The associated objective function values and the frequency at which the types
occurred are displayed in Table 3.4. We inspected the Lagrange multipliers to ﬁnd that
5
The indeterminacy of the result produced by the algorithm can easily be resolved by either considering
only YY
T
or by rotation of Y into its principal axes. For the latter, let Y
T
Y = QΛQ
T
be an eigenvalue
decomposition. Then the principal axes representation is given by YQ.
90
62 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
Table 3.4: The order eﬀect. Here n = 30, d = 2 and 100 random permutations were
applied. Four types of produced correlation matrices could be distinguished. The table
displays the associated ϕ and frequency.
type I II III IV
ϕ 0.110423 0.110465 0.110630 0.110730
frequency 2% 88% 7% 3%
none of the four types was a global minimum. For type II, the most frequently produced
lowrank correlation matrix, we also investigated the orthogonal transformation eﬀect.
Out of the 88 produced matrices Y that could be identiﬁed with type II, all were diﬀer
ently rotated, but no reﬂection occurred. The maximum rotation was equal to 38 degrees
and the standard deviation of the rotation was 7 degrees.
From the results above, we conclude that the order eﬀect is not much of an issue for
the case of interest rate correlation matrices, at least not for the numerical setting that
we investigated.
3.5.4 Majorization equipped with the power method
Line 6 in Algorithm 1 uses the largest eigenvalue of a matrix, which can be implemented
in several diﬀerent ways. For example, our implementation in the MATLAB function
major implements lambda=max(eig(B)), which uses available MATLAB builtin func
tions. This choice of implementation unnecessarily calculates all eigenvalues whereas only
the largest is required. Instead, the algorithm can be accelerated by calculating only the
largest eigenvalue, for example with the power method, see Golub & van Loan (1996). We
numerically tested the use of the power method versus lambda=max(eig(B)), as follows.
In Figure 3.5, we display the natural logarithm of the relative residual versus the com
putational time for the random Davies & Higham (2000) matrix R included in the major
package, for both the power method and lambda=max(eig(B)). As can be seen from the
ﬁgure, the power method causes a signiﬁcant gain of computational eﬃciency. The power
method is available as majorpower at www.few.eur.nl/few/people/pietersz.
3.5.5 Using an estimate for the largest eigenvalue
In Algorithm 1, the largest eigenvalue of B is calculated by an eigenvalue decomposition
or by the power method. Such methods may be relatively expensive to apply. Instead
of a full calculation, we could consider ﬁnding an easytocalculate upper bound on the
91
3.5. NUMERICAL RESULTS 63
10
9
8
7
6
5
4
3
2
1
0
0 5 10 15 20
Computational time (s)
L
N
(
r
e
l
a
t
i
v
e
r
e
s
i
d
u
a
l
)
major power
major
Figure 3.5: Convergence run for the use of the power method versus lambda=max(eig(B)).
The relative residual is Gradϕ(Y
(i)
)
F
/Gradϕ(Y
(0)
)
F
. Here n = 80 and d = 3.
largest eigenvalue of B. Such upper bound is readily determined as n −1 due to the unit
length restrictions on the n − 1 vectors y
i
. Replacing λ and its calculation by n − 1 in
Algorithm 1 will result in a reduction of computational time by not having to calculate the
eigenvalue decomposition. A disadvantage is however that the resulting ﬁtted majorizing
function might be much steeper causing its minimum to be much closer to the point of
outset. In other words, the steps taken by the majorization algorithm will be smaller.
Whether to use n −1 instead of λ is thus a tradeoﬀ between computational time for the
decomposition and the stepsize.
We tested replacing λ by n − 1 for 100 correlation matrices of dimension 80 80.
These matrices were randomly generated with the procedure of Davies & Higham (2000).
We allowed both versions of the algorithm a computational time of less than 1 second.
We investigated d = 3, d = 6, d = 40 and d = 70. For all 400 cases, without a single
exception, the version of the algorithm with the full calculation of λ produced a matrix
that had a lower value ϕ than the version with n−1. This result suggests that a complete
calculation of the largest eigenvalue is most eﬃcient. However, these results could be
particular to our numerical setting. The ‘n − 1’ version of the algorithm remains an
interesting alternative and could potentially be beneﬁcial in certain experimental setups.
92
64 CHAPTER 3. RANK REDUCTION BY MAJORIZATION
3.6 Conclusions
We have developed a novel algorithm for ﬁnding a lowrank correlation matrix locally
nearest to a given matrix. The algorithm is based on iterative majorization and this
chapter is the ﬁrst to apply majorization to the area of derivatives pricing. We showed
theoretically that the algorithm converges to a stationary point from any starting point.
As an addition to the previously available methods in the literature, majorization was
in our simulation setup more eﬃcient than either geometric programming, the Lagrange
multiplier technique or the parametrization method. Furthermore, majorization is easier
to implement than any method other than modiﬁed PCA. The majorization method
eﬃciently and straightforwardly allows for arbitrary weights.
3.A Appendix: Proof of Equation (3.13)
Deﬁne the Algorithm 1 mapping y
(k+1)
i
= m
i
(y
(k)
i
, Y
(k)
). For ease of exposition we sup
press the dependency on the row index i and current state Y
(k)
, so y
(k+1)
= m(y
(k)
),
with
m(y) =
z
z
, z = (λI −B)y +a,
where B depends on Y according to (3.9), λ is the largest eigenvalue of B and a =
j:i,=j
w
ij
ρ
ij
y
j
. We have locally around y
(∞)
, by ﬁrst order Taylor approximation
y
(k+1)
= y
(∞)
+Dm(y
(∞)
)
_
y
(k)
−y
(∞)
_
+O
_
_
_
_y
(k)
−y
(∞)
_
_
_
2
_
.
By straightforward calculation, the Jacobian matrix equals
Dm(y
(∞)
) =
_
I −
_
y
(∞)
__
y
(∞)
_
T
_
1
z
(∞)

(λI −B).
The matrix I − (y
(∞)
)(y
(∞)
)
T
is denoted by P
y
(∞) . Then, up to ﬁrst order in δ
(k)
=
y
(k)
−y
(∞)
,
y
(k+1)
−y
(∞)
≈ P
y
(∞)
1
z
(∞)

(λI −B)
_
y
(k)
−y
(∞)
_
= P
y
(∞)
1
z
(∞)

_
(λI −B)y
(k)
+a −
_
(λI −B)y
(∞)
+a
_
_
= P
y
(∞)
1
z
(∞)

_
z
(k)
−z
(∞)
_
=
z
(k)

z
(∞)

P
y
(∞)
_
y
(k)
−y
(∞)
_
, (3.16)
93
3.A. APPENDIX: PROOF OF EQUATION (3.11) 65
Figure 3.6: The equality P
y
(∞) (y
(k)
−y
(∞)
) = δ
(k)
_
1 −(δ
(k)
)
2
/4.
where in the last equality we have used P
y
(∞) y
(∞)
= 0. We note that, up to ﬁrst order in
δ
(k)
, z
(k)
/z
(∞)
 ≈ 1. The term P
y
(∞) (y
(k)
− y
(∞)
) can be calculated by elementary
geometry, see Figure 3.6. The projection operator P
y
(∞) sets any component in the
direction of y
(∞)
to zero and leaves any orthogonal component unaltered. The resulting
length P
y
(∞) (y
(k)
−y
(∞)
) has been illustrated in Figure 3.6. If we denote this length by
µ, then µ = sin(θ), where θ is the angle as denoted in the ﬁgure. Also sin(θ/2) = δ
(k)
/2
from which we obtain θ = 2 arcsin(δ
(k)
/2). It follows that
µ = sin
_
2 arcsin
_
δ
(k)
/2
_ _
= 2 sin
_
arcsin
_
δ
(k)
/2
_ _
cos
_
arcsin
_
δ
(k)
/2
_ _
= 2
_
δ
(k)
2
_
¸
1 −
_
δ
(k)
2
_
2
= δ
(k)
_
1 −
_
δ
(k)
_
2
/4 = δ
(k)
+O
_ _
δ
(k)
_
2
_
. (3.17)
The result δ
(k+1)
= δ
(k)
+O((δ
(k)
)
2
) follows by combining (3.16) and (3.17). 2
94
95
Chapter 4
Rank reduction of correlation
matrices by geometric programming
Geometric optimisation algorithms are developed that eﬃciently ﬁnd the nearest lowrank
correlation matrix. We show, in numerical tests, that our methods compare favourably
to the existing methods in the literature. The connection with the Lagrange multiplier
method is established, along with an identiﬁcation of whether a local minimum is a global
minimum. An additional beneﬁt of the geometric approach is that any weighted norm
can be applied. The problem of ﬁnding the nearest lowrank correlation matrix occurs as
part of the calibration of multifactor interest rate market models to correlation.
4.1 Introduction
The problem of ﬁnding the nearest lowrank correlation matrix occurs in areas such as
ﬁnance, chemistry, physics and image processing. The mathematical formulation of this
problem is stated in (3.4), in terms of the conﬁguration matrix Y. Here, we state the
problem in terms of correlation matrices. Let S
n
denote the set of real symmetric n n
matrices and let P be a symmetric nn matrix with unit diagonal. For C ∈ S
n
we denote
by C _ 0 that C is positive semideﬁnite. Let the desired rank d ∈ ¦1, . . . , n¦ be given.
The problem is then given by
Find C ∈ S
n
to minimize
1
2
P−C
2
subject to rank(C) ≤ d; c
ii
= 1, i = 1, . . . , n; C _ 0.
(4.1)
Here   denotes a seminorm on S
n
. The most important instance is
1
2
P−C
2
=
1
2
i<j
w
ij
(ρ
ij
−c
ij
)
2
, (4.2)
96
68 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
where W is a weights matrix consisting of nonnegative elements. In words: Find the
lowrank correlation matrix C nearest to the given nn matrix P. The choice of the
seminorm will reﬂect what is meant by nearness of the two matrices. The seminorm in
(4.2) is well known in the literature, and it is called the Hadamard seminorm, see Horn &
Johnson (1990). We note that the constraint set is nonconvex for d < n, which makes it
not straightforward to solve Problem (4.1) with standard convex optimization methods.
For concreteness, consider the following example. Suppose P is
_
_
1.0000 −0.1980 −0.3827
−0.1980 1.0000 −0.2416
−0.3827 −0.2416 1.0000
_
_
,
and W is the full matrix, w
ij
= 1. With the algorithm developed in this chapter, we solve
(4.1) with P as above and d = 2. The algorithm takes as initial input a matrix C
(0)
of
rank 2 or less, for example,
C
(0)
=
_
_
1.0000 0.9782 0.8982
0.9782 1.0000 0.9699
0.8982 0.9699 1.0000
_
_
,
and then produces a sequence of points on the constraint set that converges to the point
C
∗
=
_
_
1.0000 −0.4068 −0.6277
−0.4068 1.0000 −0.4559
−0.6277 −0.4559 1.0000
_
_
that solves (4.1). The constraint set and the points generated by the algorithm are
represented in Figure 4.1. The details of this representation are given in Section 4.5.2.
The blue point in the center and the green point represent, respectively, the target matrix
P and the solution point C
∗
. As the ﬁgure suggests, the algorithm has fast convergence
and the constraint set is a curved space.
This novel technique we propose, is based on geometric optimisation that can locally
minimize the objective function in (4.1) and which incorporates the Hadamard seminorm.
In fact, our method can be applied to any suﬃciently smooth objective function. Not all
other methods available in the literature that aim to solve (4.1) can handle an arbitrary
objective function, see the literature review in Section 3.2. We formulate the problem in
terms of Riemannian geometry. This approach allows us to use numerical methods on
manifolds that are numerically stable and eﬃcient, in particular the RiemannianNewton
method is applied. We show, for the numerical tests we performed, that the numerical
eﬃciency of geometric optimisation compares favourably to the other algorithms available
in the literature. The only drawback of the practical use of geometric optimisation is that
97
4.1. INTRODUCTION 69
Figure 4.1: The shell represents the set of 33 correlation matrices of rank 2 or less. The
details of this representation are given in Section 4.5.2.
the implementation is rather involved. To overcome this drawback, we have made available
a MATLAB implementation ‘LRCM min’ (lowrank correlation matrices minimization)
at www.few.eur.nl/few/people/pietersz.
We develop a technique to instantly check whether an obtained local minimum is a
global minimum, by adaptation of Lagrange multiplier results of Zhang & Wu (2003).
The novelty consists of an expression for the Lagrange multipliers given the matrix C,
whereas until now only the reverse direction (an expression for the matrix C given the
Lagrange multipliers) was known. The fact that one may instantly identify whether a
local minimum is a global minimum is very rare for nonconvex optimisation problems,
and that makes Problem (4.1), which is nonconvex for d < n, all the more interesting.
Problem (4.1) is important in ﬁnance, as it occurs as part of the calibration of the
multifactor LIBOR market model of Brace et al. (1997), Miltersen et al. (1997), Jamshid
ian (1997) and Musiela & Rutkowski (1997). This model is an interest rate derivatives
pricing model, as explained in Section 1.3.2, and it is used in some ﬁnancial institutions
for valuation and risk management of their interest rate derivatives portfolio. The num
ber of stochastic factors needed for the model to ﬁt to the given correlation matrix is
equal to the rank of the correlation matrix. This rank can be as high as the number of
forward LIBORs in the model, i.e., as high as the dimension of the matrix. The number
98
70 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
of LIBORs in the model can grow large in practical applications, for example, a model
with over 80 LIBORs is not uncommon. This implies that the number of factors needed
to ﬁt the model to the given correlation matrix can be high, too. There is much empirical
evidence that the term structure of interest rates is driven by multiple factors (three, four,
or even more), see the review article of Dai & Singleton (2003). Though the number of
factors driving the term structure may be four or more, the empirical work shows that it
is certainly not as high as, say, 80. This is one reason for using a model with a low number
of factors. Another reason is the enhanced eﬃciency when estimating the modelprice of
an interest rate derivative through Monte Carlo simulation. First, a lower factor model
simply requires drawing less random numbers than a higher factor model. Second, the
complexity of calculating LIBOR rates over a single time step in a simulation implemen
tation is of order n d, with n the number of LIBORs and d the number of factors, see
Joshi (2003b).
The importance of Problem (4.1) in ﬁnance has been recognized by many researchers.
In fact, the literature review of Section 3.2 refers to twenty articles or books addressing
the problem.
Due to its generality our method ﬁnds locally optimal points for a variety of other ob
jective functions subject to the same constraints. One of the most famous problems comes
from physics and is called Thomson’s problem. The Thomson problem is concerned with
minimizing the potential energy of n charged particles on the sphere in R
3
(d = 3). Geo
metric optimisation techniques have previously been applied to the Thomson problem by
Depczynski & St¨ockler (1998), but these authors have only considered conjugate gradient
techniques on a ‘bigger’ manifold, in which the freedom of rotation has not been fac
tored out. In comparison, we stress here that our approach considers a lower dimensional
manifold, which allows for Newton’s algorithm (the latter not developed in Depczynski &
St¨ockler (1998)). An implementation of geometric optimisation applied to the Thomson
problem has also been included in the ‘LRCM min’ package.
Finally, for a literature review of interest rate models, the reader is referred to Rebon
ato (2004a).
The chapter is organized as follows. In Section 4.2, the constraints of the problem
are formulated in terms of diﬀerential geometry. We parameterize the set of correlation
matrices of rank at most d with a manifold named the Cholesky manifold. This is a
canonical space for the optimisation of the arbitrary smooth function subject to the
same constraints. In Section 4.3, the Riemannian structure of the Cholesky manifold
is introduced. Formulas are given for parallel transport, geodesics, gradient and Hessian.
These are needed for the minimization algorithms, which are made explicit. In Section
4.4, we discuss the convergence of the algorithms. In Section 4.5, the application of the
algorithms to the problem of ﬁnding the nearest lowrank correlation matrix is worked out
99
4.2. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION 71
in detail. In Section 4.6, we numerically investigate the algorithms in terms of eﬃciency.
Finally, in Section 4.7, we conclude the chapter.
For a literature review on rank reduction methods, the reader is referred to Section
3.2.
4.1.1 Weighted norms
We mention two reasons for assigning nonconstant or nonhomogeneous weights in the
objective function of (4.2). First, in our setting P has the interpretation of measured
correlation. It can thus be the case that we are more conﬁdent of speciﬁc entries of the
matrix P. Second, the weighted norm of (4.2) has important applications in ﬁnance, see,
for example, Higham (2002), Rebonato (1999c) and Pietersz & Groenen (2004b).
The seminorm in the objective function ϕ can be (i) a Hadamard seminorm with
arbitrary weights per element of the matrix, as deﬁned in (4.2), or (ii) a weighted Frobenius
norm  
F,Ω
with Ω a positive deﬁnite matrix. Here X
2
F,Ω
= tr(XΩX
T
Ω). The
weighted Frobenius norm is, from a practical point of view, by far less transparent than
the Hadamard or weightsperentry seminorm (4.2). The geometric optimisation theory
developed in this chapter, and most of the algorithms mentioned in Section 3.2, can be
eﬃciently applied to both cases. The Lagrange multipliers and alternating projections
methods however can only be eﬃciently extended to the case of the weighted Frobenius
norm. The reason is that both these methods need to calculate a projection onto the
space of matrices of rank d or less. Such a projection, for the weighted Frobenius norm,
can be eﬃciently found by an eigenvalue decomposition. For the Hadamard seminorm,
such an eﬃcient solution is not available, to our knowledge, and as also mentioned in
Higham (2002, page 336).
4.2 Solution methodology with geometric optimisa
tion
We note that Problem (4.1) is a special case of the following more general problem:
Find C ∈ S
n
to minimize ˜ ϕ(C)
subject to rank(C) ≤ d; c
ii
= 1, i = 1, . . . , n; C _ 0.
(4.3)
In this chapter methods will be developed to solve Problem (4.3) for the case when ˜ ϕ is
twice continuously diﬀerentiable. In the remainder of the chapter, we assume d > 1, since
for d = 1 the constraint set consists of a ﬁnite number (2
n−1
) of points.
100
72 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
4.2.1 Basic idea
The idea for solving Problem (4.3) is to parameterize the constraint set by a manifold,
and subsequently utilize the recently developed algorithms for optimisation over mani
folds, such as Newton’s algorithm and conjugate gradient algorithms. Such geometric
optimisation has been developed by Smith (1993).
In Section 4.2.2, the constraint set is equipped with a topology, and we make an
identiﬁcation with a certain quotient space. In Section 4.2.3, it will be shown that the
constraint set as such is not a manifold; however a dense subset is shown to be a manifold,
namely the set of matrices of rank exactly d. Subsequently, in Section 4.2.4, we will deﬁne a
larger manifold (named Cholesky manifold), of the same dimension as the rankd manifold,
that maps surjectively to the constraint set. We may apply geometric optimisation on
the Cholesky manifold. The connection between minima on the Cholesky manifold and
on the constraint set will be established.
4.2.2 Topological structure
In this section, the set of nn correlation matrices of rank d or less is equipped with
the subspace topology from S
n
. We subsequently establish a homeomorphism (i.e., a
topological isomorphism) between the latter topological space and the quotient space of n
products of the d−1 sphere over the group of orthogonal transformations of R
d
. Intuitively
the correspondence is as follows: We can associate with an nn correlation matrix of rank
d a conﬁguration of n points of unit length in R
d
such that the inner product of points i
and j is entry (i, j) of the correlation matrix. Any orthogonal rotation of the conﬁguration
does not alter the associated correlation matrix. This idea is developed more rigorously
below.
Deﬁnition 1 The set of symmetric nn correlation matrices of rank at most d is deﬁned
by
C
n,d
=
_
C ∈ S
n
; Diag(C) = I, rank(C) ≤ d, C _ 0
_
.
Here I denotes the identity matrix and Diag denotes the map R
nn
→R
nn
, Diag(C)
ij
=
δ
ij
c
ij
, where δ
ij
denotes the Kronecker delta.
The set C
n,d
is a subset of S
n
. The latter space is equipped with the Frobenius norm
 
F
, which in turn deﬁnes a topology. We equip C
n,d
with the subspace topology.
In the following, the product of n unit spheres S
d−1
is denoted by T
n,d
. Elements of
T
n,d
are denoted as a matrix Y ∈ R
nd
, with each row vector y
i
of unit length. Denote
by O
d
the group of orthogonal transformations of dspace. Elements of O
d
are denoted
by a dd orthogonal matrix Q.
101
4.2. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION 73
Deﬁnition 2 We deﬁne the following right O
d
action
1
on T
n,d
:
T
n,d
O
d
→T
n,d
, (Y, Q) →YQ. (4.4)
An equivalence class ¦YQ : Q ∈ O
d
¦ associated with Y ∈ T
n,d
is denoted by [Y] and it
is called the orbit of Y. The quotient space T
n,d
/O
d
is denoted by M
n,d
. The canonical
projection T
n,d
→T
n,d
/O
d
= M
n,d
is denoted by π. Deﬁne the map
2
Ψ as
M
n,d
Ψ
−→C
n,d
, Ψ
_
[Y]
_
= YY
T
.
We consider a map Φ in the inverse direction of Ψ,
C
n,d
Φ
−→M
n,d
,
deﬁned as follows: For C ∈ C
n,d
take Y ∈ T
n,d
such that YY
T
= C. Such Y can always
be found as will be shown in Theorem 2 below. Then set Φ(C) = [Y]. It will be shown
in Theorem 2 that this map is well deﬁned. Finally, deﬁne the map S : T
n,d
→ C
n,d
,
S(Y) = YY
T
.
The following theorem relates the spaces C
n,d
and M
n,d
; the proof has been deferred
to Appendix 4.A.1.
Theorem 2 Consider the following diagram
T
n,d
C
n,d
M
n,d
= T
n,d
/O
d
E
S
c
Π
Φ
©
Ψ
(4.5)
with the objects and maps as in Deﬁnitions 1 and 2. We have the following:
(i) The maps Ψ and Φ are well deﬁned.
(ii) The diagram is commutative, i.e., Ψ◦ Π = S and Φ◦ S = Π.
(iii) The map Ψ is a homeomorphism with inverse Φ.
1
It is trivially veriﬁed that the map thus deﬁned is indeed an O
d
smooth action: YI = Y and
Y(Q
1
Q
2
)
−1
= (YQ
−1
2
)Q
1
−1
. Standard matrix multiplication is smooth.
2
Although rather obvious, it will be shown in Theorem 2 that this map is well deﬁned.
102
74 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
4.2.3 A dense part of M
n,d
equipped with a diﬀerentiable struc
ture
For an exposition on diﬀerentiable manifolds, the reader is referred to do Carmo (1992). It
turns out that M
n,d
is not a manifold, but a socalled stratiﬁed space, see, e.g., Duistermaat
& Kolk (2000). However there is a subspace of M
n,d
that is a manifold, which is the
manifold of equivalence classes of matrices of exactly rank d. The proof of the following
proposition has been deferred to Appendix 4.A.2.
Proposition 2 Let T
∗
n,d
⊂ T
n,d
be the subspace deﬁned by
T
∗
n,d
=
_
Y ∈ T
n,d
: rank(Y) = d
_
.
Then we have the following:
1. T
∗
n,d
is a submanifold of T
n,d
.
2. Denote by M
∗
n,d
the quotient space T
∗
n,d
/O
d
. Then M
∗
n,d
is a manifold of dimension
n(d −1) −d(d −1)/2.
As shown in Proposition 2, a subset M
∗
n,d
of M
n,d
is a manifold. In the following, we will
study charts given by sections of the manifold M
∗
n,d
that will ultimately lead to the ﬁnal
manifold over which will be optimized.
A section on M
∗
n,d
is a map Σ :  → T
∗
n,d
, with  open in M
∗
n,d
, such that Π◦ Σ =
id
M
∗
n,d
. Such a map singles out a unique matrix in each equivalence class. In our case we
can explicitly give such a map Σ. Let [Y] in M
∗
n,d
, and let I denote a subset of ¦1, . . . , n¦
with exactly d elements, such that dim(span(¦y
i
: i ∈ I¦)) = d, for Y ∈ [Y]. We
note that I is well deﬁned since any two Y
(1)
, Y
(2)
∈ [Y] are coupled by an orthogonal
transformation, see the proof of Theorem 2, and orthogonal transformations preserve
independence. The collection of all such I is denoted by 1
Y
. It is readily veriﬁed that 1
Y
is not empty. Let ≺ denote the lexicographical ordering, then (1
Y
, ≺) is a wellordered
set. Thus we can choose the smallest element, denoted by J(Y) = (j
1
, . . . , j
d
). Deﬁne
˜
Y ∈ R
dd
by taking the rows of Y from J
Y
, thus ˜ y
i
= y
j
i
. Deﬁne
˜
C =
˜
Y
˜
Y
T
. Since
˜
C is
positive deﬁnite, Cholesky decomposition can be applied to
˜
C, see for example Golub &
van Loan (1996, Theorem 4.2.5), to obtain a unique lowertriangular matrix
¯
Y such that
¯
Y
¯
Y
T
=
˜
C and ¯ y
ii
> 0. By Theorem 2, there exists a unique orthogonal matrix Q ∈ O
d
such that
¯
Y =
˜
YQ. Deﬁne Y
∗
= YQ. We note that Y
∗
is lowertriangular, since for
i / ∈ J
Y
, let p be the largest integer such that i > j
p
, then Y
∗
i
is dependent on y
∗
1
, . . . , y
∗
j
p
,
as J
Y
is the smallest element from 1
Y
, which implies a lowertriangular form for Y
∗
.
Then deﬁne 
Y
= ¦[Z] : J(Y) ∈ 1
Z
¦ ⊂ M
∗
n,d
. It is obvious that 
Y
and Π
−1
(
Y
) are
open in the corresponding topologies. Then
Σ
Y
: 
Y
→Π
−1
(
Y
), [Z] →Z
∗
, (4.6)
103
4.2. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION 75
is a section of 
Y
at Y. The following proposition shows that the sections are the charts
of the manifold M
∗
n,d
. The proof has been deferred to Appendix 4.A.3.
Proposition 3 The diﬀerentiable structure on M
∗
n,d
is the one which makes Σ
Y
: 
Y
→
Σ(
Y
) into a diﬀeomorphism.
4.2.4 The Cholesky manifold
In this section, we will show that, for the purpose of optimisation, it is suﬃcient to perform
the optimisation on a compact manifold that contains one of the sections. For simplicity
we choose the section Σ
Y
where J(Y) = ¦1, . . . , d¦. The image Σ
Y
(
Y
) is a smooth
submanifold of T
n,d
with the following representation in R
nd
_
_
y
T
1
. . . y
T
n
_
T
: y
1
= (1, 0, . . . , 0); y
i
∈ S
i−1
+
, i = 2, . . . , d;
y
i
∈ S
d−1
, i = d + 1, . . . , n
_
,
with S
i−1
+
embedded in R
d
by the ﬁrst i coordinates such that coordinate i is bigger than
0 and with the remaining coordinates set to zero. Also, S
d−1
is similarly embedded in R
d
.
We can consider the map S : T
n,d
→ C
n,d
restricted to Σ
Y
(
Y
), which is diﬀerentiable
since Σ
Y
(
Y
) is a submanifold of T
n,d
. The map S[
Σ
Y
(
Y
)
is a homeomorphism, in virtue
of Theorem 2.
For the purpose of optimisation, we need a compact manifold which is surjective with
C
n,d
. Deﬁne the following submanifold of T
n,d
of dimension n(d −1) −d(d −1)/2,
Chol
n,d
=
_
Y ∈ R
nd
: y
1
= (1, 0, . . . , 0);
y
i
∈ S
i−1
, i = 2, . . . , d; y
i
∈ S
d−1
, i = d + 1, . . . , n
_
,
which we call the Cholesky manifold. The Cholesky parametrization has been considered
before by Rapisarda et al. (2002), but these authors do not consider nonEuclidean geo
metric optimisation. The map S[
Chol
n,d
is surjective, in virtue of the following theorem,
the proof of which has been relegated to Appendix 4.A.4.
Theorem 3 If C ∈ C
n,d
, then there exists a Y ∈ Chol
n,d
such that YY
T
= C.
A function ˜ ϕ on C
n,d
can be considered on Chol
n,d
, too, via the composition
Chol
n,d
S
→C
n,d
˜ ϕ
→R, Y →YY
T
→ ˜ ϕ(YY
T
).
From here on, we will write ϕ(Y) := ˜ ϕ(YY
T
) viewed as a function on Chol
n,d
.
For a global minimum ϕ(Y) on Chol
n,d
, we have that YY
T
attains a global minimum
of ˜ ϕ on C
n,d
, since the map S : Chol
n,d
→ C
n,d
is surjective. For a local minimum, we
have the following theorem. The proof has been deferred to Appendix 4.A.5.
104
76 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
Theorem 4 The point Y attains a local minimum of ϕ on Chol
n,d
if and only if YY
T
attains a local minimum of ˜ ϕ on C
n,d
.
These considerations on global and local minima on Chol
n,d
show that, to optimize ˜ ϕ
over C
n,d
, we might as well optimize ϕ over the manifold Chol
n,d
. For the optimisation of
˜ ϕ over C
n,d
, there is no straightforward way to use numerical methods such as Newton and
conjugate gradient, since they require a notion of diﬀerentiability, but for optimisation of
ϕ on Chol
n,d
, we can use such numerical methods.
4.2.5 Choice of representation
In principle, we could elect another manifold
˜
M and a surjective open map
˜
M → C
n,d
.
We insist however on explicit knowledge of the geodesics and parallel transport, for this
is essential to obtaining an eﬃcient algorithm. We found that if we choose the Cholesky
manifold then convenient expressions for geodesics, etc., are obtained. Moreover, the
Cholesky manifold has the minimal dimension, i.e., dim(Chol
n,d
) = dim(M
∗
n,d
).
In the next section, the geometric optimisation tools are developed for the Cholesky
manifold.
4.3 Optimisation over the Cholesky manifold
For the development of minimization algorithms on a manifold, certain objects of the
manifold need to calculated explicitly, such as geodesics, parallel transport, etc. In this
section, these objects are introduced and made explicit for Chol
n,d
.
From a theoretical point of view, it does not matter which coordinates we choose to
derive the geometrical properties of a manifold. For the numerical computations however
this choice is essential because the simplicity of formulas for the geodesics and parallel
transport depends on the chosen coordinates. We found that simple expressions are
obtained when Chol
n,d
is viewed as a submanifold of T
n,d
, which, in turn, is viewed as a
subset of the ambient space R
nd
. This representation reveals that, to calculate geodesics
and parallel transport on Chol
n,d
, it is suﬃcient to calculate these on a single sphere.
The tangent space of the manifold Chol
n,d
at a point Y ∈ Chol
n,d
is denoted by
T
Y
Chol
n,d
. A tangent vector at a point Y is an element of T
Y
Chol
n,d
and is denoted by
∆.
4.3.1 Riemannian structure
We start with a review of basic concepts of Riemannian geometry. Our exposition follows
do Carmo (1992). Let M be an mdimensional diﬀerentiable manifold. A Riemannian
105
4.3. OPTIMISATION OVER THE CHOLESKY MANIFOLD 77
structure on M is a smooth map Y → ¸, )
Y
, which for every Y ∈ M assigns an inner
product ¸, )
Y
on T
Y
M, the tangent space at point Y. A Riemannian manifold is a
diﬀerentiable manifold with a Riemannian structure.
Let ϕ be a smooth function on a Riemannian manifold M. Denote the diﬀerential
of ϕ at a point Y by ϕ
Y
. Then ϕ
Y
is a linear functional on T
Y
M. In particular, let
Υ(t), t ∈ (−ε, ε), be a smooth curve on M such that Υ(0) = Y and
˙
Γ(0) expressed in
a coordinate chart (, x
1
, . . . , x
m
) is equal to ∆, then ϕ
Y
(∆) can be expressed in this
coordinate chart by
ϕ
Y
(∆) =
m
i=1
∂
∂x
i
(ϕ ◦ x
−1
i
)(Υ)
¸
¸
¸
t=0
(4.7)
The linear space of linear functionals on T
Y
M (the dual space) is denoted by (T
Y
M)
∗
.
A vector ﬁeld is a map on M that selects a tangent ∆ ∈ T
Y
M at each point Y ∈ M.
The Riemannian structure induces an isomorphism between T
Y
M and (T
Y
M)
∗
, which
guarantees the existence of a unique vector ﬁeld on M, denoted by Gradϕ, such that
ϕ
Y
(∆) = ¸Gradϕ, X)
Y
for all X ∈ T
Y
M. (4.8)
This vector ﬁeld is called the gradient of ϕ. Also, for Newton and conjugate gradient
methods, we have to use second order derivatives. In particular, we need to be able to
diﬀerentiate vector ﬁelds. To do this on a general manifold, we need to equip the manifold
with additional structure, namely the connection. A connection on a manifold M is a rule
∇
which assigns to each two vector ﬁelds Y
1
, Y
2
on M a vector ﬁeld ∇
Y
1
Y
2
on M,
satisfying the following two conditions:
∇
ϕY
1
+χY
2
Y
3
= ϕ∇
Y
1
Y
3
+ χ∇
Y
2
Y
3
, ∇
Y
1
(ϕY
2
) = ϕ∇
Y
1
(Y
2
) + (Y
1
ϕ)Y
2
, (4.9)
for ϕ, χ smooth functions on M and Y
1
, Y
2
, Y
3
vector ﬁelds on M.
Let Υ(t) be a smooth curve on M with tangent vector Y
1
(t) =
˙
Υ(t). A given family
Y
2
(t) of tangent vectors at the points Υ(t) is said to be parallel transported along Υ if
∇
Y
1
Y
2
= 0 on Υ(t), (4.10)
where Y
1
, Y
2
are vector ﬁelds that coincide with Y
1
(t) and Y
2
(t), respectively, on Υ(t).
If the tangent vector Y
1
(t) itself is parallel transported along Υ(t) then the curve Υ(t)
is called a geodesic. In particular, if (, x
1
, . . . , x
m
) is a coordinate chart on M and
¦X
1
, . . . , X
m
¦ the corresponding vector ﬁelds then the aﬃne connection ∇ on  can be
expressed by
∇
X
i
X
j
=
m
k=1
γ
k
i,j
X
k
. (4.11)
106
78 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
The functions γ
k
i,j
are smooth functions, called the Christoﬀel symbols for the connection.
In components, the geodesic equation becomes
¨ x
k
+
m
i,j=1
γ
k
i,j
˙ x
i
˙ x
j
= 0, (4.12)
where x
k
are the coordinates of Υ(t). On a Riemannian manifold there is a unique torsion
free connection compatible with the metric, called the LeviCivita connection. This means
that Christoﬀel symbols can be expressed as functions of a metric on M. We note that
(4.12) implies as well that, once we have determined the equation for the geodesic, we can
simply read oﬀ Christoﬀel symbols. With respect to an induced metric the geodesic is
the curve of shortest length between two points on a manifold. For a manifold embedded
in Euclidean space an equivalent characterization of a geodesic is that the acceleration
vector at each point along a geodesic is normal to the manifold so long as the curve is
traced with uniform speed.
We start by deﬁning Riemannian structures for T
n,d
and for the Cholesky manifold
Chol
n,d
. We use the LeviCivita connection, associated to the metric deﬁned as follows
on the tangent spaces. Both tangent spaces are identiﬁed with suitable subspaces of the
ambient space R
nd
, and subsequently the inner product for two tangents ∆
1
, ∆
2
is
deﬁned as
¸∆
1
, ∆
2
) = tr∆
1
∆
T
2
, (4.13)
which is the Frobenius inner product for n d matrices. We note that, in our special
case, the inner product ¸, )
Y
is independent of the point Y; therefore we suppress the
dependency on Y.
4.3.2 Normal and tangent spaces
An equation determining tangents to T
n,d
at a point Y can be obtained by diﬀerentiating
Diag(YY
T
) = I yielding Diag(Y∆
T
+∆Y
T
) = 0, i.e., Diag(∆Y
T
) = 0. The dimension
of the tangent space is n(d − 1). The normal space at the point Y is deﬁned to be the
orthogonal complement of the tangent space at the point Y, i.e., it consists of the matrices
N, for which tr∆N
T
= 0 for all ∆in the tangent space. It follows that the normal space is
n dimensional. It is straightforward to verify that if N = DY, where D is nn diagonal,
then N is in the normal space. Since the dimension of the space of such matrices is n, we
see that the normal space N
Y
T
n,d
at Y ∈ T
n,d
is given by
N
Y
T
n,d
=
_
DY ; D ∈ R
nn
diagonal
_
.
The projections Π
N
Y
T
n,d
and Π
T
Y
T
n,d
onto the normal and tangent spaces of T
n,d
are given
by
Π
N
Y
T
n,d
(∆) = Diag(∆Y
T
)Y and Π
T
Y
T
n,d
(∆) = ∆−Diag(∆Y
T
)Y,
107
4.3. OPTIMISATION OVER THE CHOLESKY MANIFOLD 79
respectively. The projection Π
T
Y
Chol
n,d
onto the tangent space of Chol
n,d
is given by
Π
T
Y
Chol
n,d
(∆) = Z
_
Π
T
Y
T
n,d
(∆)
_
,
with Z(∆) deﬁned by
z
ij
(∆) =
_
0 for j > i or i = j = 1,
δ
ij
otherwise.
4.3.3 Geodesics
It is convenient to work with the coordinates of the ambient space R
nd
. In this coordinate
system, geodesics on T
n,d
with respect to the LeviCivita connection obey the second order
diﬀerential equation
¨
Y +Γ
Y
(
˙
Y,
˙
Y) = 0, with Γ
Y
(∆
1
, ∆
2
) := Diag(∆
1
∆
T
2
)Y. (4.14)
To see this, we begin with the condition that Y(t) remains on T
n,d
,
Diag(YY
T
) = I. (4.15)
Diﬀerentiating this equation twice, we obtain,
Diag(
¨
YY
T
+ 2
˙
Y
˙
Y
T
+Y
¨
Y
T
) = 0. (4.16)
In order for Y() to be a geodesic,
¨
Y(t) must be in the normal space at Y(t), i.e.,
¨
Y(t) = D(t)Y(t) (4.17)
for some diagonal matrix D(t). To obtain an expression for D, substitute (4.17) into
(4.16), which yields (4.14).
The function Γ
Y
is the matrix notation of the Christoﬀel symbols, γ
k
ij
, with respect
to E
1
, . . . , E
nd
, the standard basis vectors of R
nd
. More precisely, ∇
E
i
E
j
=
nd
k=1
γ
k
ij
E
k
with γ
k
ij
deﬁned by
¸Γ
Y
(X
1
, X
2
), E
k
) =
nd
i,j=1
γ
k
ij
(X
1
)
i
(X
2
)
j
.
The geodesic at Y(0) ∈ T
n,d
in the direction ∆∈ T
Y(0)
T
n,d
is given by,
Y
i
(t) = cos( ∆
i
t
_
Y
i
(0) +
1
∆
i

sin
_
∆
i
t
_
∆
i
. (4.18)
for i = 1, . . . , n, per component on the sphere. By diﬀerentiating, we obtain an expression
for the evolution of the tangent along the geodesic:
˙
Y
i
(t) = −∆
i
 sin
_
∆
i
t
_
Y
i
(0) + cos
_
∆
i
t
_
∆
i
. (4.19)
Since Chol
n,d
is a Riemannian submanifold of T
n,d
it has the same geodesics.
108
80 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
4.3.4 Parallel transport along a geodesic
We consider this problem per component on the sphere. If ∆
(2)
∈ T
Y
(1) T
n,d
is parallel
transported along a geodesic starting from Y
(1)
in the direction of ∆
(1)
∈ T
Y
(1) T
n,d
, then
decompose ∆
(2)
in terms of ∆
(1)
,
∆
(2)
i
(t) = ¸∆
(1)
i
(0), ∆
(2)
i
(0))∆
(1)
i
(t) +R
i
, R
i
⊥∆
(1)
i
(0).
Then ∆
(1)
i
(t) changes according to (4.19) and R
i
remains unchanged. Parallel transport
from Y
(1)
to Y
(2)
deﬁnes an isometry T(Y
(1)
, Y
(2)
) : T
Y
(1) T
n,d
→ T
Y
(2) T
n,d
. When it is
clear in between which two points is transported, then parallel transport is denoted simply
by T. Since Chol
n,d
is a Riemannian submanifold of T
n,d
it has the same equations for
parallel transport.
4.3.5 The gradient
Since Chol
n,d
is a submanifold of R
nd
we can use coordinates of R
nd
to express the
diﬀerential ϕ
Y
of ϕ at the point Y, namely (ϕ
Y
)
ij
=
∂F
∂Y
ij
. The gradient Gradϕ of a
function ϕ on Chol
n,d
can be determined by (4.8). It follows that,
Gradϕ = Π
T
Y
Chol
n,d
(ϕ
Y
) = Z
_
ϕ
Y
−Diag(ϕ
Y
Y
T
)Y
_
. (4.20)
4.3.6 Hessian
The Hessian Hessϕ of a function ϕ is a second covariant derivative of ϕ. More precisely,
let ∆
1
, ∆
2
be two vector ﬁelds, then
Hessϕ(∆
1
, ∆
2
) = ¸∇
∆
1
Gradϕ, ∆
2
)
In local coordinates of R
nd
Hessϕ(∆
1
, ∆
2
) = ϕ
YY
(∆
1
, ∆
2
) −¸ϕ
Y
, Γ
Y
(∆
1
, ∆
2
)), (4.21)
where
ϕ
YY
(∆
1
, ∆
2
) =
d
dt
d
ds
¸
¸
¸
¸
t=s=0
ϕ(Y(t, s)), with
d
dt
¸
¸
¸
¸
t=0
Y = ∆
1
,
d
ds
¸
¸
¸
¸
s=0
Y = ∆
2
.
Newton’s method requires inverting the Hessian at minus the gradient, therefore we need
to ﬁnd the tangent ∆ to Chol
n,d
such that
Hessϕ(∆, X) = ¸−Gradϕ, X), for all tangents X to Chol
n,d
. (4.22)
109
4.4. DISCUSSION OF CONVERGENCE PROPERTIES 81
To solve (4.22), it is convenient to calculate the unique tangent vector H = H(∆) satis
fying
Hessϕ(∆, X) = ¸H, X), for all tangents X to Chol
n,d
,
since then the Newton Equation (4.22) becomes H(∆) = −Gradϕ. From (4.14) and
(4.21), we obtain
H(∆) = Π
T
Y
Chol
n,d
(ϕ
YY
(∆)) −Diag(ϕ
Y
Y
T
)∆, (4.23)
where the notation ϕ
YY
(∆) means the tangent vector satisfying
ϕ
YY
(∆) =
d
dt
¸
¸
¸
¸
t=0
ϕ
Y
(Y(t)),
˙
Y(0) = ∆.
4.3.7 Algorithms
We are now in a position to state the conjugate gradient algorithm, given as Algorithm
2, and the Newton algorithm, given as Algorithm 3, for optimisation over the Cholesky
manifold. These algorithms are instances of the geometric programs presented in Smith
(1993), for the particular case of the Cholesky manifold.
4.4 Discussion of convergence properties
In this section, we discuss convergence properties of the geometric programs: global con
vergence and the local rate of convergence.
4.4.1 Global convergence
First, we discuss global convergence for the RiemannianNewton algorithm. It is well
known that the Newton algorithm, as displayed in Algorithm 3, is not globally convergent
to a local minimum. Moreover, the steps in Algorithm 3 may even not be well deﬁned,
because the Hessian mapping could be singular. The standard way to resolve these issues,
is to introduce jointly a steepest descent algorithm. So Algorithm 3 is adjusted in the
following way. When the new search direction ∆
(k)
has been calculated, then we also
consider the steepest descent search direction ∆
(k)
Steep
= −Gradϕ(Y
(k)
). Subsequently, a
line minimization of the objective value is performed in both directions, ∆
(k)
and ∆
(k)
Steep
.
We then take as the next point of the algorithm whichever search direction ﬁnds the point
with lowest objective value. Such a steepest descent method with line minimization is
well known to have guaranteed convergence to a local minimum.
Second, we discuss global convergence for conjugate gradient algorithms. For the
Riemannian case, we have not seen any global convergence results for conjugate gradient
110
82 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
Algorithm 2 Conjugate gradient for minimizing ϕ(Y) on Chol
n,d
Input: Y
(0)
, ϕ().
Require: Y
(0)
such that Y
(0)
(Y
(0)
)
T
= I.
Compute G
(0)
= Gradϕ(Y
(0)
) = Z(ϕ
Y
− Diag(ϕ
Y
(Y
(0)
)
T
)Y
(0)
) and set J
(0)
=
−G
(0)
.
for k = 0, 1, 2, ... do
Minimize ϕ
_
Y
(k)
(t)
_
over t where Y
(k)
(t) is a geodesic on Chol
n,d
starting from
Y
(k)
in the direction of J
(k)
.
Set t
k
= t
min
and Y
(k+1)
= Y
(k)
(t
k
).
Compute G
(k+1)
= Gradϕ(Y
(k+1)
) = Z
_
ϕ
Y
−Diag(ϕ
Y
(Y
(k+1)
)
T
)Y
(k+1)
_
.
Parallel transport tangent vectors J
(k)
and G
(k)
to the point Y
(k+1)
.
Compute the new search direction
J
(k+1)
= −G
(k+1)
+γ
k
TJ
(k)
where
_
_
_
γ
k
=
¸G
(k+1)
−TG
(k)
,G
(k+1)
)
¸G
(k)
,G
(k)
)
, PolakRibi`ere,
γ
k
=
[[G
(k+1)
[[
2
[[G
(k)
[[
2
, FletcherReeves.
Reset J
(k+1)
= −G
(k+1)
if k + 1 ≡ 0 mod n(d −1) −
1
2
d(d −1).
end for
Algorithm 3 Newton’s method for minimizing ϕ(Y) on Chol
n,d
.
Input: Y
(0)
, ϕ().
Require: Y
(0)
such that Diag(Y
(0)
(Y
(0)
)
T
) = I.
for k = 0, 1, 2, ... do
Compute G
(k)
= Gradϕ(Y
(k)
) = Z
_
ϕ
Y
−Diag(ϕ
Y
Y
T
)Y
_
.
Compute ∆
(k)
= −H
−1
G
(k)
, i.e. ∆
(k)
∈ T
Y
Chol
n,d
and
Z
_
ϕ
YY
(∆
(k)
) −Diag
_
ϕ
YY
(∆
(k)
)(Y
(k)
)
T
_
Y
(k)
_
−Diag(ϕ
Y
(Y
(k)
)
T
)∆
(k)
= −G
(k)
.
Move from Y
(k)
in direction ∆
(k)
to Y
(k)
(1) along the geodesic.
Set Y
(k+1)
= Y
(k)
(1).
end for
111
4.4. DISCUSSION OF CONVERGENCE PROPERTIES 83
algorithms in the literature. Therefore we focus on the results obtained for the ﬂat
Euclidean case. Zoutendijk (1970) and AlBaali (1985) establish global convergence of
the Fletcher & Reeves (1964) conjugate gradient method with line minimization. Gilbert
& Nocedal (1992) establish alternative line search minimizations that guarantee global
convergence of the Polak & Ribi`ere (1969) conjugate gradient method.
4.4.2 Local rate of convergence
Local rates of convergence for geometric optimisation algorithms are established in Smith
(1993), Edelman, Arias & Smith (1999) and Dedieu, Priouret & Malajovich (2003).
In Theorem 3.3 of Smith (1993), the following result is established for the Riemannian
Newton method. If
ˆ
Y is a nondegenerate stationary point, then there exists an open set
 containing
ˆ
Y, such that starting from any Y
(0)
in , the sequence of points produced
by Algorithm 3 converges quadratically to
ˆ
Y.
In Theorem 4.3 of Smith (1993), the following result is stated for the Riemannian
Fletcher & Reeves (1964) and Polak & Ribi`ere (1969) conjugate gradient methods. Sup
pose
ˆ
Y is a nondegenerate stationary point such that the Hessian at
ˆ
Y is positive deﬁnite.
Suppose ¦Y
(j)
¦
∞
j=0
is a sequence of points, generated by Algorithm 2, converging to
ˆ
Y.
Then, for suﬃciently large j, the sequence ¦Y
(j)
¦
∞
j=0
has dim(Chol
n,d
)steps quadratic
convergence to
ˆ
Y.
As a numerical illustration, convergence runs are displayed in Figure 4.2, for reducing
a 10 10 correlation matrix to rank 3. The following algorithms are compared:
1. Steepest descent, for which the search direction J
(k+1)
in Algorithm 2 is equal to
−G
(k+1)
, i.e., to minus the gradient. The steepest descent method has a linear local
rate of convergence, see Smith (1993, Theorem 2.3).
2. PRCG, PolakRibi`ere conjugate gradient.
3. FRCG, FletcherReeves conjugate gradient.
4. Newton.
5. Lev.Mar., the Levenberg (1944) & Marquardt (1963) method, which is a Newton
type method.
The code that is used for this test is the package ‘LRCM min’, to be discussed in Section
4.6. This package also contains the correlation matrix used for the convergence run test.
Figure 4.2 clearly illustrates the convergence properties of the various geometric programs.
The eﬃciency of the algorithms is studied in Section 4.6 below.
112
84 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
0 20 40 60 80 100 120 140 160 180 200
−35
−30
−25
−20
−15
−10
−5
0
Iterate (i)
L
o
g
r
e
l
a
t
i
v
e
r
e
s
i
d
u
a
l
Steepest descent
PRCG
FRCG
Newton
Lev.− Mar
Figure 4.2: Convergence runs: log relative residual ln(Gradϕ(Y
(i)
)/Gradϕ(Y
(0)
))
versus the iterate i.
113
4.5. A SPECIAL CASE: DISTANCE MINIMIZATION 85
4.5 A special case: Distance minimization
In this section, the primary concern of this chapter to minimize the objective function of
(4.2) is studied. The outline of this section is as follows. First, some particular choices
for n and d are examined. Second, the diﬀerential and Hessian of ϕ are calculated.
Third, the connection with Lagrange multipliers is stated; in particular, this will lead to
an identiﬁcation method of whether a local minimum is a global minimum. Fourth, we
discuss the PCA with rescaling method for obtaining an initial feasible point.
4.5.1 The case of d = n
The case that P is a symmetric matrix and the closest positive semideﬁnite matrix C is
to be found allows a successive projection solution, which was shown by Higham (2002).
4.5.2 The case of d = 2, n = 3
A 3 3 symmetric matrix with ones on the diagonals is denoted by
_
_
1 x y
x 1 z
y z 1
_
_
.
Its determinant is given by
det = −
_
x
2
+ y
2
+ z
2
_
+ 2xyz + 1.
By straightforward calculations it can be shown that det = 0 implies that all eigenvalues
are nonnegative. The set of 3 by 3 correlation matrices of rank 2 may thus be represented
by the set ¦det = 0¦. To get an intuitive understanding of the complexity of the problem,
the feasible region has been displayed in Figure 4.1.
4.5.3 Formula for the diﬀerential of ϕ
We consider the speciﬁc case of the weighted Hadamard seminorm of (4.2). This semi
norm can be represented by a Frobenius norm by introducing the Hadamard product ◦.
The Hadamard product denotes entrybyentry multiplication. Formally, for two matrices
Aand B of equal dimensions, the Hadamard product A◦B is deﬁned by (A◦B)
ij
= a
ij
b
ij
.
The objective function (4.2) can then be written as
ϕ(Y) =
1
2
i<j
w
ij
(ρ
ij
−y
i
y
T
j
)
2
=
1
2
W
◦1/2
◦ Ψ
2
ϕ
=
1
2
¸W
◦1/2
◦ Ψ, W
◦1/2
◦ Ψ),
114
86 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
with Ψ := YY
T
−P and with (W
◦1/2
)
ij
=
√
w
ij
. Then
d
dt
ϕ(Y(t)) = ¸W
◦1/2
◦
˙
Ψ, W
◦1/2
◦ Ψ) = ¸
˙
Ψ, W◦ Ψ)
= ¸∆Y
T
+Y∆
T
, W◦ Ψ)
= ¸∆Y
T
, W◦ Ψ) +¸Y∆
T
, W◦ Ψ)
= ¸∆, 2(W◦ Ψ)Y) = ¸∆, ϕ
Y
), ∀∆.
Thus from (4.7) we have
ϕ
Y
= 2(W◦ Ψ)Y. (4.24)
Similarly, we may compute the second derivative
ϕ
YY
(∆) =
d
dt
¸
¸
¸
t=0
ϕ
Y
(Y(t)) = 2
_
(W◦ Ψ)∆+
_
W◦ (∆Y
T
+Y∆
T
)
_
Y
_
,
with Y() any curve starting from Y in the direction of ∆.
4.5.4 Connection normal with Lagrange multipliers
The following lemma provides the basis for the connection of the normal vector at Y versus
the Lagrange multipliers of the algorithm of Zhang & Wu (2003) and Wu (2003). The
result is novel since previously only an expression was known for the matrix Y given the
Lagrange multipliers. The result below establishes the reverse direction. This Lagrange
result will allow us to identify whether a local minimum is also a global minimum. That
we are able to eﬃciently determine whether a local minimum is a global minimum, is a
very rare phenomenon in nonconvex optimisation, and makes the rank reduction problem
(nonconvex for d < n) all the more interesting.
We note that the Lagrange theory is based on an eﬃcient expression of the lowrank
projection by an eigenvalue decomposition. Therefore the theory below can be extended
eﬃciently only for the Hadamard norm with equal weights and for the weighted Frobenius
norm, see also the discussion in Section 4.1.1. The proof of the following lemma has been
deferred to Appendix 4.A.6.
Lemma 1 Let Y ∈ T
n,d
be such that Gradϕ(Y) = 0. Here, Gradϕ is the gradient of ϕ
on T
n,d
, Gradϕ(Y) = Π
T
Y
T
n,d
(ϕ
Y
) = ϕ
Y
−Diag(ϕ
Y
Y
T
)Y, with ϕ
Y
in (4.24). Deﬁne
λ :=
1
2
diag
_
ϕ
Y
Y
T
_
and deﬁne P(λ) := P+Diag(λ). Then there exist a joint eigenvalue decomposition
P(λ) = QDQ
T
, YY
T
= QD
∗
Q
T
where D
∗
can be obtained by selecting at most d nonnegative entries from D (here if an
entry is selected it retains the corresponding position in the matrix).
115
4.6. NUMERICAL RESULTS 87
The characterization of the global minimum for Problem (4.1) was ﬁrst achieved in
Zhang & Wu (2003) and Wu (2003), which we repeat here: Denote by ¦C¦
d
a matrix
obtained by eigenvalue decomposition of C together with leaving in only the d largest
eigenvalues (in absolute value). Denote for λ ∈ R
n
: P(λ) = P+ Diag(λ). The proof of
the following theorem has been repeated for clarity in Appendix 4.A.7.
Theorem 5 (Characterization of the global minimum of Problem (4.1), see Zhang & Wu
(2003) and Wu (2003)) Let P be a symmetric matrix. Let λ
∗
be such that there exists
¦P+Diag(λ
∗
)¦
d
∈ C
n,d
with
Diag
_
¦P+Diag(λ
∗
)¦
d
_
= Diag(P). (4.25)
Then ¦P+Diag(λ
∗
)¦
d
is a global minimizer of Problem (4.1).
This brings us in a position to identify whether a local minimum is a global minimum:
Theorem 6 Let Y ∈ T
n,d
be such that Gradϕ(Y) = 0 on T
n,d
. Let λ and P(λ) be
deﬁned as in Lemma 1. If YY
T
has the d largest eigenvalues from P(λ) (in absolute
value) then YY
T
is a global minimizer to the Problem (4.1).
PROOF: Apply Lemma 1 and Theorem 5. 2
4.5.5 Initial feasible point
To obtain an initial feasible point Y ∈ T
n,d
we use the modiﬁed PCA method described
in Section 3.2.1. To obtain an initial feasible point in Chol
n,d
, we perform a Cholesky
decomposition as in the proof of Theorem 3.
We note that the condition in Section 3.2.1 of decreasing norm of the eigenvalues is
thus key to ensure that the initial point is close to the global minimum, see the result of
Theorem 6.
4.6 Numerical results
There are many diﬀerent algorithms available in the literature, as detailed in Section 3.2.
Some of these have an eﬃcient implementation, i.e., the cost of a single iteration is low.
Some algorithms have fast convergence, for example, the Newton method has quadratic
convergence. Algorithms with fast convergence usually require less iterations to attain
a predeﬁned convergence criterion. Thus, the realworld performance of an algorithm
is a tradeoﬀ between costperiterate and number of iterations required. A priori, it is
not clear which algorithm will perform best. Therefore, in this section, the numerical
performance of geometric optimisation is compared to other methods available in the
literature.
116
88 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
4.6.1 Acknowledgement
Our implementation of geometric optimisation over lowrank correlation matrices ‘LRCM
min’
3
is an adoption of the ‘SG min’ template of Edelman & Lippert (2000) (written in
MATLAB) for optimisation over the Stiefel and Grassmann manifolds. This template
contains four distinct wellknown nonlinear optimisation algorithms adapted for geomet
ric optimisation over Riemannian manifolds: Newton algorithm; dogleg step or Levenberg
(1944) and Marquardt (1963) algorithm; Polak & Ribi`ere (1969) conjugate gradient; and
Fletcher & Reeves (1964) conjugate gradient.
4.6.2 Numerical comparison
The performances of the following seven algorithms, all of these described in Sections 4.3
and 3.2, except for item 7 (fmincon), are compared:
1. Geometric optimisation, Newton (Newton).
2. Geometric optimisation, FletcherReeves conjugate gradient (FRCG).
3. Majorization, e.g., Pietersz & Groenen (2004b) (Chapter 3) (Major.).
4. Parametrization, e.g., Rebonato (1999b) (Param.).
5. Alternating projections without normal vector correction, e.g., Grubiˇsi´c (2002)
(Alt. Proj.).
6. Lagrange multipliers, e.g., Zhang & Wu (2003) (Lagrange).
7. fmincon, a MATLAB builtin mediumscale constrained nonlinear program (fmin
con).
We note that the ﬁrst two algorithms in this list have been developed in this chapter. The
algorithms are tested on a large number (one hundred) of randomly generated correlation
matrices. The beneﬁt of testing on many correlation matrices is, that the overall and
generic performance of the algorithms may be assessed. The random ﬁnancial correlation
matrices are generated as described in Section 3.5.1.
As the benchmark criterion for the performance of an algorithm, we take its obtained
accuracy of ﬁt given a ﬁxed amount of computational time. Such a criterion corresponds
to ﬁnancial practice, since decisions based on derivative valuation calculations often need
to be made within seconds. To display the comparison results, we use the stateoftheart
and convenient performance proﬁles; see Dolan & Mor´e (2002). The reader is referred
3
LRCM min can be downloaded from www.few.eur.nl/few/people/pietersz.
117
4.6. NUMERICAL RESULTS 89
1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08
0
10
20
30
40
50
60
70
80
90
100
Performance ratio ξ
P
r
e
f
o
r
m
a
n
c
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Ω
Param.
Alt. Proj.
Lagrange
fmincon
Major.
Newton FRCG
Newton
FRCG
Major.
Param.
Alt. Proj.
Lagrange
fmincon
Figure 4.3: Performance proﬁle with n = 30, d = 3, 2 seconds of computational time,
Hadamard norm with equal weights. A rule of thumb is, that the higher the graph of an
algorithm, the better its performance.
there for details, but the idea is described brieﬂy in Section 3.5.1. A rule of thumb is,
that the higher the proﬁle of an algorithm, the better its performance. The performance
proﬁles are displayed in Figures 4.3–4.6, for various choices of n, d, and computational
times. Each performance proﬁle represents a benchmark on 100 diﬀerent test interest rate
correlation matrices. For Figures 4.3–4.5, an objective function with equal weights is used.
For Figure 4.6, we use a Hadamard seminorm with nonconstant weights. These weights
are chosen so as to reﬂect the importance of the correlation entries for a speciﬁc trigger
swap, as outlined in, e.g., Rebonato (2004b, Section 20.4.3). For this speciﬁc trigger swap,
the ﬁrst three rows and columns are important. Therefore the weights matrix W takes
the form
w
ij
=
_
1 if i ≤ 3 or j ≤ 3,
0 otherwise.
From Figures 4.3–4.6 it becomes clear that geometric optimisation compares favoura
bly to the other methods available in the literature, with respect to obtaining the best ﬁt
to the original correlation matrix within a limited computational time.
118
90 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
1 1.1 1.2 1.3 1.4 1.5
0
10
20
30
40
50
60
70
80
90
100
Performance ratio ξ
P
r
e
f
o
r
m
a
n
c
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Ω
Param.
Alt. Proj.
Lagrange
fmincon
Major.
Newton
FRCG
Newton
FRCG
Major.
Param.
Alt. Proj.
Lagrange
fmincon
Figure 4.4: Performance proﬁle with n = 50, d = 4, 1 second of computational time,
Hadamard norm with equal weights.
4.7 Conclusions
We applied geometric optimisation tools for ﬁnding the nearest lowrank correlation ma
trix. The diﬀerential geometric machinery provided us with an algorithm more eﬃcient
than any existing algorithm in the literature, at least for the numerical cases consid
ered. The geometric approach also allows for insight and more intuition into the problem.
We established a technique that allows one to straightforwardly identify whether a local
minimum is a global minimum.
4.A Appendix: Proofs
4.A.1 Proof of Theorem 2
PROOF of (i). The maps Ψ and Φ are well deﬁned: To show that Ψ is well deﬁned, we
need to show that if Y
2
∈ [Y
1
], then Y
2
Y
T
2
= Y
1
Y
T
1
. From the assumption, we have
119
4.A. APPENDIX: PROOFS 91
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
0
10
20
30
40
50
60
70
80
90
100
Performance ratio ξ
P
r
e
f
o
r
m
a
n
c
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Ω
Param.
Alt. Proj.
Lagrange
fmincon
Major.
Newton
FRCG
Newton
FRCG
Major.
Param.
Alt. Proj.
Lagrange
fmincon
Figure 4.5: Performance proﬁle with n = 60, d = 5, 3 seconds of computational time,
Hadamard norm with equal weights.
that ∃Q ∈ O
d
: Y
2
= Y
1
Q. If follows that
Y
2
Y
T
2
= (Y
1
Q)(Y
1
Q)
T
= Y
1
QQ
T
Y
T
1
= Y
1
Y
T
1
,
which was to be shown.
To show that Φ is well deﬁned, we need to show:
(A) If C ∈ C
n,d
then there exists Y ∈ T
n,d
such that C = YY
T
.
(B) If X, Y ∈ T
n,d
, with XX
T
= YY
T
=: C then there exists Q ∈ O
d
such that
X = YQ.
Ad (A): Let
C = QDQ
T
, Q ∈ O
n
, D = Diag(D),
be an eigenvalue decomposition with d
ii
= 0 for i = d + 1, . . . , n. We note that such a
decomposition of the speciﬁed form is possible because of the restriction C ∈ C
n,d
. Then
note that
Q
√
D =
_
(Q
√
D)(:, 1 : d) 0
_
.
120
92 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
1 1.002 1.004 1.006 1.008 1.01 1.012 1.014 1.016 1.018
0
10
20
30
40
50
60
70
80
90
100
Performance ratio ξ
P
r
e
f
o
r
m
a
n
c
e
o
f
a
t
t
a
i
n
e
d
p
e
r
f
o
r
m
a
n
c
e
r
a
t
i
o
Ω
Param.
fmincon
Major.
Newton
FRCG
Newton
FRCG
Major.
Param.
fmincon
Figure 4.6: Performance proﬁle with n = 15, d = 3, 1 second of computational time,
trigger swap Hadamard seminorm.
Thus if we set Y = (Q
√
D)(:, 1 : d) then YY
T
= C and Y ∈ T
n,d
, which was to be
shown.
Ad (B): Let rank(X) = rank(Y) = rank(C) = k ≤ d. Without loss of generality, we
may assume that the ﬁrst k rows of X and Y are independent. We extend the set of
k row vectors ¦x
1
, . . . , x
k
¦ to a set of d row vectors ¦x
1
, . . . , x
k
, ˜ x
k+1
, . . . , ˜ x
d
¦, such that
the latter forms a basis of R
d
. Similarly, we obtain a basis ¦y
1
, . . . , y
k
, ˜ y
k+1
, . . . , ˜ y
d
¦ of
R
d
. It follows that there exists an orthogonal rotation Q, QQ
T
= I, such that Qx
i
= y
i
(i = 1, . . . , k), Q˜ x
i
= ˜ y
i
(i = k + 1, . . . , d). We note that then also Qx
i
= y
i
for
i = k + 1, . . . , n, by linearity of Q and since the last n − k row vectors are linearly
dependent on the ﬁrst k row vectors by assumption. It follows that XQ = Y, which was
to be shown. 2
PROOF of (ii). Diagram (4.5) is commutative: To show Ψ◦ Π = S: Let Y ∈ T
n,d
,
then Π(Y) = [Y] and Ψ([Y]) = YY
T
and also S(Y) = YY
T
. To show that Φ◦ S = Π:
Let Y ∈ T
n,d
, then S(Y) = YY
T
and Φ(YY
T
) = [Y] and also Π(Y) = [Y]. 2
PROOF of (iii). The map Ψ is a homeomorphism with inverse Φ: It is straightfor
ward to verify that Φ ◦ Ψ and Ψ ◦ Φ are both the identity maps. The map Ψ is thus
121
4.A. APPENDIX: PROOFS 93
bijective with inverse Φ. To show that Ψ is continuous, note that for quotient spaces
we have: The map Ψ is continuous if and only if Ψ ◦ Π is continuous (see for exam
ple Abraham, Marsden & Ratiu (1988), Proposition 1.4.8). In our case, Ψ ◦ Π = S
with S(Y) = YY
T
is continuous. The proof now follows from a wellknown lemma
from topology: A continuous bijection from a compact space into a Hausdorﬀ space is a
homeomorphism (see for example Munkres (1975), Theorem 5.6). 2
4.A.2 Proof of Proposition 2
1. It is suﬃcient to show that ¦Y ∈ R
nd
: rank(Y) = d¦ is open in R
nd
, since
T
∗
n,d
is open in T
n,d
if and only if ¦Y ∈ R
nd
: rank(Y) = d¦ is open in R
nd
.
Since the rank of a symmetric matrix is a locally constant function, it follows that
¦Y ∈ R
nd
: rank(Y) = d¦ = S
−1
(rank
−1
(d)) is an open subset of R
nd
, with
S(Y) = YY
T
as in Deﬁnition 2. 2
2. This part is a corollary of Theorem 1.11.4 of Duistermaat & Kolk (2000). This
theorem essentially states that for a smooth action of a Lie group on a manifold
the quotient is a manifold if the action is proper and free. First, we show that the
action of O
d
on T
∗
n,d
is proper
4
. Let
Ξ : T
∗
n,d
O
d
→T
∗
n,d
T
∗
n,d
, (Y, Q) →(YQ, Y)
and K a compact subset of T
∗
n,d
T
∗
n,d
. We have to show that Ξ
−1
(K) is compact.
By continuity of Ξ, Ξ
−1
(K) is closed in T
∗
n,d
O
d
. Because T
∗
n,d
O
d
is bounded it
follows that Ξ
−1
(K) is compact.
Second, we show that the O
d
action on T
∗
n,d
is free. Let Y ∈ T
∗
n,d
and Q ∈ O
d
such
that YQ = Y. Since rank(Y) = d, it follows from the proof of Theorem 2 (i) that
there exists precisely one Q ∈ O
d
such that YQ = Y. Thus, this Q must be the
identity matrix.
The dimension of M
∗
n,d
= dim(T
∗
n,d
) −dim(O
d
) = n(d −1) −
1
2
d(d −1). 2
4.A.3 Proof of Proposition 3
This part is a corollary of Theorem 1.11.4 of Duistermaat & Kolk (2000). This theorem
states that there is only one diﬀerentiable structure on the orbit space which satisﬁes the
following: Suppose that, for every [Y] ∈ M
∗
n,d
, we have an open neighbourhood  ⊆ M
∗
n,d
and a bijective map:
U : Π
−1
() → O
d
, Y →(Π(Y), V(Y)),
4
For a deﬁnition, see Duistermaat & Kolk (2000, page 53).
122
94 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
such that, for every Y ∈ Π
−1
(), Q ∈ O
d
, U(YQ) = (Π(Y), V(Y)Q). The diﬀerentiable
structure on M
∗
n,d
is the one which makes U into a diﬀeomorphism. The topology of M
∗
n,d
obtained in this manner is equal to the quotient topology.
Let Y ∈ T
∗
n,d
and Σ
Y
be a section over 
Y
deﬁned in (4.6). We deﬁne U
Y
:
Π
−1
(
Y
) → 
Y
O
d
as follows. For Z ∈ Π
−1
([Z]), [Z] ∈ 
Y
, there is a unique element
Q
Z
∈ O
d
such that Z = Σ
Y
([Z])Q
Z
. Then we deﬁne U
Y
by U
Y
(Z) = ([Z], Q
Z
). By
deﬁnition, we have that U
−1
Y
([Z], Q) = Σ
Y
([Z])Q. Since U
−1
Y
: Π
−1
(
Y
) → 
Y
O
d
is a
bijective map, we have that U
Y
is bijective, too. It can be easily veriﬁed that U
Y
satisﬁes
the condition U
Y
(YQ) = ([Y], Q
Y
Q) of Theorem 1.11.4 of Duistermaat & Kolk (2000)
stated above. It follows that U
Y
is a diﬀeomorphism if and only if Σ
Y
:  → Σ
Y
()
is a diﬀeomorphism. Thus, the diﬀerentiable structure on M
∗
n,d
is the one which makes
Σ
Y
: 
Y
→Σ(
Y
) into a diﬀeomorphism. 2
4.A.4 Proof of Theorem 3
Let C ∈ C
n,d
and suppose that rank(C) = k ≤ d. Then there is a Y ∈ T
n,k
such that
YY
T
= C, by Theorem 2. Apply to Y the procedure
5
outlined in Section 4.2.3, to obtain
a lowertriangular matrix Y
∗
∈ T
n,k
, such that Y
∗
(Y
∗
)
T
= C. A lowertriangular matrix
¯
Y ∈ Chol
n,d
that satisﬁes
¯
Y
¯
Y
T
= C can now easily be obtained by setting
¯
Y =
_
Y
∗
0
.¸¸.
n(d−k)
_
,
which was to be shown. 2
4.A.5 Proof of Theorem 4
First, we prove the ‘only if’ part. We note that it is suﬃcient to show that the map
S : Chol
n,d
→ C
n,d
is open. For then if Y attains a local minimum of ϕ on the open
neighbourhood  ⊂ Chol
n,d
, then S(Y) = YY
T
attains a local minimum of ˜ ϕ on the open
neighbourhood S() of YY
T
, since for any C
t
= Y
t
Y
tT
∈ S(), ˜ ϕ(C
t
) = ˜ ϕ(Y
t
Y
tT
) =
ϕ(Y
t
) ≥ ϕ(Y) = ˜ ϕ(YY
T
).
To show that S : Chol
n,d
→ C
n,d
is open, note that it is suﬃcient to show that
Π : Chol
n,d
→M
n,d
is open, since Ψ : M
n,d
→C
n,d
is a homeomorphism (see Proposition
2, item 3) and S = Ψ◦ Π.
Suppose, then, that  is open in Chol
n,d
. We have to show that Π
−1
(Π()) is open
in T
n,d
, by deﬁnition of the quotient topology of M
n,d
. We have
Π
−1
_
Π()
_
=
_
YQ : Y ∈ , Q ∈ O
d
_
.
5
The procedure in Section 4.2.3 is stated in terms of d, but k should be read there in this case.
123
4.A. APPENDIX: PROOFS 95
It is suﬃcient to show that the complement (Π
−1
(Π()))
c
is closed. Let ¦Y
(i)
¦ be a
sequence in (Π
−1
(Π()))
c
converging to Y, i.e. lim
i→∞
Y
(i)
− Y = 0. We can write
Y
(i)
= Z
(i)
Q
(i)
with Z
(i)
∈ 
c
and Q
(i)
∈ O
d
. Then,
lim
i→∞
Y
(i)
−Y = lim
i→∞
Z
(i)
Q
(i)
−Y = lim
i→∞
Z
(i)
−Y(Q
(i)
)
T
 = 0. (4.26)
Since 
c
O
d
is compact, there exists a convergent subsequence ¦(Z
(i
j
)
, Q
(i
j
)
)¦, with
Z
(i
j
)
→ Z
∗
∈ 
c
and Q
(i
j
)
→ Q
∗
, say. From (4.26) it follows that Z
∗
= Y(Q
∗
)
T
∈ 
c
,
which implies Y ∈ (Π
−1
(Π()))
c
.
The reverse direction is obvious since the map S : Chol
n,d
→C
n,d
is continuous. 2
4.A.6 Proof of Lemma 1
It is recalled from matrix analysis that C
1
and C
2
admit a joint eigenvalue decomposition
if and only if their Lie bracket [C
1
, C
2
] = C
1
C
2
− C
2
C
1
equals zero. Deﬁne
¯
P(λ) :=
−Ψ+Diag(λ). We note that 2Diag(λ)Y is the projection Π
N
Y
T
n,d
(ϕ
Y
) of ϕ
Y
onto the
normal space at Y. We note also that
YY
T
+
¯
P(λ) = P(λ). (4.27)
We calculate
¯
P(λ)Y =
_
−Ψ+Diag(λ)
_
Y = −
1
2
ϕ
Y
+
1
2
Π
N
Y
T
n,d
_
ϕ
Y
_
= 0. (4.28)
The last equality follows from the assumption that Gradϕ(Y) = 0, i.e. the diﬀerential
ϕ
Y
is normal at Y. (Here, Gradϕ(Y) denotes the gradient on T
n,d
.) It follows from
(4.28) and from the symmetry of
¯
P(λ) that
(i) YY
T
¯
P(λ) = 0 and also,
(ii) [YY
T
,
¯
P(λ)] = 0.
From (ii), YY
T
and
¯
P(λ) admit a joint eigenvalue decomposition, but then also jointly
with P(λ) because of (4.27). Suppose
¯
P(λ) = Q
¯
DQ
T
. From (i) we then have that d
∗
ii
and
¯
d
ii
cannot both be nonzero. The result now follows since YY
T
is positive semideﬁnite
and has rank less than or equal to d. 2
4.A.7 Proof of Theorem 5
Deﬁne the Lagrangian
(C, λ) := −P−C
2
ϕ
−2λ
T
diag(P−C), and
124
96 CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
v(λ) := min
_
(C, λ) : rank(C) = d
_
. (4.29)
We note that the minimization problem in (4.29) is attained by any ¦P(λ)¦
d
(see e.g.,
Equation (30) of Wu (2003)). For any C ∈ C
n,d
,
P−C
2
F
(a)
= −(C, λ
∗
)
(b)
≥ −v(λ
∗
)
(c)
= P−¦P(λ)¦
d

2
F
.
(This is the equation at the end of the proof of Theorem 4.4 of Zhang & Wu (2003).)
Here (in)equality
(a) is obtained from the property that C ∈ C
n,d
,
(b) is by deﬁnition of v, and
(c) is by assumption of (4.25). 2
125
Chapter 5
Fast driftapproximated pricing in
the BGM model
1
It is demonstrated that the forward rates process discretized by a single time step
together with a separability assumption on the volatility function allows for representation
by a lowdimensional Markov process. This in turn leads to eﬃcient pricing by, for
example, ﬁnite diﬀerences. We then develop a discretization based on the Brownian bridge
that is especially designed to have high accuracy for single time stepping. The scheme
is proven to converge weakly with order one. We compare the single time step method
for pricing on a grid with multistep Monte Carlo simulation for a Bermudan swaption,
reporting a computational speed increase by a factor 10, yet maintaining suﬃciently
accurate pricing.
5.1 Introduction
The BGM framework, developed by Brace et al. (1997), Miltersen et al. (1997) and
Jamshidian (1997), is now one of the most popular models for pricing interest rate deriva
tives. In the BGM framework almost all prices are computed using Monte Carlo simula
tion. An advantage of Monte Carlo is its applicability to almost any product. However, it
has the drawback of being computationally rather slow. In an attempt to limit the com
putational time, Hunter et al. (2001), J¨ackel (2002, Section 12.5) and Kurbanmuradov,
Sabelfeld & Schoenmakers (2002) introduced predictorcorrector drift approximations,
which reduce the Monte Carlo stage to single timestep simulation.
1
This chapter has been published in diﬀerent form as Pietersz, R., Pelsser, A. A. J. & van Regenmortel,
M. (2004), ‘Fast driftapproximated pricing in the BGM model’, Journal of Computational Finance
8(1), 93–124. An extended abstract of this chapter appeared as Pietersz, R., Pelsser, A. A. J. & van
Regenmortel, M. (2005), ‘Bridging Brownian LIBOR’, Wilmott Magazine 18, 98–103.
126
98 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
This chapter presents a signiﬁcant addition to the single time step pricing method. We
show that much more eﬃcient numerical methods (either numerical integration or ﬁnite
diﬀerences) may be used at the cost of a minor additional assumption, separability. The
latter is a nonrestrictive requirement on the form of the volatility function. The single
time step together with separability renders the state of the BGM model completely
determined by a lowdimensional Markov process. This enables eﬃcient implementation.
We give an example of the fast single time step pricing framework for Bermudan
swaptions. A comparison is made with prices obtained by leastsquares multitime step
Monte Carlo simulation in the BGM model. This includes the use of the Longstaﬀ &
Schwartz (2001) method.
The computational speed increase achieved with the use of ﬁnite diﬀerences for BGM
single time step pricing is the main result. This chapter also contains two other results:
• The ﬁrst result is a new time discretization using a Brownian bridge, as introduced
in Section 5.3, which is proven to have leastsquares error in a certain sense (to be
deﬁned) for single time step discretizations. In Section 5.5 it is shown numerically
that the Brownian bridge scheme outperforms (in the case of single time steps)
various other discretizations for the LIBORinarrears density test. In the ﬁrst part
of Section 5.6, we prove theoretically that the Brownian bridge scheme converges
weakly with order one when used for multitime step Monte Carlo. In the second
part of Section 5.6, we compare the Brownian bridge scheme numerically with other
discretizations for multitime steps.
• The second result is a method for measuring the accuracy of single time stepping.
This is the timing inconsistency test as outlined in Section 5.9.
A further application of the Brownian bridge drift approximation is its use in the likelihood
ratio method. This method, introduced by Broadie & Glasserman (1996), eﬃciently
estimates risk sensitivities for Monte Carlo pricing. The particular application of the
likelihood ratio method to the LIBOR market model has been developed by Glasserman
& Zhao (1999), who proposed the use of drift approximations.
The outline of this chapter is as follows. After setting out some basic notation and the
most important formulas for the BGM model, the single time step pricing framework is
developed, various discretization schemes are discussed and the Brownian bridge scheme
is introduced. The Brownian bridge scheme is then investigated theoretically and numer
ically for both single and multitime steps, respectively. Next, the proposed framework
is worked out for the onefactor case. This is followed by an example of the pricing of
Bermudan swaptions, both for a one and a twofactor model. A test is then developed
to assess the quality of single time steps. Finally, conclusions are drawn.
127
5.2. NOTATION FOR BGM MODEL 99
5.2 Notation for BGM model
In this section our notation of the BGM model is introduced.
We consider a BGM model, /
2
Such a model features n forward rates, f
i
, i = 1, . . . , n,
where forward i accrues from time t
i
to time t
i+1
, 0 < t
1
< < t
n+1
. Denote by α
i
the accrual factor over the period [t
i
, t
i+1
]. Denote by b
i
(t) the timet price of a discount
bond that expires at time t
i
. Bond prices and forward rates are linked by the relation
1 + α
i
f
i
(t) =
b
i
(t)
b
i+1
(t)
.
Each forward rate is driven by a ddimensional Brownian motion (where d is the number
of stochastic factors in the BGM model), w, as follows:
df
i
(t)
f
i
(t)
= ˜ µ
i
(t)dt +σ
i
(t) dw(t). (5.1)
Here σ
i
is the ddimensional volatility vector, and ˜ µ
i
is the drift term, whose form will
in general depend on the choice of probability measure. Throughout this chapter, we use
the numeraire probability measure associated with the bond maturing at time t
n+1
, the
so called terminal measure. There is a speciﬁc reason why we use the terminal measure,
and this is explained in Remark 2 of Section 5.3. For the terminal measure, the drift term
will have the following form for i < n:
˜ µ
i
(t, f
i+1
, . . . , f
n
) = −
n
k=i+1
α
k
f
k
σ
k
(t) σ
i
(t)
1 + α
k
f
k
. (5.2)
For i = n the drift term is zero. This simply expresses the wellknown fact that a forward
rate is a martingale under its associated forward measure.
For the remainder of this chapter it will be useful to have stochastic diﬀerential equa
tion (SDE) (5.1) in logarithmic form:
d log f
i
(t) = µ
i
(t)dt +σ
i
(t) dw
(n+1)
(t), (5.3)
µ
i
(t) = ˜ µ
i
(t) −
1
2
σ
i
(t)
2
.
Last, we introduce the notion of all available forward rates at a given point in time.
Deﬁne i(t) to be the smallest integer i such that t ≤ t
i
. Deﬁne f to consist of all forward
rates that have not yet expired at time t, i.e.,
f (t) =
_
f
i(t)
(t), . . . , f
n
(t)
_
. (5.4)
2
The construction of such a model may be found in, e.g., Musiela & Rutkowski (1997), Pelsser (2000)
or Brigo & Mercurio (2001).
128
100 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
5.3 Single time step method for pricing on a grid
The two key elements in the development of a method to price interest rate derivatives
in the BGM model by lowdimensional ﬁnite diﬀerences are:
• the forward rates process should be discretized by a single time step scheme; and
• the volatility structure should be separable, which permits the dynamics of the single
time step forward rates process to be represented by a lowdimensional Markov
process.
5.3.1 Justiﬁcation of the above assumptions
Because the forward rates are approximated by a singlestep scheme, the model will in
general no longer be arbitragefree. This timing inconsistency is addressed in Section 5.9,
where it is shown that its impact is negligible for most cases. The singlestep approxi
mation is accurate enough for the pricing of derivatives, as shown numerically in Section
5.8. At the end of this section we introduce a novel discretization scheme based on the
Brownian bridge that is especially designed for single time stepping. Its superiority (for
single time steps only) over other discretizations is established in Section 5.5.
We proceed by ﬁrst introducing notation for the single stepapproximated forward
rates process. This is followed by a statement of the separability assumption, after which
we establish the lowdimensional Markov representation result. Single time step dis
cretizations are then discussed, and we end by considering methods for pricing American
style options with Monte Carlo methods.
5.3.2 Notation
We assume as given a time discretization τ
0
< < τ
m
, m ≥ 1. A single time step
discretization is a discretization with m = 1. Deﬁne z
i
(u, v) =
_
v
u
σ
i
(t) dw
(n+1)
(t).
Given a scheme for the log rates
log f
i
(τ
j+1
) = log f
i
(τ
j
) + µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
+ z
i
(τ
j
, τ
j+1
) (5.5)
then denote by
f
A
i
(t) = f
i
(0) exp
_
µ
i
_
0, t, f (0), z(0, t)
_
+ z
i
(0, t)
_
its single time stepapproximated equivalent. Here µ is the “drift approximation” and it
is determined by the scheme applied, which may be the Euler, the predictorcorrector or
the Brownian bridge scheme. These schemes will be elaborated on in Section 5.4. The A
in f
A
stands for “approximated”. The vector z is deﬁned by analogy with f in (5.4).
129
5.3. SINGLE TIME STEP METHOD FOR PRICING ON A GRID 101
5.3.3 Separability
Deﬁnition 3 (Separability) A collection of volatility functions σ
i
: [0, t
i
] → R
d
, i =
1, . . . , n, is called “separable” if there exists a vectorvalued function σ : [0, t
n
] →R
d
and
vectors v
i
∈ R
d
, i = 1, . . . , n, such that
σ
i
(t) = v
i
σ(t) (5.6)
(no vector product; entrybyentry multiplication) for 0 ≤ t ≤ t
i
, i = 1, . . . , n.
Separability appears regularly in the context of requiring a process to be Markov. We men
tion three examples. First, we mention Ritchken & Sankarasubramanian (1995, Proposi
tion 2.1). Working in the HJM model (Heath, Jarrow & Morton 1992), they show that
separability is a necessary and suﬃcient condition on the volatility structure such that
the dynamics of the term structure may be represented by a twodimensional Markov
process. Second, we mention the Wiener chaos expansion framework of Hughston &
Rafailidis (2005). In this framework any interest rate model is completely characterized
by its socalled Wiener chaos expansion. The nth chaos expansion is represented by a
function φ
n
: R
n
+
→R that satisﬁes certain integrability conditions. If all φ
n
are separable,
the resulting interest rate model turns out to be Markov. Third, we mention the ﬁnite
dimensional Markov realizations for stochastic volatility forward rate models (see Bj¨ork,
Land´en & Svensson (2004)). Here a necessary condition for a stochastic volatility model
to have a ﬁnitedimensional Markov realization is that the drift term and each component
of the volatility term in the Stratonovich representation of the short rate SDE should be a
sum of functions that are separable in time to expiry and the stochastic volatility driver.
We give an example of a separable volatility function in the case of a onefactor model
(d = 1).
Example 1 (meanreversion) Following De Jong et al. (2004), the instantaneous volatil
ity may be speciﬁed as
σ
i
(t) = γ
i
e
−κ(t
i
−t)
. (5.7)
The constant κ is usually referred to as the meanreversion parameter.
5.3.4 Single time step method
The following proposition shows that a single time step plus separability yields low
dimensional representability.
Proposition 4 Suppose that / is a dfactor BGM model, for which the instantaneous
volatility structure is separable. Then the single time step discretized forward rates process
may be represented by a ddimensional Markov process.
130
102 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
PROOF: Deﬁne the Markov process x : [0, t
n
] →R
d
by
x(t) =
_
t
0
σ(s)dw
(n+1)
(s),
(entrybyentry multiplication) where σ is as in Deﬁnition 3. Then the single time step
process f
A
: [0, t
n
] →(0, ∞)
n−i(t)+1
at time t satisﬁes
f
A
i
(t) = f
i
(0) exp
_
µ
i
_
0, t, f (0), vx(t)
_
+v
i
x(t)
_
. (5.8)
Here µ
i
is deﬁned implicitly in (5.5) and v is a matrix of which row i is v
i
. The claim
follows, bar a clarifying remark:
The second term in the exponent of (5.8) is exactly equal to the stochastic part oc
curring in the BGM SDE (5.1), in virtue of the separability of the volatility structure:
_
t
0
σ
i
(s) dw(s) =
_
t
0
_
v
i
σ(s)
_
dw
(n+1)
(s)
= v
i
x(t),
where the notation of Deﬁnition 3 has been used. 2
Remark 1 The vector of single timestepped rates may be considered (if separability
holds) to be a timedependent function of the Markov process x, i.e.,
f
A
(t) = f
_
t, x(t)
_
,
for some function f . Hunt et al. (2000, Theorem 1) showed that this is impossible to
achieve for the true BGM forward rates themselves in the case when x is onedimensional
and under some technical restrictions.
Another essential building block for the fast single time step pricing framework is use of
the terminal measure. This is explained in the following remark.
Remark 2 (Choice of numeraire) For the workings of the fast single time step pricing
algorithm it is essential that the terminal measure be used. This is explained as follows. As
proven in Proposition 4, the timet single timestepped forward rates are fully determined
by x(t). This result holds for any choice of measure or numeraire. However, for the
terminal numeraire, the value of the numeraire at time t is fully determined by the forward
rate values at time t, but this does not hold in the case of, for example, the spot numeraire,
in that the latter is generally determined by bond values observed at earlier times. The
spot numeraire b
0
rolls its holdings over by the spot LIBOR account. Its timet
i
value is
b
0
(t
i
) =
1
i
j=1
b
j
(t
j−1
)
, t
0
:= 0.
131
5.4. DISCRETIZATIONS 103
Put in another way, the value of the spot numeraire is pathdependent, whereas that of the
terminal numeraire is not. For pricing on a grid it is essential that the numeraire value
is known given the value of x(t). Therefore the fast single time step framework requires
the use of the terminal numeraire.
5.3.5 Valuation of interest rate derivatives with the single time
step method
Interest rate derivatives with mild pathdependency may be valued by numerical integra
tion, by a lattice/tree or by ﬁnite diﬀerences, provided that the single timestepped rates
are used and the separability assumption holds. The derivatives that may be valued in
clude, but are not restricted to: caps, ﬂoors, European and Bermudan swaptions, trigger
swaps and discrete barrier caps.
5.4 Discretizations
We discuss four timediscrete approximation schemes of the log BGM SDE (5.3):
• Euler;
• predictorcorrector;
• Milstein secondorder scheme; and
• Brownian bridge.
The notation (Equation (5.5)) for a discretization of SDE (5.3) is recalled here:
log f
i
(τ
j+1
) = log f
i
(τ
j
) + µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
+ z
i
(τ
j
, τ
j+1
)
We implicitly deﬁne ˜ µ by
µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
= ˜ µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
−
1
2
_
τ
j+1
τ
j
σ
i
(s)
2
ds,
so as to remove the term common to the Euler, predictorcorrector and Brownian bridge
discretizations.
5.4.1 Euler discretization
The Euler discretization (see, for example, Kloeden & Platen (1999, Equation (9.3.1))
sets
˜ µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
= −
_
n
k=i+1
α
k
f
k
(τ
j
)σ
k
(τ
j
) σ
i
(τ
j
)
1 + α
k
f
k
(τ
j
)
_
(τ
j+1
−τ
j
).
132
104 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
5.4.2 Predictorcorrector discretization
The predictorcorrector discretization was introduced to the setting of LIBOR market
models by Hunter et al. (2001). The key idea is to use predicted information to more ac
curately estimate the contribution of the drift to the increment of the log rate. For the ter
minal measure, an iterative procedure may be applied that loops from the terminal forward
rate, n, to the spot LIBOR rate, i(t). Initially, we set ˜ µ
n
(τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)) = 0.
Then, for i = n −1, . . . , i(t),
˜ µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
= −
_
1
2
n
k=i+1
α
k
f
k
(τ
j
)σ
k
(τ
j
) σ
i
(τ
j
)
1 + α
k
f
k
(τ
j
)
+
1
2
n
k=i+1
α
k
f
k
(τ
j+1
)σ
k
(τ
j+1
) σ
i
(τ
j+1
)
1 + α
k
f
k
(τ
j+1
)
_
(τ
j+1
−τ
j
),
with f
k
(τ
j+1
) dependent on f
m
(τ
j
) and z
m
(τ
j
, τ
j+1
), m = k + 1, . . . , n.
5.4.3 Milstein discretization
The secondorder Milstein scheme (see, for example, Kloeden & Platen (1999, Equation
(14.2.1)) was introduced to the setting of LIBOR market models in the series of papers by
Glasserman & Merener (2003a, b and 2004). Moreover, these papers extended the conver
gence results to the case of jumpdiﬀusion with thinning, which is key to the development
of the jumpdiﬀusion LIBOR market model. Also, these papers considered discretizations
in various diﬀerent sets of state variables, such as forward rates, logforward rates, rela
tive discount bond prices and logrelative discount bond prices. In Glasserman & Merener
(2003b, 2004) it is shown numerically that the timediscretization bias of the logEuler
scheme is less than the bias of other discretizations, for example, in terms of the bonds.
The results of Glasserman and Merener thus justify the logtype discretization (5.5) used
in the present work.
The Milstein scheme can indeed be used to obtain a single time step discretization of
the forward rates process  and hence it may be applied to the single time step pricing
framework  but it is not particularly suited to single large time steps, as shown in the
numerical comparisons for single time step accuracy in Section 5.5. Therefore we omit
here the exact form of the scheme.
133
5.4. DISCRETIZATIONS 105
5.4.4 Brownian bridge discretization
Here we develop a novel discretization for the drift term. The idea is to calculate the
expectation of the drift integral given the (timechanged) Wiener increment.
˜ µ
i
_
τ
j
, τ
j+1
, f (τ
j
), z(τ
j
, τ
j+1
)
_
=
−E
(n+1)
_
_
τ
j+1
τ
j
n
k=i+1
α
k
f
k
(s)σ
k
(s) σ
i
(s)
1 + α
k
f
k
(s)
ds
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
.
(5.9)
The Brownian bridge scheme uses information present in the Wiener increment z(τ
j
, τ
j+1
)
to approximately determine the most likely value for the stochastic drift term. The
Brownian bridge scheme thus takes into account the speciﬁc form of drift terms for LIBOR
market model forward rates, whereby it can outperform general discretization schemes.
The Brownian bridge discretization is superior when a single time step is applied. This
is shown theoretically and numerically in Section 5.5. Viewed as a numerical scheme for
multistep discretizations, it converges weakly with order one, as will be shown in the
ﬁrst part of Section 5.6. In the multistep Monte Carlo numerical experiments of the
second part of Section 5.6, we show that the bias is signiﬁcantly less than for the Euler
discretization.
In the remainder of this section, we show how expression (5.9) can be calculated in
practice. In Section 5.5.1, we establish that the Brownian bridge scheme has leastsquares
error (in a yet to be deﬁned sense).
In practice, expression (5.9) can be approximated with high accuracy. The calculation
proceeds in four steps (it is indicated when a step contains an approximation):
Step 1 To calculate expression (5.9), the ﬁrst step is to note that the order of the
expectation and integral may be interchanged.
−E
(n+1)
_
_
τ
j+1
τ
j
n
k=i+1
α
k
f
k
(s)σ
k
(s) σ
i
(s)
1 + α
k
f
k
(s)
ds
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
=
−
_
τ
j+1
τ
j
E
(n+1)
_
n
k=i+1
α
k
f
k
(s)σ
k
(s) σ
i
(s)
1 + α
k
f
k
(s)
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
ds.
(5.10)
This is a straightforward application of Fubini’s theorem (see, for example, Williams
(1991, Section 8.2)).
Step 2 (approximation) For the purposes of calculating the conditional expected
value of expressions of the form f/(1 + αf), the forward rates are approximated
with a singlestep Euler discretization. We note that once this assumption has been
made, the drift no longer aﬀects the calculation. This stems from a property of
134
106 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
the Brownian bridge: a Wiener process with deterministic drift conditioned to pass
through a given point at some future time is always a Brownian bridge, indepen
dently of its drift prior to conditioning. Thus the estimation of the drift integral
(5.9) is the same whether it is assumed that the forward rates are driftless or whether
these follow a single time step Euler approximation.
−
_
τ
j+1
τ
j
E
(n+1)
_
n
k=i+1
α
k
f
k
σ
k
σ
i
1 + α
k
f
k
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
ds ≈
−
_
τ
j+1
τ
j
E
(n+1)
_
n
k=i+1
α
k
f
BB
k
σ
k
σ
i
1 + α
k
f
BB
k
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
ds,
(5.11)
where BB indicates the use of the Brownian bridge, and where we have suppressed
the dependence of time s. Formula (5.11) is thus obtained by approximating (5.10)
by using single step Euler dynamics for the forward rates instead of the true LI
BOR market model dynamics. Single step Euler dynamics imply Brownian bridge
dynamics, since we condition on timeτ
j+1
values.
We note that the assumption of singestep Euler discretization for the calculation
of expression (5.9) renders this calculation an approximation. In principle, the
approximation could aﬀect the quality of the discretization. We show numerically
that this is not the case in the LIBORinarrears case considered in Section 5.5.
Step 3 The conditional mean and conditional variance of the log forward rates are
calculated. See Appendix 5.A for details.
Step 4 (approximation) The drift expression (5.9) may be approximated by a single
numerical integration over time; the expectation term is approximated by inserting
the conditional mean of the forward rates process:
3
−
_
τ
j+1
τ
j
E
(n+1)
_
n
k=i+1
α
k
f
BB
k
σ
k
σ
i
1 + α
k
f
BB
k
¸
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
ds ≈
−
_
τ
j+1
τ
j
n
k=i+1
α
k
E
(n+1)
[f
BB
k
[T(τ
j
), z(τ
j
, τ
j+1
)]σ
k
σ
i
1 + α
k
E
(n+1)
[f
BB
k
[T(τ
j
), z(τ
j
, τ
j+1
)]
ds.
Remark 3 If a twopoint trapezoidal rule (i.e., the average of the begin and end points)
is used to evaluate the time integral in expression (5.9), the Brownian bridge reduces to
3
Alternatively, the expectation term could be evaluated by numerical integration as well, but this
is computationally expensive. The full numerical integration (“BB alternative”) has been compared
numerically in Section 5.5 with the meaninsertion approximation (“BB”); the loss in accuracy is negligible
on an absolute level. A theoretical error analysis of the meaninsertion approximation is given in Appendix
5.B.
135
5.5. THE BROWNIAN BRIDGE SCHEME FOR SINGLE TIME STEPS 107
the predictorcorrector scheme. In this sense, the predictorcorrector scheme is a special
case of the Brownian bridge scheme.
For illustration, MATLAB code is given in Appendix 5.C, implementing the Brownian
bridge scheme. The code implements a single timestep in a singlefactor model with
constant volatility. These simpliﬁcations are for clarity of exposition only and are, of
course, not a restriction imposed by the Brownian bridge scheme.
We end this section with a discussion of the method used in this chapter for pric
ing Americanstyle options with Monte Carlo. The method used is the regressionbased
method of Longstaﬀ & Schwartz (2001), which is a method of stochastic mesh type (see
Broadie & Glasserman (2004)). Convergence of the method to the correct price follows
generically from the asymptotic convergence property of stochastic mesh methods, as
shown by Avramidis & Matzinger (2004).
5.5 The Brownian bridge scheme for single time steps
In this section, we establish theoretically and numerically that the Brownian bridge scheme
has superior accuracy for single time steps.
5.5.1 Theoretical result
Consider a stochastic diﬀerential equation of the form
dx(t) = µ
_
t, x(t)
_
dt +Σ(t)dw(t). (5.12)
We note that the BGM log SDE (5.3) is of the above form. We consider a certain class
of discretizations:
Deﬁnition 4 Let the function ¯ µ(, , ) denote a single time step discretization of SDE
(5.12) with the following form:
y(τ
j+1
) = y(τ
j
) + ¯ µ
_
τ
j
, y(τ
j
), z(τ
j
, τ
j+1
)
_
+z(τ
j
, τ
j+1
). (5.13)
Here z(τ
j
, τ
j+1
) =
_
τ
j+1
τ
j
σ(s)dw(s). Any such discretization is said to use information
about the Gaussian increment to estimate the drift term.
We note that Euler, predictorcorrector and Brownian bridge are such schemes. The next
theorem states that, for the BGM setting, the Brownian bridge scheme (5.9) has least
squares error for a single time step over all discretizations that use information about the
Gaussian increment for the drift term.
136
108 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Lemma 2 Let ¦y¦ be a single time step discretization of SDE (5.12) that uses informa
tion about the Gaussian increment for the drift term. Consider the discretization expected
squared error
s
2
_
¦y¦
_
:= E
_
_
_
y(τ
j+1
) −x
¡τ
j
,y(τ
j
)¦
(τ
j+1
)
_
_
2
¸
¸
¸T(τ
j
)
_
.
Here x
¡t,y¦
denotes the solution of SDE (5.12) starting from (t, y). Then the discretization
¦y
∗
¦ that yields least squared error, s
2
, over all possible discretizations that use informa
tion about the Gaussian increment to estimate the drift term is deﬁned by
¯ µ
∗
_
τ
j
, y(τ
j
), z(τ
j
, τ
j+1
)
_
= E
_ _
τ
j+1
τ
j
µ
_
s, x
¡τ
j
,y(τ
j
)¦
(s)
_
ds
¸
¸
¸
¸
T(τ
j
), z(τ
j
, τ
j+1
)
_
. (5.14)
PROOF: Deﬁne
i :=
_
τ
j+1
τ
j
µ
_
s, x
¡τ
j
,y(τ
j
)¦
(s)
_
ds.
For ease of exposition we write z = z(τ
j
, τ
j+1
) and ¯ µ = ¯ µ(τ
j
, y(τ
j
), z), but we keep in
mind that ¯ µ is ¦T(τ
j
), z¦measurable. Also write E
t
[] := E[[T(t)]. Then let ¦y
t
¦ with
drift term ¯ µ
t
be a discretization of the form of Deﬁnition 4. First, we condition on z:
E
τ
j
_
¯ µ
t
−i
2
¸
¸
z
¸
≥ E
τ
j
_
E
τ
j
[i[z] −i
2
¸
¸
z
¸
= E
τ
j
_
¯ µ
∗
−i
2
¸
¸
z
¸
.
The inequality holds since expectation equals projection, and the latter has, by deﬁnition,
least squared error over all possible ¦T(τ
j
), z¦measurable drift terms. Continuing, we
ﬁnd
s
2
_
¦y
t
¦
_
= E
τ
j
_
¯ µ
t
−i
2
¸
= E
τ
j
_
E
τ
j
_
¯ µ
t
−i
2
¸
¸
z
¸
_
≥ E
τ
j
_
E
τ
j
_
¯ µ
∗
−i
2
¸
¸
z
¸
_
= s
2
_
¦y
∗
¦
_
,
i.e., y
∗
has less squared error than y
t
. As y
t
was an arbitrary discretization of the form
of Deﬁnition 4, the result follows. 2
5.5.2 LIBORinarrears case
We estimate numerically the accuracy in the LIBORinarrears test of the various schemes
of Section 5.4. We extend here the LIBORinarrears test of Hunter et al. (2001) by
including the Milstein and Brownian bridge schemes. The test is designed to measure the
accuracy of a single time step discretization. The idea of the test is brieﬂy described here;
for details the reader is referred to Hunter et al. (2001).
We consider the distribution of a forward rate under the measure associated with
the numeraire of a discount bond maturing at the ﬁxing time of the forward. We note
that the forward rate is not a martingale under such a measure as the natural payment
time of the forward is not the same as its ﬁxing time. An analytical formula for the
associated density, however, is known. We can thus compare the density obtained from
137
5.5. THE BROWNIAN BRIDGE SCHEME FOR SINGLE TIME STEPS 109
0
5
10
15
20
0.01% 0.10% 1.00% 10.00% 100.00%
f
d
e
n
s
i
t
y
analytical
Euler
predictcorr
Milstein
BB
BB
alternative
Series8
19
20
21
0.30% 0.40% 0.50% 0.60%
f
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.01% 0.10% 1.00% 10.00% 100.00%
f
e
r
r
o
r
i
n
d
e
n
s
i
t
y
analytical minus
Euler
analytical minus
predictor corrector
analytical minus
Milstein
analytical minus
Brownian bridge
analytical minus
Brownian bridge
alternative
analytical minus
Figure 5.1: Plots of the estimated densities and absolute errors in densities of various
single time step discretizations. The deal setup is the same as in Hunter et al. (2001);
the threemonth forward rate ﬁxing 30 years from today is set initially to 8% and its
volatility to 24%. The legend key “BB” denotes Brownian bridge and “BB alternative”
denotes full numerical integration of the expectation term. We note that three densities
have been added to the above ﬁgures compared with Figure 1 of Hunter et al. (2001):
Milstein and the two Brownian bridge schemes. On both ﬁgures, however, the diﬀerences
between the analytical and Brownian bridge densities are indiscernible to the eye. The
most notable addition is the Milstein density. Outside of the error graph, the Milstein
scheme reaches a maximum absolute error that is around twice the maximum absolute
error for the Euler scheme. The maximum absolute error in the density for the Brownian
bridge and its alternative are 10
−3
and 6 10
−4
, respectively. In this particular test
the Brownian bridge scheme thus achieves a reduction by a factor 100 in the maximum
absolute error over the predictorcorrector scheme, the latter being the second best scheme.
a single time step discretization with the analytical formula for the density. The results
of this test are displayed in Figure 5.1. It is shown (for the particular setup) that the
Brownian bridge scheme reduces the maximum error in the density by a factor 100 over
the predictorcorrector scheme.
138
110 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
5.6 The Brownian bridge scheme for multitime step
Monte Carlo
This section consists of two parts. First, we show theoretically that the Brownian bridge
scheme converges weakly with order one. Second, we estimate numerically the convergence
behavior of the various schemes of Section 5.4.
5.6.1 Weak convergence of the Brownian bridge scheme
In a ﬁnancial context, the interest lies in calculating the prices of derivatives, which are
in certain cases expectations of payoﬀ functions. Therefore we are interested mainly in
weak convergence of Monte Carlo simulations. The deﬁnition is recalled here and may be
found in, for example, Kloeden & Platen (1999, Section 9.7).
Deﬁnition 5 (Weak convergence) A scheme ¦y
ε
(τ
j
)¦ with maximum step size ε is said
to convergence weakly with order β to x if, for each function g with 2(β +1) polynomially
bounded derivatives, there exists a constant c such that, for suﬃciently small ε,
¸
¸
¸E
_
g
_
x(t)
_¸
−E
_
g
_
y
ε
(t)
_¸
¸
¸
¸ ≤ c ε
β
. (5.15)
A criterion that is easier to verify than the above deﬁnition is the concept of weak consis
tency, and under quite natural conditions it follows that weak consistency implies weak
convergence. The deﬁnition of weak consistency is recalled here, and may be found for
example on page 327 of Kloeden & Platen (1999). Here we develop the remainder of the
theory in terms of approximating an autonomous SDE, say,
dx(t) = µ
_
x(t)
_
dt +Σ
_
x(t)
_
dw(t), x(0) deterministic. (5.16)
However, the theory holds in more general cases too.
Deﬁnition 6 (Weak consistency) A scheme ¦y
ε
(τ
j
)¦ with maximum step size ε is weakly
consistent if there exists a function c = c(ε) with
lim
ε↓0
c(ε) = 0 (5.17)
such that
E
_
_
_
_
_
E
_
y
ε
(τ
j+1
) −y
ε
(τ
j
)
∆τ
j
¸
¸
¸
¸
T(τ
j
)
_
−µ
_
y
ε
(τ
j
)
_
_
_
_
_
2
_
≤ c(ε) (5.18)
and
E
_
_
_
_
_
E
_
1
∆τ
j
_
y
ε
(τ
j+1
) −y
ε
(τ
j
)
__
y
ε
(τ
j+1
) −y
ε
(τ
j
)
_
¯
¸
¸
¸T(τ
j
)
_
−Σ
_
y
ε
(τ
j
)
_
Σ
¯
_
y
ε
(τ
j
)
_
_
_
_
_
2
_
≤ c(ε).
(5.19)
139
5.6. THE BROWNIAN BRIDGE SCHEME FOR MULTITIME STEPS 111
Here ¦T(t)¦ is the ﬁltration generated by the Brownian motion driving SDE (5.16).
Kloeden and Platen prove the following theorem (see Theorem 9.7.4 of Kloeden & Platen
(1999)) linking weak consistency to weak convergence.
Theorem 7 (Linking weak consistency to weak convergence) Suppose that µ and Σ in
(5.16) are four times continuously diﬀerentiable with polynomial growth and uniformly
bounded derivatives. Let ¦y
ε
(τ
j
)¦ be a weakly consistent scheme with equitemporal steps
∆τ
j
= ε and initial value y
ε
(0) = x(0) which satisﬁes the moment bounds
E
_
max
j
[y
ε
(τ
j
)[
2q
_
≤ k
_
1 +[x(0)[
2q
_
, q = 1, 2, . . . and
E
_
1
ε
¸
¸
y
ε
(τ
j+1
) −y
ε
(τ
j
)
¸
¸
6
_
≤ c(ε), (5.20)
where c(ε) is as in Deﬁnition 6. Then y
ε
converges weakly to x.
In the proposition below we show that the Brownian bridge scheme with the proposed
calculation method is weakly consistent. The above theorem then allows us to deduce
that the Brownian bridge scheme converges weakly.
Proposition 5 (Brownian bridge scheme is weakly consistent) Assume that the volatility
functions σ
i
() are piecewise analytical on the model horizon [0, t
n
]. Then the Brownian
bridge scheme deﬁned by (5.9) and by the fourstep calculation method described in Section
5.4 is weakly consistent with the forward rates process deﬁned in (5.3).
PROOF: Without loss of generality, we may assume that the volatility functions are
analytical. Otherwise, due to the piecewise property of the volatility functions, we can
break up the problem into subproblems for which each has analytical volatility functions.
We note as well that all derivatives of the volatility functions are bounded because the
interval [0, t
n
] is compact.
We need only verify the consistency Equation (5.18) for the drift term. To achieve
this, deﬁne for i and for all τ ∈ [0, t
n
] and for all f the function g
¡i,τ,f ¦
: [0, t
n
−τ] →R,
g
¡i,τ,f ¦
(t) = −
n
k=i+1
α
k
f
k
1 + α
k
f
k
_
t
0
σ
k
(τ + s) σ
i
(τ + s)ds.
Due to the assumption that the volatility functions are analytical, it follows that the
function g
¡i,τ,f ¦
is analytical in t. Taylor’s formula states that there exists an error term
e
¡i,τ,f ¦
() depending on i, τ and f such that
g
¡i,τ,f ¦
(t) = g
¡i,τ,f ¦
(0) + t
∂g
¡i,τ,f ¦
∂t
(0) + e
¡i,τ,f ¦
(t) (5.21)
140
112 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
with
lim
t↓0
¸
¸
e
¡i,τ,f ¦
(t)
¸
¸
t
2
< ∞. (5.22)
Due to the analyticity, boundedness and limiting behaviour of the function h(x) = x/(1+
x), namely h ↑ 1 (h ↓ 0) as x →∞(x →−∞, respectively), we have that all its derivatives
are bounded. Viewed as a function [0, t
n
] [0, t
n
] R
n
→R,
(t, τ, f ) →g
¡i,τ,f ¦
(t),
we can thus ﬁnd a bound on the second derivative ∂
2
g
¡i,τ,f ¦
/∂t
2
, independent of (τ, f ).
Theorem 7.7 of Apostol (1967) then states that the error term in (5.21) may be chosen
independently of τ and f . Hence we ﬁnd that
g
¡i,τ,f ¦
(t) = t
_
−
n
k=i+1
σ
k
(τ) σ
i
(τ)
α
k
f
k
1 + α
k
f
k
_
+ e(t),
with e satisfying the secondorder Equation (5.22). Here we have used
g
¡i,τ,f ¦
(0) = 0 and
∂g
¡i,τ,f ¦
∂t
¸
¸
¸
¸
t=0
=
_
−
n
k=i+1
σ
k
(τ) σ
i
(τ)
α
k
f
k
1 + α
k
f
k
_
.
If Y
ε
denotes the Brownian bridge scheme, then
E
_
y
ε
i
(τ
j+1
) −y
ε
i
(τ
j
)
¸
¸
T(τ
j
)
¸
= g
¡i,τ
j
,Y
ε
(τ
j
)¦
(ε)
= ε
_
−
n
k=i+1
α
k
y
ε
k
(τ
j
)σ
k
(τ
j
) σ
i
(τ
j
)
1 + α
k
y
ε
k
(τ
j
)
_
+ e(ε).
We note that the term within braces is exactly drift term i evaluated at (τ
j
, Y
ε
(τ
j
)). It
follows that consistency Equation (5.18) holds with c(ε) equal to (e(ε)/ε)
2
. The function
c() is then quadratic in ε. 2
Corollary 1 (Brownian bridge scheme converges weakly with order one) Under the as
sumptions of Proposition 5, the Brownian bridge scheme deﬁned by (5.9) and by the four
step calculation method described in Section 5.4 converges weakly to the forward rates
process deﬁned in (5.3). It has order of convergence one.
PROOF: We only need verify the claim with regards to the order of convergence. In the
proof of Theorem 7 in Kloeden & Platen (1999), it is shown that the error term in the
weak convergence criterion (5.15) is less than
_
c(ε), with c() satisfying the requirements
(5.17), (5.18), (5.19) and (5.20). All these requirements can be met for the Brownian
bridge scheme with a quadratic function c. Taking the square root then yields ﬁrstorder
weak convergence for the Brownian bridge scheme. 2
141
5.6. THE BROWNIAN BRIDGE SCHEME FOR MULTITIME STEPS 113
5.6.2 Numerical results
We now turn to the second part of Section 5.6, in which the various discretization schemes
are compared numerically. A ﬂoating leg and a cap were valued with 10 million simulation
paths. This large number of paths was used because the time discretization bias for the
log rates is small compared to the standard error often observed with 10,000 paths. For
example, the Euler onestepperaccrual discretization relative bias for the ﬂoating leg and
the cap was estimated at 0.02% and 0.003%, whereas twice the standard error at 10,000
paths is 0.07% and 0.01%, respectively.
To obtain a biasestimate with minimal standard error, we jointly simulate the values
of individual payments in the ﬂoating leg and cap under their respective forward mea
sures. Such procedure ﬁlters out the discretization bias from the random noise in the
simulation. We note that, under the forward measure, there is no drift term and therefore
the associated payoﬀ is an unbiased estimator of the value of the contract. If we denote
by π
terminal
and π
fwd
the numerairedeﬂated contract payoﬀ in the terminal and forward
measure, respectively, then an unbiased estimator of the bias is π
terminal
−π
fwd
. Alterna
tively, we can benchmark against the analytical value π
analytical
of the ﬂoating leg or cap,
which yields the unbiased estimator of the bias π
terminal
−π
analytical
. The variances of the
two estimators are
Var
_
π
terminal
−π
fwd
¸
= Var
_
π
terminal
¸
+ Var
_
π
fwd
¸
−2Cov
_
π
terminal
, π
fwd
¸
(5.23)
Var
_
π
terminal
−π
analytical
¸
= Var
_
π
terminal
¸
(5.24)
If we assume Var
_
π
terminal
¸
≈ Var
_
π
fwd
¸
, then (5.23) becomes
Var
_
π
terminal
−π
fwd
¸
≈ 2
_
1 −ρ
_
π
terminal
, π
fwd
¸
_
Var
_
π
terminal
¸
(5.25)
Therefore, if the correlation term ρ[π
terminal
, π
fwd
] is larger than
1
2
, we have variance reduc
tion. In our numerical LIBOR tests we found ρ ≈ 0.999, which means that the variance
is reduced by a factor of 500. The benchmark against the forward measure payoﬀ is thus
also a useful tool when validating an implementation of a LIBOR market model, since a
bias that stems from an implementation error is more easily ﬁltered out from the random
noise of the MC simulation.
The results are presented in Figure 5.2. They show that the predictorcorrector, Mil
stein and Brownian bridge schemes have a time discretization bias that is hardly distin
guishable from the standard error of the estimate. The Euler scheme, however, has a clear
time discretization bias for larger time steps. We classify the schemes from best suited to
worst suited (for the particular numerical cases under consideration) using the criterion
of the minimal computational time required to achieve a bias that is indistinguishable
142
114 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Cap
1E07
0E+00
1E07
2E07
3E07
4E07
5E07
6E07
0 0.05 0.1 0.15 0.2 0.25 0.3
Time step (years)
E
s
t
i
m
a
t
e
d
b
i
a
s
Euler
Predictorcorrector
Milstein
BB
Floating leg
1E05
0E+00
1E05
2E05
3E05
4E05
5E05
6E05
7E05
0 0.2 0.4 0.6 0.8 1 1.2
Time step (years)
E
s
t
i
m
a
t
e
d
b
i
a
s
Euler
Predictorcorrector
Milstein
BB
Figure 5.2: Plots of the estimated biases for a ﬂoating leg and a cap for the Euler,
predictorcorrector, Milstein and Brownian bridge schemes. A singlefactor model was
applied. The ﬂoating leg is a sixyear deal, with the ﬁxings at 1, . . . , 5 years, and pay
ments of annual LIBOR at 2, . . . , 6 years. The cap is a 1.5year deal, with the ﬁxings
at 0.25, 0.5, . . . , 1.25 years, and payments of quarterly LIBOR above 5% (if at all) at
0.5, 0.75, . . . , 1.5 years. The market conditions are the same for both deals: all initial
forward rates are 6%, and all volatility is constant at 20%. The net present values of the
ﬂoating leg and cap are 0.24 and 0.013, respectively, on a notional of one unit of currency.
The error bars denote a 95% conﬁdence bound based on twice the sample standard error.
from the standard error at 10,000,000 paths. As Milstein is slightly faster than predictor
corrector, which in turn is faster than the Brownian bridge, we obtain: ﬁrst, Milstein;
second, predictorcorrector; third, Brownian bridge; and fourth, Euler. We stress here
that this classiﬁcation might be particular to the numerical cases that we considered. We
also stress that the strength of the Brownian bridge lies in single time steps rather than
in multitime steps.
5.7 An example: onefactor BGM framework with
drift approximations
This section illustrates the framework for fast single time step pricing in BGM by setting
it up in the special case of a onefactor model with a volatility structure as in Example
143
5.7. EXAMPLE: ONEFACTOR DRIFTAPPROXIMATED BGM 115
1. This structure may be written as follows:
σ
i
(t) = ˜ γ
i
e
κt
,
for certain constants ˜ γ
i
. The corresponding Markov factor, x, is then deﬁned as and
characterized by
x(t) =
_
t
0
e
κs
dw(s), x(t) ∼ ^
_
0, v(t)
_
, where
v(t) =
_
t
0
e
2κs
ds =
_
e
2κt
−1
2κ
, κ ,= 0,
t, κ = 0.
Prices may now be computed by either numerical integration or ﬁnite diﬀerences. In
the case of numerical integration, if π(t, x) denotes the numerairedeﬂated value of the
contingent claim, we have
π
_
0, x(0)
_
=
_
∞
−∞
π(t, x)p
_
x; 0, v(t)
_
dx
where t denotes the expiry of the contingent claim and p(; µ, v) denotes the Gaussian
density with mean µ and standard deviation
√
v. In case of ﬁnite diﬀerences, Feynman
Kac yields the following PDE for the price relative to the terminal bond:
∂π
∂t
+
1
2
e
2κt
∂
2
π
∂x
2
= 0, (5.26)
with use of appropriate boundary conditions. For example, for a Bermudan payer swaption
we have π(, −∞) ≡ 0, zero convexity ∂
2
π/∂x
2
≡ 0 at x = ∞, and exercise boundary
conditions at the exercise times.
5.7.1 A simple numerical example
We will evolve ﬁve annual (α
i
= 1) forward rates over a oneyear period. Forward rate
i accrues from year i until year i + 1, i = 1, . . . , 5. Take f
i
(0) = 7%, ˜ γ
i
= 25% and
κ = 15%; then v(1) ≈ 1.166196. Suppose that, after one year, the process x jumps to 1;
thus x(1) = 1. All computations are displayed in Table 5.1. Column (II) is determined
by (5.2). To evaluate the eﬀect of the Brownian bridge scheme over the Euler scheme,
the “driftfrozen” forward rates (where the drift is evaluated at time zero) are displayed
in column (V), using the equation (V) = (I) exp ((II) + (III) + (IV)). Then, we start
with computing the Brownian bridge scheme forward rate 5 and work back to forward
rate 1. Forward rate 5 is easily computed as no drift terms are involved. To compute
the drift term integral at time 1 for forward rate 4, we compute the drift term integral of
144
116 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Table 5.1: A simple numerical example.
(I) (II) (III) (IV) (V) (VI) (VII)
i f
i
(0) µ
i
(0) −
1
2
˜ γ
2
i
v(1) ˜ γ
i
x(1) Drift Equation Brownian
frozen (5.9)
i−1
bridge
f
i
(1) −(5.9)
i
f
i
(1)
5 7.00% 0.00000 −0.03644 0.25 8.67% −0.00569 8.67%
4 7.00% −0.00409 −0.03644 0.25 8.63% −0.00567 8.62%
3 7.00% −0.00818 −0.03644 0.25 8.60% −0.00564 8.57%
2 7.00% −0.01227 −0.03644 0.25 8.56% −0.00562 8.53%
1 7.00% −0.01636 −0.03644 0.25 8.53% 8.47%
145
5.8. EXAMPLE: BERMUDAN SWAPTION 117
(5.9) for forward rate 5. The result is displayed in column (VI). This we may then use
to compute the Brownian bridge scheme forward rate 4 (see column (VII)), where we use
the equation (VII)
i
= (I)exp(¦
n
j=i+1
(VI)
j
¦ + (III) + (IV)). Continuing, we compute the
drift for forward rate 3 using only the Brownian bridge forward rates 4 and 5. And so on
until all forward rates have been computed.
5.8 Example: Bermudan swaption
As an example of the single time step pricing framework, an analysis is made for Bermudan
swaptions in comparison with a BGM model combined with the leastsquares Monte Carlo
method introduced by Longstaﬀ & Schwartz (2001). The onefactor setup introduced in
the previous section was used with zero meanreversion.
Callable Bermudan and European payer swaptions were priced in a onefactor BGM
model for various tenors and noncall periods. The zero rates were taken to be ﬂat at 5%,
and the volatility of the forwards was set ﬂat at 15%. The Bermudans were priced on a
grid, the Europeans through numerical integration. The PDE was solved using an explicit
ﬁnitediﬀerence scheme. The explanatory variable in the leastsquares Monte Carlo was
taken to be the net present value (NPV) of the underlying swap. This was regressed on to
a constant and a linear term. These two basis functions yield suﬃciently accurate results
because the value of a Bermudan swaption increases almost linearly with the value of the
underlying swap.
Problems may possibly occur for Americanstyle derivatives in the single time step
framework. Since the framework is not arbitragefree, spurious early or delayed exercise
may take place to collect the arbitrage opportunity. The eﬀects of these phenomena have
been analyzed by comparing the exercise boundaries
4
and risk sensitivities of Longstaﬀ
Schwartz and single time step BGM. In both models the exercise rule turned out to be of
the following form: exercise whenever the NPV of the underlying swap, s, is larger than
a certain value s
∗
, which is then deﬁned to be the exercise boundary.
For a full description of the deal see Table 5.2. Results have been summarized in Table
5.3. Computational times may be found in Table 5.4. Exercise boundaries for the 8 year
deal are displayed in Figure 5.3, including conﬁdence bounds on the LongstaﬀSchwartz
4
In the LongstaﬀSchwartz case, the future discounted cashﬂows are regressed against the NPV of
the underlying swap with a constant and linear term – say, with coeﬃcients a and b. So the option is
exercised whenever s > a + bs ⇔ s > a/(1 − b) =: s
∗
, where it is assumed that b < 1, which turns out
to hold in practice. Hence the exercise boundary s
∗
may be computed from the regression coeﬃcients by
the above formula.
146
118 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Table 5.2: Speciﬁcation of the Bermudan swaption comparison deal.
Callable Bermudan swaption
Market data
Zero rates Flat at 5%
Volatility Flat at 15%
Product speciﬁcation
Tenor Variable (28 years)
Noncall period Variable
Call dates Semiannual
Pay/receive Pay ﬁxed
Fixed leg properties
Frequency Semiannual
Date roll None
Day count Half year = 0.5
Fixed rate 5.06978% (ATM)
Floating leg properties
Frequency Semiannual
Date roll None
Day count Half year = 0.5
Margin 0%
Numerics
Simulation paths 10,000
Finitediﬀerence scheme Explicit
LongstaﬀSchwartz
Explanatory variable Swap NPV
Basis function type Monomials
No. of basis functions 2 (constant and linear)
147
5.8. EXAMPLE: BERMUDAN SWAPTION 119
Table 5.3: Results of the Bermudan swaption comparison deal. The notation xNCy in the
ﬁrst column denotes an xyear underlying swap with a noncall period of y years. In case
of a European swaption, it means that the swaption is exercisable after y years exactly.
All prices and standard errors are in basis points.
Bermudan European
Drift Longstaﬀ Standard Drift Monte Standard
approx. Schwartz error Approx. Carlo error
BGM BGM BGM
2NC1 29.40 28.85 0.42 27.36 26.88 0.43
3NC1 64.33 62.78 0.83 53.78 52.92 0.83
4NC1 101.66 101.51 1.29 78.04 78.77 1.24
4NC3 44.09 43.59 0.70 42.93 42.55 0.71
5NC1 141.22 137.95 1.68 100.85 99.31 1.55
5NC3 89.25 86.75 1.34 83.08 80.83 1.36
6NC1 182.16 179.48 2.22 122.27 123.36 1.92
6NC3 134.88 136.43 2.01 120.60 123.06 2.03
6NC5 50.93 50.79 0.86 50.07 50.09 0.87
7NC1 224.40 221.38 2.61 142.93 140.66 2.19
7NC3 181.20 177.11 2.53 156.15 153.71 2.53
7NC5 101.84 100.59 1.64 97.28 96.57 1.65
8NC1 266.63 266.35 3.15 159.38 161.00 2.50
8NC3 226.55 226.94 3.14 185.20 190.98 3.08
8NC5 151.23 151.13 2.38 137.73 140.95 2.41
8NC7 54.20 53.70 0.96 52.38 53.12 0.96
148
120 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Table 5.4: Computational times for the Bermudan swaption comparison deal for a com
puter with a 700 MHz processor. The notation xNCy in the ﬁrst column denotes an x
year underlying swap with a noncall period of y years. In the single time step framework
Bermudans are priced on a grid and Europeans are priced through numerical integration.
All computational times are in seconds.
Bermudan European
Drift Longstaﬀ Drift Monte
approximated Schwartz approximated Carlo
BGM BGM BGM
2NC1 0.4 3.0 0.0 1.9
3NC1 0.4 6.6 0.1 3.7
4NC1 0.7 11.1 0.2 6.1
4NC3 0.2 4.5 0.1 3.4
5NC1 1.4 17.3 0.6 9.1
5NC3 0.3 9.0 0.1 6.2
6NC1 2.4 24.5 0.6 12.8
6NC3 0.7 14.6 0.2 9.8
6NC5 0.2 5.8 0.0 4.8
7NC1 4.0 33.1 0.8 16.8
7NC3 1.4 21.2 0.4 13.5
7NC5 0.3 11.4 0.2 8.6
8NC1 5.6 45.9 1.2 23.9
8NC3 2.2 30.2 0.6 18.8
8NC5 0.6 18.4 0.2 13.5
8NC7 0.1 7.4 0.0 7.8
boundaries.
5
We looked at exercise boundaries for other deals as well and these revealed
a similar picture. Risk sensitivities for the various deals are displayed in Figure 5.4.
5
The empirical covariance matrix of the regressionestimated coeﬃcients a and b may be used to obtain
the empirical variance of s
∗
. Denote random errors in a and b by a and b, respectively. If it is assumed
that these errors are relatively small, a Taylor expansion yields (ignoring secondorder terms)
s
∗
≈
a
1 −b
_
1 +
a
a
+
b
1 −b
_
.
We thus obtain the empirical variance of s
∗
(as well as its standard error). Assuming that s
∗
is normally
distributed, a 95% conﬁdence interval is given by plus and minus twice the standard error.
149
5.8. EXAMPLE: BERMUDAN SWAPTION 121
1 2 3 4 5 6 7
0
100
200
300
400
500
600
700
Exercise point (Y)
S
w
a
p
N
P
V
e
x
e
r
c
i
s
e
l
e
v
e
l
(
b
p
)
Drift approximated exercise boundary
Longstaff Schwartz exercise boundary
Figure 5.3: Exercise boundaries for the eightyear deal.
0
50
100
150
200
250
300
2
N
C
1
3
N
C
1
4
N
C
1
4
N
C
3
5
N
C
1
5
N
C
3
6
N
C
1
6
N
C
3
6
N
C
5
7
N
C
1
7
N
C
3
7
N
C
5
8
N
C
1
8
N
C
3
8
N
C
5
8
N
C
7
Bermudan swaption
D
e
l
t
a
(
P
a
r
a
l
l
e
l
s
h
i
f
t


s
c
a
l
e
d
t
o
s
h
i
f
t
o
f
1
0
b
p
)
Drift Approx BGM
Longstaff Schwartz
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2
N
C
1
3
N
C
1
4
N
C
1
4
N
C
3
5
N
C
1
5
N
C
3
6
N
C
1
6
N
C
3
6
N
C
5
7
N
C
1
7
N
C
3
7
N
C
5
8
N
C
1
8
N
C
3
8
N
C
5
8
N
C
7
Bermudan swaption
V
e
g
a
(
P
a
r
a
l
l
e
l
s
h
i
f
t


s
c
a
l
e
d
t
o
s
h
i
f
t
o
f
1
0
0
b
p
)
Drift Approx BGM
Longstaff Schwartz
Figure 5.4: Risk sensitivities: deltas and vegas with respect to a parallel shift in the zero
rates and caplet volatilities, respectively. The error bars for the LongstaﬀSchwartz prices
represent a 95% conﬁdence bound based on twice the empirical standard error.
150
122 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Table 5.5: BGM pricing simulation rerun for 500,000 paths using precomputed exercise
boundaries. The standard errors for both prices were virtually the same in all cases,
therefore only a single standard error is reported. All prices and standard errors are in
basis points.
BGM simulation price
LS precomputed DA precomputed Standard error
exercise boundaries exercise boundaries
2NC1 28.63 28.62 0.06
3NC1 62.80 62.77 0.12
4NC1 99.51 99.58 0.18
5NC1 138.38 138.55 0.24
6NC1 178.08 179.41 0.30
7NC1 221.51 222.49 0.36
8NC1 263.05 265.27 0.42
The results show that the single time step BGM pricing framework indeed prices the
Bermudan swaptions close to LongstaﬀSchwartz, including correct estimates of risk sen
sitivities for shortermaturity deals. In all cases the price diﬀerence is within twice the
standard error of the simulation. Moreover, the computational time involved is less by
a factor 10. We note that the exercise boundary is calculated slightly diﬀerently by the
LongstaﬀSchwartz and driftapproximated (DA) approach. Also, risk sensitivities for
longermaturity deals (seven to eight years) can be outside of the twostandarderror con
ﬁdence bound. The Brownian bridge drift approximation thus becomes worse for longer
maturity deals, as also explained in Section 5.9. To determine which approach computed
the best exercise boundaries, the BGM pricing simulation was rerun for 500,000 paths
using the precomputed exercise boundaries. The results, given in Table 5.5, show that
the driftapproximated exercise boundaries are not worse than their LongstaﬀSchwartz
counterparts and are even slightly better.
6
Hence there is no problem with the spurious
early exercise opportunities arising from the absence of noarbitrage in the fast single time
step framework. The nonarbitragefree issue is investigated further in the next section.
This section ends with the results for a twofactor model.
6
This does not necessarily mean that the DA framework outperforms LongstaﬀSchwartz because we
only regress on the NPV of the underlying swap. LongstaﬀSchwartz may possibly yield better exercise
boundaries when it is regressed on to more explanatory variables.
151
5.8. EXAMPLE: BERMUDAN SWAPTION 123
Table 5.6: Twofactor model comparison. 50,000 paths were used for the Longstaﬀ
Schwartz simulation. “Swap NPV only” and “All forward rates” indicate that Longstaﬀ
Schwartz regressed on only the NPV of the swap and on all forward swap rates, respec
tively. All prices and standard errors are in basis points.
Fast LongstaﬀSchwartz
drift Swap NPV All forward rates Standard
approximation only (benchmark) error
2NC1 25.45 23.27 24.64 0.2
3NC1 59.22 55.79 58.08 0.3
4NC1 94.67 89.54 93.00 0.5
5NC1 132.35 124.79 129.42 0.7
6NC1 171.41 162.89 169.76 0.9
7NC1 212.15 202.97 210.89 1.1
8NC1 252.49 242.59 251.88 1.3
9NC1 292.62 283.89 294.68 1.5
5.8.1 Twofactor model
We consider a twofactor model with the same setup as above with the exception of the
volatility structure, which we now take to be
df
i
(t)
f
i
(t)
= v
i,1
dw
(i+1)
1
(t) + v
i,2
dw
(i+1)
2
(t).
Here [v
i
[ = 15%. For a model with forward expiry structure t
1
< < t
n
, we take the
v
i
∈ R
2
to be
v
i
= (15%)
_
a
i
,
_
1 −a
2
i
_
, a
i
=
t
i
−t
1
t
n
−t
1
.
This instantaneous volatility structure is purely hypothetical. It has the property that
correlation steadily drops between more separated forward rates. To solve the two
dimensional PDE version of (5.26) we used the hopscotch method (see paragraph 48.5
of Wilmott (1998)). Results for the twofactor model are displayed in Table 5.6. In a
twofactor model (with decorrelation) the exercise decision no longer depends only on
the NPV of the underlying swap but also on all forward swap rates. We therefore take
the results with regression on all forward swap rates to be the benchmark. Indeed, the
driftapproximated prices agree more with the benchmark than with prices obtained when
LongstaﬀSchwartz regresses on the NPV of a single swap. The computational time for
152
124 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
1 0 2
Ĺ
x
x(1)
x(2)
time o
Figure 5.5: Timing inconsistency in the single time step framework for BGM.
the fast driftapproximated pricing twodimensional grid was, on average, only a quarter
of the computational time for Monte Carlo.
5.9 Test of accuracy of drift approximation
Besides the approximation of the drift, the framework (Proposition 4) contains a timing
inconsistency. The inconsistency is best described by an example (see Figure 5.5). Sup
pose that the underlying Markov process x jumps to x(2), say, in two years. We consider
computing the value of the forwards at year 2. We could jump immediately to year 2
and calculate the forwards there. Alternatively, we could consider ﬁrst calculating the
forwards at time 1 (under the assumption that x jumps to some value x(1)) and from
this point calculate the forwards at time 2 (assuming that x then jumps to the very same
x(2)). In general, the so computed forwards at time 2 will be diﬀerent.
In a way, any lowdimensional approximation of BGM will exhibit this timing incon
sistency. Consider the following. Given the value of x(t), we cannot determine all timet
forward rates. We do, however, know the value of f
n
(t) because f
n
has zero drift under
the terminal measure n + 1. The value of any other forward rate f
i
(t) does not depend
solely on the value of x(t) but is dependent on the whole path that x traversed on the
interval [0, t]. The framework for fast single time step pricing simply calculates the most
likely value of f
i
(t) given the value of x(t). If we start from a diﬀerent initial model state
(for example, if we start from the state determined by x(1)), then almost surely our guess
for the most likely value of f
i
(t) will be diﬀerent. In this way, it is not really fair to
consider this timing inconsistency, but we will nonetheless investigate it. In the following,
a test will be proposed to evaluate the size of the inconsistency error.
153
5.9. TEST OF ACCURACY OF DRIFT APPROXIMATION 125
20 10 0
time (Y)o
30
f
20
s t t
n+1
Figure 5.6: Setup for inconsistency test.
5.9.1 Test of accuracy of drift approximations based on no
arbitrage
The accuracy test is described by an example. We consider some time t at which forwards
i, . . . , n have not yet expired. The framework for fast driftapproximated pricing yields
timet forward rates as a function of x(t). Under the assumptions that the model state
is determined by the Markov process x, and that the framework is arbitragefree, the
fundamental arbitragefree pricing formula will yield values of forward rates at time s < t
as a function of x(s) given by the following formula:
7
f
AF
i
(s, x) =
1
α
i
_
b
AF
i
(s)/b
AF
n+1
(s)
b
AF
i+1
(s)/b
AF
n+1
(s)
−1
_
=
1
α
i
_
_
_
E
(n+1)
_
b
DA
i
(t)
b
DA
n+1
(t)
¸
¸
¸x(s) = x
_
E
(n+1)
_
b
DA
i+1
(t)
b
DA
n+1
(t)
¸
¸
¸x(s) = x
_ −1
_
_
_
(5.27)
where each of the abovestated t random variables should be evaluated at (t, x(t)). The
second equality follows from b
AF
i
/b
AF
n+1
being a martingale by the assumption of no ar
bitrage. The “arbitragefree” forward rates f
AF
i
(s, x) obtained in this way may then be
compared with forward rates f
DA
i
(s, x) obtained by single time stepping.
5.9.2 Numerical results for single time step test
The inconsistency test was performed under the following setup. Ten annual forward
rates were considered where forward rate i accrued from year i to i +1, for i = 20, . . . , 29.
Under the notation of the previous section, s was taken to be 10 years, t was taken to be
20 years and t
n+1
was taken to be 30 years. See also Figure 5.6. f
i
(0) was taken to be 5%,
and meanreversion, κ, was varied at 0%, 5% and 10%. The ˜ γ
i
were chosen such that the
volatility of the corresponding caplet was equal to some general volatility level v, which
was varied at 10%, 15% and 20%. Let sd denote the standard deviation of x(10). x(10)
7
Here the notations “AF” and “DA” indicate “arbitragefree” and “driftapproximated”, respectively.
154
126 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
Table 5.7: Quality of drift approximations: comparison of f
AF
20
(10) and f
DA
20
(10) under
diﬀerent x(10) moves for the volatility/meanreversion scenario 15%/10%. sd denotes the
standard deviation of x(10). All variables are evaluated at time t=10.
Brownian Bridge
x(10) f
AF
20
f
DA
20
f
DA
20
−f
AF
20
(%) (%) (bp)
−sd 3.75 3.81 5.11
−sd/2 4.23 4.27 4.03
0 4.77 4.79 2.37
+sd/2 5.38 5.38 0.05
+sd 6.07 6.03 3.47
Predictorcorrector
x(10) f
AF
20
f
DA
20
f
DA
20
−f
AF
20
(%) (%) (bp)
−sd 3.74 3.81 7.17
−sd/2 4.19 4.27 7.94
0 4.70 4.79 8.81
+sd/2 5.28 5.38 9.79
+sd 5.92 6.03 10.91
Table 5.8: Quality of drift approximations: Maximum of [f
AF
20
(10) −f
DA
20
(10)[ over x(10)
moves 0, ±sd/2, ±sd for diﬀerent volatility/meanreversion scenarios. sd denotes the stan
dard deviation of x(10). Diﬀerences are denoted in basis points.
Brownian Bridge
Mean Volatility level (v)
reversion 10% 15% 20%
0% 2.97 9.34 28.73
5% 2.56 8.21 19.46
10% 1.46 5.11 12.56
Predictorcorrector
Mean Volatility level (v)
reversion 10% 15% 20%
0% 2.86 8.60 37.45
5% 2.32 12.29 53.85
10% 1.69 10.91 44.59
moves were considered for 0, ±sd/2, and ±sd. Results for the volatility/meanreversion
scenario 15%/10% are given in Table 5.7. The comparison is only reported for f
20
because
this forward rate contains the most drift terms, and therefore its corresponding error is
the largest among i = 20, . . . , 29. We note that the error for f
29
is always zero as it is
fully determined by x. In Table 5.8 the maximum error (over the ﬁve considered x(10)
moves) between f
AF
20
(10) and f
DA
20
(10) is reported.
The test was performed for both the Brownian bridge and predictorcorrector schemes.
The results show that the former outperforms the latter in the timing inconsistency test.
The inconsistency test results show that, for less volatile market scenarios, the single
time step framework performs very accurately, with errors only up to a few basis points.
155
5.10. CONCLUSIONS 127
For more volatile market scenarios the approximation deteriorates. But for realistic yield
curve and forward volatility scenarios there are no problems with respect to pricing (see
Section 5.8). The worsening of the approximation for more volatile scenarios is what
may be expected from the nature of the drift approximations: as the model dimensions
increase, the single time step approximation will break up. By “model dimensions” we
mean the volatility level, the tenor of the deal, the diﬀerence between the forward index
i and n, or time zero forward rates, etc. Care should be taken in applying the single time
step framework for BGM that the market scenario does not violate the realm where the
single time step approximation is reasonably valid.
5.10 Conclusions
We have introduced a fast approximate pricing framework as an addition to the predictor
corrector drift approximation developed by Hunter et al. (2001). These authors used the
drift approximation only to speed up their Monte Carlo by reducing it to single time step
simulation. We have shown that, at a slight cost, much faster computational methods
may be used, such as numerical integration or ﬁnite diﬀerences. The additional cost is a
nonrestrictive assumption, namely, separability of the volatility function. The proposed
drift approximation framework was applied to the pricing of Bermudan swaptions, for
which it yielded very accurate prices with much lower computation times.
5.A Appendix: Mean of geometric Brownian bridge
In this appendix, the timet mean of the process f
k
deﬁned in (5.9) is determined. Equiv
alently, we may determine the timet mean of the process y, given by
dy(t)
y(t)
= σ(t) dw(t), y(0) = y
0
, y(t
∗
) = y
∗
.
(Compare with (5.9).) The solution of y (unconditional of timet
∗
) is given by
y(t) = y
0
e
x(t)−
1
2
v(t)
,
where
x(t) :=
_
t
0
σ(s) dw(s), v(t) :=
_
t
0
σ(s)
2
ds.
We note that
_
ω ∈ Ω; y(t
∗
) = y
∗
_
=
_
ω ∈ Ω; x(t
∗
) = log(y
∗
/y
0
) +
1
2
v(t
∗
) =: x
∗
_
.
156
128 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
According to the martingale time change theorem (for example Theorem 4.6 of Karatzas
& Shreve (1991)), we have that x(τ()) is a Brownian motion, where the time change τ is
deﬁned by
τ(t) = inf¦s ≥ 0; v(t) > s¦.
Working in the timechanged time coordinates, x()[x(τ
∗
) = x
∗
is a standard Brownian
bridge, and so, according to Section 5.6.B of Karatzas & Shreve (1991),
_
x(τ)
¸
¸
x(τ
∗
) = x
∗
_
∼ ^
_
τ
τ
∗
x
∗
, τ −
τ
2
τ
∗
_
.
Back in the original time coordinates, this translates to
_
x(t)
¸
¸
x(t
∗
) = x
∗
_
∼ ^
_
v(t)
v(t
∗
)
x
∗
, v(t) −
_
v(t)
_
2
v(t
∗
)
_
.
With this, we may evaluate the mean of (y(t)[y(t
∗
) = y
∗
) to be
E
_
y(t)
¸
¸
y(t
∗
) = y
∗
¸
= y
0
_
y
∗
y
0
_
v(t)
v(t
∗
)
exp
_
1
2
v(t)
v(t
∗
)
_
v(t
∗
) −v(t)
_
_
, (5.28)
where the following simple rule has been used: E[e
z
] = e
β+τ
2
/2
whenever z is normally
distributed, z ∼ ^(β, τ
2
).
5.B Appendix: Approximation of substituting the
mean in the expectation of expression (5.9)
In Section 5.3 a fourstep method for the calculation of expression (5.9) is described. An
approximating fourth step is proposed that evaluates the expectation of the BGM drift
inserting the mean. In this appendix an error bound for this approximation is derived,
and it is shown that the approximation is of order two in volatility in the neighbourhood
of zero.
The expectation term can always be rewritten as
g(µ, σ) = E
_
exp¦µ + σz¦
1 + exp¦µ + σz¦
_
,
where z is distributed standard normally. It is straightforward to verify that the above
function g : R
2
→ R is inﬁnitely diﬀerentiable at every point of the whole real plane.
We note that approximating the above expectation at the mean signiﬁes that the above
function is approximated as
g(µ, σ) ≈ g(µ, 0) =
exp¦µ¦
1 + exp¦µ¦
.
157
5.C. APPENDIX: MATLAB CODE FOR BROWNIAN BRIDGE SCHEME 129
Fix µ and calculate the derivative of g with respect to σ. The interchange of diﬀerentiation
and expectation is a subtle argument that may, for example, be found in Williams (1991,
paragraph A.16.1). We carefully veriﬁed that in the above case all the requirements for
interchange are satisﬁed. We then ﬁnd
∂g
∂σ
(µ, σ) = E
_
z
exp¦µ + σz¦
_
1 + exp¦µ + σz¦
_
2
_
.
Due to the odd nature of the above integrand at the point σ = 0, we ﬁnd that
∂g
∂σ
(µ, 0) = 0.
Taylor’s formula then states that there exists c ≥ 0 (possibly depending on µ) such that
¸
¸
¸
¸
g(µ, σ) −
exp¦µ¦
1 + exp¦µ¦
¸
¸
¸
¸
≤ cσ
2
.
Because a bound on the second derivative of σ →g(µ, σ) may be found independently of
µ on some interval [0, ¯ σ], it follows from Theorem 7.7 of Apostol (1967) that the constant
c may then be chosen independently of µ for all σ ∈ [0, ¯ σ].
5.C Appendix: MATLAB code for Brownian bridge
scheme
MATLAB code illustrating the Brownian bridge scheme and the foursteps calculation
method in Section 5.4, is displayed below.
function result = fBB(n,f0,a,vol,t,z)
% Calculates forward LIBOR rates in onefactor model with Brownian bridge
% drift approximation & single time step, given the normal increment z.
% n, no. of forward LIBORs, a positive integer
% f0, array with n elements, time zero forward LIBORs
% a, array with n elements, day count fractions
% vol, array with n elements, vol[i] = volatility of forward LIBOR i
% t, time (scalar)
% z, Gaussian increment ~N(0,1), scalar
% f is used to store result
f=zeros(n,1); % creates zero array with n entries
158
130 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
% First do ultimate forward LIBOR => martingale!
f(n)=f0(n)*exp(0.5*vol(n)^2*t+vol(n)*sqrt(t)*z);
% Loop from penultimate LIBOR down to first LIBOR.
run_drift=0.0; % used for efficient calculation of drift
for i=n1:1:1
zt=log(f(i+1)/f0(i+1))+0.5*vol(i+1)^2*t; % Needed for driftBB.
% quad is a standard integration routine in MATLAB.
% quad(@f,a,b,tol,trace,p1,p2,...) integrates the function
% f(s,p1,p2,...) over s from a to b with convergence criteria tol and
% trace.
% For definitions of tol and trace we refer to MATLAB documentation.
% Of course, one can use any integration routine instead of quad.
% Adjusting the convergence criterion of the numerical integrator
% allows for a tradeoff between accuracy and computational speed.
% For example, the predictorcorrector scheme is a special case of
% the Brownian bridge scheme if the crudest integrator (twopoint
% trapezoid) is used.
run_drift=run_drift ...
quad(@driftBB,0.0,t,1.0e6,0,f0(i+1),a(i+1),vol(i+1),t,zt);
% Equation (5.3) in exp form
f(i)=f0(i)*exp((run_drift*vol(i)0.5*vol(i)^2*t)+vol(i)*sqrt(t)*z);
end
result = f; % return result f
function result = driftBB(s,f0,a,vol,t,zt)
% Calculates drift term evaluated at the mean of the Brownian bridge.
% This function will be integrated over time.
% s, scalar, current (intermediate) time
% f0, scalar, time zero forward LIBOR
% a, scalar, day count fraction
% vol, scalar, volatility of forward LIBOR
% t, scalar, time (at which forward LIBOR has already been predicted)
% zt, scalar, help variable associated with LIBOR predicted at time t
% Mean of Brownian bridge, Equation (5.26) in logform:
m=s./t.*zt0.5.*vol.^2.*s.*s./t+log(f0)+log(a);
159
5.C. APPENDIX: MATLAB CODE FOR BROWNIAN BRIDGE SCHEME 131
% Essential form of BGM drift in terms of log rates:exp(.)/(1+exp(.)):
result=vol*exp(m)./(1.0+exp(m));
160
132 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM
161
Chapter 6
A comparison of single factor
Markovfunctional and multi factor
market models
We compare single factor Markovfunctional and multi factor market models for hedging
performance of Bermudan swaptions. We show that hedging performance of both models
is comparable, thereby supporting the claim that Bermudan swaptions can be adequately
riskmanaged with single factor models. Moreover, we show that the impact of smile can
be much larger than the impact of correlation. We propose a new method for calculating
risk sensitivities of callable products in market models, which is a modiﬁcation of the
leastsquares Monte Carlo method. The hedge results show that this new method enables
proper functioning of market models as riskmanagement tools.
6.1 Introduction
Bermudan swaptions form a popular class of interest rate derivatives. The underlying
is a plainvanilla interest rate swap, in which periodic ﬁxed payments are exchanged for
ﬂoating LIBOR payments. Institutional debt issuers use interest rate swaps to revert from
ﬂoating to ﬁxed interest rate payments, and vice versa. Often the issuers want to reserve
the right to cancel the swap. A cancellable swap can be valued by the following parity
relation. A cancellable interest rate swap is equal to a plainvanilla interest rate swap
plus a callable interest rate swap with reversed cash ﬂows. Thus a cancellable swap can
be valued when the callable swap can be valued. Such callable swap options are referred
to as Bermudan swaptions. Bermudan means that the exercise opportunities are at a
discrete set of time points. A European swaption is an option to enter into a swap at only
a single exercise date.
162
134 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
In this chapter, we will study the pricing and hedging performance of two popular
models for Bermudan swaptions. Many models have been proposed in the literature for
valuation and risk management of Bermudan swaptions. We distinguish three categories:
shortrate models, Markovfunctional models and market models.
Shortrate models model the dynamics of the term structure of interest rates by spec
ifying the dynamics of a single rate (the short rate) from which the whole term structure
at any point in time can be calculated. Examples of shortrate models include the models
of Vasicek (1977), Cox et al. (1985), Dothan (1978), Black et al. (1990), Ho & Lee (1986)
and Hull & White (1990).
The Markovfunctional model of Hunt et al. (2000) assumes that the discount factors
are a function of some underlying Markov process. The model is then fully determined by
noarbitrage arguments and by requiring a ﬁt to the initial yield curve and interest rate
option volatility.
Market models were introduced by Brace et al. (1997), Miltersen et al. (1997) and
Jamshidian (1997). The name ‘market model’ refers to the modelling of market observable
variables such as LIBOR rates and swap rates. The explicit modelling of market rates
allows for natural formulas for interest rate option volatility, that are consistent with the
market practice of using the formula of Black (1976) for caps (options on LIBOR) and
swaptions (options on swap rates).
Shortrate and Markovfunctional models are usually
1
implemented as models with
a single stochastic process driving the term structure of interest rates. A disadvantage
is then that the instantaneous correlation between interest rates can only be 1. Market
models however eﬃciently allow for any number of stochastic variables to be used, so that
any instantaneous correlation structure can be captured. There is substantial evidence
that the term structure of interest rates is driven by multiple factors (three, four, or even
more), see the review article of Dai & Singleton (2003). A more realistic description of
reality may thus be expected from multi factor models, which points to possibly better
hedge performance. The question addressed in this chapter is whether the increase in
hedge performance due to use of a multi factor model is signiﬁcant. To those that a
priori dismiss the use of single factor models due to their economic irrelevance by failure
in capturing the multi factor dynamics of the term structure of interest rates, we say:
Models that are best for managing an interest rate derivatives book are not necessarily
models that are most realistic, rather they are models that most reduce variance of proﬁt
and loss (P&L), thereby preserving wealth in the most stable manner. We mention four
articles that compare single and multi factor models.
First, in favour of multi factor models, Longstaﬀ et al. (2001) claim that shortrate
models, because of supposedly misspeciﬁed dynamics, lead to suboptimal exercise strate
1
Two factor short rate models exist too, see for example Ritchken & Sankarasubramanian (1995).
163
6.1. INTRODUCTION 135
gies. This claim is supported by empirical evidence performed with the shortrate models
of Black et al. (1990) and Black & Karasinski (1991). The authors then conclude that
the costs to Wall Street ﬁrms of following single factor exercise strategies could be several
billion dollars. The argument of Andersen & Andreasen (2001), and also ours, against
the claim of Longstaﬀ et al. (2001), is that their choice of calibration does not correspond
to market practice and leads to models that are poorly ﬁtted to market.
Second, in favour of single factor models, Andersen & Andreasen (2001) claim that
the exercise strategy obtained from a properly calibrated single factor model only leads
to insigniﬁcant losses when applied in a two factor model.
Third, Driessen, Klaassen & Melenberg (2003) are the ﬁrst to investigate hedge perfor
mance. These authors investigate two types of delta hedge instruments, (i) a number of
delta hedge securities, i.e. discount bonds, equal to the number of factors, and (ii) a large
set of discount bonds, one for each security spanning the yield curve. They show that
if the number of hedge instruments is equal to the number of factors, then multi factor
models outperform single factor models. If, however, the large set of hedging instruments
is used, which is the case in practice, then single factor models perform as well as multi
factor models in terms of delta hedging of European swaptions.
Fourth, Fan, Gupta & Ritchken (2003) show, for the case of the number of hedge
instruments equal to the number of factors, that higher factor models perform better
than lower factor models in terms of delta hedging of European swaptions and European
swaption straddles
2
. The results of Fan et al. (2003) are thus consistent with the ﬁndings
of Driessen et al. (2003).
Relative to Driessen et al. (2003) and Fan et al. (2003), we make the contribution
of also considering vega hedging and Bermudanstyle swaptions rather than only delta
hedging and only Europeanstyle swaptions. A European product depends solely on the
marginal distributions of the swap rates, whereas a Bermudan product depends on the
joint distribution, too. Moreover, we ﬁt the models exactly to a subset of European
swaptions particular to a Bermudan swaption rather than attempting to ﬁt to the whole
swaption volatility surface, as Driessen et al. (2003) and Fan et al. (2003). The two
practices of (i) ﬁtting to an appropriate set of swaptions, and (ii) vega hedging, are
probably more close in spirit to ﬁnancial practice. In fact, we show that the variance of
P&L is signiﬁcantly reduced when a vega hedge has been set up additional to a delta
hedge.
There is one drawback of using high factor models however, which is lesser tractability
than low (one or two) factor models. For valuation in high factor models, we must resort
to Monte Carlo (MC) simulation. Valuation by MC is not a problem, but the estimation
2
A European swaption straddle consists of a position of long a payer swaption and long an otherwise
identical receiver swaption.
164
136 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
of sensitivities (Greeks) can be less eﬃcient. This is not due to the choice of calibration,
as can sometimes be the case as shown by Pietersz & Pelsser (2004a) (see Chapter 2),
since in this chapter the safe option of timeconstant volatility (but dependent on the
forward rates) is used. The less eﬃcient estimation of sensitivities occurs if the payoﬀ
along the path can change discontinuously as dependent on initial parameters, see, for
example, Glasserman (2004, Section 7.1). We show that such discontinuity appears in
the Longstaﬀ & Schwartz (2001) algorithm for valuation of Bermudanstyle options. We
consider two methods to improve the eﬃciency of sensitivity estimates. The comparison
of hedge performance of single and multi factor models thus entails a tradeoﬀ between
more realistic modelling and tractability.
For the Markovfunctional model, the failure of not capturing a realistic instantaneous
correlation structure can be remedied, in some sense, for Bermudan swaptions and perhaps
for other derivatives, too, as follows. In theory the price of a coterminal Bermudan
swaption is dependent of and fully determined by the joint distribution of the forward
coterminal swap rates at each of the exercise dates. In eﬀect there are thus n(n + 1)/2
stochastic variables that determine the price. In this chapter, we use the observation
that the price of a Bermudan swaption is, up to ﬁrst order approximation, determined by
the joint distribution of only the underlying spot coterminal swap rates at the exercise
dates, see, e.g., Piterbarg (2004, page 67). There are only n such spot coterminal swap
rates. The marginal distributions of these swap rates are governed by the associated
European swaption volatility quoted in the market, whereby, in a lognormal model, we
only need to specify correlation. We will call their correlation the terminal correlation.
A novel approximating formula is derived for the terminal correlation in the Markov
functional model. The accuracy of the new formula is tested numerically. The novel
formula allows the Markovfunctional model to be calibrated to terminal correlation. We
then equip a full factor swap market model with a parameterized instantaneous correlation
matrix, calculate the resulting terminal correlation and ﬁt the Markovfunctional model
to this terminal correlation. Thus, although the Markovfunctional model fails to capture
instantaneous correlation, it can be tweaked such that it is ﬁtted to product speciﬁc
terminal correlation. Since such correct correlation speciﬁcation more or less determines
the price of the Bermudan swaption, it then no longer matters for pricing Bermudan
swaptions whether the single factor Markovfunctional model is a realistic or unrealistic
model of other parts of reality in the interest rate market, outside of the volatilities and
correlations of the relevant swap rates. Essentially, we have projected all relevant parts
of reality correctly onto the single factor Markovfunctional model. With the thus ﬁtted
Markovfunctional model, and also with swap and LIBOR market models, we subsequently
compare hedge performance of Bermudan swaptions with real market data over a 1 year
period.
165
6.1. INTRODUCTION 137
The research in this chapter is not aimed at comparing the model generated Bermudan
swaption prices to reallife market quoted prices. Rather, the hypothetical viewpoint is
taken that swaps and European swaptions are liquidly traded in the market, and Bermu
dan swaptions are less liquidly traded. The model is then used as an extrapolation tool
to determine a Bermudan swaption price consistent with swap and European swaption
prices, and such that the risk sensitivities provide a hedge of the former in terms of the
latter securities. In any case, the study in this chapter is relevant for nonstandard Bermu
dan swaptions, for which the underlying has more exotic coupon payments. Examples
of such exotic coupon payments are capped ﬂoater (min(f, k) for some cap rate k and
leverage ), inverse ﬂoater (max(k −f, 0)) and range accrual (f, with the fraction for
which LIBOR within the accrual period is within a certain range). These nonstandard
Bermudan swaptions are called callable LIBOR exotics. The results of this chapter may
apply to many types of callable LIBOR exotics, but further research will have to provide
a deﬁnitive answer. Nonetheless, the results of this chapter are interesting for the study
of callable LIBOR exotics, since these have evolved from standard Bermudan swaptions.
For both the swap market model and the Markovfunctional model we initially use the
basic wellknown nonsmile versions. Smile is the phenomenon that for European options
diﬀerent Blackimplied volatility is quoted for diﬀerent strikes of the option. As mentioned
in Hunt et al. (2000, last paragraph of Section 3.2), the Markovfunctional model can be
ﬁtted to smile. We provide details, also for the swap market model, and show that the
resulting smileﬁtting procedure is numerically eﬃcient and straightforward to implement.
The smile Markovfunctional model and smile swap market model are subsequently ﬁtted
to USD swaption smile data. We then compare empirically the impact of smile versus the
impact of correlation.
The LIBOR Markovfunctional model has been compared with the LIBOR market
model before by Bennett & Kennedy (2004). These authors show that the one factor
LIBOR Markovfunctional model with mean reversion and the one factor separable LI
BOR market model are largely similar in terms of dynamics and pricing. They also show
this for an approximated version of the LIBOR market model by drift approximations, as
introduced by Pietersz et al. (2004) (see also Chapter 5) and Hunter et al. (2001). Rela
tive to Bennett & Kennedy (2004) this chapter makes the contribution of also comparing
multi factor models with the Markovfunctional model. Moreover, we show how multi
factor models can a priori be compared to the Markovfunctional model which is not a
straightforward extension from the onedimensional case.
The remainder of the chapter is organized as follows. First, we outline the com
parison methodology for the two models. The LIBOR and swap market models and
Markovfunctional model are discussed, as well as the two Greeks calculation methods
for market models. Second, the data is described. Third, we numerically test the accu
racy of an approximating formula for the terminal correlation in the Markovfunctional
166
138 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
model. Fourth, empirical comparison results are presented. Fifth, the impact of smile is
investigated. Sixth, we conclude.
6.2 Methodology
In this section, we ﬁrst introduce some notation. Second, we set up the framework that
enables a comparison between multi factor and single factor models.
The type of Bermudan swaption that is considered here is the coterminal version, as
opposed to, for example, the ﬁxed maturity version. A coterminal Bermudan swaption
is an option to enter into an underlying swap at several exercise opportunities, where
each swap ends at the same contractually determined end date. The maturity of the
swap entered into thus becomes smaller as the option is exercised later. In contrast, for
a ﬁxed maturity Bermudan swaption, each swap that can be entered into has the same
contractually speciﬁed maturity and the respective end dates then diﬀer. We consider a
Bermudan swaption on an underlying swap with n payments and a ﬁxed rate k. Associated
with this swap is a tenor structure 0 < t
1
< < t
n+1
. The underlying swap makes a
payment π
i
at time t
i+1
depending on the LIBOR rate f(t
i
) ﬁxed at time t
i
for i = 1, . . . , n.
Denote the notional amount by q and the day count fraction for accrual period [t
i
, t
i+1
]
by α
i
. Introduce the variable η ∈ ¦−1, 1¦ by η = 1 for a pay ﬁxed swap and η = −1
for a receive ﬁxed swap. The payment π
i
is then ηα
i
(f(t
i
) − k)q. The holder of the
Bermudan swaption has the right to enter into the swap at the dates t
1
, . . . , t
n
. If the
holder exercises the option at time t
i
, then he or she will receive the payments π
i
, . . . , π
n
.
Alternatively, in the market the holder could have entered into an otherwise equal swap
but with ﬁxed rate equal to the swap rate s
i:n+1
(t
i
). Here s
i:j
denotes the forward swap
rate for a swap that start at t
i
and ends at t
j+1
. The holder will thus only exercise the
Bermudan at time t
i
if η(s
i:n+1
(t
i
) −k) > 0. But even when the immediate exercise value
is positive, the holder can nonetheless decide to hold on to the option in view of a more
favourable forward swap rate s
j:n+1
(t
i
), j > i. It follows that the price of a Bermudan
swaption is dependent of and fully determined by the joint distribution of the variables
¦s
j:n+1
(t
i
) ; j = i, . . . , n, i = 1, . . . , n¦. The forward swap rates ¦s
1:n+1
, . . . , s
n:n+1
¦ are
called coterminal since they all coend at the same termination date.
We contend that the main driver for the price of Bermudan swaptions is the joint
distribution of the realizations of the coterminal swap rates ¦s
i:n+1
(t
i
) ; i = 1, . . . , n¦.
Ostrovsky (2002) calls this the diagonal process. The economic argument is that prima
facta, the holder of the option has to choose between receiving the payoﬀs of entering
into the swaps starting at t
1
, t
2
, . . . , t
n
and the associated payoﬀs are determined fully by
s
1:n+1
(t
1
), s
2:n+1
(t
2
), . . . , s
n:n+1
(t
n
).
167
6.2. METHODOLOGY 139
As is common in ﬁnancial practice, we calibrate models to only those sections of the
market that are relevant to the product, rather than attempting to ﬁt the models to all
available market data. We assume that any valuation model for the Bermudan swaption is
calibrated to the socalled diagonal of European swaptions that start at t
i
and end at t
n+1
,
i = 1, . . . , n. This means that the variance of the variables ¦s
1:n+1
(t
1
), . . . , s
n:n+1
(t
n
)¦ is
already fully determined. Thus the diagonal process is fully determined (given a nor
mal or lognormal distribution) if we specify the correlation matrix for the variables
¦s
i:n+1
(t
i
) ; i = 1, . . . , n¦. This correlation matrix will be called the terminal correla
tion. In the next three sections, we discuss the LIBOR and swap market models and the
Markovfunctional model, respectively. We show how the terminal correlation can approx
imately be calculated in the swap market model and the Markovfunctional model. For
the Markovfunctional model we show how the model can be calibrated to the terminal
correlation.
The idea of terminal correlation is not new to ﬁnance. For example, Rebonato (2002,
Section 7.1.2) shows that it is the terminal and not the instantaneous correlation that
directly aﬀects the price of swaptions. The terminal correlation itself is determined both
by the instantaneous correlation and the term structure of instantaneous volatility. In
Rebonato (1999c, Section 11.4) it is shown that the terminal correlation is inﬂuenced
just as much, and even more, by the instantaneous volatility than by the instantaneous
correlation.
6.2.1 The LIBOR and swap market models
Within the swap market model, n forward swap rates are modelled as lognormal processes
under their respective forward measure, with forward swap rate s
i:n+1
satisfying,
ds
i:n+1
(t)
s
i:n+1
(t)
= σ
i:n+1
(t) dw
(i:n+1)
(t),
¸
dw
(i:n+1)
(t), dw
(j:n+1)
(t)
_
= ρ
i:n+1,j:n+1
(t)dt.
Here σ
i:n+1
() denotes the instantaneous volatility function and w
(i:n+1)
denotes a Brow
nian motion under the i
th
forward swap measure. The latter measure is associated with a
portfolio of discount bonds, weighted by the respective day count fractions, with maturity
times corresponding to the payment times of the swap. The value of such a portfolio of
discount bonds is named the present value of a basis point (PVBP).
Within the LIBOR market model, n forward LIBORs are modelled as lognormal
processes under their respective forward measure, with forward LIBOR f
i
satisfying,
df
i
(t)
f
i
(t)
= σ
i
(t) dw
(i+1)
(t),
¸
dw
(i+1)
(t), dw
(j+1)
(t)
_
= ρ
ij
(t)dt.
Here σ
i
() denotes the instantaneous volatility function and w
(i+1)
denotes a Brownian
motion under the i
th
forward measure. The latter measure is associated with a discount
168
140 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
bond that matures at t
i+1
, the payment time of the i
th
LIBOR deposit. The LIBOR
market model is calibrated approximately to swaption volatility, via an approximation
of swaption volatility in terms of LIBOR volatility, see, e.g., Hull & White (2000). By
assumption of constant volatility and constant correlation (see below), the resulting cal
ibration algorithm reduces to a simple bootstrap algorithm for determining the LIBOR
volatility levels.
Within both market models, we set the instantaneous volatility and correlation con
stant over time, i.e., σ
i:n+1
(t) = σ
i:n+1
and ρ
i:n+1,j:n+1
(t) = ρ
i:n+1,j:n+1
for the swap model,
and σ
i
(t) = σ
i
and ρ
ij
(t) = ρ
ij
for the LIBOR model. These choices, relative to the time
homogeneous case, will not, or only favourably, impact the results, as explained by the
following two arguments. First, a constant instantaneous volatility assumption leads to
eﬃciently estimated risk sensitivities, whereas certain speciﬁc timehomogeneous speciﬁ
cations may not, as shown by Pietersz & Pelsser (2004a), see also Chapter 2. Second,
our choice of parametrization of the correlation matrix is both a constant and time
homogeneous parametrization.
The rank of the correlation matrix P = (ρ
ij
)
n
i,j=1
determines the number of Brownian
motions (number of factors) driving the model. When an arbitrary correlation matrix
has been speciﬁed, generally such matrix has full rank n, but then if a number of factors
d < n be required, we are led to solve a rank reduction problem
3
. To test the two
extreme cases, we consider only either rank 1 or fullrank correlation matrices, allowing
respectively correlation constant at 1 or a full ﬁt to any correlation matrix.
We parameterize the instantaneous correlation matrix by, for i < j,
ρ
ij
(a) =
¸
(e
2at
i
−1)/t
i
(e
2at
j
−1)/t
j
for a > 0, and ρ
ij
(a) ≡ 1, for a = 0. (6.1)
This parametrization of instantaneous correlation allows for a simple calibration of the
Markovfunctional model to the terminal correlation of the swap market model. In fact,
parametrization (6.1) has been chosen such that the resulting terminal correlation of the
swap market model exactly matches the terminal correlation of a Markovfunctional model
with mean reversion parameter a. The correlation structure (6.1) is nonetheless a good
choice, since we will show that, for a suitable choice of a, (6.1) corresponds to a form that
is often quoted in the literature, see, for example, Rebonato (1998, Equation (4.5), page
83),
ρ
ij
(β) = exp
_
−β[t
i
−t
j
[
_
, for some β ≥ 0. (6.2)
We numerically ﬁtted the form of (6.1) to (6.2), for 10 10 correlation matrices, where
n = 10 corresponds to the setting in the forthcoming hedge tests. In other words, ﬁx β,
3
For solving such rank reduction problems the reader is referred to Pietersz & Groenen (2004a, b)
(see Chapter 3), Grubiˇsi´c & Pietersz (2005) (see Chapter 4), Wu (2003), Rebonato (2002, Section 9) or
Brigo (2002).
169
6.2. METHODOLOGY 141
0%
5%
10%
15%
20%
25%
0% 5% 10% 15%
beta
0.00
0.01
0.02
0.03
a (left axis)
avg. abs. error per entry of corr.
matrix (right axis)
Figure 6.1: Fitted aparameter of parametrization (6.1) (left axis) and ﬁt error (right
axis) versus the βparameter of the Rebonato (1998) parametrization (6.2). The ﬁt error
is the average absolute error over the entries.
and then ﬁnd a that solves
min
a≥0
n
i=1
n
j=1
¸
¸
ρ
ij
(a) −ρ
ij
(β)
¸
¸
.
The relationship between the ﬁtted a as dependent on β is displayed in Figure 6.1. As
can be seen from the ﬁgure, the ﬁt is of good quality, obtaining an average absolute error
over the entries in the correlation matrix that is less than 0.02 for typical values of β and
a.
6.2.2 The Markovfunctional model
We consider the swap variant of the Markovfunctional model, see Hunt et al. (2000,
Section 3.4) for details on this variant. Within the (swap) Markovfunctional model, any
model variable is a function of an underlying Markov process x. For example, for a forward
swap rate we have s
i:n+1
(t
j
) = s
i:n+1
(t
j
, x(t
j
)). We assume that the driving Markov process
of the model is a deterministically timechanged Brownian motion, satisfying
dx(t) = τ(t)dw(t).
170
142 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
Here τ() denotes a deterministic function (that can be chosen piecewise constant) and
w denotes a Brownian motion.
We now present an approximate formula for the terminal correlation. An argu
ment explaining the formula is given, and in a later section we investigate the accu
racy of the approximating formula. By a Taylor expansion, we have ln s
i:n+1
(t
i
, x) ≈
s
(0)
i:n+1
(t
i
) + s
(1)
i:n+1
(t
i
)x. Since correlation is unaltered by a linear transformation, the ter
minal correlation of the swap rates is thus approximately equal to the terminal correlation
of the underlying Markov process,
ρ
_
ln s
i:n+1
(t
i
), ln s
j:n+1
(t
j
)
_
≈ ρ
_
x(t
i
), x(t
j
)
_
. (6.3)
By straightforward calculation, for i < j,
ρ
_
x(t
i
), x(t
j
)
_
=
Cov(x(t
i
), x(t
j
))
_
Var(x(t
i
))Var(x(t
j
))
=
¸
¸
¸
_
_
t
i
0
τ
2
(t)dt
_
t
j
0
τ
2
(t)dt
. (6.4)
In fact, any functional of the Markov process can be linearized by a Taylor expansion
and, according to the argument above, would exhibit the same approximate terminal
correlation (6.4). The above theoretical argument is therefore not very strong. The
approximation however turns out to be accurate, as will be shown numerically in Section
6.4.
In principle, the Markovfunctional model can thus be approximately ﬁtted to the
terminal correlation by minimization of the ﬁtting error given a marketimplied or his
torically estimated terminal correlation matrix. The parameters for this minimization
problem are for example the n parameters governing the piecewise constant function
τ(). For ease of exposition we will however restrict our attention to the case of mean
reversion, i.e. τ(t) = exp(at), with a denoting the mean reversion parameter, see Section
4 of Hunt et al. (2000). In this case we have, for i < j,
ρ
_
x(t
i
), x(t
j
)
_
=
_
e
2at
i
−1
e
2at
j
−1
. (6.5)
To verify that the Markovfunctional model is properly calibrated to terminal corre
lation, in the swap market model this correlation is approximately calculated to be, from
(6.1), for i < j,
_
t
i
0
σ
i:n+1
(t)σ
j:n+1
(t)ρ
ij
(t)dt
_
_
t
i
0
σ
2
i:n+1
(t)dt
_
t
j
0
σ
2
j:n+1
(t)dt
=
σ
i:n+1
σ
j:n+1
ρ
ij
t
i
_
σ
2
i:n+1
t
i
σ
2
j:n+1
t
j
= ρ
ij
¸
t
i
t
j
=
_
e
2at
i
−1
e
2at
j
−1
. (6.6)
The speciﬁcation (6.1) of the instantaneous correlation of the swap market model was con
structed such that the (approximate) terminal correlation (6.5) of the Markovfunctional
171
6.2. METHODOLOGY 143
model with mean reversion parameter a is equal to the (approximate) terminal correlation
(6.6) in the swap market model with parameter a. We note that this correspondence does
not necessarily hold for the LIBOR market model, though we nonetheless employ it in
the comparison tests.
6.2.3 Estimating Greeks for callable products in market models
The algorithm of Longstaﬀ & Schwartz (2001) (LS) renders the numeraire relative payoﬀ
along a simulated path discontinuously dependent on initial input. The discontinuity in
the LS algorithm stems from the estimated optimal exercise index chosen from a discrete
set of possible exercise opportunities. Such a discrete choice is inherently discontinuously
dependent on initial input. Any discontinuity in a simulation may cause ﬁnite diﬀerence
estimates of sensitivities to be less eﬃcient, see Glasserman (2004, Section 7.1). We
describe two methods that enhance the eﬃciency of ﬁnite diﬀerence estimates, the second
of which is novel. These are:
(i) Finite diﬀerences with optimal perturbation size.
(ii) Constant exercise decision heuristic.
The two methods are discussed below in more detail. We denote by v the base value of
the derivative, i.e., the value of the derivative in the unperturbed model.
Method (i), the ﬁnite diﬀerences method is best described as the bumpandrevalue
approach. Initial market data is perturbed by amount ε, the model is recalibrated and
subsequently priced at v(ε). The ﬁnite diﬀerence estimate of the Greek is then (v(ε) −
v)/ε. The mean square error (MSE) of the ﬁnite diﬀerence estimator is dependent on the
chosen perturbation size ε. If the numeraire relative payoﬀ along the path is continuously
dependent on initial input, then least MSE is obtained when ε is selected as small as
possible (though larger than machine precision), see Glasserman (2004). If the payoﬀ
is discontinuous however, then there is a tradeoﬀ between increasing and decreasing
ε, leading to an optimal (‘large’ and positive) choice of ε that attains least MSE, see
Glasserman (2004). After some preliminary testing, we found perturbation sizes of roughly
1 basis point (bp, 0.01%) for delta and 5 bp for vega.
Method (ii) that we propose, is named the constant exercise decision method. Here,
for the base valuation we record per path when the exercise decision takes place. In the
perturbed model, we no longer perform LS leastsquares Monte Carlo, but rather use the
very same exercise strategy as in the base valuation case. The constant exercise boundary
method is a heuristic, since its estimate is stable but biased. The bias stems from not
taking into account the change in value of the derivative as a result of a change in the
(approximate) exercise decision. The bias is likely to be small, because the exercise deci
sion is close to optimal by construction. Therefore, the change in value due to a change
172
144 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
in exercise decision is likely to be small. Though the method is biased, we nevertheless
consider it in our tests. In ﬁnance, the importance is not bias, rather it is reduction of
variance of P&L. Moreover, the method is straightforward to implement, and more eﬃ
cient, since in revaluations linear regressions for the LS algorithm are no longer required.
We note that the constant exercise method renders a revaluation continuously dependent
on initial market data, provided the underlying swap payoﬀ is continuous, which is the
case for the Bermudan swaption studied in this chapter. From the discussion on pertur
bation sizes for method (i), it then follows that a leastMSE ﬁnite diﬀerence estimate of
sensitivities is obtained by employing perturbation sizes that are as small as possible. We
use 10
−5
bp for both delta and vega.
We end this section by a brief discussion of other methods for calculation of Greeks
available in the literature. These methods could not straightforwardly be extended to
the situation of our investigations. Discussed are the pathwise method (Glasserman &
Zhao 1999), the likelihood ratio method (Glasserman & Zhao 1999), the Malliavin calculus
approach (Fourni´e, Lasry, Lebuchoux, Lions & Touzi 1999) and the utility minimization
approach (Avellaneda & Gamba 2001). The pathwise method cannot handle discontin
uous payoﬀs. The likelihood ratio and Malliavin calculus method both require that the
matrix of instantaneous volatility be invertible. For the market model setting, we have an
n d matrix with n the number of forward rates and d the number of stochastic factors.
Usually d < n and most often d ¸n, which rules out inverting the instantaneous volatility
matrix. Glasserman & Zhao (1999, Section 4.2) have resolved the noninvertibility issue
only for a particular case, that does not apply to our case: When the payoﬀ is dependent
only on the rates at their ﬁxing times, ¦s
1:n+1
(t
1
), s
2:n+1
(t
2
), . . . , s
n:n+1
(t
n
)¦. Finally, the
utility minimization approach simply calculates a diﬀerent sort of risk sensitivity and is
thus altogether biased.
6.3 Data
We describe the data used in the empirical comparison and smileimpact tests. All market
data was kindly provided by ABN AMRO Bank.
First, we describe the data used in the comparison test. For the comparison test, we
use an arbitrarily chosen timespan, 16 June 2003–2004, of USD data of midquotes for
deposit rates, swap rates and atthemoney (ATM) swaption volatility. We use the 1 and
12 months deposit rates and the 2Y, 3Y, 4Y, 5Y, 7Y, 10Y and 15Y swap rates. The
discount factors are bootstrapped from market data. Any discount factors required at
dates not available from the bootstrap are calculated by means of linear interpolation on
zero rates. A statistical description of the swaption volatility data is displayed in Table
6.1. For each available tenor and expiry (Exp.), the associated column with four entries
173
6.3. DATA 145
Table 6.1: Statistical description of the swaption volatility data.
Tenor (Years)
Exp. 1 2 3 4 5 7 10 15 30
1M 46.3 51.7 45.9 40.6 37.8 32.1 27.5 22.9 19.0
(6.2) (6.4) (6.2) (5.0) (4.7) (3.8) (3.5) (3.0) (2.7)
[34.3, [34.0, [30.3, [27.8, [26.2, [22.8, [19.6, [16.6, [13.6,
65.8] 68.3] 62.1] 53.1] 48.7] 42.1] 37.4] 33.0] 27.6]
2M 45.8 50.6 44.9 40.0 37.3 31.9 27.4 22.9 19.0
(4.8) (5.5) (5.3) (4.3) (4.1) (3.2) (2.9) (2.5) (2.2)
[36.0, [34.0, [30.3, [27.8, [26.2, [23.0, [19.9, [16.9, [13.8,
61.0] 63.5] 57.0] 50.1] 47.2] 39.4] 35.0] 30.9] 25.7]
3M 45.4 49.4 44.0 39.5 36.9 31.7 27.3 22.9 18.9
(3.9) (4.9) (4.5) (3.9) (3.6) (2.8) (2.5) (2.1) (1.8)
[36.0, [34.0, [30.0, [27.6, [26.0, [23.0, [19.8, [16.9, [13.7,
57.7] 60.3] 54.1] 49.3] 46.3] 37.8] 32.6] 28.8] 23.8]
6M 48.0 46.5 41.2 37.2 34.9 30.4 26.6 22.3 18.6
(4.3) (4.6) (4.1) (3.6) (3.4) (2.6) (2.1) (1.6) (1.3)
[34.9, [33.1, [29.3, [27.1, [25.4, [22.7, [20.0, [17.1, [14.1,
56.9] 56.8] 52.7] 47.6] 44.5] 37.0] 31.3] 25.5] 20.9]
1Y 46.0 41.0 36.5 33.5 31.7 28.3 25.1 21.4 17.9
(5.0) (4.7) (3.9) (3.4) (3.2) (2.5) (2.0) (1.5) (1.2)
[32.1, [29.4, [27.0, [25.2, [23.7, [21.6, [19.5, [16.7, [14.2,
55.5] 55.5] 48.2] 43.3] 40.6] 34.8] 30.0] 24.4] 20.3]
2Y 36.7 33.0 30.3 28.5 27.1 25.0 22.6 19.6 16.8
(4.3) (3.7) (3.1) (2.8) (2.6) (2.2) (1.8) (1.5) (1.2)
[26.9, [24.7, [23.2, [22.1, [21.1, [19.7, [17.9, [15.6, [13.5,
50.4] 44.3] 39.4] 36.5] 34.5] 30.8] 27.1] 22.7] 19.5]
3Y 29.9 27.8 26.3 25.0 24.0 22.4 20.5 18.0 15.5
(3.1) (2.7) (2.4) (2.2) (2.1) (1.8) (1.6) (1.3) (1.1)
[23.2, [21.7, [20.7, [20.0, [19.3, [18.1, [16.6, [14.6, [12.7,
38.6] 34.9] 32.6] 30.9] 29.7] 27.1] 24.2] 20.6] 18.1]
4Y 25.7 24.5 23.4 22.5 21.7 20.4 18.8 16.6 14.3
(2.2) (2.1) (1.9) (1.8) (1.7) (1.5) (1.3) (1.1) (1.0)
[20.8, [19.8, [19.1, [18.4, [17.9, [16.9, [15.6, [13.7, [11.9,
31.0] 29.6] 28.2] 27.2] 26.4] 24.4] 22.0] 19.0] 16.9]
5Y 23.2 22.3 21.4 20.7 19.9 18.8 17.4 15.5 13.4
(1.8) (1.7) (1.6) (1.6) (1.5) (1.3) (1.2) (1.0) (1.0)
[19.1, [18.4, [17.8, [17.2, [16.7, [15.8, [14.7, [12.9, [11.2,
28.0] 26.7] 25.7] 24.8] 24.0] 22.3] 20.2] 17.7] 15.8]
174
146 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
Table 6.2: Discount factors for the USD data of 21 February 2003.
1Y 2Y 3Y 4Y 5Y 6Y
0.98585 0.96223 0.92697 0.88571 0.84286 0.79986
Table 6.3: Swaption volatility, in percentages, against strike and expiry for the USD data
of 21 February 2003. All displayed swaptions coterminate 6 years from today. Here
‘Exp.’ denotes Expiry.
Strike, in oﬀset in basis points from the ATM forward swap rate
Exp. 300 200 100 50 0 50 100 200 300
1Y 58.78 45.41 37.34 35.19 33.15 32.55 31.99 31.32 31.21
2Y 43.65 38.62 32.57 30.82 29.13 28.59 28.10 27.46 27.30
3Y 40.72 35.12 30.01 28.46 26.95 26.12 25.31 25.03 24.75
4Y 38.65 32.41 27.96 26.59 25.23 24.75 24.31 23.72 23.52
5Y 37.17 30.92 26.66 25.36 24.08 23.63 23.20 22.63 22.43
reports, respectively, the mean, the standard deviation (in parentheses), the minimum (in
[, form) and the maximum (in ] form). Any volatility required at expiries and tenors not
available from Table 6.1 are calculated by means of linear surface interpolation.
Second, we describe the data used in the smileimpact test, in which we will consider
a 6 year deal. We use USD data for 21 February 2003. The discount factors are displayed
in Table 6.2. The swaption volatility against strike and expiry is displayed in Table 6.3.
6.4 Accuracy of the terminal correlation formula
The terminal correlation in the Markovfunctional model is estimated via the terminal
covariance. We have, for i < j, for any measure,
E
_
ln s
i:n+1
(t
i
) ln s
j:n+1
(t
j
)
¸
= E
_
ln s
i:n+1
(t
i
)E
_
ln s
j:n+1
(t
j
)
¸
¸
T(t
i
)
¸
_
. (6.7)
The above equality follows from the T(t
i
)measurability of ln s
i:n+1
(t
i
). Expression (6.7)
can be calculated on a lattice. We estimate (6.7) by calculating for each grid point at
time t
i
the conditional expectation E[ln s
j:n+1
(t
j
)
¸
¸
T(t
i
)], subsequently we integrate the
result multiplied by ln s
i:n+1
(t
i
) to obtain the required expectation.
175
6.5. EMPIRICAL COMPARISON RESULTS 147
Table 6.4: Error analysis of the terminal correlation measured in the Markovfunctional
model versus given by the approximate formula (6.3), for a 40 years annualpaying deal,
thus for a 40 40 correlation matrix. Abbreviations used are m.r. for mean reversion,
max. for maximum, abs. for absolute, err. for error, rel. for relative, and avg. for average.
M.r. Max. abs. err. Max. rel. err. Avg. abs. err. Avg. rel. err.
0% 1.6 10
−4
0.0190% 4.5 10
−5
0.0076%
5% 5.0 10
−5
0.0072% 1.2 10
−5
0.0030%
10% 2.1 10
−5
0.0032% 3.0 10
−6
0.0012%
15% 1.0 10
−5
0.0018% 9.8 10
−7
0.0006%
20% 5.7 10
−6
0.0011% 4.0 10
−7
0.0003%
The accuracy of the approximate formula (6.3) is tested for a 40 years deal, with EUR
market data of 8 February 1999, for which the swaption volatility level is on average 14%.
The test is performed at various mean reversion levels, 0%, 5%, 10%, 15%, and 20%. The
terminal correlation matrix within the Markovfunctional model is calculated numerically
on a lattice under the terminal measure and subsequently compared to the correlation
matrix given by the approximate formula (6.3). We note that the comparison contains two
sources of error: First, the approximation (6.3), and, second, the numerical error inherent
in the lattice calculation. In Table 6.4, various descriptive data for the comparison test
are displayed. Reported are, over the entries in the matrix, the maximum absolute and
relative errors, and the average absolute and relative errors. As can be seen from Table
6.4, these errors are quite small, especially considered over a 40 years horizon.
6.5 Empirical comparison results
In this section, we report the results of our empirical comparison. The deal description
is given in Table 6.5. For market models we use the terminal measure, 10,000 simulation
paths (5,000 plus 5,000 antithetic) and 10 stochastic factors (a full factor model), bar
when a = 0%, we use a single factor model. To determine the exercise boundary in
market models, we use the leastsquares Monte Carlo algorithm of Longstaﬀ & Schwartz
(2001), with all forward rates as explanatory variables, i.e., all available LIBOR rates
for the LIBOR market model and all available swap rates for the swap market model.
The reason for using all available rates as explanatory variables is that the multi factor
nature of the market models needs be retained (if at all present; for a = 0% a single
factor model must be used). As basis functions we use a constant and one linear term
176
148 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
Table 6.5: The Bermudan swaption deal used in the comparison.
Trade: Bermudan Swaption
Trade Type: Receive Fixed
Notional: USD 100m
Start Date: 16Jun2004
End Date: 16Jun2014
Fixed Rate: 3.2%
Index Coupon: Per Annum
Index Basis: ACT/365
Roll Type: Modiﬁed Following
Callable: At Fixing Dates
per explanatory variable, ¦1, x
1
, . . . , x
m
¦, where m denotes the number of explanatory
variables. The NPVs, deltas and vegas of the deal are calculated at each trade date
from 16 June 2003 till 15 June 2004, inclusive, for the mean reversion levels 0%, 5% and
10%. A price comparison is displayed in Figure 6.2. As can be seen from the ﬁgure, the
Markovfunctional and market models are similar in terms of NPV, and prices comove
and stay together over time.
The models are, more importantly, compared in terms of hedge performance. With
respect to hedging, we use socalled bucket hedging rather than factor hedging. With
factor hedging, the number of hedge instruments equals the number of factors in the
model. Risk sensitivities are calculated by perturbing only the model intrinsic factors.
With bucket hedging, the number of hedge instruments equals the number of market
traded instruments to which the model has been calibrated to. Risk sensitivities are
calculated by perturbing the value of a market traded asset, and then by revaluation of
the derivative in a model recalibrated to the perturbed market data. The reasons that
we employ bucket hedging rather than factor hedging are twofold. First, Driessen et al.
(2003, Section VII.C) show that bucket hedging outperforms factor hedging for caps and
European swaptions (for delta hedging). Second, bucket hedging corresponds to ﬁnancial
practice.
Two types of hedges are considered:
(i) Delta hedging only.
(ii) Delta and vega hedging.
The delta hedge is set up in terms of discount bonds, one discount bond for each tenor
time associated with the deal. In the case of the deal of Table 6.5, there are 11 such
177
6.5. EMPIRICAL COMPARISON RESULTS 149
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
J
u
n

0
3
J
u
n

0
3
J
u
l

0
3
J
u
l

0
3
A
u
g

0
3
A
u
g

0
3
S
e
p

0
3
S
e
p

0
3
O
c
t

0
3
O
c
t

0
3
N
o
v

0
3
N
o
v

0
3
D
e
c

0
3
D
e
c

0
3
D
e
c

0
3
J
a
n

0
4
J
a
n

0
4
F
e
b

0
4
F
e
b

0
4
M
a
r

0
4
M
a
r

0
4
A
p
r

0
4
A
p
r

0
4
M
a
y

0
4
M
a
y

0
4
M
a
y

0
4
J
u
n

0
4
Trade day
B
e
r
m
u
d
a
n
s
w
a
p
t
i
o
n
v
a
l
u
e
(
U
S
D
)
MF MR=0%
MF MR=5%
MF MR=10%
SMM MR=0%
SMM MR=5%
SMM MR=10%
LMM MR=0%
LMM MR=5%
LMM MR=10%
Figure 6.2: Bermudan swaption values per trade date, for various models and correlation
speciﬁcations.
discount bonds. To set up a joint delta and vega hedge, we proceed in the following four
steps. First, we calculate the vegas of the 10 underlying European swaptions. Second,
we calculate the amount of each of the European swaptions needed to have zero portfolio
vega for all underlying volatilities. Third, the aggregate delta position, of the Bermudan
and European swaptions, is calculated. Fourth, discount bonds are acquired to obtain
zero delta exposure for all 11 delta buckets.
The risk sensitivities are calculated in two ways, as detailed in Section 6.2.3, (i) ﬁnite
diﬀerences with perturbation sizes 1 bp for delta and 5 bp for vega (referred to as ‘large’
perturbation sizes), and, (ii) constant exercise decision method, with perturbation sizes
10
−5
bp for both delta and vega (referred to as ‘small’ perturbation sizes).
We note here that the computational time of calculating the NPV, the 11 deltas and
the 10 vegas, at any particular trade date, is around 92 seconds for market models
4
with ordinary LS, around 42 seconds for market models with constant exercise decision
method, versus 3 seconds for the Markovfunctional model. This diﬀerence of compu
4
There are fast algorithms for implementation of market models with Monte Carlo, see Joshi (2003b)
for LIBOR models, and Pietersz & van Regenmortel (2005, Section 5) (see also Section 7.5 in this thesis)
for swap models. Needless to say, we used these fast algorithms.
178
150 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
500,000
400,000
300,000
200,000
100,000
0
100,000
200,000
300,000
400,000
MF 0%
(unhedged)
MF 0%
(delta
hedged)
MF 0%
(delta &
vega
hedged)
SMM 5%
(unhedged)
SMM 5%
(delta
hedged)
SMM 5%
(delta &
vega
hedged)
LMM 10%
(unhedged)
LMM 10%
(delta
hedged)
LMM 10%
(delta &
vega
hedged)
1st qrt
min
median
max
3rd qrt
Figure 6.3: Comparison of delta versus delta and vega hedging. Boxwhisker plots for
the change in value (in USD) of the hedged portfolio. The percentages denote the mean
reversion level (MF) or correlation parametrization parameter (LMM and SMM). For
market models, we use the constant exercise decision method, with ‘small’ perturbation
sizes.
tational time is inherent to the (least squares) Monte Carlo implementation of market
models versus the lattice implementation of Markovfunctional models. Of course, such
lattice implementation is allowed only because of the mild pathdependency of Bermudan
swaptions.
The hedge portfolios are set up at each trade day and the change in portfolio value on
the next trade day is recorded. The hedge test results are ordered in three subsections.
6.5.1 Delta hedging versus delta and vega hedging
The performance of delta hedging versus delta and vega hedging is compared. Boxwhisker
plots, for the change in hedge portfolio value, are displayed in Figure 6.3, for various
models and mean reversion or correlation parametrization parameters. Here, MF, LMM,
and SMM denote respectively, Markovfunctional model, LIBOR market model and swap
market model. Boxwhisker plots provide a convenient representation of a distribution,
by displaying ﬁve of its key characteristics: the minimum, median, and maximum values,
and the ﬁrst and third quartiles.
We draw the following conclusions from the boxwhisker plots in Figure 6.3:
1. Delta hedging signiﬁcantly decreases variance of P&L.
179
6.5. EMPIRICAL COMPARISON RESULTS 151
200,000
150,000
100,000
50,000
0
50,000
100,000
150,000
200,000
250,000
MF (delta&vega
hedged)
SMM
(delta&vega
hedged) 'Large'
pert. sizes
SMM
(delta&vega
hedged) Const.
ex., 'small'
pert. sizes
LMM
(delta&vega
hedged) 'Large'
pert. sizes
LMM
(delta&vega
hedged) Const.
ex., 'small'
pert. sizes
1st qrt
min
median
max
3rd qrt
Figure 6.4: ‘Large’ perturbation sizes versus constant exercise decision method with
‘small’ perturbation sizes. Boxwhisker plots for the change in value (in USD) of the
hedged portfolio. Mean reversion or correlation parameter of 0%.
2. Vega hedging additional to delta hedging signiﬁcantly further decreases variance of
P&L.
It is clear that a joint delta and vega hedge by far outperforms a delta hedge. Therefore
we omit, in the remainder of the chapter, further study of delta hedges without a vega
hedge.
6.5.2 ‘Large’ perturbation sizes versus constant exercise deci
sion method with ‘small’ perturbation sizes
The performance of joint deltavega hedging is compared as dependent on the method
used to calculate risk sensitivities. Boxwhisker plots for the change in value of the delta
vega hedged portfolios, with a mean reversion of 0% or a correlation parameter of 0%,
are displayed in Figure 6.4. Here, ‘const. ex.’ and ‘pert.’ denote ‘constant exercise
decision method’ and ‘perturbation’, respectively. The analogous boxwhisker plots for
mean reversion or correlation parameters 5% and 10% are similar. We draw the following
conclusions from the boxwhisker plots in Figure 6.4.
1. The estimation of sensitivities by ﬁnite diﬀerences over MC with ‘large’ perturbation
sizes adversely aﬀects the variance of P&L for hedging in market models.
180
152 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
2. The best performing Greek calculation method, for deltavega hedging, is the con
stant exercise decision method, for which we approximately obtain similar results
as with the Markovfunctional model.
3. The use of the constant exercise decision method enables proper functioning of
market models as risk management tools, for callable products on underlying assets
that are continuously dependent on initial market data.
It is clear that the constant exercise decision method with ‘small’ perturbation sizes by
far outperforms ordinary LS with ‘large’ perturbation sizes. The theoretical explanation
of this outperformance is related to two issues. First, the classical LS algorithm causes
a discontinuity in the numeraire relative payoﬀ along the path, which renders ﬁnite dif
ference estimates of sensitivities to be less eﬃcient. Second, ‘larger’ perturbation sizes
cause more variance in the ﬁnite diﬀerence estimate of a sensitivity, since the correlation
between the payoﬀ in the original and perturbed models becomes smaller. These two ef
fects lead to more Monte Carlo caused randomness in the contents of the hedge portfolio,
which ultimately leads to increased variance of P&L, as can be seen in Figure 6.4.
We omit, in the remainder of the chapter, further study of ordinary LS with ‘large’
perturbation sizes.
6.5.3 Deltavega hedge results
The performance of joint deltavega hedging is compared across models and mean rever
sion or correlation speciﬁcations. For the market models, we use the constant exercise
decision method with ‘small’ perturbation sizes. Boxwhisker plots for the change in value
of the deltavega hedged portfolios are displayed in Figure 6.5. We draw the following
conclusions from the boxwhisker plots in Figure 6.5.
1. The impact of mean reversion or correlation parameter speciﬁcation on hedge per
formance is not very large.
2. The hedge performance for all three models is very similar.
6.6 The impact of smile
In this section, we provide details on how the Markovfunctional and swap market models
can be ﬁtted to smile and investigate the impact of smile relative to the impact of corre
lation cq. mean reversion on the prices of Bermudan swaptions. As a concrete example,
the displaced diﬀusion smile dynamics of Rubinstein (1983) are considered. In a displaced
181
6.6. THE IMPACT OF SMILE 153
40,000
30,000
20,000
10,000
0
10,000
20,000
30,000
MF 0%
(delta &
vega
hedged)
MF 5%
(delta &
vega
hedged)
MF 10%
(delta &
vega
hedged)
SMM 0%
(delta &
vega
hedged)
SMM 5%
(delta &
vega
hedged)
SMM
10%
(delta &
vega
hedged)
LMM 0%
(delta &
vega
hedged)
LMM 5%
(delta &
vega
hedged)
LMM
10%
(delta &
vega
hedged)
1st qrt
min
median
max
3rd qrt
Figure 6.5: Deltavega hedge results. Boxwhisker plots for the change in value (in USD) of
the hedged portfolio. The percentages denote the mean reversion level (MF) or correlation
parametrization parameter (LMM and SMM). For market models, we use the constant
exercise decision method, with ‘small’ perturbation sizes.
diﬀusion setting, the forward swap rate is modelled as
s
i:n+1
(t) = ˜ s
i:n+1
(t) −r
i
,
d˜ s
i:n+1
(t)
˜ s
i:n+1
(t)
= σ
i:n+1
dw
(i:n+1)
(t), (6.8)
with r
i
the displacement parameter and w
(i:n+1)
a Brownian motion under the forward
swap measure associated with s
i:n+1
. The solution to stochastic diﬀerential equation
(SDE) (6.8) is
s
i:n+1
(t) = −r
i
+
_
s
i:n+1
(0) + r
i
_
exp
_
σ
i:n+1
w
(i:n+1)
(t) −
1
2
σ
2
i:n+1
t
_
. (6.9)
The displaced diﬀusion extension is ﬁrst discussed for the Markovfunctional model and
second for the swap market model. The Markovfunctional model is ﬁtted to volatility by
ﬁtting the digital swaptions. The value v
(i)
of the digital swaption on swap rate s
i:n+1
(t
i
)
with strike k is given by the familiar formula in the Black world
v
(i)
= p
i:n+1
(0)φ
_
d
(i)
2
_
, d
(i)
2
=
log
_
k/s
i:n+1
(0)
_
−
1
2
σ
2
i:n+1
t
i
σ
i:n+1
√
t
i
(6.10)
where φ() denotes the cumulative normal distribution function and where p
i:n+1
denotes
the present value of a basis point, p
i:n+1
=
n
k=i
α
i
b
i+1
(t). Here α
i
denotes the day
count fraction for period [t
i
, t
i+1
] and b
i
(t) denotes the timet value of a discount bond
182
154 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
for payment of one unit of currency at time t
i
. In the displaced diﬀusion world, the value
˜ v
(i)
of the digital swaption is given by a displaced forward swap rate and strike
˜ v
(i)
= p
i:n+1
(0)φ
_
˜
d
(i)
2
_
,
˜
d
(i)
2
=
log
_
s
i:n+1
(0)+r
i
k+r
i
_
−
1
2
σ
2
i:n+1
t
i
σ
i:n+1
√
t
i
. (6.11)
The implementation of a nonsmile Markovfunctional model has to be changed only in
two places to incorporate displaced diﬀusion smile dynamics. First, the functional form
of the terminal discount bond b
n+1
at time t
n
is determined, using the equation
b
n+1
(t
n
) =
1
1 + α
n
s
n:n+1
(t
n
)
. (6.12)
In a nonsmile Markovfunctional model, we then have
b
n+1
(t
n
, x(t
n
)) =
1
1 + α
n
s
n:n+1
(0) exp
_
−
1
2
σ
2
n:n+1
t
n
+
σ
n:n+1
e
2at
n
−1
x(t
n
)
_, (6.13)
this is exactly the penultimate equation on page 399 of Hunt et al. (2000). In a displaced
diﬀusion setting, we substitute (6.9) into (6.12) and then (6.13) becomes
˜
b
n+1
(t
n
, x(t
n
)) =
1
1 + α
n
_
−r
i
+
_
s
n:n+1
(0) + r
i
_
exp
_
−
1
2
σ
2
n:n+1
t
n
+
σ
n:n+1
e
2at
n
−1
x(t
n
)
_
_,
Second, the functional forms of the swap rates s
i:n+1
(t
i
, ), i = 1, . . . , n−1 are determined,
by inverting the value of the digital swaption against strike. In a nonsmile Markov
functional model, we invert (6.10) and obtain
s
i:n+1
(t
i
, x(t
i
)) = s
i:n+1
(0) exp
_
−
1
2
σ
2
i:n+1
t
i
−σ
i:n+1
√
t
i
φ
−1
_
j
(i)
(x(t
i
))
p
i:n+1
(0)
__
,
with j
(i)
(x) denoting the value of a digital swaption with strike x in the model, calculated
by induction from i = n − 1, . . . , 1. In a displaced diﬀusion setting, we invert (6.11) to
obtain
s
i:n+1
(t
i
, x(t
i
)) = −r
i
+
_
s
i:n+1
(0) + r
i
_
exp
_
−
1
2
σ
2
i:n+1
t
i
−σ
i:n+1
√
t
i
φ
−1
_
j
(i)
(x(t
i
))
p
i:n+1
(0)
__
.
Next, the displaced diﬀusion swap market model is made reference to. The dynamics
of the forward swap rates under the terminal measure in general smile models can be
found in Jamshidian (1997, Equation (6), page 320).
We ﬁt the displaced diﬀusion model to the market data of Table 6.3 and ﬁnd the
volatility parameters σ
i:n+1
and displacement parameters r
i
as listed in Table 6.6. The
ﬁtted volatility and ﬁt errors are displayed in Table 6.7. As can be seen from the table,
183
6.6. THE IMPACT OF SMILE 155
Table 6.6: Displaced diﬀusion parameters ﬁtted to the USD market data of 21 February
2003 of Table 6.3.
i 1 2 3 4 5
Expiry 1Y 2Y 3Y 4Y 5Y
Tenor 5Y 4Y 3Y 2Y 1Y
σ
i:n+1
28.29% 21.76% 18.28% 16.08% 14.62%
r
i
0.71% 1.55% 2.33% 2.89% 3.39%
Table 6.7: Fitted swaption volatility and ﬁt errors with the displaced diﬀusion model,
in percentages, against strike and expiry for the USD data of 21 February 2003. All
displayed swaptions coterminate 6 years from today. Here ‘Exp.’ denotes Expiry.
Fitted swaption volatility
Strike, in oﬀset in basis points from the ATM forward swap rate
Exp. 300 200 100 50 0 50 100 200 300
1Y 37.82 35.11 33.88 33.48 33.15 32.89 32.66 32.30 32.03
2Y 34.32 31.57 30.07 29.54 29.11 28.74 28.43 27.92 27.51
3Y 32.23 29.58 28.02 27.45 26.98 26.57 26.21 25.63 25.16
4Y 30.34 27.83 26.29 25.72 25.23 24.82 24.46 23.85 23.37
5Y 29.17 26.74 25.21 24.63 24.14 23.72 23.35 22.73 22.23
Absolute ﬁt errors, model volatility minus market volatility
Strike, in oﬀset in basis points from the ATM forward swap rate
Exp. 300 200 100 50 0 50 100 200 300
1Y 20.96 10.29 3.46 1.71 0.00 0.34 0.67 0.98 0.82
2Y 9.33 7.05 2.50 1.28 0.02 0.15 0.33 0.45 0.21
3Y 8.49 5.54 1.99 1.01 0.02 0.44 0.90 0.60 0.41
4Y 8.31 4.58 1.67 0.87 0.00 0.06 0.14 0.13 0.15
5Y 8.00 4.18 1.45 0.73 0.06 0.09 0.14 0.10 0.20
184
156 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
the displaced diﬀusion model ﬁts the market well for ATM and outofthemoney (OTM)
options (ﬁt error less than a percent), but not so well for inthemoney (ITM) options,
for which the model underﬁts the market up to 21%. We note here that the disability
of obtaining a perfect ﬁt to the smile volatility data is due solely to the displaced dif
fusion model, and not to the Markovfunctional or market models. An exact ﬁt to the
swaption smile surface can be obtained, for example, with the relativeentropy minimiza
tion framework of Avellaneda, Holmes, Friedman & Samperi (1997). To benchmark the
implementation of the displaced diﬀusion Markovfunctional and swap market models,
European swaptions are valued in (i) a constant volatility model with the volatility asso
ciated with the expiry and strike of the swaption and (ii) the smile model. The results of
this test for the Markovfunctional model have been displayed in Table 6.8. The bench
mark is of high quality, though there are some slight diﬀerences due to numerical errors
in the grid calculation. The benchmark results for the swap market model are of similar
good quality.
Subsequently, Bermudan swaptions are priced with varying strikes and otherwise spec
iﬁed in Table 6.9. The Bermudan swaptions are priced in the Markovfunctional and SMM
models, and in their displaced diﬀusion counterparts, at various mean reversion or corre
lation parameter levels. In the nonsmile models, there are two possibilities for choosing
the volatilities. First, the volatilities can be used that correspond to the strike of the
Bermudan swaption. Second, the ATM volatilities can be used, regardless of the strike of
the Bermudan swaption. The calculated prices are displayed in Table 6.10. The results in
the table show that the impact of correlation is signiﬁcant, since a 10% change in mean
reversion can cause a change in value equal to a parallel volatility shift of 1%. The impact
of correlation is comparable to that reported by Choy, Dun & Schl¨ ogl (2004, Table 11),
though the latter authors name this impact ‘nonsubstantial’. The impact of smile is, for
the deal considered, much larger than the impact of correlation and mean reversion, since
10% mean reversion is usually a high level when observed in the market. In terms of vega,
the smile impact can be as large as a parallel shift in volatility of 8% to 1%, for perstrike
volatilities, and 1% to 6%, for ATM volatilities. Furthermore, the displaced diﬀusion
smile model underﬁtted the volatility smile observed in the market. Since increasing the
volatility usually leads to a higher value for Bermudan swaptions
5
, the impact of smile
can thus be even higher, when ATM volatilities are used.
5
Pietersz & Pelsser (2004a, Appendix) (see also Appendix 2.A) explain that Bermudan swaptions can
in certain particular circumstances have negative vega.
185
6.6. THE IMPACT OF SMILE 157
T
a
b
l
e
6
.
8
:
B
e
n
c
h
m
a
r
k
r
e
s
u
l
t
s
f
o
r
t
h
e
d
i
s
p
l
a
c
e
d
d
i
ﬀ
u
s
i
o
n
M
a
r
k
o
v

f
u
n
c
t
i
o
n
a
l
m
o
d
e
l
:
E
u
r
o
p
e
a
n
s
w
a
p
t
i
o
n
p
r
i
c
e
s
i
n
a
c
o
n
s
t
a
n
t
v
o
l
a
t
i
l
i
t
y
m
o
d
e
l
v
e
r
s
u
s
a
s
m
i
l
e
m
o
d
e
l
.
T
h
e
n
o
t
i
o
n
a
l
i
s
U
S
D
1
0
0
m
i
l
l
i
o
n
.
A
l
l
d
i
s
p
l
a
y
e
d
s
w
a
p
t
i
o
n
s
c
o

t
e
r
m
i
n
a
t
e
6
y
e
a
r
s
f
r
o
m
t
o
d
a
y
.
H
e
r
e
‘
E
x
p
.
’
d
e
n
o
t
e
s
E
x
p
i
r
y
.
S
t
r
i
k
e
,
i
n
o
ﬀ
s
e
t
i
n
b
a
s
i
s
p
o
i
n
t
s
f
r
o
m
t
h
e
A
T
M
f
o
r
w
a
r
d
s
w
a
p
r
a
t
e

3
0
0

2
0
0

1
0
0

5
0
0
5
0
1
0
0
2
0
0
3
0
0
E
x
p
.
C
o
n
s
t
a
n
t
v
o
l
a
t
i
l
i
t
y
M
a
r
k
o
v

f
u
n
c
t
i
o
n
a
l
m
o
d
e
l
1
Y
3
2
8
5
2
,
5
6
1
6
0
6
,
7
7
7
1
,
3
0
4
,
8
5
1
2
,
3
4
0
,
1
7
4
3
,
6
8
9
,
4
8
5
5
,
2
9
8
,
8
8
9
9
,
0
5
3
,
7
5
7
1
3
,
2
0
3
,
0
9
0
2
Y
2
4
,
5
4
0
2
4
9
,
6
3
9
1
,
0
1
8
,
2
1
3
1
,
6
8
3
,
7
8
5
2
,
5
4
2
,
7
5
9
3
,
5
8
0
,
4
4
2
4
,
7
7
3
,
6
8
5
7
,
5
2
1
,
9
7
5
1
0
,
5
9
5
,
4
8
0
3
Y
8
4
,
0
2
6
3
9
1
,
0
7
2
1
,
0
9
8
,
9
6
5
1
,
6
2
9
,
5
0
8
2
,
2
7
6
,
2
8
5
3
,
0
3
0
,
8
6
2
3
,
8
8
1
,
5
0
4
5
,
8
1
8
,
0
7
0
7
,
9
8
4
,
7
9
9
4
Y
1
0
6
,
9
5
9
3
6
7
,
4
5
7
8
7
8
,
1
5
6
1
,
2
3
7
,
1
2
1
1
,
6
6
3
,
2
7
4
2
,
1
5
1
,
9
6
7
2
,
6
9
7
,
0
7
3
3
,
9
2
8
,
9
9
8
5
,
3
0
5
,
3
9
4
5
Y
8
2
,
7
2
5
2
3
5
,
0
7
7
5
0
4
,
0
2
0
6
8
4
,
9
2
8
8
9
5
,
6
6
2
1
,
1
3
4
,
2
2
5
1
,
3
9
8
,
0
9
3
1
,
9
9
0
,
4
9
7
2
,
6
5
0
,
7
0
6
D
i
s
p
l
a
c
e
d
d
i
ﬀ
u
s
i
o
n
M
a
r
k
o
v

f
u
n
c
t
i
o
n
a
l
m
o
d
e
l
1
Y
3
2
2
5
2
,
2
5
5
6
0
5
,
4
4
6
1
,
3
0
3
,
0
8
3
2
,
3
3
8
,
2
2
0
3
,
6
8
7
,
5
9
9
5
,
2
9
7
,
2
3
7
9
,
0
5
2
,
7
2
7
1
3
,
2
0
2
,
5
5
6
2
Y
2
4
,
2
0
1
2
4
8
,
0
6
0
1
,
0
1
5
,
1
2
4
1
,
6
8
0
,
2
2
1
2
,
5
3
8
,
9
9
3
3
,
5
7
6
,
7
4
3
4
,
7
7
0
,
2
3
8
7
,
5
1
9
,
3
3
0
1
0
,
5
9
3
,
6
7
5
3
Y
8
3
,
1
6
9
3
8
8
,
9
8
3
1
,
0
9
5
,
7
8
1
1
,
6
2
6
,
0
1
8
2
,
2
7
2
,
6
6
8
3
,
0
2
7
,
2
9
0
3
,
8
7
8
,
1
0
1
5
,
8
1
5
,
2
4
9
7
,
9
8
2
,
6
6
7
4
Y
1
0
5
,
9
9
0
3
6
5
,
5
8
8
8
7
5
,
5
8
6
1
,
2
3
4
,
3
6
2
1
,
6
6
0
,
4
3
3
2
,
1
4
9
,
1
5
3
2
,
6
9
4
,
3
6
6
3
,
9
2
6
,
6
8
3
5
,
3
0
3
,
5
5
0
5
Y
8
2
,
1
5
4
2
3
4
,
1
2
5
5
0
2
,
7
9
2
6
8
3
,
6
2
7
8
9
4
,
3
2
9
1
,
1
3
2
,
9
0
3
1
,
3
9
6
,
8
1
5
1
,
9
8
9
,
3
7
8
2
,
6
4
9
,
7
8
7
186
158 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
Table 6.9: The Bermudan swaption deal used in the test of impact of smile.
Trade: Bermudan Swaption
Trade Type: Receive Fixed
Notional: USD 100m
Valuation Date: 21Feb2003
Start Date: 21Feb2004
End Date: 21Feb2009
Index Coupon: Per Annum
Index Basis: ACT/365
Roll Type: Modiﬁed Following
Callable: At Fixing Dates
6.7 Conclusions
We investigated the impact of correlation on the pricing and hedge performance of Bermu
dan swaptions for various models. We showed how the Markovfunctional model can ap
proximately be ﬁtted to terminal correlation, by developing a novel approximate formula
for terminal correlation. The approximate formula was shown to be of high quality in a
numerical test. Empirically, the impact of terminal correlation was shown to be some
what signiﬁcant for pricing of Bermudan swaptions in market models, and the same eﬀect
can be attained in the singlefactor Markovfunctional model by calibration to terminal
correlation. We showed empirically by comparison with multi factor market models that
hedge performance for Bermudan swaptions is, for practical purposes, almost identical,
regardless of the model, number of factors, or correlation speciﬁcation. Our results show
that the need of modelling correlation can already be adequately met by a single factor
model. Whether these results extend beyond the asset class of Bermudan swaptions, is
an interesting question that we leave to answer in future research. With respect to hedge
portfolios, we showed (i) that delta hedging signiﬁcantly reduces variance of P&L in both
Markovfunctional and market models, (ii) that vega hedging additional to delta hedg
ing signiﬁcantly further reduces variance of P&L in both Markovfunctional and market
models, (iii) that estimation of Greeks by ﬁnite diﬀerences over Monte Carlo for callable
products with the regular LS algorithm and ‘large’ perturbation sizes adversely aﬀects
the deltavega hedge performance of market models. We showed that our proposal of the
constant exercise decision method with ‘small’ perturbation sizes enables proper func
tioning of market models as risk management tools, for callable products on underlying
assets that are continuously dependent on initial market data. Moreover, we investigated
187
6.7. CONCLUSIONS 159
Table 6.10: Prices of Bermudan swaptions in smile versus nonsmile models with various
correlation/mean reversion assumptions. Here ‘MF’, ‘MR’, ‘a’ and ‘SE’ denote ‘Markov
functional model’, ‘mean reversion’, ‘the correlation parameter a of (6.1)’ and the ‘stan
dard error’, respectively. Any diﬀerence (‘Diﬀ.’) is with respect to a price at zero mean
reversion or at zero a. The nonsmile models use perstrike volatilities, except where
indicated that ATM volatilities are used.
Strike
2% 3% 4% 5% 6%
MFMR=0% 420,954 1,072,043 2,452,060 5,047,951 8,573,535
Vega 1% MF 34,609 68,118 96,242 93,513 68,477
SMMa=0% 407,667 1,053,210 2,443,332 5,065,794 8,605,508
SMM SEa=0% 7,551 12,964 17,204 16,032 10,625
Vega 1% SMM 33,105 66,154 97,141 91,850 67,194
MFMR=5% 436,518 1,103,269 2,495,689 5,090,486 8,606,769
Diﬀ. in vega 0.4 0.5 0.5 0.5 0.5
SMMa=5% 407,922 1,060,731 2,461,625 5,101,071 8,657,349
SMM SEa=5% 7,386 12,764 17,228 16,244 11,243
Diﬀ. in vega 0.0 0.1 0.2 0.4 0.8
MFMR=10% 452,417 1,135,155 2,540,694 5,135,323 8,642,685
Diﬀ. in vega 0.9 0.9 0.9 0.9 1.0
SMMa=10% 405,819 1,062,986 2,485,969 5,142,268 8,708,570
SMM SEa=10% 7,202 12,488 17,090 16,359 11,873
Diﬀ. in vega 0.1 0.1 0.4 0.8 1.5
Smile MF 148,130 747,270 2,347,664 5,074,574 8,623,356
Diﬀ. in vega 7.9 4.8 1.1 0.3 0.7
Smile SMM 146,223 756,925 2,373,545 5,094,288 8,642,666
Smile SMM SE 4,710 11,138 16,781 14,437 9,801
Diﬀ. in vega 8.3 4.8 0.8 0.5 1.0
ATM volatilities
MFMR=0% 67,210 650,483 2,328,235 5,124,154 8,691,466
Vega 1% MF 14,997 60,797 95,139 93,681 72,106
Smile MF 148,130 747,270 2,347,664 5,074,574 8,623,356
Diﬀ. in vega 5.4 1.6 0.2 0.5 0.9
SMMa=0% 61,944 610,939 2,286,869 5,139,345 8,731,084
SMM SEa=0% 2,626 10,114 16,763 16,122 11,628
Vega 1% SMM 13,915 59,861 94,495 91,859 77,375
Smile SMM 146,223 756,925 2,373,545 5,094,288 8,642,666
Diﬀ. in vega 6.1 2.4 0.9 0.5 1.1
188
160 CHAPTER 6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS
the impact of smile via displaced diﬀusion versions of the Markovfunctional and swap
market models. For a particular deal and USD market data, we showed that the impact
of smile is much larger that the impact of correlation.
189
Chapter 7
Generic market models
Currently, there are two market models for valuation and risk management of interest
rate derivatives, the LIBOR and swap market models of Brace et al. (1997), Jamshid
ian (1997), Musiela & Rutkowski (1997) and Miltersen et al. (1997). In this chapter,
we introduce arbitragefree constant maturity swap (CMS) market models and generic
market models featuring forward rates that span periods other than the classical LIBOR
and swap periods. The generic market model generalizes the LIBOR and swap market
models. We derive necessary and suﬃcient conditions for the structure of the forward
rates to span an arbitragefree economy in terms of relative discount bond prices, at all
times. We develop generic expressions for the drift terms occurring in the stochastic dif
ferential equation driving the forward rates under a single pricing measure. The generic
market model is particularly apt for pricing of Bermudan CMS swaptions, ﬁxedmaturity
Bermudan swaptions, and callable hybrid coupon swaps. We show how the instantaneous
correlation of the generic forward rates can be calculated from the single instantaneous
correlation matrix of forward LIBOR rates. These results are suﬃcient for implementation
of calibration and pricing algorithms for generic market models.
7.1 Introduction
Generic market models are speciﬁcally designed for the pricing of certain types of swaps.
In particular, we will consider constant maturity swaps (CMS) and hybrid coupon swaps.
An interest rate swap is an agreement to exchange, over a speciﬁed period, interest rate
payments, at a speciﬁed frequency, over a speciﬁed underlying notional that is not ex
changed. In a plainvanilla swap, the ﬂoating interest rate is the LIBOR rate. A constant
maturity swap pays not the LIBOR rate but instead a swap rate with speciﬁed tenor, ﬁxed
for all payments in the CMS swap. The payment frequency remains unchanged however.
A hybrid coupon swap is a swap that features a ﬂoating payment schedule, designating the
190
162 CHAPTER 7. GENERIC MARKET MODELS
nature of each of the ﬂoating payments. The nature of the ﬂoating payment can be that it
is determined by either a LIBOR rate with varying maturity or a swap rate with varying
tenor. An example of such a payment schedule has been given in Table 7.1. Additionally,
the function that transforms the LIBOR or swap rate into a cash ﬂow may even be not
entirely linear, for example, capped, ﬂoored or inverse.
The above swaps may have the feature that the swaps can be cancelled. Such versions
are deemed cancellable swaps. To hold a cancellable swap is equal to holding a swap and
an option to enter into the very same swap but with reversed cash ﬂows
1
. The latter
option is called a callable swap. In this chapter we will also be concerned with the pricing
of callable and cancellable CMS and hybrid coupon swaps. There are two types of callable
swaptions: ﬁxedmaturity or coterminal. A coterminal option allows to enter into an
underlying swap at several exercise opportunities, where each swap ends at the same
contractually determined end date. The swap maturity becomes shorter as exercise is
delayed. In contrast, for the ﬁxedmaturity version, each underlying swap has the same
contractually speciﬁed maturity and the respective end dates then diﬀer.
The main outset of the chapter is that a model is deemed to be proper for valuing a
certain callable or cancellable swap, if the volatility of a rate that appears in the contract
payoﬀ has been calibrated correctly to the market volatility. The concept is best illustrated
by example. In the case of the hybrid coupon swap of Table 7.1 at the valuation date 11
June 2004, we would want to calibrate exactly to the volatilities of the 1Y 2Y swaption,
2Y 4Y swaption, 3Y caplet, 4Y 2Y swaption and 5Y caplet. In contrast, for a cap
one would calibrate to the volatilities of the 1Y , 2Y , 3Y , 4Y and 5Y caplets. For a
coterminal Bermudan swaption, to the volatilities of the 1Y 5Y , 2Y 4Y , 3Y 3Y ,
4Y 2Y and 5Y 1Y swaptions. When employing a LIBOR market model to value a
cap, the model would feature the following 1Y forward LIBOR rates: 1Y , 2Y , 3Y , 4Y
and 5Y . If a swap market model would be used to value the Bermudan swaption, it would
feature the 1Y 5Y , 2Y 4Y , 3Y 3Y , 4Y 2Y and 5Y 1Y forward swap rates. For
both LIBOR and swap market models, the canonical interest rates are simply equipped
with the corresponding canonical volatilities, allowing for an eﬃcient and straightforward
calibration. Obviously, to straightforwardly calibrate a market model for the hybrid
coupon swap of Table 7.1, and callable or cancellable versions thereof, the model would
have to feature the forward swap rates 1Y 2Y , 2Y 4Y , 4Y 2Y , and the 1Y forward
LIBOR rates at 3Y and 5Y . Up to now, whether a model containing such rates would
be arbitragefree is not wellknown. To our knowledge, generic methods for deriving the
arbitragefree drift terms for the SDE driving the various forward rates have not been
developed yet. In this chapter, we develop such generic theory.
1
Some readers might not be familiar with ‘callable’ and ‘cancellable’ swaps and might prefer to think
of swaps and options thereon.
191
7.1. INTRODUCTION 163
Table 7.1: Example of a hybrid coupon swap payment structure for the ﬂoating side. Date
roll is modiﬁed following and day count is actual over 365.
Fixing Day count Payment Rate
date fraction date
11Jun04 1.005479 13Jun05 1Y LIBOR
13Jun05 0.997260 12Jun06 2Y swap rate
12Jun06 0.997260 11Jun07 4Y swap rate
11Jun07 1.002740 11Jun08 1Y LIBOR
11Jun08 1.000000 11Jun09 2Y swap rate
11Jun09 1.000000 11Jun10 1Y LIBOR
192
164 CHAPTER 7. GENERIC MARKET MODELS
In terms of practical relevance, the generic market model technology is valuable to
ﬁnancial institutions that aim to trade in CMS Bermudan swaptions or callable hybrid
coupon swaps. As such, their costumers might require any sequence of various maturity
LIBOR or swap rate payments in the tailored exotic derivatives that they demand for
their business. In this chapter, we show that a generic implementation of the resulting
drift terms is feasible in practice, thereby enabling proper pricing and hedging of such
hybrid coupon swaps.
A further motivation for the theory in this chapter is that the idea of generic market
models is not new to the ﬁnance literature, since it has already been suggested by Gal
luccio, Huang, Ly & Scaillet (2004). These authors discuss what they call the cosliding
(commonly referred to as ‘LIBOR’) and coterminal (commonly referred to as ‘swap’)
market models. The class of cosliding market models corresponds to our class of CMS
market models, but ours is deﬁned diﬀerently. Galluccio et al. (2004) show that the only
admissible cosliding model is the LIBOR market model. Interestingly, we show that there
are n arbitragefree CMS market models associated with a tenor structure with n ﬁxings,
and the LIBOR and swap models are two special cases of these CMS models. In addi
tion to the n CMS models, we introduce generic market models, extending the number
of arbitragefree market models to n!. Also, Galluccio et al. (2004) discuss the coinitial
market model, but this model does not ﬁt into our dynamic market model framework.
Moreover, in contrast to Galluccio et al. (2004), we derive generic expressions for the drift
terms of the forward rates, for all n! models (thus for LIBOR, swap, CMS and generic
models).
An alternative way of calibrating a model to the relevant volatility levels, is to take
a LIBOR market model, and derive generic approximate expressions for the volatility
of various forward rates. Such a procedure, for the speciﬁc case of calibration of the
LIBOR model to swaption volatility, has been investigated in J¨ackel & Rebonato (2003),
Joshi & Theis (2002), Hull & White (2000) and Pietersz & Pelsser (2004a) (Chapter 2).
The advantage of the generic market model speciﬁcation is that the relevant volatility
functions can be directly speciﬁed. Moreover, the development of the theory of generic
market models is justiﬁed already by the additional insight into the workings of LIBOR
and swap market models. Also, Pietersz & Pelsser (2005a) (see also Chapter 6) provide
an empirical price and hedge comparison for Bermudan swaptions with either (i) the
LIBOR model calibrated to swaption volatility via the approximate formula, or (ii) the
swap model equipped with its canonical swaption volatility. Prices turn out to be largely
similar, while hedge performance seems to be slightly better for the canonical models, see
Figures 2 and 5, respectively, of Pietersz & Pelsser (2005a) (see Figures 6.2 and 6.5 of
this thesis). These results are thus slightly in favour of CMS and generic market models,
rather than the use of approximate swaption volatility with the LIBOR model.
193
7.2. PRELIMINARIES 165
We mention three areas of market model theory to which the generic market model
approach extends. First, generic models may also be used in multicurrency market
models, see Schl¨ ogl (2002). Second, a numerical implementation of a generic model may
utilize drift approximations, see, for example, Hunter et al. (2001) and Pietersz et al. (2004,
2005) (see also Chapter 5). Third, generic models may be equipped with smile dynamics.
The volatility smile is the phenomenon that for European options diﬀerent Black (1976)
implied volatilities are quoted in the market when the strike of the option is varied. The
derivation of generic market models in this chapter does not make any assumptions on the
instantaneous volatility. As a result, smileincorporating models, such as the displaced
diﬀusion (Rubinstein 1983), and constant elasticity of variance (CEV) (Cox & Ross 1976)
models, can be readily applied to the generic market model framework. An application of
the CEV speciﬁcation to the LIBOR market model can be found in Andersen & Andreasen
(2000).
Finally, for an indepth overview of pricing models for interest rate derivatives, the
reader is referred to Rebonato (2004a).
An outline of the chapter is as follows. First, preliminaries are introduced. Second,
necessary and suﬃcient noarbitrage conditions on the structure and values of the forward
rates are derived. Third, generic arbitragefree drift terms for the forward rates are derived
under a change of measure in a market model setting. Fourth, the numeric eﬃciency of
the generic drift term calculations is discussed. Fifth, the issue of calibrating generic
market models to correlation is addressed. Sixth, we end with conclusions.
7.2 Preliminaries
We consider tenor times or a tenor structure 0 =: t
1
< < t
n+1
and day count fractions
α
i
, over the period [t
i
, t
i+1
], for i = 1, . . . , n. Suppose traded in the market is a set of m
forward LIBOR or swap rate agreements that are associated with that tenor structure
2
.
Initially, m may be diﬀerent from n, but in Theorem 8 we show that it makes sense, from
an economic point of view, to consider only m = n. The set of associated forward swap
agreements is administered by a set of pairs
c =
_
j
=
_
s(j), e(j)
_
; j = 1, . . . , m ; s(j), e(j) integers 1 ≤ s(j) < e(j) ≤ n + 1
_
.
(7.1)
Here s(j) and e(j) denote start and end of the forward swap agreement. The above set
expression for c simply designates that there are m associated forward swap agreements,
that each forward swap agreement starts and ends on one of the tenor times and that a
2
The frequency of the ﬂoating payments is restricted to one payment per ﬁxedpayment period, but
this is only for ease of exposition. In practice, this assumption may be relaxed and the theory follows
through unchanged for any positive whole number of ﬂoating payments per ﬁxedpayment period.
194
166 CHAPTER 7. GENERIC MARKET MODELS
start is strictly before an end. If the start s and end e of two forward swap agreements
(1)
,
(2)
are equal, then
(1)
and
(2)
are considered equal, thereby a priori excluding the
possibility of diﬀerent forward rates for the same forward swap agreement. We note also
that diﬀerent payment frequencies for a given swap period are not allowed. The value of
the forward rate associated with
j
is denoted by f
j
. Forward rate f
j
may, and shall, in the
course of our chapter, depend on time, f
j
= f
j
(t). The associated forward swap agreement
is deﬁned as follows. At times t
s(j)
and t
e(j)
the agreement starts and ends, respectively.
The agreement is partitioned by a number of e(j) − s(j) accrual periods [t
s(j)
, t
s(j)+1
],
. . . , [t
e(j)−1
, t
e(j)
]. The LIBOR rate is recorded at the start of each accrual period. If the
accrual periods are indexed by i = s(j), . . . , e(j) −1, then the LIBORobservation time is
t
i
, the maturity of the LIBOR deposit is t
i+1
−t
i
, and the observed LIBOR rate is denoted
by (t
i
). If forward swap agreement j has been entered into at time t
∗
at rate f
j
(t
∗
), then
the ﬁxed and ﬂoating payments are α
i
f
j
(t
∗
) and α
i
(t
i
), respectively. We assume liquid
trading in the market at times t
∗
= t
1
, . . . , t
n
of those forward swap agreements ∈ c
for which t
s(j)
≥ t
∗
. In other words, there is trading in a forward swap agreement if the
agreement has not yet started or is about to start. We assume the cost of entering into
any forward swap agreement at any tenor time to be zero.
The forward swap agreement structures of the LIBOR and swap market models ﬁt into
the framework of (7.1). For the LIBOR market model (LMM), c
LMM
= ¦(1, 2), (2, 3), . . . ,
(n, n+1)¦. For the swap market model (SMM), c
SMM
= ¦(1, n+1), (2, n+1), . . . , (n, n+
1)¦. We introduce here a third kind of market model, associated with the qperiod CMS
rates. We name it the CMS(q) market model, for q = 1, . . . , n, and it is deﬁned by
c
CMS(q)
= ¦(1, 1+q), (2, 2+q), . . . , (n−q +1, n+1), (n−q +2, n+1), . . . , (n, n+1)¦. We
note that for q = 1 and q = n we retain the LIBOR and swap market models, respectively.
The structure of these market models can be speciﬁed equivalently as follows, too.
There exists an enumeration
j
= (s(j), e(j)), such that, for the LIBOR model, s(j) = j,
e(j) = j +1. For the swap model, s(j) = j, e(j) = n+1. For the CMS(q) model, s(j) = j,
e(j) = j + q (j = 1, . . . , n −q + 1), e(j) = n + 1 (j = n −q + 2, . . . , n). (7.2)
7.2.1 Absence of arbitrage
Associated with the tenor structure we also consider discount bonds. A discount bond is
a hypothetical security that pays one unit of currency at its maturity. The price at time
t of a discount bond maturing at time t
i
is denoted by b
i
(t). We note that there are n+1
discount bonds and that we necessarily have b
i
(t
i
) = 1 for i = 1, . . . , n + 1. The latter
is just saying that the cost of immediately receiving one unit of currency is one unit of
currency. The timet
1
discount bond prices are sometimes simply denoted by b
i
rather
than by b
i
(t
1
).
195
7.2. PRELIMINARIES 167
In terms of price consistency among the discount bonds, forward swap agreements,
and LIBOR deposits, we require some form of absence of arbitrage. We follow Musiela
& Rutkowski (1997), in which two forms of noarbitrage are introduced. First, a weaker
notion of noarbitrage is the usual noarbitrage condition in a pure bond market. Second,
a stronger notion of noarbitrage assumes, in addition, that cash is also available in the
market, which means that money, not stored in a money market account, can be carried
over at zero cost. The stronger form of noarbitrage excludes a number of situations
allowed by the weaker form. For example, discount bond prices greater than 1 (negative
interest rates) are excluded by the strong form, but not by the weak form. More generally,
the discount bond prices are required, by the strong form, but not by the weak form, to
not increase with increasing maturity, as shown by Musiela & Rutkowski (1997, page
267, below Equation (13)). In the next section, it will be shown that the generic market
models guarantee the weak form of noarbitrage. Conditions guaranteeing the stronger
form of noarbitrage are more diﬃcult to derive. Therefore, hereafter we only consider
the weak form of noarbitrage, and any mentioning of ‘noarbitrage’ will refer to the weak
form. We note that the weak form of absence of arbitrage is guaranteed when all discount
bond prices are positive, since a set of positive future cash ﬂows implies a portfolio that
holds nonnegative amounts of discount bonds, of which at least one position is positive.
Since all discount bond prices are positive by assumption, we have that the price of such
a portfolio is positive, thereby excluding arbitrage.
Valuation of nonEuropean interest rate derivatives requires a dynamic model, that
is, a model that generates unique arbitragefree discount bond prices at all future time
points. Examples of such dynamic models are the LIBOR and swap market models. An
example of a nondynamic model is the coinitial market model, as deﬁned by Galluccio
& Hunter (2004). The coinitial model features forward swap rates that span the periods
(1, 2),(1, 3),. . . ,(1, n + 1), that is, all swap rates start at time t
1
but end consecutively at
times t
2
, . . . , t
n+1
. The coinitial speciﬁcation is nondynamic since at time t
2
, all forward
swap agreements have expired. From a practical point of view, nondynamic models
are less useful than dynamic models, since nondynamic models can only be used for
Europeanstyle options. For the dynamic case, arbitrary speciﬁcation of forward rates at
not only t
1
, but at all time points t
1
, . . . , t
n
, is required to lead to unique discount bond
prices.
Given an arbitrary set c of forward rates and their values ¦f
j
(t
i
)¦
i,j
, there are two
mutually exclusive possibilities, that are given in the following deﬁnition.
Deﬁnition 7
• Condition A. At each of the times t
1
, . . . , t
n
, there is a unique system of prices for
the discount bonds, such that the resulting aggregate trade system of discount bonds,
forward swap agreements, and LIBOR deposits, is arbitragefree.
196
168 CHAPTER 7. GENERIC MARKET MODELS
• Condition B. At least at one of the times t
1
, . . . , t
n
, either there exists no system or
there are more than one diﬀerent systems of prices for the discount bonds, such that
the resulting aggregate trade system of discount bonds, forward swap agreements,
and LIBOR deposits, is arbitragefree.
Obviously, we would want condition A to hold in ﬁnancial models, and, in particular, in
generic market models. In this chapter, we will derive necessary and suﬃcient conditions
on c and the values ¦f
j
(t
i
)¦, for condition A to hold. In particular, given a number of
n + 1 tenor times, we will show that there are exactly n! possibilities of choosing c. The
CMS market model (with LIBOR and swap market models as special cases) only accounts
for n of these possibilities. An example for n = 6 with market models of LIBOR, CMS(3),
swap, coinitial, and the hybrid swap of Table 7.1 (viewed from the valuation date 11 June
2003), is given in Figures 7.1 and 7.2.
Remark 4 (Forward LIBOR versus swaption frequencies) In this remark we point out
a silent assumption that is sometimes made when calibrating a market model to parts
of the swaption volatility matrix. For concreteness, we consider the EUR market, for
which market traded swaps have annual ﬁxed payments and semiannual ﬂoating LIBOR
payments. If a market model with semiannual ﬁxed payments is calibrated to a swaption
volatility, then silently it has been assumed that there is no signiﬁcant diﬀerence between
semiannual ﬁxed versus semiannual ﬂoating swaption volatility and annual ﬁxed versus
semiannual ﬂoating swaption volatility.
7.3 Necessary and suﬃcient conditions on the for
ward swap agreements structure for guaranteed
noarbitrage
In this section we derive the necessary and suﬃcient conditions for a set of forward rates
to specify unique arbitragefree discount bond prices. The program to achieve that goal is
as follows. First, we value the forward swap agreements in terms of discount bond prices.
Second, the conditions on the forward swap agreements are translated into conditions on
the discount bond prices.
A forward swap agreement is valued by valuation of its ﬂoating and ﬁxed payments
in turn. The collections of ﬂoating and ﬁxed payments of a forward swap agreement are
called ﬂoating and ﬁxed legs, respectively. The value π
ﬂt
() of the ﬂoating leg of a forward
swap agreement = (s, e) is
3
π
ﬂt
() = b
s
−b
e
.
3
Here we assume equality of the forecast and discount curves and of the payment and index day count
fractions.
197
7.3. NECESSARY AND SUFFICIENT CONDITIONS FOR NOARBITRAGE 169
tenor
1y 2y 3y 4y 5y 6y
1y
2y
3y
4y
5y
6y
e
x
p
i
r
y
tenor
1y 2y 3y 4y 5y 6y
1y
2y
3y
4y
5y
6y
e
x
p
i
r
y
tenor
1y 2y 3y 4y 5y 6y
1y
2y
3y
4y
5y
6y
e
x
p
i
r
y
tenor
1y 2y 3y 4y 5y 6y
1y
2y
3y
4y
5y
6y
e
x
p
i
r
y
tenor
1y 2y 3y 4y 5y 6y
1y
2y
3y
4y
5y
6y
e
x
p
i
r
y
LIBOR
Swap
CMS(3)
Coinitial
Hybrid coupon
Figure 7.1: The swaptions from the swaption matrix to which various market models are
calibrated.
198
170 CHAPTER 7. GENERIC MARKET MODELS
t
1y 2y 3y 4y 5y 6y
1
2
3
4
5
6
f
o
r
w
a
r
d
r
a
t
e
i
n
d
e
x
t
1y 2y 3y 4y 5y 6y
1
2
3
4
5
6
f
o
r
w
a
r
d
r
a
t
e
i
n
d
e
x
t
1y 2y 3y 4y 5y 6y
1
2
3
4
5
6
f
o
r
w
a
r
d
r
a
t
e
i
n
d
e
x
t
1y 2y 3y 4y 5y 6y
1
2
3
4
5
6
f
o
r
w
a
r
d
r
a
t
e
i
n
d
e
x
t
1y 2y 3y 4y 5y 6y
1
2
3
4
5
6
f
o
r
w
a
r
d
r
a
t
e
i
n
d
e
x
LIBOR
Swap
CMS(3)
Coinitial
Hybrid coupon
7y
7y
7y 7y
7y
Legend
fixing date
fixing and payment date
payment date
Figure 7.2: An overview of the forward swap agreements for various market models.
199
7.3. NECESSARY AND SUFFICIENT CONDITIONS FOR NOARBITRAGE 171
This equation can be seen to hold by considering a portfolio in the discount bonds that will
have the exact same cash ﬂows as the ﬂoating leg, to wit, long a discount bond maturing
at time t
s
and short a bond maturing at time t
e
. At time t
s
, we invest the proceeds of the
long position in the discount bond into the LIBOR deposit. At each LIBOR payment, we
reinvest the notional into the LIBOR deposit. At the end of the ﬂoating leg, the notional
cancels against the short position in the discount bond. It is not hard to see that such
procedure provides the exact same cash ﬂows as a ﬂoating leg.
The value π
fxd
(, f) of a ﬁxed leg with forward rate f can be obtained by simply
discounting back the known future cash ﬂows
4
,
π
fxd
(, f) = f
e−1
i=s
α
i
b
i+1
. ¸¸ .
.
The underbraced expression is also called present value of a basis point (PVBP in short),
and is denoted by p
s:e
.
The conditions on the forward rates are governed by the forward swap agreements to
have zero value, that is, π
ﬂt
() − π
fxd
(, f) = 0. In fact, there exists a unique system of
prices for the discount bonds consistent with the forward rates if and only if the system
of m linear equations in the n unknown variables b
2
, . . . , b
n+1
given by
_
b
s(j)
−b
e(j)
−
e(j)−1
i=s(j)
f
j
α
i
b
i+1
= 0
_
m
j=1
, (7.3)
with b
1
= 1, has a unique solution. The latter is already a precisely speciﬁed and tractable
necessary and suﬃcient condition for existence of unique discount bond prices that are
consistent with the forward rates. This condition can be validated by numerically check
ing invertibility of linear equation (7.3). In the sequel, we will develop conditions and
implications that are more straightforward to verify and that a priori guarantee invert
ibility of (7.3), and we will sketch scenarios in which these implications will hold. It will
be shown that invertibility of (7.3) is guaranteed in typical ﬁnance scenarios, and that
invertibility can be violated only under extreme situations, that are fully irrelevant to a
ﬁnance setting.
If m < n then if a solution exists, it is bound to exhibit nonuniqueness. If m > n,
then the system is in general overdetermined. Only for a very particular choice of forward
rates f
j
, the system could then be degenerate, thereby still allowing for a unique solution.
Given arbitrarily speciﬁed forward rates however, the degeneracy will occur, if at all, only
occasionally. Generally speciﬁed forward rates span a nondegenerate set of equations,
4
We assume, for notational simplicity only, that the ﬁxed payment frequency equals the ﬂoating
payment frequency.
200
172 CHAPTER 7. GENERIC MARKET MODELS
thereby implying that, when m > n, in most cases the model does not have unique
discount bond prices. In other words, two diﬀerent subsets of n forward rates determine,
via (7.3), two sets of discount bond prices that are diﬀerent and thus inconsistent with
each other. The model should have the property that there exist unique discount bond
prices regardless of how the forward rates are speciﬁed. The possibility of degeneracy is
excluded by the following assumption on the values that the forward rates can attain.
Assumption 1 A forward rate f can only attain any nonnegative value, that is, we
must have
f ≥ 0. (7.4)
Assumption 1 will be satisﬁed almost always in any interest rate market. Only in very
rare occasions have negative interest rates been observed. An example of negative in
terest rates in Japan at the start of November 1998 is given in Ostrom (1998). These
interest rates reached 3 to 6 basis points (bp) (.03% to .06%). Moreover, the popular
displaced diﬀusion smile model of Rubinstein (1983) can generate negative forward rates
with positive probability, if the displacement parameter is negative. However, violation
of Assumption 1 does not necessarily imply that the system of forward rates admits ar
bitrage of the weak form. In fact, we make plausible that slightly negative interest rates
still allow for unique discount bond prices that are arbitragefree in the weak sense, by
considering a simple numerical example. We consider a single forward rate, two tenor
times ¦t
1
= 0, t
2
¦ market model. The price of the discount bond for maturity at time t
2
is given by 1/(1 + αf). The rate f should thus satisfy f > −1/α, to ensure a positive
and ﬁnite price for the discount bond. For annual payments, for which α ≈ 1, we have
−1/α ≈ −100%. In fact, for more frequent payments than annual, the arbitragedefying
rate is even more negative than −100%. These considerations lead us to conclude that ar
bitrage of the weak form in a forward swap agreement market can occur only in situations
that are considered ﬁnancially extreme. Essential to noarbitrage is thus the structure of
the forward swap agreements.
7.3.1 Main result
The main result can now be formulated. The theorem below states that, for dynamic
market models, (i) if a tenor structure has n ﬁxing times t
1
, . . . , t
n
, then we require n
forward swap agreements, and (ii) for each ﬁxing time t
i
, there is exactly one forward
swap agreement that starts at that ﬁxing time t
i
, i = 1, . . . , n. We note that the coinitial
model does not ﬁt the requirements below, though it is a perfectly sensible arbitragefree
model. The reason that the coinitial model is not incorporated is the requirement that
a model be dynamic, see the discussion in Section 7.2.1.
201
7.3. NECESSARY AND SUFFICIENT CONDITIONS FOR NOARBITRAGE 173
Algorithm 4 Back substitution.
Input: n, U ((n + 1) (n + 1) unit uppertriangular), c ∈ R
n+1
.
Output:
ˆ
b = U
−1
c ∈ R
n+1
.
1: Set
ˆ
b
n+1
⇐c
n+1
.
2: for i = n, . . . , 1 do
3:
ˆ
b
i
⇐c
i
−
n+1
j=i+1
u
ij
ˆ
b
j
.
4: end for
Theorem 8 Let ¦t
1
, . . . , t
n+1
¦ be a set of tenor times. Let c = ¦
j
¦
m
j=1
and f
j
be a set of
forward swap agreements and forward rates, respectively, associated with the tenor times.
Then, at each of the times t
1
, . . . , t
n
, for all forward rates ¦f
j
¦
m
j=1
satisfying Assumption
1, there exists a unique weakform arbitragefree solution to the system of linear equations
(7.3) in the discount bond prices, if and only if m = n and there exists an ordering of the
n forward swap agreements
j
= (s(j), e(j)), j = 1, . . . , m such that s(j) = j.
PROOF: The proof is split into two parts. First, we prove that the described structure
of forward rates leads to arbitragefree invertibility of system (7.3) for all forward rates
satisfying Assumption 1. Second, the reverse implication is proven.
Suppose that the structure c of forward swap agreements is such that m = n and
that there exists an ordering of the n forward swap agreements
j
= (s(j), e(j)), j =
1, . . . , m such that s(j) = j. The existence of unique arbitragefree discount bond prices
is guaranteed if we show there exists unique discount bond prices that are all positive.
To that order, consider system (7.3) in terms of the deﬂated discount bond prices,
ˆ
b
i
≡
b
i
/b
n+1
, and substitute s(j) = j,
_
ˆ
b
j
−
ˆ
b
e(j)
−
e(j)−1
i=j
f
j
α
i
ˆ
b
i+1
= 0
_
n
j=1
, ¦
ˆ
b
n+1
= 1¦. (7.5)
We note that the (n + 1) (n + 1) matrix U = U(f ) associated with this system is
unit uppertriangular, which means that the diagonal contains ones and that the lower
triangular part of the matrix contains zeros. It follows that this matrix is invertible. We
thus have
U(f )
ˆ
b = c,
ˆ
b = U(f )
−1
c, c = (0 0 1)
T
∈ R
n+1
.
An eﬃcient method for calculating the inverse of a unit uppertriangular matrix is back
substitution, see for example Golub & van Loan (1996, Algorithm 3.1.2). Back substitution
will aid in the proof, therefore it has been displayed in Algorithm 4. We show by induction
for i = n + 1, n, . . . , 1 that
ˆ
b
i
≥ 1. For i = n + 1,
ˆ
b
i
=
ˆ
b
n+1
= 1, by line 1 of Algorithm 4,
which states that
ˆ
b
n+1
= c
n+1
= 1. Suppose, then, that
ˆ
b
j
≥ 1 for j = i + 1, . . . , n + 1.
202
174 CHAPTER 7. GENERIC MARKET MODELS
We have, by line 3 of Algorithm 4, that
ˆ
b
i
= c
i
−
n+1
j=i+1
u
ij
ˆ
b
j
= −
n+1
j=i+1
u
ij
ˆ
b
j
. We note
that, for j > i, u
ij
is either −α
j
f
i
, −1 −α
j
f
i
, or 0. It follows that
ˆ
b
i
= f
i
e(i)−1
j=i
α
j
ˆ
b
j+1
. ¸¸ .
≥0
+
ˆ
b
e(i)
.¸¸.
≥1
≥ 1,
which concludes the induction proof. The unique solution for the undeﬂated discount
bond prices at tenor point t
1
is then given by b
i
≡
ˆ
b
i
/
ˆ
b
1
, which is deﬁned and positive
since
ˆ
b = (
ˆ
b
1
, . . . ,
ˆ
b
n+1
) ≥ 1.
We note that the above proof is independent of the number of tenor times. Therefore
the forward swap agreements structure n = m and ¦s(j) = j¦ guarantees existence of
unique arbitragefree discount bond prices for all forward rates satisfying Assumption 1
at all tenor times t
1
, . . . , t
n
, which was to be shown.
The reverse implication is proven by induction on n. For n = 1, the result is immediate.
Now, assume the result is true for i = 1 to n −1. We want to prove it is true for n. The
model viewed from t
2
has n tenor points, so by the induction hypothesis we must have
that: (i) m ≥ n−1, (ii) there are exactly n−1 forward swap agreements that start at t
2
or
later, (iii) for these n −1 forward swap agreements, there is an enumeration j = 2, . . . , n,
such that s(j) = j. There are three possibilities: m = n −1, m > n or m = n. We show
that the cases m = n −1 and m > n lead to nonuniqueness or noninvertibility of (7.3)
for some of the forward rates f that satisfy Assumption 1.
If m = n −1, there are less equations than unknown variables in (7.3), and it follows
that, if there is a solution at all, it will be nonunique.
If m > n, then we may form a submodel with n forward swap agreements such that
s(j) = j for j = 1, . . . , n. We have already proven that such a structure with n forward
rates leads to unique positive discount bond prices. For a left out forward swap agreement,
say = (s, e), the associated forward rate f should then satisfy
f =
b
s
−b
e
e−1
i=s
α
i
b
i+1
. (7.6)
We conclude then that there are forward rates satisfying Assumption 1 for which there
do not exist discount bond prices.
Thus we must have m = n and for remaining forward swap agreement 1 we have
s(1) = 1 from which the result follows. 2
As a corollary, we can count the dynamic market model structures given the number
of tenor times n + 1. For forward rate 1, we can chose from n end times t
2
, . . . , t
n+1
, for
forward rate 2, from n −1 end times t
3
, . . . , t
n+1
, etcetera.
203
7.4. GENERIC EXPRESSIONS FOR NOARBITRAGE DRIFT TERMS 175
Corollary 2 (Counting dynamic market model structures) Consider market models with
n+1 tenor times. Then there are n! ways of selecting forward swap agreements such that,
for all forward rates satisfying Assumption 1, and at all tenor times t
1
, . . . , t
n
, there exist
unique weakform arbitragefree discount bond prices satisfying (7.3).
We note that Theorem 8 rules out the applicability of generic market models to
Bermudancallable spread options, in the sense that we cannot deﬁne two rates, ﬁxing at
the same time, as state variables.
7.4 Generic expressions for noarbitrage drift terms
In this section, generic expressions are derived for the arbitragefree drift terms of generic
market models, that are so characteristic for the LIBOR and swap market models. We
assume given a dynamic market model, therefore the forward swap agreements are of the
form
i
= (i, e(i)). If dependency of the end index is clear we simply write e(i) as e. The
forward rate f
i:e
has start date t
i
and end date t
e
. Forward rate f
i:e
is modelled under
its forward measure, which is associated with the p p
i:e
as numeraire. Forward rate f
i:e
is
modelled as
df
i:e
(t)
f
i:e
(t)
= σ
i:e
(t) dw
(i:e)
(t), (7.7)
with σ
i:e
denoting a ddimensional volatility vector, and with w
(i:e)
a ddimensional Brow
nian motion under the forward measure Q
i:e
associated with p
i:e
as numeraire. The
positive integer d is deemed the number of factors of the model. The volatility vector
σ
i:e
(t) = σ
i:e
(t, ω) can be state dependent to allow for smile modelling.
For pricing of nonstandard interest rate derivatives, it is necessary to jointly imple
ment the above scheme (7.7) for all forward rates simultaneously. Therefore we must work
out the SDE for the forward rates under a single pricing measure. We can work either
with the terminal or spot measure. Each is treated below consecutively.
7.4.1 Terminal measure
In this subsection, we work with the terminal measure Q
n+1
, that is the measure associated
with the terminal discount bond b
n+1
as numeraire.
Without loss of generality, the presentation is given as if all forward rates have not yet
expired. We work with the numerairedeﬂated discount bond prices. The quantity ˆ p
i:e
denotes the deﬂated p, ˆ p
i:e
≡ p
i:e
/b
n+1
. The deﬂated ps can be calculated, in turn, when
the deﬂated discount bond prices
ˆ
b
i
≡ b
i
/b
n+1
are known. The deﬂated discount bond
prices are given by (7.5). Recall that (7.5) can be written in matrix form as U
ˆ
b = c, with
204
176 CHAPTER 7. GENERIC MARKET MODELS
c = (0 0 1)
T
, and U = U(f ) an (n + 1) (n + 1) unit uppertriangular matrix, given
by
u
ij
=
_
¸
¸
¸
_
¸
¸
¸
_
0 if i > j or (i < j and j > e(i)),
1 if i = j,
−α
j−1
f
i:e(i)
if i < j and j < e(i),
−α
j−1
f
i:e(i)
−1 if i < j and j = e(i).
Thus
ˆ
b = U(f )
−1
c. We may write ˆ p as a function of the forward rates, ˆ p = ˆ p(f ). In fact,
ˆ p = A
ˆ
b, A ≡
_
_
_
_
_
0 (α
1
α
e(1)−1
0 0)
0 0 (α
2
α
e(2)−1
0 0)
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 (α
n
)
_
_
_
_
_
,
for the n (n + 1) matrix A. Thus, ˆ p = AU(f )
−1
c. Subsequently, we deﬁne the Radon
Nikod´ ym density
z
i:e,n+1
(t) ≡
p
i:e
(t)/b
n+1
(t)
p
i:e
(0)/b
n+1
(0)
=
ˆ p
i:e
(t)
ˆ p
i:e
(0)
. (7.8)
We note that z
i:e,n+1
(t) is a martingale under the terminal measure Q
n+1
. This implies
that
dz
i:e,n+1
(t)
z
i:e,n+1
(t)
=
dˆ p
i:e
(t)
ˆ p
i:e
(t)
= θ
i:e,n+1
(t) w
(n+1)
(t), (7.9)
with the ddimensional vector θ given by
θ
i:e,n+1
(t) =
1
ˆ p
i:e
(t)
n
k=i+1
∂ˆ p
i:e
∂f
k:e(k)
(t)f
k:e(k)
(t)σ
k:e(k)
(t). (7.10)
The summation is required only from i + 1 to n since ˆ p
i:e
is dependent on f
k:e(k)
only
for k > i. Finally we apply Girsanov’s theorem to obtain the required expression for
dw
(i:e)
(t) −dw
(n+1)
(t),
dw
(i:e)
(t) −dw
(n+1)
(t) = −θ
i:e,n+1
(t)dt. (7.11)
Thus,
df
i:e
(t)
f
i:e
(t)
= −
1
ˆ p
i:e
(t)
n
k=i+1
∂ˆ p
i:e
∂f
k:e(k)
(t)f
k:e(k)
(t)[σ
k:e(k)
(t)[[σ
i:e
(t)[ρ
k:e(k),i:e
(t)dt
+σ
i:e
(t) dw
(n+1)
(t),
where the scalar ρ
k:e(k),i:e
has been deﬁned as
ρ
k:e(k),i:e
(t) =
σ
k:e(k)
(t) σ
i:e
(t)
[σ
k:e(k)
(t)[[σ
i:e
(t)[
,
205
7.4. GENERIC EXPRESSIONS FOR NOARBITRAGE DRIFT TERMS 177
and has the interpretation of instantaneous correlation.
An expression is given for ∂ˆ p/∂f
k:e(k)
. We note that ∂U/∂f
k:e(k)
is a matrix that is
zero bar a single row, the k
th
row, and that the derivative is independent of f, since all
f terms occur linearly in the matrix U. The k
th
row is ﬁlled, from entry (k, k + 1), with
the row vector (−α
k
−α
e(k)−1
0 0). We have that
∂ˆ p
∂f
k:e(k)
= −AU
−1
∂U
∂f
k:e(k)
U
−1
c = −AU
−1
∂U
∂f
k:e(k)
ˆ
b = AU
−1
c
k
ˆ p
k:e(k)
, (7.12)
where c
k
∈ R
n+1
denotes the standard basis vector with unit k
th
coordinate, and zero
coordinates otherwise. We deﬁne
˜
b
(k)
i
by
˜
b
(k)
i
= (U
−1
c
k
)
i
, i = 1, . . . , n, k = 1, . . . , n. (7.13)
Substituting (7.13) into (7.12) yields
∂ˆ p
i:e
∂f
k:e(k)
= 1
¡k≥i+1¦
ˆ p
k:e(k)
_
min(e(i)−1,k−1)
j=i
α
j
˜
b
(k)
j+1
_
. (7.14)
Deﬁne µ(i, k) ≡ min
_
e(i) − 1, k − 1). Substituting (7.14) into (7.12), suppressing the
dependency of time, and using ˆ p
k:e(k)
f
k:e(k)
=
ˆ
b
k
−
ˆ
b
e(k)
, we obtain the generic market
model SDE under the terminal measure:
df
i:e
f
i:e
= −
1
ˆ p
i:e
n
k=i+1
_
ˆ
b
k
−
ˆ
b
e(k)
_
_
µ(i,k)
j=i
α
j
˜
b
(k)
j+1
_
σ
k:e(k)
σ
i:e
dt +σ
i:e
dw
(n+1)
. (7.15)
7.4.2 Spot measure
In this subsection, we work with the spot measure Q
Spot
, that is the measure associated
with the spot LIBOR numeraire, deﬁned as follows. The account starts out with one unit
of currency. Subsequently, this amount is invested in the spot LIBOR account. After the
ﬁrst accrual period, the proceeds are reinvested in the then spot LIBOR account. This
procedure is repeated. For the spot measure it is convenient to deﬁne the spot index i(t),
deﬁned by i(t) = min¦integer i ; t < t
i
¦.
For the spot measure, we work with discount bond prices, deﬂated by the spot discount
bond b
i(t)
. The quantities ¯ p and
¯
b denote the vectors of b
i(t)
deﬂated PVBPs and discount
bond prices, respectively. We have ¯ p = A
¯
b and
¯
b =
1
ˆ
b
i(t)
ˆ
b =
1
(U
−1
c)
i(t)
U
−1
c.
206
178 CHAPTER 7. GENERIC MARKET MODELS
The RadonNikod´ ym density z
i:e,i(t)
(t) is deﬁned similarly to (7.8). A martingale SDE for
the RadonNikod´ ym density holds,
dz
i:e,i(t)
(t)
z
i:e,i(t)
(t)
=
d¯ p
i:e,i(t)
(t)
¯ p
i:e,i(t)
(t)
= θ
i:e,i(t)
(t) dw
(i(t))
,
similar to (7.9), with ddimensional volatility vector equal to
θ
i:e,i(t)
(t) =
1
¯ p
i:e
(t)
n
k=i(t)
∂¯ p
i:e
∂f
k:e(k)
(t)f
k:e(k)
(t)σ
k:e(k)
(t). (7.16)
If we compare (7.16) to (7.10), we ﬁnd that, for the spot measure, we sum over all available
forward rates from i(t) to n, since ¯ p
i:e
might depend on all those forward rates. Recall
that, for the terminal measure, we need only sum from i + 1 to n.
Similar to (7.11), we have dw
(i:e)
−dw
(i(t))
= −θ
i:e,i(t)
dt. Thus we obtain the equivalent
of (7.12),
df
i:e
(t)
f
i:e
(t)
= −
1
¯ p
i:e
(t)
n
k=i(t)
∂¯ p
i:e
∂f
k:e(k)
(t)f
k:e(k)
(t)[σ
k:e(k)
(t)[[σ
i:e
(t)[ρ
k:e(k),i:e
(t)dt
+σ
i:e
(t) dw
(i(t))
(t). (7.17)
An expression for ∂¯ p/∂f
k:e(k)
is given by
∂¯ p
∂f
k:e(k)
=
1
ˆ
b
i(t)
∂ˆ p
∂f
k:e(k)
+
1
ˆ
b
i(t)
_
U
−1
∂U
∂f
k:e(k)
U
−1
c
_
i(t)
. ¸¸ .
=ˆ p
k:e(k)
˜
b
(k)
i(t)
¯ p. (7.18)
Similar as in (7.12) and (7.14) for the terminal measure, we ﬁnd for the spot measure:
∂¯ p
i:e
∂f
k:e(k)
= 1
¡k≥i+1¦
¯ p
k:e(k)
µ(i,k)
j=i
α
j
˜
b
(k)
j+1
− ¯ p
k:e(k)
¯ p
i:e
˜
b
(k)
i(t)
. (7.19)
Substituting (7.19) into (7.17), suppressing the dependency of time, and using
¯ p
k:e(k)
f
k:e(k)
=
¯
b
k
−
¯
b
e(k)
,
we obtain the generic market model SDE under the spot measure:
df
i:e
f
i:e
= −
1
¯ p
i:e
n
k=i(t)
_
¯
b
k
−
¯
b
e(k)
_
_
1
¡k≥i+1¦
µ(i,k)
j=i
α
j
˜
b
(k)
j+1
− ¯ p
i:e
˜
b
(k)
i(t)
_
σ
k:e(k)
σ
i:e
dt +σ
i:e
dw
(i(t))
. (7.20)
207
7.5. COMPLEXITY 179
7.4.3 An example: The LIBOR market model
For illustration, in this section the LIBOR drift terms are calculated starting from the
generic market model framework. We stress here that the explicit calculation in this
section of the generic expressions of the previous section is not required for implementation
of the generic market model framework, but is merely performed for illustration only.
First, we derive the LIBOR SDE for the terminal measure, by applying (7.15). In the
LIBOR market model, a forward rate f
k:e(k)
is denoted by f
k
. We note that:
(i) ˆ p
i:e(i)
= ˆ p
i:i+1
= α
i
ˆ
b
i+1
,
(ii) µ(i, k) = min(e(i) −1, k −1) = min(i, k −1) = i, for k = i + 1, . . . , n,
(iii)
˜
b
(k)
j
=
ˆ
b
j
ˆ
b
k
1
¡j≤k¦
=
¯
b
j
¯
b
k
1
¡j≤k¦
,
(iv)
ˆ
b
k
−
ˆ
b
k+1
ˆ
b
k
=
¯
b
k
−
¯
b
k+1
¯
b
k
= 1 −
1
1+α
k
f
k
=
α
k
f
k
1+α
k
f
k
,
(v)
µ(i,k)
j=i
α
j
˜
b
(k)
j+1
=
ˆ p
i:e(i)
ˆ
b
k
=
¯ p
i:e(i)
¯
b
k
.
Substituting (i)–(v) into (7.15), we obtain,
df
i
f
i
= −
n
k=i+1
α
k
f
k
1 + α
k
f
k
σ
k
σ
i
dt +σ
i
dw
(n+1)
,
which is the familiar expression for the SDE of the LIBOR market model under the
terminal measure.
Second, we derive the LIBOR SDE for the spot measure. If we substitute (i)–(v) into
(7.20), we see that for k ≥ i + 1,
i
j=i
α
j
˜
b
(k)
j+1
cancels against ¯ p
i:i+1
˜
b
(k)
i(t)
, and for k ≤ i, we
are left with −¯ p
i:i+1
˜
b
(k)
i(t)
, therefore:
df
i
f
i
=
i
k=i(t)
α
k
f
k
1 + α
k
f
k
σ
k
σ
i
dt +σ
i
dw
(i(t))
,
which is the familiar expression for the SDE of the LIBOR market model under the spot
measure.
7.5 Complexity
We study the complexity of the drift calculation over a single time step in a numerical
implementation. For generic market models, we show that the complexity is, at worse,
of order O(n
3
). For speciﬁc market models, such as the LIBOR, swap, and CMS market
208
180 CHAPTER 7. GENERIC MARKET MODELS
Algorithm 5 An O(nd)algorithm for calculating the forward LIBOR rates for a time
step in the LIBOR market model. The number of factors is denoted by d. The log forward
rates, log f (t) = (log f
i(t)
(t), . . . , log f
n
(t)) at time t, and log f (t + ∆t) at time t + ∆t, are
denoted by φ
(1)
and φ
(2)
, respectively. Here Σ = (σ
ij
) governs the volatility, with σ
ij
the
timet volatility of forward rate f
i
with respect to factor j in the model. ∆w should be
sampled from a ^(0,
√
∆tI
d
) distribution.
Input: n; d (1 ≤ d ≤ n); φ
(1)
, α ∈ R
n
; ∆w ∈ R
d
; Σ ∈ R
nd
; ∆t.
Output: φ
(2)
∈ R
n
.
1: Set γ ⇐0 with γ ∈ R
d
.
2: for i = n, . . . , i(t) do
3: φ
(2)
i
⇐φ
(1)
i
.
4: for j = 1, . . . , d do
5: φ
(2)
i
⇐φ
(2)
i
+
_
γ
j
−
1
2
σ
ij
_
σ
ij
∆t + σ
ij
∆w
j
.
6: γ
j
⇐γ
j
−
α
i
exp
_
φ
(1)
i
_
1+α
i
exp
_
φ
(1)
i
_
σ
ij
.
7: end for
8: end for
models, we show that a more eﬃcient implementation is available that renders the order
to O(nd). For CMS market models, this more eﬃcient implementation is approximate.
For generic market models, the results are derived for the terminal measure, but
can equally well be derived for the spot measure. Recall (7.12) that occurs in the drift
calculation,
∂ˆ p
∂f
k:e(k)
= −AU
−1
∂U
∂f
k:e(k)
U
−1
c.
The inverse of U can be calculated in O(n
3
) operations. Subsequently, the 4 consecu
tive matrix multiplications with a vector require O(n
2
) operations, for each forward rate
k, thus in total O(n
3
) operations. Therefore a generic market model has at worse a
complexity of O(n
3
).
The LIBOR market model has a special structure that renders the complexity to
O(nd), which has been shown by Joshi (2003b). In Algorithm 5 such an O(nd) algorithm
has been displayed that calculates the forward LIBOR rates for a time step under the
terminal measure. An algorithm for the spot measure can be deﬁned analogously, by
summing from 1 to n and by incrementing γ
j
(rather than decrementing) before updating
φ
(2)
i
.
We show that a similar approximate algorithm can be deﬁned for CMS(q) market
models, for the terminal measure. The algorithm is shown to be exact for the swap market
209
7.5. COMPLEXITY 181
model (q = n). The following quantity that occurs in the drift term is approximated:
˜ p
(k)
i:µ(i,k)+1
= ˜ p
(k)
i:min(k,i+q)
:=
min(k,i+q)−1
j=i
α
j
˜
b
(k)
j+1
(i < k). (7.21)
The approximation is based on the assumption that α
i
is close to α
i+q
, for i = 1, . . . , n−q.
We note that this assumption is used only to eﬃciently approximate (7.21) for calculation
of drift terms, and this assumption is not used in the calculation of contract payoﬀs.
Moreover, if needs be, the drift terms can be calculated exactly by exact calculation of
(7.21).
Approximation 1 Approximately, by assumption of α
i
≈ α
i+q
(i = 1, . . . , n − q), we
have, for ˜ p
(k)
i:µ(i,k)+1
deﬁned in (7.21),
˜ p
(k)
i:µ(i,k)+1
≈ α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
(i < k). (7.22)
Here, an empty product denotes 1. Formula (7.22) is exact for i > k−q−1. In particular,
(7.22) is exact for any i in the swap market model (q = n).
The rationale for Approximation 1, as well as the proof of exactness when i > k−q−1, are
given in Appendix 7.A. We note that accumulating errors in (7.22) are likely to cancel,
since in practice the diﬀerence α
i
− α
i+q
is both negative and positive. From (7.15) and
Approximation 1, we obtain,
df
i:e
f
i:e
≈ −
1
ˆ p
i:e
n
k=i+1
_
ˆ
b
k
−
ˆ
b
e(k)
_
α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
σ
k:e(k)
σ
i:e
dt
+σ
i:e
dw
(n+1)
. (7.23)
Deﬁne
v
i
=
n
k=i+1
_
ˆ
b
k
−
ˆ
b
e(k)
_
α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
σ
k:e(k)
. (7.24)
The proof of the following lemma has been deferred to Appendix 7.B.
Lemma 3 The quantity v
i
deﬁned in (7.24) satisﬁes the following recursive formulas:
• v
n
= 0,
• v
i
=
_
1 + α
i
f
i+1:e(i+1)
_
v
i+1
+ α
i
_
ˆ
b
i+1
−
ˆ
b
e(i+1)
_
σ
i+1:e(i+1)
.
210
182 CHAPTER 7. GENERIC MARKET MODELS
Algorithm 6 An O(nd)algorithm for approximately calculating the forward swap rates
for a time step in the CMS(q) market model (exact when q = n), under the terminal
measure. The number of factors is denoted by d. The log forward rates, log f (t) =
(log f
i(t):e(i(t))
(t), . . . , log f
n:e(n)
(t)) at time t, and log f (t +∆t) at time t +∆t, are denoted
by φ
(1)
and φ
(2)
, respectively. Here Σ = (σ
ij
) governs the volatility, with σ
ij
the timet
volatility of forward rate f
i:e(i)
with respect to factor j. Here, e() is deﬁned in (7.2). ∆w
should be sampled from a ^(0,
√
∆tI
d
) distribution.
Input: n; d, q (1 ≤ d, q ≤ n); φ
(1)
, α ∈ R
n
; ∆w ∈ R
d
; Σ ∈ R
nd
; ∆t.
Output: φ
(2)
∈ R
n
.
1: β
n+1
⇐1.
n+1
⇐0.
2: for i = n, . . . , i(t) do
3:
i
⇐
i+1
+ α
i
β
i+1
−1
¡i<n & e(i)=e(i+1)−1¦
α
e(i+1)−1
β
e(i+1)
.
4: f
(1)
i
⇐exp(φ
(1)
i
).
5: β
i
⇐
i
f
(1)
i
+ β
e(i)
.
6: If i = n, set v
n
⇐0 ∈ R
d
, else (i < n), set
v
i
⇐
_
1 + α
i
f
(1)
i+1
_
v
i+1
+ α
i
_
β
i+1
−β
e(i+1)
_
σ
i+1
.
7: φ
(2)
i
⇐φ
(1)
i
+
_
−
1
i
v
i
−
1
2
σ
i
_
σ
i
∆t +σ
i
∆w.
8: end for
211
7.5. COMPLEXITY 183
3
2
1
0
1
2
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
q
6%
4%
2%
0%
2%
4%
6%
Error in bp of option premium (left axis)
Error in % of standard error (right axis)
Figure 7.3: Results of the test of exact versus approximate drift terms in CMS(q) models.
212
184 CHAPTER 7. GENERIC MARKET MODELS
Table 7.2: Deal description for the test of exact versus approximate drift terms in CMS(q)
models.
Currency: USD
Market data: Swap rates and atthemoney swaption volatility
Valuation date: 18 July 2003
Deal: 30 year ﬁxedmaturity Bermudan swaption
Start date: 16 June 2004
Frequency: Annual
Day count: ACT/365
Date roll: Modiﬁed following
Fixed coupon: 3.2%
In Algorithm 6 an O(nd) algorithm, based on Lemma 3, is displayed that approximately
calculates the forward swap rates for a time step under the terminal measure, for the
CMS(q) market model. This algorithm is exact for the swap market model (q = n).
Algorithm 6 also calculates timet values for discount bond prices (denoted by β) and for
PVBPs (p
i:e(i)
is denoted by
i
).
To benchmark the accuracy of Algorithm 6, various ﬁxedmaturity Bermudan swap
tions are priced in their corresponding CMS(q) market models, with both exact SDE
(7.15) and approximate SDE (7.23). The deal speciﬁcation is given in Table 7.2. The
swap tenor is q years, with 31 − q exercise opportunities, at (16 June 2004 + i years),
i = 0, . . . , 30 − q, for q = 1, . . . , 30. The diﬀerence between the minimum (0.996) and
maximum (1.007) attained day count fractions is 0.011. To price ﬁxedmaturity Bermu
dan swaptions in Monte Carlo, we use the algorithm of Longstaﬀ & Schwartz (2001), with
the swap value as explanatory variable x, and basis functions 1, x and x
2
. An 8 factor
model is used (d = 8), with the correlation of the forward CMS(q) rates given by the
parametrization of Rebonato (1998, Equation (4.5), page 83), exp(−β[t
i
− t
j
[), for rates
f
i:e(i)
and f
j:e(j)
, with β = 3%. The diﬀerences between the prices obtained with exact
and approximate drift terms are displayed in Figure 7.3. We note that for q = n, equal
prices are obtained up to all digits. The results show that the error is small, up to only 3
bp of the option premium, and up to only 6% of the simulation standard error. Moreover,
the error ﬂuctuates robustly around 0, since the diﬀerence α
i
−α
i+q
is both negative and
positive, in practice.
A signiﬁcant reduction of computational time can thus be attained by selecting a low
number of factors d. A result of using a low number of factors is that the instantaneous
correlation matrix (ρ
ij
) cannot be exactly ﬁt to the historically estimated or market
213
7.6. GENERIC CALIBRATION TO CORRELATION 185
implied correlation matrix. The procedure for ﬁtting a generic market model to correlation
is exactly the same as for the LIBOR market model. For ﬁtting a lowfactor LIBOR
market model to correlation, the reader is referred to Pietersz & Groenen (2004a, b) (see
Chapter 3), Grubiˇsi´c & Pietersz (2005) (see Chapter 4), Wu (2003) and Rebonato (2002,
Section 9) or Brigo (2002).
7.6 Generic calibration to correlation
When each particular interest rate derivative has its own generic market model that is
used for its valuation and risk management, then the associated input correlation to those
models involves diﬀerent interest rates. There is a relationship between these correlations,
and this relationship allows for netting correlation risk or reserves. Moreover, utilizing the
relationship between the correlations means that correlation is determined consistently
across diﬀerent products. In general all interest rate correlations stem from the correla
tions between diﬀerent segments of the yield curve. In this section we show how forward
LIBOR correlations can be used to determine subsequently the correlations for any of the
generic market models speciﬁc to certain interest rate products.
The advantage of considering all correlations in this way comes from the fact that
one can treat correlation risk (or reserves) in a consistent fashion across all interest rate
products. Netting of correlation reserves will subsequently occur naturally. Furthermore
only instantaneous forward LIBOR correlations have to be determined and administered.
The key to the method is the wellknown fact that, within the LIBOR market model,
the instantaneous volatility vector σ
s:e
(t) of a forward swap rate f
s:e
can be expressed as
weighted averages of instantaneous volatility vectors σ
i
(t) of forward LIBORs,
σ
s:e
(t) =
e−1
i=s
w
s:e
i
(t)σ
i
(t).
An expression for the weights w
s:e
i
may be found, for example, in Hull & White (2000, page
53). The weights w
s:e
i
are state dependent. A highly accurate deterministic approximation
˜ σ
s:e
(t) for the instantaneous volatility can however be obtained by evaluating the weights
at time zero,
˜ σ
s:e
(t) =
e−1
i=s
w
s:e
i
(0)σ
i
(t).
From the preceding considerations it should be clear that the instantaneous forward rate
correlation ρ
s(1):e(1),s(2):e(2)
(t) can be approximately expressed as a function of the instan
214
186 CHAPTER 7. GENERIC MARKET MODELS
taneous forward LIBOR correlations ρ
ij
(t),
ρ
s(1):e(1),s(2):e(2)
(t) = ρ
_
df
s(1):e(1)
(t)
f
s(1):e(1)
(t)
,
df
s(2):e(2)
(t)
f
s(2):e(2)
(t)
_
=
σ
T
s(1):e(1)
(t)σ
s(2):e(2)
(t)
_
σ
T
s(1):e(1)
(t)σ
s(1):e(1)
(t)σ
T
s(2):e(2)
(t)σ
s(2):e(2)
(t)
,
where
σ
T
i:j
(t)σ
k:l
(t) ≈ ˜ σ
T
i:j
(t)˜ σ
k:l
(t)
=
j−1
m
1
=i
l−1
m
2
=k
w
i:j
m
1
(0)w
k:l
m
2
(0)[σ
m
1
(t)[[σ
m
2
(t)[ρ
m
1
m
2
(t).
7.7 Conclusions
In this chapter, a generalization of market models has been studied, whereby arbitrary
forward rates are allowed to span the tenor structure relevant to an interest rate derivative.
The beneﬁt of such generalization is that straightforward volatilitycalibration can be
achieved for the ﬁxings of LIBOR or swap rates relevant to the interest rate derivative.
Generic market models are therefore particularly apt for pricing and risk management of
CMS and hybrid coupon swaps, and callable and cancellable versions thereof, in particular,
Bermudan CMS swaptions and ﬁxedmaturity Bermudan swaptions. We showed that the
LIBOR and swap market models are special cases of the generic market model framework.
The need for a generic speciﬁcation of market models has been illustrated by counting
the admissible market model structures with n + 1 tenor times. We found n! possible
market models. For example, already only for an annualpaying deal of 10 years, there
are 10!=3,628,800 market models, thereby establishing the need for a generic speciﬁcation.
Necessary and suﬃcient conditions were derived for a set of forward swap agreements to
provide a unique solution for discount bond prices, essentially regardless of the scenario
of attained forward rates. The major novelty of this chapter is the derivation of generic
expressions for noarbitrage drift terms in generic market models. We developed a novel
algorithm of order O(nd) for approximate drift calculations in CMS market models under
the terminal measure.
7.A Appendix: Rationale for Approximation 1
We proceed by induction on i = k −1, . . . , i(t).
• For i = k −1: ˜ p
(k)
i:µ(i,k)+1
= ˜ p
(k)
k−1:min(k,k−1+q)
= α
k−1
˜
b
(k)
k
= α
k−1
.
215
7.B. APPENDIX: PROOF OF LEMMA 3 187
• For i = k −2, . . . , k −q, we have min(k, i + q) = k. The quantity
˜
b
(k)
i+1
satisﬁes:
˜
b
(k)
i+1
= f
i+1:e(i+1)
˜ p
(k)
i+1:k
. (7.25)
To see this, note that from line 3 of Algorithm 4, we have:
˜
b
(k)
i+1
= f
i+1:e(i+1)
˜ p
(k)
i+1:i+q+1
+
˜
b
(k)
i+q+1
. (7.26)
From the deﬁnition
˜
b
(k)
j
= (U
−1
c
k
)
j
in (7.13), we deduce that
˜
b
(k)
j
= 0 for j > k,
from which (7.25) follows. We obtain:
˜ p
(k)
i:k
= ˜ p
(k)
i:k
= ˜ p
(k)
i+1:k
+ α
i
˜
b
(k)
i+1
= ˜ p
(k)
i+1:k
_
1 + α
i
f
i+1:e(i+1)
_
(∗)
= α
k−1
k−2
m=i+1
_
1 + α
m
f
m+1:e(m+1)
__
1 + α
i
f
i+1:e(i+1)
_
= α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
,
where equality (∗) follows from the induction hypothesis.
• For i = k −q −1, . . . , i(t), we have min(k, i + q) = i + q. From (7.26), we deduce:
˜ p
(k)
i:µ(i,k)+1
= ˜ p
(k)
i:i+q
= α
i
˜
b
(k)
i+1
−α
i+q
˜
b
(k)
i+q+1
+ ˜ p
(k)
i+1:i+q+1
= α
i
_
f
i+1:e(i+1)
˜ p
(k)
i+1:i+q+1
+
˜
b
(k)
i+q+1
_
−α
i+q
˜
b
(k)
i+q+1
+ ˜ p
(k)
i+1:i+q+1
(∗)
≈ ˜ p
(k)
i+1:i+q+1
_
1 + α
i
f
i+1:e(i+1)
_
= α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
,
where in approximation (∗), we have used α
i
≈ α
i+q
. 2
7.B Appendix: Proof of Lemma 3
For i < n,
v
i
=
n
k=i+1
_
ˆ
b
k
−
ˆ
b
e(k)
_
α
k−1
k−2
m=i
_
1 + α
m
f
m+1:e(m+1)
_
σ
k:e(k)
=
_
ˆ
b
i+1
−
ˆ
b
e(i+1)
_
α
i
σ
i+1:e(i+1)
+
_
1 + α
i
f
i+1:e(i+1)
_
_
n
k=i+2
_
ˆ
b
k
−
ˆ
b
e(k)
_
α
k−1
k−2
m=i+1
_
1 + α
m
f
m+1:e(m+1)
_
σ
k:e(k)
_
=
_
1 + α
i
f
i+1:e(i+1)
_
v
i+1
+ α
i
_
ˆ
b
i+1
−
ˆ
b
e(i+1)
_
σ
i+1:e(i+1)
,
216
188 CHAPTER 7. GENERIC MARKET MODELS
which was to be shown. 2
217
Chapter 8
Conclusions
In this thesis, innovations on eﬃcient pricing and riskmanagement of Bermudanstyle
interest rate derivatives are presented. The main pricing model for these derivatives is
the LIBOR market model (see Brace et al. (1997), Jamshidian (1997) and Miltersen et
al. (1997)). It allows for eﬃcient calibration to volatility and correlation.
The most outstanding result of the thesis is the development of new market models,
named CMS and generic market models (Chapter 7). We specify precisely when an
arbitrary structure of forward rates is arbitragefree at all possible (future) states of
the model. Via matrix notation, we are able to transform CMS and generic pricing
measures to spot and terminal measures, which enables a Monte Carlo implementation
of the models. Moreover, we present an eﬃcient algorithm to accurately approximate
forward CMS rates over time steps during simulation in CMS market models. CMS and
generic market models allow for eﬃcient volatility calibration (i.e., the model parameter
is the market implied volatility) of a whole new class of derivatives, such as ﬁxedmaturity
Bermudan swaptions and Bermudan CMS swaptions.
A breakthrough on pricing callable products (e.g., Bermudan swaptions) in market
models can be found in Chapter 6. There, the leastsquares Monte Carlo (MC) algorithm
of Longstaﬀ & Schwartz (2001) is studied for estimating the optimal exercise decision for
American options with MC. A discontinuity occurs in the leastsquares MC algorithm,
whereby ﬁnite diﬀerence estimates of risk sensitivities are ineﬃcient. We propose a mod
iﬁcation of the leastsquares MC algorithm, named constant exercise decision method.
Hedge tests with Bermudan swaptions on USD data show that reduction of variance of
proﬁt and loss (P&L) is much greater and acceptable when the constant exercise decision
method is used.
Chapter 6 contains many more results: First, hedge tests show that jointly delta
and vega hedging outperforms delta hedging only. Second, tests show that correlation
pricing impact on Bermudan swaptions is signiﬁcant. It is shown that correlation can
be accurately captured in both singlefactor models (e.g., the Markovfunctional model of
218
190 CHAPTER 8. CONCLUSIONS
Hunt et al. (2000)) and multifactor models (e.g., market models). Third, hedge tests show
that the following do not signiﬁcantly impact reduction of variance of P&L: correlation,
number of factors, or, use of multifactor models versus singlefactor models. Fourth,
tests show that the pricing impact of volatility smile can be much larger than the pricing
impact of correlation.
As stated above, correlation still remains an important aspect of pricing derivatives,
even though correlation impact on hedging is small and smile is more important (for
the empirical ﬁndings in Chapter 6). Fullfactor market models allow for straightforward
calibration to correlation, in the sense that the model parameter is literally the real
world observed correlation value. For lowfactor market models however, the situation is
diﬀerent. Here, any model attainable correlation matrix has a rank equal to the number of
factors in the model, and this rank restriction usually does not hold for the given realworld
correlation matrix. Therefore, we need to ﬁnd the lowrank correlation matrix closest to
the given matrix. This optimization problem is nonconvex, and therefore hard to solve.
The process of ﬁnding the nearest lowrank correlation matrix is called rank reduction
of the correlation matrix. This thesis includes some of the forefront knowledge on this
topic, via Chapters 3 and 4. Two completely diﬀerent solution algorithms are presented,
based on majorization and geometric programming. In extensive numerical tests, these
two algorithms seem to outperform existing algorithms, in terms of computational speed.
Both methods enjoy global convergence properties. Geometric Newton has a quadratic
rate of convergence, and geometric conjugate gradient has msteps quadratic convergence,
where m is the dimension of the manifold. Moreover, we develop a method to instantly
check whether a stationary point (i.e., a point with negligible gradient) is in fact a global
minimum, which is quite an uncommon feature for a nonconvex programming problem.
A discretization can be thought of as a translation of a continuoustime model to a
numerical algorithm aimed to implement the model. For the LIBOR market model, a new
and socalled Brownian bridge discretization is introduced in Chapter 5. The Brownian
bridge is speciﬁcally designed for single or large time steps. For single time steps, Brownian
bridge is leastsquares optimal over all other discretizations (in a certain sense). This
is conﬁrmed in an extended version of the numerical LIBORinarrears test of Hunter
et al. (2001). Viewed as a multi step scheme, theoretical results show that Brownian
bridge converges weakly with order one. Moreover, we show that a mild assumption on
volatility, named separability, in combination with a single time step scheme, yields much
more eﬃcient pricing on a grid or recombining lattice, instead of Monte Carlo simulation.
An important result is given in Chapter 2. Empirical tests highlight the eﬀect of
various popular calibration choices on the quality of risk sensitivity estimates of Bermudan
swaptions priced with the LIBOR market model. Socalled timehomogeneous choices lead
to poor and unstable estimates of risk, whereas the socalled constant volatility choice leads
to stable and eﬃcient estimates. The results are important to practitioners that need to
219
191
choose a calibration method for market models, with the aim to risk manage Bermudan
swaptions and other interest rate derivatives.
220
221
Nederlandse samenvatting
(Summary in Dutch)
Waarderingsmodellen voor Bermudastijl rente derivaten
Bermudastijl rente derivaten vormen een belangrijke klasse van opties. Veel bancaire en
verzekeringsproducten, zoals hypotheken, vervroegd aﬂosbare obligaties, en levensverzek
eringen, bevatten Bermuda rente opties, die een gevolg zijn van de mogelijkheid tot
vervroegde terugbetaling of stopzetting van het contract. Het veel voorkomen van deze
opties maakt duidelijk dat het belangrijk is, voor banken en verzekeraars, om de waarde
en risico van deze producten op de juiste manier in te schatten. Het juist inschatten
van het risico maakt het mogelijk om markt risico af te dekken met onderliggende en
regelmatig verhandelde waardes en opties. Waarderingsmodellen moeten arbitragevrij
zijn, en dienen consistent te zijn met (gekalibreerd te zijn aan) prijzen van actief ver
handelde onderliggende opties. De dynamica van de modellen moet overeen komen
met de geobserveerde dynamica van de rentetermijnstructuur, zoals bijvoorbeeld cor
relatie tussen rentestanden. Bovendien moeten waarderingsalgoritmes eﬃci¨ent zijn: Fi
nanci¨ele beslissingen gebaseerd op derivaten waarderingsberekeningen worden veeleer bin
nen enkele seconden genomen, dan binnen uren of dagen. In recente jaren is een succesvolle
klasse van modellen naar voren gekomen, genaamd markt modellen. Dit proefschrift, on
der begeleiding van Antoon Pelsser en Ton Vorst, breidt de theorie van markt modellen
uit, door: (i ) een nieuwe, eﬃci¨ente en meer nauwkeurige benaderende waarderingstech
niek te introduceren, (ii ) twee nieuwe en snelle algoritmes voor correlatiekalibratie te
presenteren, (iii ) nieuwe modellen te ontwikkelen die een eﬃci¨ente kalibratie toestaan
voor een hele nieuwe klasse van derivaten, zoals vastelooptijd Bermuda rente opties, en
(iv) nieuwe empirische vergelijkingen te presenteren van bestaande kalibratie technieken
en modellen, in termen van reductie van risico.
222
223
Bibliography
Abraham, R., Marsden, J. E. & Ratiu, T. (1988), Manifolds, Tensor Analysis, and Ap
plications, SpringerVerlag, Berlin.
AlBaali, M. (1985), ‘Descent property and global convergence of the FletcherReeves
method with inexact line search’, IMA Journal of Numerical Analysis 5, 121–124.
Andersen, L. & Andreasen, J. (2000), ‘Volatility skews and extensions of the LIBOR
market model’, Applied Mathematical Finance 7(1), 1–32.
Andersen, L. & Andreasen, J. (2001), ‘Factor dependence of bermudan swaptions: fact
or ﬁction?’, Journal of Financial Economics 62(1), 3–37.
Apostol, T. M. (1967), Calculus, Vol. 1, 2 edn, John Wiley & Sons, Chichester.
Avellaneda, M. & Gamba, R. (2001), Conquering the Greeks in Monte Carlo: Eﬃcient
calculation of the market sensitivities and hedgeratios of ﬁnancial assets by direct
numerical simulation, in ‘Proceedings of the First Bachelier Congress’, Paris.
Avellaneda, M., Holmes, R., Friedman, C. & Samperi, D. (1997), ‘Calibration of volatility
surfaces via relativeentropy minimization’, Applied Mathematical Finance 4(1), 37–
64.
Avramidis, A. N. & Matzinger, H. (2004), ‘Convergence of the stochastic mesh estimator
for pricing Bermudan options’, Journal of Computational Finance 7(4), 73–91.
Baxter, M. W. & Rennie, A. J. O. (1996), Financial calculus: An introduction to derivative
pricing, Cambridge University Press, Cambridge.
Bennett, M. N. & Kennedy, J. E. (2004), A comparison of Markovfunctional
and market models: The onedimensional case, www2.warwick.ac.uk/fac/sci/
statistics/staff/research students/mbennett/.
Berridge, S. & Schumacher, J. M. (2003), ‘An irregular grid method for high
dimensional freeboundary problems in ﬁnance’, Future Generation Computer Sys
tems 20(3), 353–362.
224
196 BIBLIOGRAPHY
Bj¨ork, T. (2004), Arbitrage Theory in Continuous Time, 2 edn, Oxford University Press,
Oxford.
Bj¨ork, T., Land´en, C. & Svensson, L. (2004), ‘Finitedimensional Markovian realiza
tions for stochastic volatility forwardrate models’, Proceedings of the Royal Society
460(2041), 53–83. Series A.
Black, F. (1976), ‘The pricing of commodity contracts’, Journal of Financial Economics
3(2), 167–179.
Black, F. & Karasinski, P. (1991), ‘Bond and option pricing when short rates are lognor
mal’, Financial Analysts Journal 47(4), 52–59.
Black, F. & Scholes, M. (1973), ‘The pricing of options corporate liabilities’, Journal of
Political Economy 81(3), 637–654.
Black, F., Derman, E. & Toy, W. (1990), ‘A onefactor model of interest rates and its
applications to treasury and bond options’, Financial Analysts Journal 46(1), 33–39.
Borg, I. & Groenen, P. J. F. (1997), Modern Multidimensional Scaling, SpringerVerlag,
Berlin.
Brace, A. & Womersley, R. S. (2000), Exact ﬁt to the swaption volatility matrix using
semideﬁnite programming, presented at the ICBI Global Derivatives Conference,
Paris.
Brace, A., Dun, T. & Barton, G. (1998), Towards a central interest rate model, presented
at the ICBI Global Derivatives Conference, Paris.
Brace, A., G¸ atarek, D. & Musiela, M. (1997), ‘The market model of interest rate dynam
ics’, Mathematical Finance 7(2), 127–155.
Brennan, M. J. & Schwartz, E. S. (1979), ‘A continuous time approach to the pricing of
bonds’, Journal of Banking and Finance 3(2), 133–155.
Brigo, D. (2002), A note on correlation and rank reduction, www.damianobrigo.it.
Brigo, D. & Mercurio, F. (2001), Interest Rate Models: Theory and Practice, Springer
Verlag, Berlin.
Broadie, M. & Glasserman, P. (1996), ‘Estimating security price derivatives using simu
lation’, Management Science 42(2), 269–285.
Broadie, M. & Glasserman, P. (2004), ‘A stochastic mesh method for pricing high
dimensional American options’, Journal of Computational Finance 7(4), 35–72.
225
BIBLIOGRAPHY 197
Cairns, A. J. G. (2004), Interest Rate Models: An Introduction, Princeton University
Press, New Jersey.
Choy, B., Dun, T. & Schl¨ogl, E. (2004), ‘Correlating market models’, Risk Magazine
pp. 124–129. September.
Chu, M. T., Funderlic, R. E. & Plemmons, R. J. (2003), ‘Structured low rank approxi
mation’, Linear Algebra and its Applications 366, 157–172.
Cox, J. C. & Ross, S. A. (1976), ‘The valuation of options for alternative stochastic
processes’, Journal of Financial Economics 3(2), 145–166.
Cox, J. C., Ingersoll, J. E. & Ross, S. A. (1985), ‘A theory of the term structure of interest
rates’, Econometrica 53(2), 385–408.
Dai, Q. & Singleton, K. (2003), ‘Term structure dynamics in theory and reality’, Review
of Financial Studies 16(3), 631–678.
D’Aspremont, A. (2002), Calibration and RiskManagement Methods for the Libor Mar
ket Model Using Semideﬁnite Programming, PhD thesis, Ecole Polytechnique, Paris.
D’Aspremont, A. (2003), ‘Interest rate model calibration using semideﬁnite programming’,
Applied Mathematical Finance 10(3), 183–213.
Davies, P. I. & Higham, N. J. (2000), ‘Numerically stable generation of correlation ma
trices and their factors’, BIT 40, 640–651.
De Jong, F., Driessen, J. & Pelsser, A. A. J. (2004), ‘On the information in the interest
rate term structure and option prices’, Review of Derivatives Research 7(2), 99–127.
De Leeuw, J. & Heiser, W. J. (1977), Convergence of correctionmatrix algorithms for
multidimensional scaling, in J. C. Lingoes, E. E. Roskam & I. Borg, eds, ‘Geometric
representations of relational data’, Mathesis Press, Ann Arbor, MI, pp. 735–752.
Dedieu, J.P., Priouret, P. & Malajovich, G. (2003), ‘Newton’s method on Riemannian
manifolds: Covariant alphatheory’, IMA Journal of Numerical Analysis 23(3), 395–
419.
Depczynski, U. & St¨ockler, J. (1998), A diﬀerential geometric approach to equidistributed
knots on Riemannian manifolds, in C. K. Chui & L. L. Schumaker, eds, ‘Approxima
tion Theory IX, Theoretical Aspects’, Vol. 1, Vanderbilt University Press, Nashville,
TN, pp. 99–106.
do Carmo, M. P. (1992), Riemannian Geometry, 12 edn, Birkh¨auser, Boston, MA.
226
198 BIBLIOGRAPHY
Dolan, E. D. & Mor´e, J. J. (2002), ‘Benchmarking optimization software with performance
proﬁles’, Mathematical Programming, Series A 91(2), 201–213.
Dothan, L. U. (1978), ‘On the term structure of interest rates’, Journal of Financial
Economics 6(1), 59–69.
Driessen, J., Klaassen, P. & Melenberg, B. (2003), ‘The performance of multifactor term
structure model for pricing and hedging caps and swaptions’, Journal of Financial
abd Quantitative Analysis 38(3), 635–672.
Duistermaat, J. J. & Kolk, J. A. C. (2000), Lie Groups, SpringerVerlag, Berlin.
Dykstra, R. L. (1983), ‘An algorithm for restricted least squares regression’, Journal of
the American Statistical Association 87(384), 837–842.
Edelman, A. & Lippert, R. (2000), Nonlinear eigenvalue problems with orthogonality
constraints (section 8.3), in Z. Bai, J. Demmel, J. Dongarra, A. Ruhe & H. van der
Vorst, eds, ‘Templates for the Solution of Algebraic Eigenvalue Problems: A Practical
Guide’, SIAM, Philidelphia.
Edelman, A., Arias, T. A. & Smith, S. T. (1999), ‘The geometry of algorithms with
orthogonality constraints’, SIAM Journal of Matrix Analysis and its Applications
20(2), 303–353.
Fan, R., Gupta, A. & Ritchken, P. (2003), ‘Hedging in the possible presence of un
spanned stochastic volatility: Evidence from swaption markets’, Journal of Finance
58(5), 2219–2248.
Fletcher, R. & Reeves, C. M. (1964), ‘Function minimization by conjugate gradients’,
Computer Journal 7(2), 149–154.
Flury, B. (1988), Common Principal Components and Related Multivariate Models, J.
Wiley & Sons, New York.
Fourni´e, E., Lasry, J.M., Lebuchoux, J., Lions, P.L. & Touzi, N. (1999), ‘Applications
of Malliavin calculus to Monte Carlo methods in ﬁnance’, Finance and Stochastics
3(4), 391–412.
Galluccio, S. & Hunter, C. J. (2004), ‘The coinitial swap market model’, Economic Notes
33(2), 209–232.
Galluccio, S., Huang, Z., Ly, J.M. & Scaillet, O. (2004), Theory and calibration of swap
market models, Working Paper, June Version.
227
BIBLIOGRAPHY 199
Gilbert, J.C. & Nocedal, J. (1992), ‘Global convergence properties of conjugate gradient
methods for optimization’, SIAM Journal on Optimization 2(1), 21–42.
Glasserman, P. (2004), Monte Carlo Methods in Financial Engineering, SpringerVerlag,
Berlin.
Glasserman, P. & Merener, N. (2003a), ‘Cap and swaption approximations in LIBOR
market models with jumps’, Journal of Computational Finance 7(1), 1–36.
Glasserman, P. & Merener, N. (2003b), ‘Numerical solution of jumpdiﬀusion LIBOR
market models’, Finance and Stochastics 7(1), 1–27.
Glasserman, P. & Merener, N. (2004), ‘Convergence of a discretization scheme for jump
diﬀusion processes with statedependent intensities’, Proceedings of the Royal Society
460(2041), 111–127. Series A.
Glasserman, P. & Zhao, X. (1999), ‘Fast Greeks by simulation in forward LIBOR models’,
Journal of Computational Finance 3(1), 5–39.
Glasserman, P. & Zhao, X. (2000), ‘Arbitragefree discretization of lognormal forward
LIBOR and swap rate models’, Finance and Stochastics 4(1), 35–68.
Glunt, W., Hayden, T. L., Hong, S. & Wells, J. (1990), ‘An alternating projection algo
rithm for computing the nearest Euclidean distance matrix’, SIAM Journal of Matrix
Analysis and its Applications 11(4), 589–600.
Golub, G. H. & van Loan, C. F. (1996), Matrix Computations, 3 edn, John Hopkins
University Press, Baltimore, MD.
Grubiˇsi´c, I. (2002), Interest rate theory: BGM model, Master’s thesis, Mathematical
Institute, Leiden University. www.math.uu.nl/people/grubisic.
Grubiˇsi´c, I. & Pietersz, R. (2005), Eﬃcient rank reduction of correlation matrices,
www.few.eur.nl/few/people/pietersz.
Han, S.P. (1988), ‘A successive projection method’, Mathematical Programming 40, 1–14.
Hayden, T. L. & Wells, J. (1988), ‘Approximation by matrices positive semideﬁnite on a
subspace’, Linear Algebra and its Applications 109, 115–130.
Heath, D., Jarrow, R. & Morton, A. (1992), ‘Bond pricing and the term structure of
interest rates: A new methodology for contingent claims valuation’, Econometrica
60(1), 77–105.
228
200 BIBLIOGRAPHY
Heiser, W. J. (1995), Convergent computation by iterative majorization: Theory and
applications in multidimensional data analysis, in W. J. Krzanowski, ed., ‘Recent
Advances in Descriptive Multivariate Analysis’, Oxford University Press, Oxford,
pp. 157–189.
Higham, N. J. (2002), ‘Computing the nearest correlation matrix–a problem from ﬁnance’,
IMA Journal of Numerical Analysis 22(3), 329–343.
Ho, T. S. Y. & Lee, S.B. (1986), ‘Term structure movements and pricing interest rate
contingent claims’, Journal of Finance 41(5), 1011–1029.
Horn, R. A. & Johnson, C. R. (1990), Matrix Analysis, Cambridge University Press,
Cambridge.
Hughston, L. P. & Rafailidis, A. (2005), ‘A chaotic approach to interest rate modelling’,
Finance and Stochastics 9(1), 43–65.
Hull, J. C. (2000), Options, Futures, and Other Derivatives, 4 edn, PrenticeHall, London.
Hull, J. C. & White, A. (1990), ‘Pricing interestratederivative securities’, Review of
Financial Studies 3(4), 573–592.
Hull, J. C. & White, A. (2000), ‘Forward rate volatilities, swap rate volatilities, and
implementation of the LIBOR market model’, Journal of Fixed Income 10(2), 46–
62.
Hunt, P. & Kennedy, J. E. (2000), Financial Derivatives in Theory and Practice, John
Wiley & Sons, Chichester.
Hunt, P., Kennedy, J. E. & Pelsser, A. A. J. (2000), ‘Markovfunctional interest rate
models’, Finance and Stochastics 4(4), 391–408.
Hunter, C. J., J¨ackel, P. & Joshi, M. S. (2001), ‘Getting the drift’, Risk Magazine. July.
J¨ackel, P. (2002), Monte Carlo Methods in Finance, J. Wiley & Sons, Chichester.
J¨ackel, P. & Rebonato, R. (2003), ‘The link between caplet and swaption volatilities in a
BraceG¸ atarekMusiela/Jamshidian framework: approximate solutions and empirical
evidence’, Journal of Computational Finance 6(4), 41–59.
Jamshidian, F. (1989), ‘An exact bond option formula’, Journal of Finance 44(1), 205–
209.
Jamshidian, F. (1997), ‘LIBOR and swap market models and measures’, Finance and
Stochastics 1(4), 293–330.
229
BIBLIOGRAPHY 201
Jamshidian, F. (2003), Minimax optimality of Bermudan and American claims and their
MonteCarlo upper bound approximation, NIB Capital Bank, working paper.
Joshi, M. S. (2003a), The Concepts and Practice of Mathematical Finance, Cambridge
University Press, Cambridge.
Joshi, M. S. (2003b), ‘Rapid computation of drifts in a reduced factor LIBOR market
model’, Wilmott Magazine 5, 84–85.
Joshi, M. S. & Theis, J. (2002), ‘Bounding Bermudan swaptions in a swaprate market
model’, Quantitative Finance 2(5), 370–377.
Karatzas, I. & Shreve, S. E. (1991), Brownian Motion and Stochastic Calculus, 2 edn,
SpringerVerlag, Berlin.
Kerkhof, J. & Pelsser, A. A. J. (2002), ‘Observational equivalence of discrete string models
and market models’, Journal of Derivatives 10(1), 55–61.
Kiers, H. A. L. (2002), ‘Setting up alternating least squares and iterative majorization al
gorithms for solving various matrix optimization problems’, Computational Statistics
and Data Analysis 41, 157–170.
Kiers, H. A. L. & Groenen, P. J. F. (1996), ‘A monontonically convergent algorithm for
orthogonal congruence rotation’, Psychometrika 61, 375–389.
Kloeden, P. E. & Platen, E. (1999), Numerical Solution of Stochastic Diﬀerential Equa
tions, Vol. 23 of Applications of Mathematics, SpringerVerlag, Berlin.
Kurbanmuradov, O., Sabelfeld, K. & Schoenmakers, J. (2002), ‘Lognormal approxima
tions to LIBOR market models’, Journal of Computational Finance 6(1), 69–100.
Levenberg, K. (1944), ‘A method for the solution of certain nonlinear problems in least
squares’, Quarterly of Applied Mathematics 2, 164–168.
Longstaﬀ, F. A. & Schwartz, E. S. (1992), ‘Interest rate volatility and the term structure:
A twofactor general equilibrium model’, Journal of Finance 47(4), 1259–1282.
Longstaﬀ, F. A. & Schwartz, E. S. (2001), ‘Valuing American options by simulation: A
simple leastsquares approach’, Review of Financial Studies 14(1), 113–147.
Longstaﬀ, F. A., SantaClara, P. & Schwartz, E. S. (2001), ‘Throwing away a billion
dollars: the cost of suboptimal exercise strategies in the swaptions market’, Journal
of Financial Economics 62(1), 39–66.
230
202 BIBLIOGRAPHY
Marquardt, D. W. (1963), ‘An algorithm for leastsquares estimation of nonlinear param
eters’, Journal of the Society for Industrial and Applied Mathematics 11(2), 431–441.
Merton, R. C. (1973), ‘Theory of rational option pricing’, Bell Journal of Economics and
Management Science 4(1), 141–183.
Merton, R. C. (1976), ‘Option pricing when underlying stock returns are discontinuous’,
Journal of Financial Economics 3(1–2), 125–144.
Miltersen, K. R., Sandmann, K. & Sondermann, D. (1997), ‘Closed form solutions for term
structure derivatives with lognormal interest rates’, Journal of Finance 52(1), 409–
430.
Morini, M. & Webber, N. (2004), An EZI method to reduce the rank of a correlation
matrix, www.cass.city.ac.facfin/facultypages/nwebber/.
Munkres, J. R. (1975), Topology, PrenticeHall, London.
Musiela, M. & Rutkowski, M. (1997), ‘Continuoustime term structure models: Forward
measure approach’, Finance and Stochastics 1(4), 261–291.
Øksendal, B. K. (1998), Stochastic Diﬀerential Equations, 5 edn, SpringerVerlag, Berlin.
Ostrom, D. (1998), ‘Japanese interest rates enter negative territory’, Japan Economic
Institute Report (43B), 4–6. www.jei.org/archive/.
Ostrovsky, D. (2002), A Markovfunctional model consistent with caplet and swaption
smiles, Yale University Working Paper.
Pearson, N. D. & Sun, T.S. (1994), ‘Exploiting the conditional density in estimating the
term structure: An application to the Cox, Ingersoll, and Ross model’, Journal of
Finance 49(4), 1279–1304.
Pelsser, A. A. J. (2000), Eﬃcient Methods for Valuing Interest Rate Derivatives, Springer
Verlag, Berlin.
Pietersz, R. (2001), The LIBOR market model, Master’s thesis, Mathematical Institute,
Leiden University. www.math.leidenuniv.nl/scripties/pietersz.pdf.
Pietersz, R. & Groenen, P. J. F. (2004a), ‘A major LIBOR ﬁt’, Risk Magazine p. 102.
December issue.
Pietersz, R. & Groenen, P. J. F. (2004b), ‘Rank reduction of correlation matrices by
majorization’, Quantitative Finance 4(6), 649–662.
231
BIBLIOGRAPHY 203
Pietersz, R. & Pelsser, A. A. J. (2004a), ‘Riskmanaging Bermudan swaptions in a LIBOR
model’, Journal of Derivatives 11(3), 51–62.
Pietersz, R. & Pelsser, A. A. J. (2004b), ‘Swap vega in BGM: pitfalls and alternatives’,
Risk Magazine pp. 91–93. March issue.
Pietersz, R. & Pelsser, A. A. J. (2005a), A comparison of single factor markovfunctional
and multi factor market models, SSRN Working Paper.
Pietersz, R. & Pelsser, A. A. J. (2005b), Swap vega in BGM: pitfalls and alternatives, in
N. Dunbar, ed., ‘Derivatives Trading and Option Pricing’, Risk Books, London, UK,
pp. 277–285.
Pietersz, R. & van Regenmortel, M. (2005), Generic market models, SSRN Working Paper.
Pietersz, R., Pelsser, A. A. J. & van Regenmortel, M. (2004), ‘Fast driftapproximated
pricing in the BGM model’, Journal of Computational Finance 8(1), 93–124.
Pietersz, R., Pelsser, A. A. J. & van Regenmortel, M. (2005), ‘Bridging Brownian LIBOR’,
Wilmott Magazine 18, 98–103.
Piterbarg, V. V. (2004), ‘TARNs: Models, valuation, risk sensitivities’, Wilmott Magazine
14, 62–71.
Polak, E. & Ribi`ere, G. (1969), ‘Note sur la convergence de m´ethodes de directions con
jugu´ees’, Revue Fran¸caise d’Informatique et de Recherche Op´erationnelle 16, 35–43.
Rapisarda, F., Brigo, D. & Mercurio, F. (2002), Parametrizing correlations: A geometric
interpretation, Banca IMI Working Paper, www.fabiomercurio.it.
Rebonato, R. (1998), Interest Rate Option Models, 2 edn, J. Wiley & Sons, Chichester.
Rebonato, R. (1999a), ‘Calibrating the BGM model’, pp. 74–79. Risk Magazine.
Rebonato, R. (1999b), ‘On the simultaneous calibration of multifactor lognormal interest
rate models to Black volatilities and to the correlation matrix’, Journal of Compu
tational Finance 2(4), 5–27.
Rebonato, R. (1999c), Volatility and Correlation in the Pricing of Equity, FX and Interest
Rate Options, J. Wiley & Sons, Chichester.
Rebonato, R. (2001), Accurate and optimal calibration to coterminal European swaptions
in a FRAbased BGM framework, Royal Bank of Scotland Working Paper, London.
232
204 BIBLIOGRAPHY
Rebonato, R. (2002), Modern Pricing of InterestRate Derivatives, Princeton University
Press, New Jersey.
Rebonato, R. (2004a), ‘Interestrate termstructure pricing models: a review’, Proceedings
of the Royal Society London 460(2043), 667–728. Series A.
Rebonato, R. (2004b), Volatility and Correlation: The Perfect Hedger and the Fox, 2 edn,
J. Wiley & Sons, Chichester, UK.
Ritchken, P. & Sankarasubramanian, L. (1995), ‘Volatility structures of the forward rates
and the dynamics of the term structure’, Mathematical Finance 5(1), 55–72.
Rogers, L. C. G. (2002), ‘Monte Carlo valuation of American options’, Mathematical
Finance 12(3), 271–286.
Rubinstein, M. (1983), ‘Displaced diﬀusion option pricing’, Journal of Finance 38(1), 213–
217.
SantaClara, P. & Sornette, D. (2001), ‘The dynamics of the forward interest rate curve
with stochastic string shocks’, Review of Financial Studies 14(1), 149–185.
Schl¨ogl, E. (2002), ‘A multicurrency extension of the lognormal interest rate market mod
els’, Finance and Stochastics 6(2), 173–196.
Sharpe, W. F. (1964), ‘Capital asset prices: A theory of market equilibrium under condi
tions of risk’, Journal of Finance 19(3), 425–442.
Sidenius, J. (2000), ‘LIBOR market models in practice’, Journal of Computational Fi
nance 3(3), 5–26.
Smith, S. T. (1993), Geometric Optimization Methods for Adaptive Filtering, PhD thesis,
Harvard University, Cambridge, MA.
Suﬀridge, T. J. & Hayden, T. L. (1993), ‘Approximation by a Hermitian positive
semideﬁnite Toeplitz matrix’, SIAM Journal of Matrix Analysis and its Applications
14(3), 721–734.
Valdez, S. (1997), An Introduction to Global Financial Markets, 2 edn, MacMillan Press,
London.
Vasicek, O. (1977), ‘An equilibrium characterization of the term structure’, Journal of
Financial Economics 5(2), 177–188.
Weigel, P. (2004), ‘Optimal calibration of LIBOR market models to correlations’, Journal
of Derivatives 12(3).
233
BIBLIOGRAPHY 205
Williams, D. (1991), Probability with Martingales, Cambridge University Press, Cam
bridge.
Wilmott, P. (1998), Derivatives: The Theory and Practice of Financial Engineering, John
Wiley & Sons, Chichester.
Wu, L. (2003), ‘Fast atthemoney calibration of the LIBOR market model using Lagrange
multipliers’, Journal of Computational Finance 6(2), 39–77.
Zangwill, W. I. (1969), ‘Convergence conditions for nonlinear programming algorithms’,
Management Science (Theory Series) 16(1), 1–13.
Zhang, Z. & Wu, L. (2003), ‘Optimal lowrank approximation to a correlation matrix’,
Linear Algebra and its Applications 364, 161–187.
Zoutendijk, G. (1970), Nonlinear programming, computational methods, in J. Abadie,
ed., ‘Integer and nonlinear programming’, pp. 37–86.
234
235
Author index
Abraham, R. 92
AlBaali, M. 81
Andersen, L. 16, 134, 135, 164
Andreasen, J. 16, 134, 135, 164
Apostol, T. M. 112, 129
Arias, T. A. 83
Avellaneda, M. 144, 154
Avramidis, A. N. 17, 107
Barton, G. 19, 22, 29
Baxter, M. W. 3
Bennett, M. N. 137
Berridge, S. 17
Bj¨ork, T. 3, 101
Black, F. xxvi, 3, 11–15, 134, 164
Borg, I. 47
Brace, A. 15, 19, 22, 29, 40, 69, 97, 134,
161, 189
Brennan, M. J. 12
Brigo, D. 22, 46, 75, 99, 140, 184
Broadie, M. 17, 98, 107
Cairns, A. J. G. 14
Choy, B. 156
Chu, M. T. 47
Cox, J. C. 12, 134, 164
Dai, Q. 5, 69, 134
D’Aspremont, A. 15, 29
Davies, P. I. 57, 61–63
De Jong, F. 54, 101
De Leeuw, J. 47
Dedieu, J.P. 83
Depczynski, U. 70
Derman, E. 12, 134
do Carmo, M. P. 73, 76
Dolan, E. D. 55, 88
Dothan, L. U. 12, 134
Driessen, J. 54, 101, 135, 148
Duistermaat, J. J. 73, 93, 94
Dun, T. 19, 22, 29, 156
Dykstra, R. L. 45, 47
Edelman, A. 83, 87
Fan, R. 135
Fletcher, R. 81, 83, 87
Flury, B. 44
Fourni´e, E. 144
Friedman, C. 154
Funderlic, R. E. 47
Galluccio, S. 16, 164, 167
Gamba, R. 144
G¸ atarek, D. 15, 40, 69, 97, 134, 161, 189
Gilbert, J.C. 81
Glasserman, P. xxvi, 17, 21, 32, 98, 104,
107, 135, 143, 144
Glunt, W. 47
Golub, G. H. 62, 74, 173
Groenen, P. J. F. xxv, 39, 44, 47, 71, 88,
140, 184
Grubiˇsi´c, I. xxv, 44–46, 50, 56, 88, 140, 184
Gupta, A. 135
Han, S.P. 45, 47
Hayden, T. L. 47
236
208 AUTHOR INDEX
Heath, D. 101
Heiser, W. J. 47
Higham, N. J. 45, 47, 57, 61–63, 71, 85
Ho, T. S. Y. 12, 134
Holmes, R. 154
Hong, S. 47
Horn, R. A. 67
Huang, Z. 164
Hughston, L. P. 101
Hull, J. C. 3, 12, 13, 19, 22, 29, 44, 134,
139, 164, 185
Hunt, P. xxvi, 3, 16, 102, 134, 137, 141,
142, 154, 189
Hunter, C. J. xxv, 16, 97, 103, 108, 127,
137, 164, 167, 190
Ingersoll, J. E. 12, 134
J¨ackel, P. xxv, 97, 103, 108, 127, 137, 164,
190
Jamshidian, F. 14–17, 40, 69, 97, 134, 154,
161, 189
Jarrow, R. 101
Johnson, C. R. 67
Joshi, M. S. xxv, 4, 30, 69, 97, 103, 108,
127, 137, 149, 164, 180, 190
Karasinski, P. 12, 134
Karatzas, I. 4, 127, 128
Kennedy, J. E. xxvi, 3, 16, 102, 134, 137,
141, 142, 154, 189
Kerkhof, J. 15
Kiers, H. A. L. 47
Klaassen, P. 135, 148
Kloeden, P. E. 103, 104, 110–112
Kolk, J. A. C. 73, 93, 94
Kurbanmuradov, O. 97
Land´en, C. 101
Lasry, J.M. 144
Lebuchoux, J. 144
Lee, S.B. 12, 134
Levenberg, K. 83, 87
Lions, P.L. 144
Lippert, R. 87
Longstaﬀ, F. A. xxvi, 14, 15, 17, 23, 98,
107, 117, 134, 135, 143, 147, 181, 189
Ly, J.M. 164
Malajovich, G. 83
Marquardt, D. W. 83, 87
Marsden, J. E. 92
Matzinger, H. 17, 107
Melenberg, B. 135, 148
Mercurio, F. 22, 46, 75, 99
Merener, N. 104
Merton, R. C. xxvi, 2, 12
Miltersen, K. R. 15, 40, 69, 97, 134, 161,
189
Mor´e, J. J. 55, 88
Morini, M. 45
Morton, A. 101
Munkres, J. R. 92
Musiela, M. 15, 40, 69, 97, 99, 134, 161,
166, 189
Nocedal, J. 81
Øksendal, B. K. 3, 4
Ostrom, D. 172
Ostrovsky, D. 138
Pearson, N. D. 12
Pelsser, A. A. J. xxiii, xxv, xxvi, 15, 16,
19, 54, 97, 99, 101, 102, 134, 135, 137,
140–142, 154, 156, 164, 189
Pietersz, R. xxiii, xxv–xxvii, 15, 19, 39, 44,
46, 50, 56, 71, 88, 97, 135, 137, 140, 149,
156, 164, 184
Piterbarg, V. V. 136
237
AUTHOR INDEX 209
Platen, E. 103, 104, 110–112
Plemmons, R. J. 47
Polak, E. 81, 83, 87
Priouret, P. 83
Rafailidis, A. 101
Rapisarda, F. 46, 75
Ratiu, T. 92
Rebonato, R. 10, 13, 22, 23, 29, 42, 46, 58,
70, 71, 88, 139, 140, 164, 165, 181, 184
Reeves, C. M. 81, 83, 87
Rennie, A. J. O. 3
Ribi`ere, G. 81, 83, 87
Ritchken, P. 14, 101, 134, 135
Rogers, L. C. G. 17
Ross, S. A. 12, 134, 164
Rubinstein, M. 152, 164, 172
Rutkowski, M. 69, 99, 161, 166
Sabelfeld, K. 97
Samperi, D. 154
Sandmann, K. 15, 40, 69, 97, 134, 161, 189
Sankarasubramanian, L. 14, 101, 134
SantaClara, P. 15, 134
Scaillet, O. 164
Schl¨ogl, E. 156, 164
Schoenmakers, J. 97
Scholes, M. xxvi, 3, 13–15
Schumacher, J. M. 17
Schwartz, E. S. xxvi, 12, 14, 15, 17, 23, 98,
107, 117, 134, 135, 143, 147, 181, 189
Sharpe, W. F. 3
Shreve, S. E. 4, 127, 128
Sidenius, J. 44
Singleton, K. 5, 69, 134
Smith, S. T. 71, 81, 83
Sondermann, D. 15, 40, 69, 97, 134, 161,
189
Sornette, D. 15
St¨ockler, J. 70
Suﬀridge, T. J. 47
Sun, T.S. 12
Svensson, L. 101
Theis, J. 30, 164
Touzi, N. 144
Toy, W. 12, 134
Valdez, S. 8
van Loan, C. F. 62, 74, 173
van Regenmortel, M. xxv, xxvii, 97, 137,
149, 164
Vasicek, O. 12, 134
Webber, N. 45
Weigel, P. 45
Wells, J. 47
White, A. 12, 19, 22, 29, 44, 134, 139, 164,
185
Williams, D. 105, 128
Wilmott, P. 123
Womersley, R. S. 15
Wu, L. xxv, 46, 56, 69, 86–88, 95, 96, 140,
184
Zangwill, W. I. 51
Zhang, Z. xxv, 46, 56, 69, 86–88, 96
Zhao, X. 21, 32, 98, 144
Zoutendijk, G. 81
238
239
Curriculum Vitae
Raoul Pietersz was born on 12 June 1978 in Rotterdam, The Netherlands. In 2000, he ob
tained a Certiﬁcate of Advanced Studies in Mathematics (Mathematical Tripos Part III),
with distinction, from the University of Cambridge. Over the academic year 19992000,
he was awarded a title of Cambridge European Trust Scholar, and a retrospective title
of Scholar at Peterhouse, Cambridge. In the summer of 2000, he completed internships
at UBS Warburg and Dresdner Kleinwort Wasserstein, in London. In 2001, he obtained
a ﬁrst class M.Sc. degree in Mathematics from Leiden University. His Master’s thesis
entitled “The LIBOR market model”was completed during an internship at ABN AMRO
Bank, in Amsterdam. Over the period 19972001, he was awarded the Shell International
Scholarship for undergraduate studies. His Ph.D. research, under supervision of Antoon
Pelsser and Ton Vorst, focuses on the eﬃcient valuation and risk management of inter
est rate derivatives. He has published articles in The Journal of Computational Finance,
The Journal of Derivatives, Quantitative Finance, Risk Magazine and Wilmott Magazine.
He has presented his research at various international conferences. His teaching experi
ence includes lecturing taught Master courses on derivatives at the Rotterdam School of
Management. Since the start of the Ph.D. period, he has held a parttime position at
ABN AMRO Bank, initially at Quantitative Risk Analytics, Risk Management. Since
July 2004, he is a Senior Derivatives Researcher, developing frontoﬃce pricing models
for interest rate derivatives, at Product Development Group, Quantitative Analytics, as
part of Structured Derivatives.
240
241
Erasmus Research Institute of Management
ERIM Ph.D. Series Resesearch in Management
Appelman, J.H., Governance of Global Interorganizational Tourism Networks: Changing Forms
of Coordination between the Travel Agency and Aviation Sector, Promotors: Prof. dr. F.M.
Go & Prof. dr. B. Nooteboom
EPS2004036MKT, ISBN: 9058920607, http://hdl.handle.net/1765/1199
Assen, M.F. van, Empirical Studies in Discrete Parts Manufacturing Management, Promotors:
Prof. dr. S.L. van de Velde & Prof. dr. W.H.M. Zijm
EPS2005056LIS, ISBN: 9058920852, http://hdl.handle.net/1765/6767
Berens, G., Corporate Branding: The Development of Corporate Associations and their In
ﬂuence on Stakeholder Reactions, Promotor: Prof. dr. C. B. M. van Riel
EPS2004039ORG, ISBN: 9058920658, http://hdl.handle.net/1765/1273
Berghe, D.A.F., Working Across Borders: Multinational Enterprises and the Internationaliza
tion of Employment, Promotors: Prof. dr. R.J.M. van Tulder & Prof. dr. E.J.J. Schenk
EPS2003029ORG, ISBN: 9058920534, http://hdl.handle.net/1765/1041
Bijman, W.J.J., Essays on Agricultural Cooperatives: Governance Structure in Fruit and Veg
etable Chains, Promotor: Prof. dr. G.W.J. Hendrikse
EPS2002015ORG, ISBN: 9058920240, http://hdl.handle.net/1765/867
Boer, N.I., Knowledge Sharing within Organizations: A situated and relational Perspective,
Promotor: Prof. dr. K. Kumar
EPS2005060LIS, ISBN: 9058920860, http://hdl.handle.net/1765/6770
Boer, C.A., Distributed Simulation in Industry, Promotors: Prof. dr. A. de Bruin & Prof.
dr. eng. A. Verbraeck
EPS2005065LIS, ISBN: 9058920933, http://hdl.handle.net/1765/6925
Brito, M.P. de, Managing Reverse Logistics or Reversing Logistics Management?, Promotors:
Prof. dr. eng. R. Dekker & Prof. dr. M. B. M. de Koster
EPS2004035LIS, ISBN: 9058920585, http://hdl.handle.net/1765/1132
Brohm, R., Polycentric Order in Organizations: a dialogue between Michael Polanyi and IT
242
consultants on knowledge, morality, and organization, Promotors: Prof. dr. G.W.J. Hendrikse
& Prof. dr. H.K. Letiche
EPS2004063ORG, ISBN: 905892095X, http://hdl.handle.net/1765/6911
Campbell, R.A.J., Rethinking Risk in International Financial Markets, Promotor: Prof. dr.
C.G. Koedijk
EPS2001005F&A, ISBN: 9058920089, http://hdl.handle.net/1765/306
Chen, Y., Labour Flexibility in China’s Companies: An Empirical Study, Promotors: Prof.
dr. A. Buitendam & Prof. dr. B. Krug
EPS2001006ORG, ISBN: 9058920127, http://hdl.handle.net/1765/307
Daniˇsevsk´a, P., Empirical Studies on Financial Intermediation and Corporate Policies, Pro
motor: Prof. dr. C.G. Koedijk
EPS2004044F&A, ISBN: 9058920704, http://hdl.handle.net/1765/1518
DelporteVermeiren, D.J.E., Improving the Flexibility and Proﬁtability of ICTenabled Busi
ness Networks: An Assessment Method and Tool, Promotors: Prof.mr. dr. P.H.M. Vervest &
Prof. dr. eng. H.W.G.M. van Heck
EPS2003020LIS, ISBN: 9058920402, http://hdl.handle.net/1765/359
Dijksterhuis, M., Organizational Dynamics of Cognition and Action in the Changing Dutch
and US Banking Industries, Promotors: Prof. dr. eng. F.A.J. van den Bosch & Prof. dr. H.W.
Volberda
EPS2003026STR, ISBN: 9058920488, http://hdl.handle.net/1765/1037
Fenema, P.C. van, Coordination and Control of Globally Distributed Software Projects, Promo
tor: Prof. dr. K. Kumar
EPS2002019LIS, ISBN: 9058920305, http://hdl.handle.net/1765/360
Fleischmann, M., Quantitative Models for Reverse Logistics, Promoters: Prof. dr. eng. J.A.E.E.
van Nunen & Prof. dr. eng. R. Dekker
EPS2000002LIS, ISBN: 3540417117, http://hdl.handle.net/1765/1044
Flier, B., Strategic Renewal of European Financial Incumbents: Coevolution of Environmental
Selection, Institutional Eﬀects, and Managerial Intentionality, Promotors: Prof. dr. eng. F.A.J.
van den Bosch & Prof. dr. H.W. Volberda
EPS2003033STR, ISBN: 9058920550, http://hdl.handle.net/1765/1071
Fok, D., Advanced Econometric Marketing Models, Promotor: Prof. dr. P.H.B.F. Franses
EPS2003027MKT, ISBN: 9058920496, http://hdl.handle.net/1765/1035
243
Ganzaroli , A., Creating Trust between Local and Global Systems, Promotors: Prof. dr. K.
Kumar & Prof. dr. R.M. Lee
EPS2002018LIS, ISBN: 9058920313, http://hdl.handle.net/1765/361
Gilsing, V.A., Exploration, Exploitation and Coevolution in Innovation Networks, Promotors:
Prof. dr. B. Nooteboom & Prof. dr. J.P.M. Groenewegen
EPS2003032ORG, ISBN 9058920542, http://hdl.handle.net/1765/1040
Graaf, G. de, Tractable Morality: Customer Discourses of Bankers, Veterinarians and Char
ity Workers, Promotors: Prof. dr. F. Leijnse & Prof. dr. T. van Willigenburg
EPS2003031ORG, ISBN: 9058920518, http://hdl.handle.net/1765/1038
Hartigh, E. den, Increasing Returns and Firm Performance: An Empirical Study, Promotor:
Prof. dr. H.R. Commandeur
EPS2005067STR, ISBN: 9058920984, http://hdl.handle.net/1765
Hermans. J.M., ICT in Information Services, Use and deployment of the Dutch securities trade,
18601970. Promotor: Prof. dr. drs. F.H.A. Janszen
EPS2004046ORG, ISBN 9058920720, http://hdl.handle.net/1765/1793
Heugens, P.M.A.R., Strategic Issues Management: Implications for Corporate Performance,
Promotors: Prof. dr. eng. F.A.J. van den Bosch & Prof. dr. C.B.M. van Riel
EPS2001007STR, ISBN: 9058920097, http://hdl.handle.net/1765/358
Hooghiemstra, R., The Construction of Reality, Promotors: Prof. dr. L.G. van der Tas RA
& Prof. dr. A.Th.H. Pruyn
EPS2003025F&A, ISBN: 905892047X, http://hdl.handle.net/1765/871
Jansen, J.J.P., Ambidextrous Organizations, Promotors: Prof. dr. eng. F.A.J. Van den Bosch
& Prof. dr. H.W. Volberda
EPS2005055STR, ISBN 905892081X
Jong, C. de, Dealing with Derivatives: Studies on the Role, Informational Content and Pricing
of Financial Derivatives, Promotor: Prof. dr. C.G. Koedijk
EPS2003023F&A, ISBN: 9058920437, http://hdl.handle.net/1765/1043
Keizer, A.B., The Changing Logic of Japanese Employment Practices: A FirmLevel Analy
sis of Four Industries Promotors: Prof. dr. J.A. Stam & Prof. dr. J.P.M. Groenewegen
EPS2005057ORG, ISBN: 9058920879, http://hdl.handle.net/1765/6667
Kippers, J., Empirical Studies on Cash Payments, Promotor: Prof. dr. Ph.H.B.F. Franses
244
EPS2004043F&A, ISBN 9058920690, http://hdl.handle.net/1765/1520
Koppius, O.R., Information Architecture and Electronic Market Performance, Promotors: Prof.
dr. P.H.M. Vervest & Prof. dr. eng. H.W.G.M. van Heck
EPS2002013LIS, ISBN: 9058920232, http://hdl.handle.net/1765/921
Kotlarsky, J., Management of Globally Distributed ComponentBased Software Development
Projects, Promotor: Prof. dr. K. Kumar
EPS2005059LIS, ISBN: 9058920887, http://hdl.handle.net/1765/6772
Kuilman, J., The reemergence of foreign banks in Shanghai: An ecological analysis, Promotor:
Prof. dr. B. Krug
EPS2005066ORG, ISBN: 9058920968, http://hdl.handle.net/1765/6926
Langen, P.W. de, The Performance of Seaport Clusters: A Framework to Analyze Cluster
Performance and an Application to the Seaport Clusters of Durban, Rotterdam and the Lower
Mississippi, Promotors: Prof. dr. B. Nooteboom & Prof. drs. H.W.H. Welters
EPS2004034LIS, ISBN: 9058920569, http://hdl.handle.net/1765/1133
Le Anh, T., Intelligent Control of VehicleBased Internal Transport Systems, Promotors: Prof.
dr. M.B.M. de Koster & Prof. dr. eng. R. Dekker
EPS2005051LIS, ISBN 9058920798, http://hdl.handle.net/1765/6554
LeDuc, T., Design and control of eﬃcient order picking processes, Promotor: Prof. dr. M.B.M.
de Koster
EPS2005064LIS, ISBN 9058920941, http://hdl.handle.net/1765/6910
Liang, G., New Competition: Foreign Direct Investment And Industrial Development In China,
Promotor: Prof. dr. R.J.M. van Tulder
EPS2004047ORG, ISBN 9058920739, http://hdl.handle.net/1765/1795
Loef, J., Incongruity between Ads and Consumer Expectations of Advertising, Promotors: Prof.
dr. W.F. van Raaij & Prof. dr. G. Antonides
EPS2002017MKT, ISBN: 9058920283, http://hdl.handle.net/1765/869
Maeseneire, W., de, Essays on Firm Valuation and Value Appropriation, Promotor: Prof. dr.
J.T.J. Smit
EPS2005053F&A, ISBN 9058920828
Mandele, L.M., van der, Leadership and the Inﬂection Point: A Longitudinal Perspective, Pro
motors: Prof. dr. H.W. Volberda, Prof. dr. H.R. Commandeur
245
EPS2004042STR, ISBN 9058920674, http://hdl.handle.net/1765/1302
Meer, J.R. van der, Operational Control of Internal Transport, Promotors: Prof. dr. M.B.M.
de Koster & Prof. dr. eng. R. Dekker
EPS2000001LIS, ISBN: 9058920046, http://hdl.handle.net/1765/859
Miltenburg, P.R., Eﬀects of Modular Sourcing on Manufacturing Flexibility in the Automo
tive Industry: A Study among German OEMs, Promotors: Prof. dr. J. Paauwe & Prof. dr.
H.R. Commandeur
EPS2003030ORG, ISBN 9058920526, http://hdl.handle.net/1765/1039
Moerman, G.A., Empirical Asset Pricing and Banking in the Euro Area, Promotors: Prof.
dr. C.G. Koedijk
EPS2005058F&A, ISBN: 9058920909, http://hdl.handle.net/1765/6666
Mol, M.M., Outsourcing, Supplierrelations and Internationalisation: Global Source Strategy
as a Chinese Puzzle, Promotor: Prof. dr. R.J.M. van Tulder
EPS2001010ORG, ISBN: 9058920143, http://hdl.handle.net/1765/355
Mulder, A., Government Dilemmas in the Private Provision of Public Goods, Promotor: Prof.
dr. R.J.M. van Tulder
EPS2004045ORG, ISBN: 9058920712, http://hdl.handle.net/1765
Muller, A.R., The Rise of Regionalism: Core Company Strategies Under The Second Wave
of Integration, Promotor: Prof. dr. R.J.M. van Tulder
EPS2004038ORG, ISBN 9058920623, http://hdl.handle.net/1765/1272
Oosterhout, J. van, The Quest for Legitimacy: On Authority and Responsibility in Gover
nance, Promotors: Prof. dr. T. van Willigenburg & Prof.mr. H.R. van Gunsteren
EPS2002012ORG, ISBN: 9058920224, http://hdl.handle.net/1765/362
Pak, K., Revenue Management: New Features and Models, Promotor: Prof. dr. eng. R.
Dekker
EPS2005061LIS, ISBN: 9058920925
Peeters, L.W.P., Cyclic Railway Timetable Optimization, Promotors: Prof. dr. L.G. Kroon
& Prof. dr. eng. J.A.E.E. van Nunen
EPS2003022LIS, ISBN: 9058920429, http://hdl.handle.net/1765/429
Popova. V., Knowledge Discovery and Monotonicity, Promotor: Prof. dr. A. de Bruin
EPS2004037LIS, ISBN: 9058920615, http://hdl.handle.net/1765/1201
246
Pouchkarev, I., Performance Evaluation of Constrained Portfolios, Promotors: Prof. dr. J.
Spronk & Dr. W.G.P.M. Hallerbach
EPS2005052F&A, ISBN: 9058920836, http://hdl.handle.net/1765/6731
Puvanasvari Ratnasingam, P., Interorganizational Trust in Business to Business ECommerce,
Promotors: Prof. dr. K. Kumar & Prof. dr. H.G. van Dissel
EPS2001009LIS, ISBN: 9058920178, http://hdl.handle.net/1765/356
Romero Morales, D., Optimization Problems in Supply Chain Management, Promotors: Prof.
dr. eng. J.A.E.E. van Nunen & Dr. H.E. Romeijn
EPS2000003LIS, ISBN: 9090140786, http://hdl.handle.net/1765/865
Roodbergen , K.J., Layout and Routing Methods for Warehouses, Promotors: Prof. dr. M.B.M.
de Koster & Prof. dr. eng. J.A.E.E. van Nunen
EPS2001004LIS, ISBN: 9058920054, http://hdl.handle.net/1765/861
Schweizer, T.S., An Individual Psychology of NoveltySeeking, Creativity and Innovation, Pro
motor: Prof. dr. R.J.M. van Tulder
EPS2004048ORG, ISBN: 9058920771, http://hdl.handle.net/1765/1818
Six, F.E., Trust and Trouble: Building Interpersonal Trust Within Organizations, Promotors:
Prof. dr. B. Nooteboom & Prof. dr. A.M. Sorge
EPS2004040ORG, ISBN: 905892064X, http://hdl.handle.net/1765/1271
Slager, A.M.H., Banking across Borders, Promotors: Prof. dr. D.M.N. van Wensveen & Prof.
dr. R.J.M. van Tulder
EPS2004041ORG, ISBN: 9058920666, http://hdl.handle.net/1765/1301
Spekl´e, R.F., Beyond Generics: A closer look at Hybrid and Hierarchical Governance, Pro
motor: Prof. dr. M.A. van Hoepen RA
EPS2001008F&A, ISBN: 9058920119, http://hdl.handle.net/1765/357
Teunter, L.H., Analysis of Sales Promotion Eﬀects on Household Purchase Behavior, Promotors:
Prof. dr. eng. B. Wierenga & Prof. dr. T. Kloek
EPS2002016ORG, ISBN: 9058920291, http://hdl.handle.net/1765/868
Valck, K. de, Virtual Communities of Consumption: Networks of Consumer Knowledge and
Companionship, Promotors: Prof. dr. eng. G.H. van Bruggen, & Prof. dr. eng. B. Wierenga
EPS2005050MKT, ISBN: 905892078X, http://hdl.handle.net/1765/6663
Verheul, I., Is there a (fe)male approach? Understanding gender diﬀerences in entrepreneur
247
ship, Prof. dr. A.R. Thurik
EPS2005054ORG, ISBN: 9058920801, http://hdl.handle.net/1765/2005
Vis, I.F.A., Planning and Control Concepts for Material Handling Systems, Promotors: Prof.
dr. M.B.M. de Koster & Prof. dr. eng. R. Dekker
EPS2002014LIS, ISBN: 9058920216, http://hdl.handle.net/1765/866
Vliet, P. van, Downside Risk and Empirical Asset Pricing, Promotor: Prof. dr. G.T. Post
EPS2004049F&A ISBN: 9058920755, http://hdl.handle.net/1765/1819
Vromans, M.J.C.M., Reliability of Railway Systems, Promotors: Prof. dr. L.G. Kroon &
Prof. dr. eng. R. Dekker
EPS2005062LIS, ISBN: 9058920895, http://hdl.handle.net/1765/6773
Waal, T. de, Processing of Erroneous and Unsafe Data, Promotor: Prof. dr. eng. R. Dekker
EPS2003024LIS, ISBN: 9058920453, http://hdl.handle.net/1765/870
Wielemaker, M.W., Managing Initiatives: A Synthesis of the Conditioning and Knowledge
Creating View, Promotors: Prof. dr. H.W. Volberda & Prof. dr. C.W.F. BadenFuller
EPS200328STR, ISBN: 905892050X, http://hdl.handle.net/1765/1036
Wijk, R.A.J.L. van, Organizing Knowledge in Internal Networks: A Multilevel Study, Pro
motor: Prof. dr. eng. F.A.J. van den Bosch
EPS2003021STR, ISBN: 9058920399, http://hdl.handle.net/1765/347
Wolters, M.J.J., The Business of Modularity and the Modularity of Business, Promotors: Prof.
mr. dr. P.H.M. Vervest & Prof. dr. eng. H.W.G.M. van Heck
EPS2002011LIS, ISBN: 9058920208, http://hdl.handle.net/1765/920
248
Pricing models for Bermudanstyle interest rate
derivatives
Bermudanstyle interest rate derivatives are an important class of
options. Many banking and insurance products, such as mortgages,
cancellable bonds, and life insurance products, contain Bermudan
interest rate options associated with early redemption or cancella
tion of the contract. The abundance of these options makes evident
that their proper valuation and risk measurement are important to
banks and insurance companies. Risk measurement allows for off
setting market risk by hedging with underlying liquidly traded assets
and options. Pricing models must be arbitragefree, and consistent
with (calibrated to) prices of actively traded underlying options.
Model dynamics need be consistent with the observed dynamics of
the term structure of interest rates, e.g., correlation between inte
rest rates. Moreover, valuation algorithms need be efficient: Finan
cial decisions based on derivatives pricing calculations often need to
be made in seconds, rather than hours or days. In recent years, a
successful class of models has appeared in the literature known as
market models. This thesis extends the theory of market models, in
the following ways: (i) it introduces a new, efficient, and more
accurate approximate pricing technique, (ii) it presents two new and
fast algorithms for correlationcalibration, (iii) it develops new models
that enable efficient calibration for a whole new range of deriva
tives, such as fixedmaturity Bermudan swaptions, and (iv) it presents
novel empirical comparisons of the performance of existing calibra
tion techniques and models, in terms of reduction of risk.
ERIM
The Erasmus Research Institute of Management (ERIM) is the Research
School (Onderzoekschool) in the field of management of the
Erasmus University Rotterdam. The founding participants of ERIM
are RSM Erasmus University and the Erasmus School of Economics.
ERIM was founded in 1999 and is officially accredited by the Royal
Netherlands Academy of Arts and Sciences (KNAW). The research
undertaken by ERIM is focussed on the management of the firm in its
environment, its intra and interfirm relations, and its business
processes in their interdependent connections.
The objective of ERIM is to carry out first rate research in manage
ment, and to offer an advanced graduate program in Research in
Management. Within ERIM, over two hundred senior researchers and
Ph.D. candidates are active in the different research programs. From
a variety of academic backgrounds and expertises, the ERIM commu
nity is united in striving for excellence and working at the forefront
of creating new business knowledge.
www.erim.eur.nl ISBN 9058920992
RAOUL PI ETERSZ
Pricing models for
Bermudanstyle
interest rate derivatives
D
e
s
i
g
n
:
B
&
T
O
n
t
w
e
r
p
e
n
a
d
v
i
e
s
w
w
w
.
b

e
n

t
.
n
l
P
r
i
n
t
:
H
a
v
e
k
a
w
w
w
.
h
a
v
e
k
a
.
n
l
71
R
A
O
U
L
P
I
E
T
E
R
S
Z
P
r
i
c
i
n
g
m
o
d
e
l
s
f
o
r
B
e
r
m
u
d
a
n

s
t
y
l
e
i
n
t
e
r
e
s
t
r
a
t
e
d
e
r
i
v
a
t
i
v
e
s
Erim  05 omslag Pietersz 9/23/05 1:41 PM Pagina 1
Pricing Models for BermudanStyle Interest Rate Derivatives
Pricing Models for BermudanStyle Interest Rate Derivatives
Waarderingsmodellen voor Bermudastijl rente derivaten
Proefschrift
ter verkrijging van de graad van doctor aan de Erasmus Universiteit Rotterdam op gezag van de rector magniﬁcus Prof.dr. S.W.J. Lamberts en volgens besluit van het College voor Promoties.
De openbare verdediging zal plaatsvinden op donderdag 8 december 2005 om 16.00 uur door Raoul Pietersz geboren te Rotterdam
Promotiecommissie Promotoren: Prof.dr. A.A.J. Pelsser Prof.dr. A.C.F. Vorst Overige leden: Prof.dr. P.J.F. Groenen Prof.dr. F.C.J.M. de Jong Dr.ir. M.P.E. Martens
Erasmus Research Institute of Management (ERIM) Erasmus University Rotterdam Internet: http://www.erim.eur.nl ERIM Ph.D. Series Research in Management 71 ISBN 9058920992 c 2005, Raoul Pietersz All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the author.
Voor Beata. Karsten en Dani¨l e .
.
For all this. 5 and 6 were written in cooperation with Antoon. The research beneﬁtted greatly from his invaluable suggestions. I express my gratitude to the other members of the committee. for eﬃciently aiding me in administrative matters. . for coauthoring Chapter 3. Fourth. Sixth. and for parttime employing me at Product Development Group. Elli Hoek van Dijke. I am very grateful. During the Ph. He is the one who suggested me to start the Ph. Wilfred Mijnhardt for guiding me through the publication process. Quantitative Risk Analytics (at the time “Market Risk – Modelling and Product Analysis”). for teaching me many technical and exciting aspects of interest rate derivatives pricing. and. period has been excellent. Quantitative Analytics. Also. Thank you Antoon. I thank my promotor and former employer Ton Vorst. Farshid Jamshidian. and Ella Boniuk.D. Second. and who parttime employed me at ABN AMRO Bank. and. many thanks go to the other members of the small committee.Acknowledgements First and foremost. and from ECFR. ABN AMRO Bank.D.D. Tineke Kurtz. track with Antoon Pelsser. and he truly is an inspirator. Frank de Jong. Martin Martens for his help while I was coteaching one of his example classes and for suggesting me as a lecturer at the Rotterdam School of Management (RSM). I thank Igor Grubiˇi´ for coauthoring Chapter 4. from the Econometric Institute (EI). for the period September 2001 till June 2004. Patrick Groenen. Thierry Post. Mari¨lle Sonnenberg. from July 2004 onwards. special thanks go to Marcel van Regenmortel. Chapters 2. Lane Hughston. Third. I would like to thank my promotor Antoon Pelsser. period I have been able to present my work at leading international conferences. received from Erasmus Research Institute of Management (ERIM). Martin Martens. His guidance throughout the Ph. Tineke van der Vhee. Fifth. I am grateful to Patrick Groenen. Winfried Hallerbach for help with preparation of RSM lectures. I am very grateful for the ﬁnancial support that made this possible. sc I am much obliged to my colleagues at Erasmus University Rotterdam: Jaap Spronk for support through the Erasmus Center for Financial Research (ECFR). and. Chapters 5 and 7 were written in cooperation with Marcel. for e being a pleasant roommate.
Finally. Benjamin Schiessl´.viii ACKNOWLEDGEMENTS A special thank you to past and present managers at ABN AMRO Bank: Ton Vorst. Reinier Bosman. Many thanks to Frank de Jong and Joanne Kennedy for providing much appreciated feedback to my research proposal. Jelper Striet. and Glyn Baker. Drona Kandhai. I thank Beata for her love. Bram Warmenhoven. Bernt van Linder and Geert Ceuppens. Marcel van Regenmortel. support. Dick Boswinkel. and kindness. and Alex Zilber. Lukas Phaf. I am also grateful to a number of anonymous referees and to Riccardo Rebonato and Mark Joshi who provided valuable comments and suggestions to earlier versions of the papers that form the basis for this thesis. Andr´ e Roukema. and Willem van der Zwart. I would like to thank my parents for always being there for me. Martijn van der e Voort. Amsterdam . Thank you to past and present colleagues at ‘Product Analysis’: Steﬀen Lukas. and Thilo Roßberg and Russell Barker in London. A thank you to colleagues that have become friends: Dion Hautvast. Thank you to past and present colleagues at Product Development Group: Nicolas Carr´ in Amsterdam. Nam Kyoo Boots. Raoul Pietersz February 7th 2005. Rutger Pijls. I thank my son Karsten for bringing so much joy to my life. Frank Putman. Danny Wester. Alice Gee. Thanks to e members of CAL: Nancy Appels.
. . . . . . . . 1. . .3 Interest rate derivatives pricing models . and swaptions 1. . . . . . . . . . . . . . . . . . . . . . .2 Interest rate markets and options . . . . . . . . . . . . .3 Explanation . . . . . . . .2 Recalibration approach . . . . . . . . . . 1. . . . . . . . . . . . 1. . . .4 American option pricing with Monte Carlo simulation . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Alternative method for calculating swap vega . . . . . . . . . . . 2. . . . 2. . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . .2. . bonds. . .1 Use of models in practice . . . . . . . . 2. . . . . . . . . . .3 Markovfunctional models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . and swaps . . . . . .1 Short rate models . . . . . . . . . . . . . .A Appendix: Negative vega twostock Bermudan options . . . . . .3. 1. . . .1 Linear products: Deposits. ﬂoors. . . . . . . . . . . . . . . . . . . . . . .8 Conclusions . . . . .3. 2. . . . . . . .2. . . . . . . . . . . . . . vii xix xxiii 1 1 4 6 7 8 11 12 15 16 17 19 19 21 24 27 29 30 30 32 34 . . . . . . . . . . .6 Numerical results .1 Introduction . 1. . 2 Riskmanaging Bermudan swaptions in a LIBOR model 2. . . . .2 Interest rate options: Caps. . . 2. . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Comparison with the swap market model . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Swap vega and the swap market model . . . . . 2. . .1 Arbitragefree pricing . . . . . . 1. .2 Market models . . . .Contents Acknowledgements Notation Outline 1 Introduction 1. . . . . . . . . . .
. .4 Alternating projections without normal correction . .2. . .2. . . . . . . . . . . . . .3. . . . . . . . . . . . 3. 3. .1 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . .3 Geometric programming . . . .4 Majorization equipped with the power method . . . 4 Rank reduction of correlation matrices by geometric programming 4. . . . . . . . . . . . . . . . . . . . . 4. . .A Appendix: Proof of Equation (3. .2 Literature review .2. . . . . 3. . . .2 Nonconstant weights . . . . . . .5. . . . . . . . . . . 4. . .1 Introduction . . . . . . . . . . . . .3 A dense part of Mn. . .5 Lagrange multipliers . . . . .1 Modiﬁed PCA . . . . . . . . .3 Majorization . . . . . . . . . . . . . . . . . .5 Numerical results . . . . . . . . . . . . . . . . .6 Conclusions . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . 4. . . . . . . . 4. . . . . . . . . . . . . . . . . . . . 3. . .5. . . . . 3. . . . . . . . . 4. . .2.4. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . 3.3. . . . . . . . . . . .2 Topological structure . . . . . . .2. . . 3. 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . .2. . . . . . . . . . . .3 Geodesics . .1 Basic idea . 4. . . . 3. . . . . . . . . . . . . . . . . . . . . . . . .6 Hessian . . . . 4. . . . . .11) . . .x 3 Rank reduction of correlation matrices by majorization 3. . . . .3 The order eﬀect . 4. . . . . .d equipped with a diﬀerentiable structure . . . . . . .5. .2. . . . . .3. .4 The algorithm and convergence analysis . . . . . . . . . . 3. . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . .5. . . . . . . 3. 3. . . . .2 Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . .4 Parallel transport along a geodesic . . . . . . 4. . . . . . . .2. . .2. . .6 Parametrization . . . 4. . . . . . . . .3 Optimisation over the Cholesky manifold . 4. . . . . . . . . .1 Riemannian structure . CONTENTS 39 39 43 44 44 45 45 46 46 47 47 50 51 52 53 54 58 59 62 62 64 64 67 67 71 71 72 72 74 75 76 76 76 78 79 80 80 80 . . . . . . . . . . . . . . . 3. . . . . . . . . . . . .4 The Cholesky manifold . . . . . . . . . . . . .5 Using an estimate for the largest eigenvalue . . . . . . . . . . . . 4. . . . 3. .5. 4. 3. . . . . .2 Solution methodology with geometric optimisation .7 Alternating projections with normal correction (d = n) 3. . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Numerical comparison with other methods . . . . . . . . . . .5 Choice of representation . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . .2 Normal and tangent spaces . . . . . . .1 Weighted norms . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Local rate of convergence . . . . . . . . . . . . . . . . . . . . . .5 The gradient . . . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . . . 3. . . . . . . . . . . .2. .3. . . . .3.
7 Conclusions . . . . . . 101 5. . . . . . . . .A. . . . . . . . . . . . . . . . . .A. . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . .4 Brownian bridge discretization .7 Algorithms .2 Local rate of convergence . . . . .A.3 Separability . . .6. . . . . . . . . . . . . . . .4. . .5. . . . . . . .1 Euler discretization . . . . . . . . .3 Milstein discretization . . .4 Single time step method . . . .5 Valuation of interest rate derivatives with the single time step method103 5. . 4. . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . .2 Notation for BGM model . . . . . . . . . . . 107 . . . . . . . . . .A Appendix: Proofs . .5 The Brownian bridge scheme for single time steps . . . . . . . . . . . . . . . . .6 Numerical results . . . . . . . .A. . . . . . . . . . . . . . . . . . . . . 4. 4. . multipliers . . . 4. . . . . . . . . . .6. . . .6 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . .5 Proof of Theorem 4 . . . . . .5. . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . 99 5. . .4 Discretizations . . . . . . . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . 100 5. . . . . . . . . . . . . . . . . . . .1 Acknowledgement . . . . . . . . . .4. . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . .3 Single time step method for pricing on a grid . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Discussion of convergence properties . . . . . . xi 81 81 81 83 85 85 85 85 86 87 87 88 88 90 90 90 93 93 94 94 95 95 5 Fast driftapproximated pricing in the BGM model 97 5. . . . . .1 Theoretical result . .A. . . . . . . . . . . . .4. . .1 Justiﬁcation of the above assumptions . . . . 104 5. . . . . 105 5. . 100 5. . 104 5.5 A special case: Distance minimization .3 Formula for the diﬀerential of ϕ . . . .4. . . . . . . . . . . . . 4. . . . . . . 107 5. . . . . 101 5. . . . . . . . . . . . . . . . . . . . . . . . . .5. . . .5 Initial feasible point . . . .2 Notation . . . . . . .2 Predictorcorrector discretization . . .1 The case of d = n . . . . . . . . . . .CONTENTS 4. . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n = 3 . . . . . . . . . . . . . . . . . . . . . . 97 5. . . . . .3. . . 4. . . . . . . . . . . . . . 4. . . . . . . . . . .5. . . . . . . . . .4 Connection normal with Lagrange 4. . . . . . . . . . .2 Proof of Proposition 2 . .4. . 4. . . .A. . . . . . 4. . . . .3 Proof of Proposition 3 . . . . . . 103 5. 4.2 The case of d = 2. . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Proof of Theorem 3 . . . .1 Introduction . . . . . . . . . . . . 4. . . .7 Proof of Theorem 5 . . . . . . . . 4. . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Global convergence . .A. . . . .2 Numerical comparison . . 103 5. . . . . . . .1 Proof of Theorem 2 . . . .3. . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.
. . . 7. . . .5. . . . . . . . . . . . .1 Introduction . . . . . . 143 6. . . . . . . . . . . . . . . . . . . . . . . . . 133 6. . . . . . . . . . . . 5. . . . . .2. 150 6.3 Necessary and suﬃcient conditions for noarbitrage 7. . . .5. . . . . Appendix: Approximation of substituting the mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: Mean of geometric Brownian bridge . . . . .1 Main result . . . . . .2 Numerical results . . . . . . . . . . .C 6 A comparison of single factor Markovfunctional and multi factor market models 133 6. . . 147 6. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Driftapproximation accuracy test based on noarbitrage 5. . . . . . . . . . . . . . . . . . . . . . 158 7 Generic market models 7. . . . .8. . . . . . . .2. . . . . . . . . . . . . .A 5. . . . . . . .2 Preliminaries . . . . . . . . . . . . . . . . . . .2 Methodology . . . . . . . .xii 5. . .5. .6 The impact of smile . . . . . . . . . . . . . . . . . .1 Weak convergence of the Brownian bridge scheme .1 The LIBOR and swap market models . . . . .9. . . . . . . . . . . . . . . 5. . . . . . . . .2 ‘Large’ perturbation sizes versus constant exercise decision method with ‘small’ perturbation sizes . . . 7. . . . . . . . . . . . . . . . . . .6. . . . .B 5. 144 6. . . .2 LIBORinarrears case . . . . Example: Bermudan swaption . . . . 138 6. . . . . . . . . . . . . . . CONTENTS . . . .2. . . . . . . . . 5. . . . . . . . . Conclusions . . . . . . .4 Accuracy of the terminal correlation formula . .1 Twofactor model . . . . . . . . . . . . . .1 A simple numerical example . . . . . .9 5. . . . . .3 Estimating Greeks for callable products in market models . . . . . . . . 161 161 165 166 168 172 . . .7 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . . . . . . . . . . . . . 5. . . . .1 Absence of arbitrage . . . . . . . . . . . . . . . . . . 152 6. 152 6. . . . . . . . . . . . . . . . . . . . . . .10 5. . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 5. . . . . . . . .6. . . . . . . . . . . . .8 5. . . . .2. . 146 6.5 Empirical comparison results . .3 Deltavega hedge results . Example: onefactor driftapproximated BGM . . . . . The Brownian bridge scheme for multitime steps .2 The Markovfunctional model . . . . . . . . . . 5. . . . . .2 Numerical results for single time step test . . . . . . . . . . 139 6. . . . . . . . . .5. . . . . . . . . . .7. . . . . . . . . . . . . . . Appendix: MATLAB code for Brownian bridge scheme . . . . . . . . . . . 141 6. . . . . . . . . . . . . Test of accuracy of drift approximation .3. . . 151 6. . . . . .1 Delta hedging versus delta and vega hedging . . . . . . . 108 110 110 113 114 115 117 123 124 125 125 127 127 128 129 5. . . . . . . . . . . . . . . . . . . . .9. . . . .3 Data . . .
. . . . . . . . . . . . . . . . .4. . . . . . . xiii 175 175 177 179 179 185 186 186 187 189 193 195 207 . . . . . . . . . . . . . . . . .2 Spot measure . . . .CONTENTS 7. . . . . . . . . .5 Complexity . . . . .A Appendix: Rationale for Approximation 1 . . 7. . . . . . . . . . .3 An example: The LIBOR market model 7. . . . . . . 7. . . . . . . 7. . . . . . .7 Conclusions . . . . . . . . . . . . . . .1 Terminal measure . . . . . . . . . . . . . . 7. . . 7. . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Generic calibration to correlation . . . . .B Appendix: Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . .4. . . . . . . .4 Generic expressions for noarbitrage drift terms 7. . . . . . . . . . . . . . . . . . . . . 8 Conclusions Nederlandse samenvatting (Summary in Dutch) Bibliography Author index .
.
. . . . . . . Performance proﬁle for n = 30. . . . . . . . . .6 2. . .4 2. t = 0.6 4. . . . . . . . . . . . .2 4. . . . . . . . . . . . . . 5% strike. . . . . . . . . . . . . . . . . . .2 3. Performance proﬁle for n = 60. . xxiv 9 24 25 25 26 28 31 31 32 49 56 57 58 63 65 69 84 89 90 91 92 109 114 121 121 124 1. . equal weights. . . . . . . . .1 5. . . . . . d = 3. . Performance proﬁle for n = 10. . . . . . . . . . . . . t = 0. . . . . . . d = 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4. .4 5. Observed change in swap rate instantaneous variance. . t = 2s. . .6 5. . . . . . . .8 3.3 2. . .4 3.2 5. . . . . Exercise boundaries for the eightyear deal. d = 2.7 2. . . . . . . . . . LIBORinarrears test. t = 2s. Performance proﬁle for n = 20. . . . . . . . .5 2. .3 4. . Timing inconsistency in the single time step framework for BGM. . . . 2. . . . . . . . . . . . . . . t = 3s. .1 3. . . . . . . . . . . .1s. . . . . . . . . . . . . . . . . Convergence runs. . . The equality Py(∞) (y(k) − y(∞) ) = δ (k) 1 − (δ (k) )2 /4. . . . . . . . . . . . Natural increment of Black implied swaption volatility. . equal weights. . . . .000 simulation paths. . . . . . . . . Performance proﬁle for n = 80. Comparison of LMM and SMM for swap vega per bucket. .1 4. . . . . . . . . . . equal weights. .3 3. The idea of majorization. . . . d = 3. .3 5.000 simulation paths. . . . Risk sensitivities.05s.5 4. . . . Shell representing the set of 3 × 3 correlation matrices of rank 2 or less. . Empirical standard errors of vega for 10. . . . . . Recalibration THFRV vega results for 1 million simulation paths. . . . . . . Performance proﬁle for n = 50. . . . . . . . . . . . . t = 1s. . . . . . . . . . . . .1 Payoﬀs of caplets and ﬂoorlets versus realized LIBOR. . . . . Convergence run of the power method versus lambda=max(eig(B)). . . . . t = 1s. .000 simulation paths. . . . . . . .List of Figures 1 Outline of the thesis. Performance proﬁle for n = 15. . . nonequal weights. . . . . . Monte Carlo convergence. . . . .5 3. . . . . . . . Swap vega results for 10. . . . . . . d = 4. . . . . . . . . . . . . . . .2 2. . .1 2. . . . . d = 5. . . . . . . . . . . . Comparison of LMM and SMM for total swap vega against strike. . . . . . . d = 4. .5 Recalibration swap vega results for 10. . .
. . . . . . . . . . . .1 7. . .5 7. . . . . . . . .3 LIST OF FIGURES Setup for inconsistency test. . . .1 6. . . . . . . . 169 An overview of the forward swap agreements for various market models. . . . . . Comparison of delta versus delta and vega hedging. 183 . . . . . . . .2 6. 125 Fitted aparameter versus βparameter. . . . 170 Test results of exact versus approximate drift terms in CMS(q) models. . . . . . . Deltavega hedge results. . . . . . . . . .xvi 5. . . . . . . . . . . . . . . . .4 6. . .3 6. . . . . . .2 7. . .6 6. . . . . . . . . . . . . . . . . Bermudan swaption values per trade date. . . 141 149 150 151 153 Swaptions from swaption matrix to which various models are calibrated. . . . ‘Large’ versus ‘small’ perturbation sizes and constant exercise method. . . .
. . . . . . . . . . . . (2004). . Results of the Bermudan swaption comparison deal. . . . . . . . . . . . . .8 6. . . . .7 6. . . Quality of drift approximations: various scenarios. . . . . . . . . . . . . . . . . . . . . .5 6. . . . Error analysis of the terminal correlation. . .3 6. . .8 6. The order eﬀect. .4 5. . . . . . . . . . . . . . . .6 6. . . . . Simulation rerun using precomputed exercise boundaries. . . . . .1 6. . Swap vega per bucket test results for varying strikes.7 5. . . . . . . . . . . . .1 Some short rate models and their speciﬁcation of short rate dynamics. . Smile swaption volatility USD data of 21 February 2003. . A simple numerical example. . . . . Comparative results of the parametrization and majorization Results for the ratchet cap and trigger swap. . . . . . . . . . . . . . Fitted swaption volatility and ﬁt errors with the displaced diﬀusion model. . . . . . .9 6. . . . . . . The Bermudan swaption deal used in the comparison. . . . . . . . . . . . . . . . . . . . . . . . Discount factors for the USD data of 21 February 2003. . . . . . . . . . . Quality of drift approximations: volatility/meanreversion 15%/10%. . . . . . Results for negative vega per bucket for twostock Bermudan Excerpt of Table 3 in De Jong et al. . Speciﬁcation of the Bermudan swaption comparison deal. . . . . .2 2. . . . .2 5. . . . Benchmark results for the displaced diﬀusion Markovfunctional model. .1 3. . . . . . . . . . . . . . . .6 5. . . Prices of Bermudan swaptions in smile versus nonsmile models. . .3 3.3 5. . . . . Deal description. . . . . .4 5. . . . . . . . . . . . . . . . . . . .10 Market European swaption volatilities. . . . . . . .4 3. . . Statistical description of the swaption volatility data. . . . . . . . . The Bermudan swaption deal used in the test of impact of smile. . .2 6. . . 23 33 35 36 55 59 60 62 116 118 119 120 122 123 126 126 145 146 146 147 148 155 155 157 158 159 . . . . . . . . . . . . . . . . algorithms.4 6. . . . . . . Fitted displaced diﬀusion parameters. .2 3.1 2. . . . . . . . Twofactor model comparison. . . . . . . . . . . . . . option. . . . . . . . . 13 2. .1 5. . . . . . . . . . .3 2.5 5. . . Computational times for the Bermudan swaption comparison deal. . .List of Tables 1. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 LIST OF TABLES Example of a hybrid coupon swap payment structure for the ﬂoating side. 163 Deal description for test of exact versus approximate drift in CMS models. 184 .1 7.xviii 7. .
convergence criterium. N Normal distribution. R ν Vega: risk sensitivity with respect to Sn volatility. F Filtration. v A lower case bold denotes a vector. Expectation. Sn θ Drift in short rate models. caplet Ω (+1)/ﬂoorlet (−1) index.s) Performance ratio for algorithm s on test problem p. Q µ Drift. performance proﬁle. Ψ Error perentry in correlation matrix: Ψ = YYT − P. φ Cumulative normal distribution function. π Realized numeraire relative payoﬀ. Time discretization: τ1 < · · · < τm . η Pay (+1)/receive (−1) ﬁxed index. ω ∈ Ω.Notation s A lower case italic denotes a scalar. . (p. τi Discretization time point. ω Outcome of probability space. ϕ Objective function for rank reduction of correlation matrices. Arbitragefree pricing measure. Probability space. Set of real n × n symmetric matrices. Sphere in Rn+1 . M An upper case bold denotes a matrix. matrix form: P. σ Volatility. ∆ Tangent vector for geometric programming. Γ Lagrange multipliers.4). ε Perturbation size. α Day count fraction. Set of real numbers. ρ Target correlation. P Λ Diagonal eigenvalues matrix. χ Auxiliary majorization function. δ Delta: risk sensitivity with respect to underlying rate or asset price. correlation parameters. E λ Largest eigenvalue of B. β Correlation parameter. Realworld probability measure. γ Weights. see (2.
C Correlation matrix.g. k Strike rate. e End index of a forward rate. p PVBP. s:e NOTATION r Short rate. i (Subscript i) Associated with the period [ti . associated with ti (e. asset price. fs:e . S Swap market model.1).g.3). m Number of discretization time points. (Superscript (i + 1)) Associated with the forward measure for which bi+1 is the numeraire. I Identity matrix. bi ). Tenor structure: 0 = t0 < · · · < tn . xNCy x noncall y option. (Subscript s : e) Associated with the period [ts . o Option price. fi . ti Tenor time. z Normally distributed random variable. σs:e ). (i+1) d Inﬁnitesimal diﬀerential.01%. b Discount bond price. σi . with P a correlation matrix. years. start index of a forward rate. f Forward rate. d2 in Blacktype formulas. underlying with a maturity of x years from today but callable only after y P Target correlation matrix. numeraire value.xx O Order. weight coeﬃcient. αi ). i(t) Spot LIBOR index at time t. QQT = I. bp Basis point.g. . v Value of a security. the usual d1 . present value of a basis point. 0. asset or derivative. te ] (e. see (2.. with I the identity matrix. instantaneous continuously compounded interest rate. bi Price of a discount bond maturing at time ti . P&L Proﬁt and loss. w Brownian motion. B Helper matrix for majorization. t Time. d Number of stochastic factors. s Swap rate. a Model parameter. ti+1 ] (e. exercisable on an see (2. Q Orthogonal matrix. Y Decomposition matrix: YYT = P. P = (ρij )ij . c Model parameter. model parameter. · Scalar product of vectors.. n Number of forward rates.
Vector length: Square root of sum of squares of vector entries. quadratic crossvariation. · F xxi Frobenius norm. for matrices Y. T (Superscript T ) Matrix transpose. Cov Covariance.NOTATION ·. · Scalar product of vectors. · Quadratic variation. . . Var Variance. Y 2 F := tr(YYT ) Much smaller than.
.
we provide an outline of the thesis. The problem is the socalled rank reduction of correlation matrices. over lowrank correlation matrices attainable by the model. in turn. an objective value. The Risk article has been republished as part of a Risk book. The combined results are important and valuable to ﬁnancial institutions that need to select a calibration method for market models. The results however also show that socalled constant volatility leads to eﬃcient and stable estimates of risk sensitivities. lead to ﬂuctuations in a hedge portfolio that are spurious and have no economic meaning. The results show that care should be taken when selecting a calibration method: Certain choices. which is the error with the original given correlation matrix.. socalled timehomogeneous volatility. see Pietersz & Pelsser (2004b). Readers that are nonexperts in the ﬁeld of interest rate derivatives pricing could skip the outline at ﬁrst reading and return here after reading the introductory Chapter 1. which are known to be diﬃcult to solve: The problem is to minimize. e. and occurs as a key part of calibrating multifactor market models to correlation.g. We present two elegant solution algorithms. see Pietersz & Pelsser (2005b).Outline The purpose of this thesis is to further knowledge of eﬃcient valuation and risk management of interest rate derivatives (mainly of Bermudanstyle but other types are also included) by extending the theory on market models. may lead to noneﬃcient estimates of risk sensitivities. Chapter 2 investigates various popular calibration choices for the LIBOR market model and their eﬀect on the quality of risk sensitivities of Bermudan swaptions. rank reductions of correlation matrices are nonconvex optimization problems. Mathematically formulated. see Pietersz & Pelsser (2004a). the algorithms of Chapters 3 and 4 outperform existing algorithms. A schematic outline of the thesis is given in Figure 1. Chapters 3 and 4 solve the same problem in two completely diﬀerent ways. Poor and unstable estimates of risk. and the risk associated with the derivative is not adequately reduced. Here. The beneﬁt over existing algorithms is the enhanced eﬃciency: In terms of computational speed. An extended abstract of the chapter has been published in Risk Magazine. . Chapter 2 has been published in the Journal of Derivatives. with the aim to risk manage Bermudan swaptions and other interest rate derivatives. in the numerical tests we considered.
xxiv OUTLINE Ch. Here. 7: Development of a new model Ch. The new model denotes CMS and generic market models. . respectively. efficiency. 5 New calculation method Performance in practice: Reduction of risk.s 3&4 Model A Calibration method A Calibration method B New calibration method Calculation method A Model B New model Ch. Performance in practice Ch. models A and B denote market models and Markovfunctional models. 2: Comparison: performance of calibration methods Ch. etc. 6: Comparison: hedge performance of models Figure 1: Outline of the thesis.
in terms of computational speed. see Pietersz & Groenen (2004b). see Pietersz & Groenen (2004a). we are able to bring the gradient and Hessian to natural forms. enabling an eﬃcient implementation. By carefully selecting the manifold. from any starting point) to a local minimum. J¨ckel & Joshi (2001). instead of Monte Carlo.OUTLINE xxv Chapter 3 presents a solution for rank reduction of correlation matrices. see Grubiˇi´ & Pietersz sc (2005). As a multi step disa cretization. we show that a single time step discretization combined with a separability assumption on the volatility. we show that Brownian bridge converges weakly with order one.e. The resulting algorithm is globally convergent (i. An extended abstract of the chapter has been published in Risk Magazine. The multi step convergence is illustrated by numerical tests. and makes the problem of rank reduction of correlation matrices all the more interesting. allows for an even more eﬃcient implementation via pricing on a grid or on a recombining lattice.. Chapter 4 develops a solution for rank reduction of correlation matrices based on geometric programming. which is a general technique from optimization. The algorithm is shown to be straightforward to implement. The majorization algorithm is extremely eﬃcient. The beneﬁt of Brownian bridge is its accuracy when single or large time steps are used. which is optimization over curved space (manifolds). Additionally. This feature is very rare for nonconvex optimisation problems. we develop a novel method to immediately check whether a stationary point is a global minimum. by extending the Lagrange multiplier results of Zhang & Wu (2003) and Wu (2003). we show that it is leastsquares optimal to use Brownian bridge (in a to be deﬁned natural sense). based on majorization. For the working paper version. the Brownian bridge discretization. Chapter 5 introduces a new discretization for the LIBOR market model. Chapter 4 has been submitted. This is also conﬁrmed in the numerical LIBORinarrears test extended from Hunter. Extensive numerical tests show that geometric programming compares favourably with other existing algorithms. where m denotes the dimension of the manifold. The manifoldequivalents of Newton and conjugate gradient optimization algorithms are presented for the problem of rank reduction of correlation matrices. For single time steps. . Discretizations are required for implementation of a pricing algorithm. which makes its use accessible to nonexperts. The geometric curved algorithms enjoy the same superlinear local convergence properties as their Euclidean ﬂat counterparts: Quadratic convergence for Newton and msteps quadratic convergence for conjugate gradient. We perform the task of showing that majorization can be applied to rank reduction of correlation matrices. Finally. Chapter 3 has been published in Quantitative Finance. because of its low cost per iterate.
An extended abstract has been published in Wilmott Magazine. such as short rate models. of continuous trading in the underlying asset. see. This algorithm is required for market models. Pelsser & van Regenmortel (2005). Glasserman (2004. Section 7. correlation pricing eﬀects can be captured in both model types. termed constant exercise decision method. With our proposed modiﬁcation. a far greater reduction of variance of P&L is attained than with the original algorithm. we propose a novel adjustment of the Longstaﬀ & Schwartz (2001) algorithm. the algorithm of Longstaﬀ & Schwartz (2001) for estimating the optimal exercise decision of American options in Monte Carlo is investigated. but not for the Markovfunctional model. Second. The market models are representative of multifactor interest rate pricing models. The hedge tests show no signiﬁcant diﬀerences in terms of reduction of variance of P&L. the eﬀect of the number of stochastic factors and correlation speciﬁcation is investigated. which renders convergence of ﬁnite diﬀerence estimates of risk sensitivities to be slower. Vega hedging is the oﬀsetting of volatility risk by trading in underlying options. The proﬁt and loss (P&L) of hedge portfolios of Bermudan swaptions are recorded for USD swap rates and swaptions data over a one year period. Chapter 6 presents novel empirical comparisons on the performance of models in terms of reduction of risk. The Markovfunctional model is representative of singlefactor models. number of factors and correlation speciﬁcation. Our proposal thus enables market models to function properly as risk management tools of callable derivatives. the eﬀect of smile on pricing is investigated. We show that delta and vega hedging signiﬁcantly outperforms delta hedging. The reduction is comparable to the reduction in the Markovfunctional model. Kennedy & Pelsser (2000). Moreover.xxvi OUTLINE Chapter 5 has been published in the Journal of Computational Finance. widely applied by traders and practitioners. Pelsser & van Regenmortel (2004). the impact of smile is similar in both market models and Markovfunctional models. for example. see Pietersz. and the Markovfunctional model of Hunt. The results show that the impact of smile can be much larger than the impact of correlation. We show that the algorithm contains a discontinuity. We compare LIBOR and swap market models. and it is considered a ﬁnancial engineering trick. Volatility smile is the phenomenon that diﬀerent Black (1976) implied volatilities are quoted for diﬀerent strikes of otherwise equal options. Also. Vega hedging is not based on a replication argument.1). delta hedging is compared to delta and vega hedging. The hedge tests show that the less eﬀective estimation of risk sensitivities adversely aﬀects reduction of variance of P&L. Finally. Both market models and Markovfunctional models can be calibrated to relevant interest rate correlations. The three main conclusions of the hedge tests are quite remarkable: First. Third. in terms of reduction of variance of P&L. Delta hedging is theoretically justiﬁed by the replication argument of Black & Scholes (1973) and Merton (1973). across: models. . see Pietersz. Therefore.
for both terminal and spot measures. Chapter 7 has been submitted. calculates forward rates over time steps for CMS market models. For a working paper version. which allow for ease of volatility calibration for a whole new range of derivatives. In Chapter 7. such as ﬁxedmaturity Bermudan swaptions and Bermudan CMS swaptions. CMS and generic drift terms for the forward rates are derived. by use of matrix notation. see Pietersz & van Regenmortel (2005). For a working paper version. respectively. CMS and generic market models allow for a choice of forward rates other than the classical LIBOR and swap rates in the LIBOR and swap market models. A fast algorithm is presented that approximately. . new CMS and generic market models are developed. We present a theoretical result with necessary and suﬃcient conditions for an arbitrary structure of forward rates to be arbitragefree at all possible states of the model. but accurately. see Pietersz & Pelsser (2005a).OUTLINE xxvii Chapter 6 has been submitted.
.
driving up the arbitrage price. American option pricing with Monte Carlo simulation is discussed. If an arbitrage opportunity occurs. this causes the arbitrage to . then many investors buy the arbitrage opportunity. The relative valuation of an asset (most often a derivative) is in terms of other asset prices. we brieﬂy describe interest rate markets and options. An arbitrage is an opportunity to make a riskless proﬁt with positive probability. 1.1 Arbitragefree pricing In this thesis. Fourth.Chapter 1 Introduction Bermudanstyle interest rate derivatives are an important class of options. Pricing models are thus viewed as an ‘extrapolation tool’ that aim to extrapolate derivative prices from underlying assets. and life insurance products. with no costs at time of execution. The purpose of this thesis is to further knowledge of eﬃcient valuation and risk management of Bermudanstyle interest rate derivatives. Risk measurement allows for oﬀsetting market risk by hedging with underlying liquidly traded assets and options. The outline of this chapter is as follows. we provide a historic background and comprehensive framework for the chapters that are to follow. we provide an overview of interest rate derivatives pricing models relevant to the thesis. cancellable bonds. pricing models produce relative valuations. The abundance of these options makes evident that their proper valuation and risk measurement are important to banks and insurance companies. First. Key to relative valuation models is the exclusion of arbitrage. we introduce the use of models for arbitragefree pricing. Second. such as mortgages. Third. Eventually. contain Bermudan interest rate options associated with early redemption or cancellation of the contract. Many banking and insurance products. In this chapter.
In eﬀect. for large market participants. We call a model complete if all the derivatives that we added to the model are attainable. and also because of analytical tractability. However. since it is merely a particular dynamic portfolio of the base assets. on which is deﬁned a Fadapted Brownian motion w. for example. transaction costs are suﬃciently low relative to transaction sizes. arbitrageurs cannot exploit all theoretical arbitrage opportunities. A derivative that we added to the model is said to be attainable if its payoﬀ can be exactly replicated by a dynamically managed selfﬁnancing portfolio of the base assets. We then consider. s(t) . then it follows from the L´vyKhinchin theorem that the diﬀusion term is a e (time. P denotes the realworld measure. foremost because continuous models provide a more than suﬃciently accurate description of reality.. consist of two parts: An average part (drift) and a random part (diﬀusion). F) with ﬁltration F = (F(t))t≥0 . derivatives whose values derive from these base assets. This thesis therefore considers only continuous diﬀusion models. In the presence of transaction costs.2 CHAPTER 1. In a complete model. If it is then assumed that asset returns over disjoint time periods are independent. any added derivative is eﬀectively redundant (in this theoretical world). ω)dw(t). This assumption obviously does not reﬂect reality. Model dynamics of returns on asset prices. We consider a model with a ﬁltered probability space (Ω.g. A selfﬁnancing portfolio is a portfolio without injection or withdrawal of funds. The use of continuous diﬀusion models is widespread. The diﬀusion part can be modelled as either continuous (e. Brownian motion) or discontinuous (e. When we construct a relative valuation model. Next to absence of arbitrage. then we usually do so by specifying dynamics of certain base asset prices in a frictionless arbitragefree market. A realization in Ω is denoted by ω.and statedependent) volatility coeﬃcient times a Brownian motion. its Fadapted drift by µ and its Fadapted volatility by σ. the assumption of zero transaction costs is quite accurate for the market as a whole. Moreover. The asset price is denoted by s.. jumps as in Merton (1976)). in the realworld measure. Arbitrage opportunities are therefore not likely to occur in an eﬃcient and competitive economy. INTRODUCTION disappear. the assumption of absence of transaction costs leads to a theory that is still suﬃciently accurate. In a complete model. P. any derivative price is already known when the current underlying prices and their dynamics are known: The current derivative price is simply equal to the current value of a replicating portfolio. The added derivatives are said to be spanned by the underlying assets. We summarize these concepts in terms of stochastic diﬀerential equations (SDEs).g. We have: ds(t) = µ(t. ω)dt + σ(t. since transaction costs make some of these no longer proﬁtable. but much more tractable. Here. we also assume absence of transaction costs: there is no diﬀerence in price when buying or selling an asset.
6. A suﬃcient condition for noarbitrage is equality of market prices of risk for all assets. ARBITRAGEFREE PRICING 3 Also. ω): db(t) = r(t.1. respectively.41)) that the model is complete.. It can be shown (e. see. Baxter & Rennie (1996.4)) to construct the riskneutral measure. A martingale is a process with zero drift. Equation (19. we may apply Girsanov’s theorem (e. market prices of risk then turn out to disappear. b(t) The market price of risk (or Sharpe ratio. Theorem o 10.. then it can be shown (e. e. then we obtain the famous formula of Black & Scholes (1973). it is (drift[risk free rate])/volatility and (µ − r)/σ. The martingale property of numerairerelative asset prices implies that their future expectations take today’s value. Theorem 7.1. If s and n denote prices of an asset and a numeraire. ω)dt..6)).e.18). n(0) n(t) where the expectation is with respect to the riskneutral measure. then s(0) s(t) =E . If we calculate (1.32)) that the assumption of existence of such a measure automatically implies that the model is arbitragefree. The price v of a derivative (also an asset). Hull (2000. Hunt & Kennedy (2000.g. . then satisﬁes v(0) = n(0)E v(t) . Øksendal (1998.1) which is the fundamental arbitragefree pricing formula. e. for t ≥ 0. i.g. Moreover. Theorem 8. Asset prices may be denominated in terms of amounts of the numeraire. Hunt & Kennedy (2000. see. Bj¨rk (2004. for t ≥ 0.g. Suppose we can construct a new measure (the socalled risk neutral measure) such that all numeraireexpressed asset prices become martingales. We introduce some key concepts of arbitragefree pricing. necessarily sharing the same market price of risk.. Under the risk neutral measure. page 119)... see Sharpe (1964)) is deﬁned to be the excess average return over the risk free rate divided by the volatility of the asset. n(t) (1.g. e.g.. if there exists a single unique risk neutral measure. Theorem 7. we assume the existence of a money market account with price b and return r(t. every derivative security is attainable by a replicating portfolio in the underlying assets. Given the assumption of equality of all market prices of risk. The actual levels of market prices of risk then turn out not to matter for valuation. We note that the market price of risk does not occur in this formula. In other words.g. Therefore these do not aﬀect arbitragefree pricing.1) for a call option on a stock that is modelled as geometric Brownian motion. A numeraire is an asset with a strictly positive value at all times.
3) In other words.58) and (5.1. we directly model under a riskneutral measure. the diﬀusion part. and it is an example of a risk sensitivity: the risk sensitivity with respect to the underlying asset price. The amount δs can be calculated from (1. Joshi (2003a. 1.. ∂s (1. we can show that δs = ∂v . traders quote the diﬀusion part: socalled implied volatility. angle brackets · denote quadratic variation. Theorem 3. for example. which has a value equal to the payoﬀ of derivative v. Øksendal (1998. value changes due to trading in the asset s and riskfree asset b cancel exactly: sdδs + bdδb = 0.2).4) in the form of (1.59))): dv = δs ds + δb db. Noarbitrage requirements then fully ﬁx the drift term. and from Itˆ’s formula: If s is a stochastic o process. Suppose amounts δs of s and δb of b are held.3). (1. A European option is an option that is exercisable only at a single point in time (usually during a single trading day).3).3). then the value v is given by: v = sδs + bδb .3. The reason that the derivative price is fully ﬁxed by (1.g. the hedge is not rebalanced on a continuous basis. Equations (5.2) The change in value v of a selfﬁnancing portfolio thus satisﬁes (e. By rewriting (1.4) see.17). ∂s 2 ∂s2 (1. The replicating portfolio holds a dynamic amount of δs of the underlying asset s. Karatzas & Shreve (1991. Exercise 2.5) The quantity δs is called the delta. rather at . INTRODUCTION In practice.1 Use of models in practice A dynamic hedge of the derivative v consists of taking an opposite position in the replicating portfolio.g. It is the volatility to be used in the BlackScholes formula to obtain the European option price. in all possible future states of the economy. for European call and put options. e. In fact.1) is replication by a selfﬁnancing portfolio. and v : R → R is a function.4 CHAPTER 1. (1. In practice. (1. Here. then for the process v(s) we have: dv = ∂v 1 ∂2v ds + ds. by specifying the part that stems from the realworld measure. see. A replicating portfolio is a dynamically managed selfﬁnancing portfolio of underlying assets s and b.
we should thus use at least two factors. In that case.1. which may render the derivative equivalent to (or almost equivalent to) a market traded option. The important reason is again that of noarbitrage. volatility) of the model. Rebalancing usually takes place when delta risk exceeds a certain threshold level. Implied . The diﬀerence between the use of a model as a hedging tool or economic model has important implications. which should be avoided. While we use riskneutral pricing models as relative valuation and hedging tools. 3.e.. For an economic model. and to reproduce prices of underlying assets and options exactly. rather than timeseries estimates from historic data on asset returns. a pricing model is recalibrated to the most recent implied volatility data. for a model used as hedging tool. Extensive empirical research has shown that the term structure is driven by more than one stochastic factors. it is interesting to note that these models also make assertions on realworld price dynamics. see the review article of Dai & Singleton (2003). Riskneutral models can thus also be viewed as economic models. i. Second. option price data (implied volatility) determines the diﬀusion part (or. as long as it satisﬁes the above three properties. Though modelling of realworld price dynamics is a vital aspect of riskneutral models. The process of making the model consistent with market prices is called calibration. and a singlefactor model is simply not acceptable. and is the same as above for the use of implied volatility in models. equivalently. it is not the most important aspect. consider modelling the term structure of interest rates. within a limited amount of computational time. The adaption from continuoustime to discretetime hedging works extremely well in practice. The necessity that pricing models need reproduce prices of underlying assets and options has two further implications: First.e. it is perfectly sensible to consider a singlefactor model. and forms the basis for the success of arbitragefree pricing models. ARBITRAGEFREE PRICING 5 discrete points in time.1. For example. through the connection with the realworld measure. whenever the derivative needs be valued. to use implied volatility for the diﬀusion term in the model. However. The only way to avoid arbitrage is to make the model consistent with prices of underlying options. 2.. attempting to model economic reality. To adequately reduce variance of proﬁt and loss (P&L). To eﬃciently produce prices. More important to riskneutral models are: 1. when a hedge is set up. i. the model should produce a derivative price equal to the market traded option price. otherwise the ﬁnancial institution holding the derivative can incur arbitrage. To produce sensible prices of derivatives. The reason is straightforward: Certain features of a derivative may become redundant during the life of the derivative.
while not having to fully model all parts of the market. derivative traders face volatility risk (vega). since we hedge parameters that are input to the model. Another practice for arbitragefree pricing models is that of customizing a model to a certain product. (1. due to the practice of recalibration. relevant to the product. over an agreed period of time. The beneﬁt is that the product is priced correctly. As a result. We refer to the length of an interest rate agreement as its tenor. An interest rate agreement is an agreement to borrow or lend money. From an arbitragefree pricing perspective though. then a derivative value may change too. then the following portfolio has zero volatility risk (for small changes in σ): ∂v/∂σ one derivative (v) and − options (o). participants trade primarily in interest rate agreements. vega hedging is not inconsistent with arbitragefree pricing models: We are only holding a diﬀerent portfolio of derivatives and options that needs to be delta hedged. . The practice of recalibration to unpredictably changing volatility is not consistent with most pricing models (excluding stochastic volatility models). The risk may be oﬀset by vega hedging. Diﬀerent tenors may attract diﬀerent interest rates. In practice however. since the underlying asset price is a state variable of the model. which gives rise to the socalled term structure of interest rates. and it thus contributes to wealth preservation. are incorporated into the model. We construct the model in such way that all parts of economic reality.6) ∂o/∂σ A delta hedge with the underlying asset as in (1. Delta hedging with the underlying asset in (1. and o the price of an underlying option.5) can then be applied to the veganeutral portfolio in (1.5) is inmodel hedging. thereby often attaining a more eﬃcient implementation.2 Interest rate markets and options In interest rate markets. since most models assume volatility to be known over the model time horizon. vega hedging enables a signiﬁcant additional reduction of variance of P&L. against agreed periodical payments (interest rate payments) that are in some form denoted as a percentage (interest rate) of the underlying borrowed or lent amount (notional amount). Vega hedging is outofmodel hedging. INTRODUCTION volatility quotes change over time. 1.6 CHAPTER 1. there is simply no need to add the additional options (though such addition is allowed): the original derivative is already perfectly deltareplicable in the theoretical model world. If σ denotes the volatility.6). When implied volatility changes. Nonetheless.
and swaps. then we call such a bond a zero coupon bond. Rolling over money market deposits in such way. forward interest rate agreements. In fact. Eﬀectively.1 Linear products: Deposits. the other party returns the notional and makes the agreed interest rate payments. Interest rate swaps involve only exchanges of interest rate payments. one party pays this agreed interest rate (the ﬁxed rate). Swap rates can be seen as long term borrowing and lending rates. while the other party pays a ﬂoating interest rate. A bond is an agreement between two parties. 1. The ﬁxed rate at which market participants can enter into a swap agreement at otherwise zero cost is called the swap rate. At the end of the agreement (at maturity). and swaps are of particular relevance to this thesis. The two parties agree on an interest rate and one party deposits the notional amount. At initiation. bonds. and swaps Money market deposits usually have a maturity of one year or less. During the life of the bond. Money market deposits. the borrower receives a prenegotiated amount for the bond (not necessarily equal to the notional amount). If there are no coupon payments during the life of the bond. Interest rate swaps typically have a maturity of two years or more.2. semiannually. and monthly. We remark that the frequency of ﬁxed and ﬂoating payments may diﬀer. the borrower makes coupon payments on the notional amount. usually on the basis of a ﬁxed contractually agreed rate. and we enter into a swap in which we pay ﬁxed. the swap rate for a swap with a particular tenor is more or less the interest rate for that particular tenor. At the end of each deposit. bonds. the ﬁxed swap payments remain. but the coupon payments could also be based on a ﬂoating interest rate. bonds. but normally do not involve exchanges of notional. the borrower returns the notional amount to the lender. Typical frequencies are annually.2. At maturity of a bond. on a designated notional amount. The two parties agree on an interest rate. Periodically. in order to pay back the notional from the previous deposit agreement. therefore we explain their workings in some detail. . the borrower and the lender. We discuss the method for determining this ﬂoating interest rate below. we mean the prevailing market interest rate for the tenor spacing between the ﬂoating interest rate payments. INTEREST RATE MARKETS AND OPTIONS 7 The above description of an interest rate agreement includes money market deposits. the resulting deposit interest payments cancel against the ﬂoating interest we receive from the swap. quarterly. By a ﬂoating interest rate.1. The reason is that a swap can be used to create a synthetic borrowing or lending agreement at a single interest rate over the tenor period of the swap: Suppose we borrow money through deposits with ﬂoating interest rates. we borrow the same amount again in the deposit market. we then pay a ﬁxed interest rate on our loan over the life of the swap. on a notional equal to the amount borrowed from the deposit.
8 CHAPTER 1. pages 269–270). for various tenors and currencies. each trading day at noon (12. and swaptions The plainvanilla European interest rate options most relevant to this thesis are (i) caps and ﬂoors. such as interest rate derivatives. An interesting note is that the ﬁrst major swap took place only in 1981. Financial authorities determine reference rates normally along the following lines: A number of panel banks are consulted. each trading day at around 11. If LIBOR ﬁxes above (below) the respective strike rate. EURIBOR: Euro interbank oﬀered rate. likewise. and. published by the British Bankers’ Association (BBA). These reference rates are published for several tenors and currencies. between IBM and the World Bank. (ii) swaptions. but not the obligation. INTRODUCTION The two parties in a swap determine the ﬂoating interest rate usually via a reference interest rate. For each tenor and currency. A reference interest rate is a rate that is set by a ﬁnancial authority or calculation agent. Each panel bank provides rates at which it conceives it possible to borrow money in the interbank market. see Valdez (1997. some percentile of the top and bottom of the quotes are discarded.2 Interest rate options: Caps.1. published by the European Banking Federation (FBE) and by the Financial Markets Association (ACI).3. a ﬂoorlet) gives its holder at expiry the right.00am central European time. on LIBOR rates. Examples are: LIBOR: London interbank oﬀered rate. whereby it is sensible not to exercise the caplet (ﬂoorlet) and it ends worthless. Upon publication of the reference rate. since we then receive the positive diﬀerence LIBOR minus strike (strike minus LIBOR) at the deposit payment date. It is interesting to note there is now an interest rate derivatives pricing model that bears the name of a reference rate: the LIBOR market model. A caplet (respectively. 1.00am) London time. Swaptions are options on swap rates. respectively. then it is cheaper to borrow (more rewarding to lend) in the market. The option gains at the deposit payment date as dependent on realized LIBOR are displayed in Figure 1. see Section 1.2. practitioners say that the rate then ﬁxes. then it is sensible to exercise the caplet (ﬂoorlet). If LIBOR ﬁxes below (above) the respective strike rate. then we say that he or she exercises the option. and. Reference rates are used to calculate payments not only of swaps. The remaining quotes are averaged to form the reference rate for that tenor and currency. to enter into a borrowing deposit (lending deposit) at a prearranged strike rate. ﬂoors. A cap consists of a sequence of consecutive caplets. . If an option holder claims the option right. but also of other securities. Caplets and ﬂoorlets are call and put options. a ﬂoor consists of a sequence of consecutive ﬂoorlets.
1: Payoﬀs of caplets and ﬂoorlets versus realized LIBOR.1. INTEREST RATE MARKETS AND OPTIONS 9 Caplet Floorlet Strike rate LIBOR→ Strike rate LIBOR→ Figure 1.2. .
but not the obligation. Moreover. therefore we require a positive premium for the option. published for various tenors and currencies by the International Swaps and Derivatives Association (ISDA). Market participants invariably indicate the direction of swap cash ﬂows from the point of view of whether ﬁxed payments are payed or received. to enter into a swap with a ﬁxed rate equal to a prearranged strike rate. Let αi denote the day count fraction for the period [ti . The timet price of a discount bond with maturity ti is denoted by bi (t). we ﬁnd that an option always provides a nonnegative cash ﬂow. To calculate cap and ﬂoor or swaption premiums. see Rebonato (2004a. again a reference rate is used. The payoﬀ structures for payer and receiver swaptions are similar to those of caplets and ﬂoorlets in Figure 1. in that they pay the relevant diﬀerence between realized rate and strike.1: instead of ‘LIBOR’ read ‘swap rate’. To present the Black approach for caps and ﬂoors. we introduce some terminology: We set out by specifying a tenor structure. (1. via consultation of a group of panel banks. This approach however lacked a theoretical justiﬁcation for long. For a cash settled swaption at expiry. A discount bond is a hypothetical security that pays one unit of currency at its maturity. a receiver) swaption corresponds to a call option (put option). market participants initially used a Blacktype formula that is based on assuming a lognormal distribution for the LIBOR or swap rate at expiry. with a positive probability to provide a positive cash ﬂow. the approach is similar for swaptions. both parties need to agree on ‘the’ swap rate manifest in the market.10 CHAPTER 1. INTRODUCTION A European swaption gives its holder at expiry the right. We present the Black approach for caps and ﬂoors. if we hold a swaption that gives us the right to enter into a swap for which we pay or receive ﬁxed. ti+1 ]. Prior to that. 0 = t0 < t1 < · · · < tn+1 . Section 4(d)). if this is positive. A payer (respectively. we consider two tenor points 0 < t1 < t2 . Financial authorities calculate swap reference rates more or less in the same way as deposit reference rates are set.1. We deﬁne the forward LIBOR rate f1 (t) by b1 (t) − b2 (t) . respectively. Many researchers therefore considered the use of the Black formula for caps and ﬂoors or swaptions to be unsound for a considerable period: While the Black swaptions approach had already been justiﬁed in 1990. Usually. and instead of ‘caplet’ or ‘ﬂoorlet’ read ‘payer swaption’ or ‘receiver swaption’. Thus. LIBOR ﬁxes at time t1 and interest is paid at time t2 . but the approach later turned out to be valid. then such a swaption is a payer or receiver swaption. an assumption of jointly lognormal LIBOR and swap rates is inconsistent.7) f1 (t) = α1 b2 (t) . Cash settled contracts diﬀer from normal option contracts. and has no other cash ﬂows. articles establishing its validity kept appearing at least until 1997. An example of a reference swap rate is ISDAFIX. From Figure 1.
1. Continuing.g.3. Exercise of a Bermudan derivative is a tradeoﬀ between taking the option gains now or holding onto the option at possibly more favourable option gains later. To see this. and −1 for a ﬂoorlet. note that f1 (t) in (1. σ1 t1 . ﬂoors and European swaptions.9) in full. Equality (∗) holds since b2 (t2 ) = 1.9) = α1 b2 (0)E max(f1 (t1 ) − k.2 = ln f1 (0) k 2 ± 1 σ1 t1 2 √ . such as caps. annual or semiannual periods. Examples include Bermudanstyle interest rate derivatives. INTEREST RATE DERIVATIVES PRICING MODELS 11 We consider the forward measure.1). Bermudan means that the derivative is exercisable (equivalently: callable) at multiple discrete time points. which is the measure associated with the discount bond b2 maturing at the payment date of the LIBOR deposit. Here. values of Bermudan interest rate derivatives therefore depend on multiple interest rates. but on multiple interest rates. The payoﬀ v(t2 ) at time t2 of a caplet with strike rate k is then given by α1 max(f1 (t1 ) − k. 0) = b2 (0)E n(t2 ) b2 (t2 ) (1. f1 (t) = f1 (0) exp f1 (t) 1 2 − σ1 t + σ1 w(t) .1.8) where σ1 is a scalar constant. it follows that f1 is a martingale under its forward measure. we ﬁnd for the caplet value v(0): v(0) = n(0)E (∗) v(t2 ) α1 max(f1 (t1 ) − k. If we calculate (1.3 Interest rate derivatives pricing models Up to here we have examined options on a single interest rate.. usually separated by. There are however many interesting interest rate derivatives that depend not only on a single interest rate. By the assumption of absence of arbitrage. we obtain the formula of Black (1976): v(0) = α1 b2 (0)η f1 (0)φ ηd1 − kφ ηd2 d1. 0). 2 (1. There are of course also many nonBermudan interest rate derivatives that dependent on . φ(·) denotes the cumulative normal distribution function. Inherently. we assume that the forward LIBOR rate is a lognormal martingale under its forward measure: df1 (t) = σ1 dw(t).7) is the value of a portfolio ((b1 (t) − b2 (t))/α1 ) expressed in terms of amounts of the numeraire b2 (t). e. equivalently. From (1. 0) . η denotes +1 for a caplet.
12) . n(t) = n(0) exp n t r(s)ds . The reason for this abundance of interest rate models stems from diﬀerent speciﬁcations of the interest rate market. fn ): 1 + αi fi = bi .1): bi (t. discount bond prices may be given (implicitly or explicitly) in terms of a set of forward LIBOR rates f = (f1 . we need a model that features dynamics for the whole term structure of interest rates. as explained below. .12 CHAPTER 1. ω)dw(t). there are many short rate models.3. . equivalently. Discount bond prices satisfy. . . as explained in Section 1. for t < ti . bi (1. we select the bank account as numeraire. Short rate models are characterized by their speciﬁcation of dynamics for the short rate. bi+1 (1. and focus on the diﬀusion term: dbi (b) = · · · + σi (t.1. 0 In a short rate model. Therefore. r(ti ) =1 n(t) r(t) = r = E exp n(ti ) ti − t r(s)ds r(t) = r . we need only specify the volatility term σi . For example. (1. (b) To specify discount bond price dynamics. Next to short rate models. the drift term then follows from noarbitrage restrictions. INTRODUCTION multiple interest rates. ﬂoors and swaptions. the ﬁrst dynamic term structure models are short rate models.11) The bond price volatility may thus be statedependent.1. we can calculate discount bond prices. For most mainstream short rate models. we can ﬁnd explicit and analytical formulas for (1. once the arbitragefree dynamics of the short rate are known. there are many more other interest rate derivatives pricing models. r) = E bi ti . The short rate r is a hypothetical rate: it is the instantaneous rate of interest for the ﬂoating money market account (equivalently: bank account) with value n: dn = rdt. As can be seen from Table 1. Preferably.10) for discount bond prices given the associated short rate. To value such multirate dependent products. A set of discount bond prices may be alternatively given by a set of interest rates.10). by the fundamental arbitragefree pricing formula (1.10) From (1. we omit the drift.1 Short rate models Historically. Examples of short rate models are given in Table 1. 1. such dynamic term structure models need be consistent with the Black formula for caps.1 (this table is not complete).
10).3. ω) dw(t) (b) (b) (f ) (1.14) dbi bi dr r (1. ω) dw(t) = · · · + σ (r) (t.1. . and vice versa. see (1. Ingersoll & Ross (1985) Ho & Lee (1986) Hull & White (1990) Speciﬁcation dr = bdt + σdw dr = (b − ar)dt + σdw dr = ardt + σrdw dr = (b − ar)dt + σrdw √ dr = (b − ar)dt + σ rdw dr = θ(t)dt + σdw dr = (θ(t) − ar)dt + σdw √ dr = (θ(t) − ar)dt + σ rdw Black. (1. Here. b. the scalars a. ω) dw(t) r dbi (b) = · · · + σi (t) dw(t) ⇒ (1. and not state dependent.13) corresponds to the extension of Black & Scholes (1973) from stocks to bonds. Rebonato (2004a.1: Some short rate models and their speciﬁcation of short rate dynamics. INTEREST RATE DERIVATIVES PRICING MODELS 13 Table 1. ω) dw(t) = · · · + σi (t. Section 3) lists ﬁve speciﬁcations. it is the requirement of deterministic and known volatility for an interest rate speciﬁcation that determines a model. and (1. for one of the speciﬁcations.13) dfi = · · · + σ (f ) (t. and c denote model parameters.15) Speciﬁcation (1.14) corresponds to short rate models. Model Merton (1973) Vasicek (1977) Dothan (1978) Brennan & Schwartz (1979) Cox. More speciﬁcations are available: in fact. ω) dw(t) = · · · + σi (t. We restrict our exposition to these model classes. In fact. if we require volatility to be only time dependent σ(t). Dynamics for a set of forward LIBOR rates or for the short rate give rise to dynamics for discount bond prices. then we obtain stochastic volatility for the other speciﬁcations: dr = · · · + σ (r) (t. Derman & Toy (1990) dr = θ(t)rdt + σrdw Black & Karasinski (1991) dr = (ar − br log r)dt + σrdw √ Pearson & Sun (1994) dr = (b − ar)dt + σ r − c dw or in terms of the short rate r. Thus. ω) dw(t) bi fi i dr = · · · + σ (r) (t) dw(t) ⇒ r dfi (f ) = · · · + σi (t) dw(t) ⇒ fi dfi fi dbi bi = · · · + σi (t.15) to the LIBOR market model.
although some two factor short rate models exist too. INTRODUCTION Bond options can be viewed as caplets and ﬂoorlets. Such is. see Section 1. However. Moreover. as model parameters need be implied from market option prices via nonstraightforward numerical procedures.e. short rate models have only a single stochastic driver. . The classical model of Black & Scholes (1973) exhibits a socalled ﬂat volatility smile. in a sense. Section 4(b)). Equation 20. A disadvantage is the resulting diﬃculty to model instantaneous decorrelation. at least not without further modiﬁcation. model parameters need to be tweaked to ensure the model ﬁts to the relevant market rates and volatilities. Volatility smile is the phenomenon that diﬀerent implied volatility is quoted for options that have diﬀerent strikes but that are otherwise identical. as stated before. We thus expect from any interest rate model that aims to be the equivalent of Black & Scholes (1973) for interest rates to also produce a ﬂat volatility smile. Jamshidian (1989) decomposes the option on the coupon paying bond into several options on discount bonds. in the sense that the implied volatility is independent of strike. The straight extension of the model of Black & Scholes (1973) from stocks to bonds. see Hull (2000. the produced smile ought to correspond also qualitatively to the observed smile. The resulting numerical calibration procedures can be instable and computationally costly. deterministic and known instantaneous bond volatility. An advantage of single factor models is that recombining lattices can be used to eﬃciently price mildly pathdependent derivatives.10). i..2. and smile shapes are not controllable by model users. Another disadvantageous feature of short rate models is that they produce an arbitrary volatility smile. The reason is that short rate models are formulated in terms of an artiﬁcial short rate that is not directly observable in the market. There are also other related problems. see Rebonato (2004a.14 CHAPTER 1. also some drawbacks to short rate models: They are. Smile volatility is more realistic than ﬂat volatility. Typically. Consequently. and such is not the case for short rate models: the latter exhibit rather arbitrary smiles. diﬃcult to calibrate. suﬀers however from the problem that discount bond prices do not necessarily converge to one at maturity.14) does not correspond to market practice of quoting implied volatility for LIBOR and swap rates. deterministic volatility for an abstract short rate in (1. see. The initial success of short rate models is mainly due to their analytical tractability and numerical eﬃciency. An example of the indirect calibration of short rate models is when they are calibrated to swaption volatility: we then have to resort to the formula of Jamshidian (1989): A swaption is viewed as an option on a coupon paying bond. not the case for short rate models. including Americantype options. Therefore a direct application of Black & Scholes (1973) to bond prices yields an interest rate derivatives pricing model with many undesirable features. however. for example. since we can observe a pronounced volatility smile in interest rate markets. Longstaﬀ & Schwartz (1992) and Ritchken & Sankarasubramanian (1995). There are.
11). Section 4. we investigate the eﬀect of including such time homogeneous calibration procedures on the quality of risk sensitivities produced by the model. Time homogeneity of volatility means that the model preserves the cap or swaption volatility curves as modeltime progresses. presented in Section 1. Positivity of interest rates is guaranteed for the deterministic volatility LIBOR market model. Section 3. Consequently.2 Market models In recent years. Market models correspond to speciﬁcation (1. respectively. For an arbitragefree construction of the LIBOR market model starting from discount bond dynamics (1. the state variables of the model. This joint capswaption calibration potential of the LIBOR market model has very much contributed to its success.15). G¸tarek & Musiela (1997). Moreover. Fortunately for the LIBOR market model.. INTEREST RATE DERIVATIVES PRICING MODELS 15 For a more extensive discussion on advantages and disadvantages of short rate models (e.3. the LIBOR market model can be calibrated to swaption volatility. the reader is referred to Cairns (2004.2). such as time homogeneity of cap and swaption volatility.3. forward swap rates are generally not lognormally distributed. This seems to imply that the LIBOR market model produces nonﬂat swaption volatility. Market models therefore allow for straightforward calibration to prices of underlying options: the model parameter is simply equal to the market quoted volatility. In LIBOR market models. 1. It is speciﬁcation (1. SantaClara & Schwartz (2001)). because the LIBOR and swap market model produce ﬂat volatility for caps and swaptions. Kerkhof & Pelsser (2002) show that the two formulations are equivalent. and are thus based on a set of forward rates.g. Sandmann & Sondermann (1997) and Jamshidian a (1997)) or string models (see SantaClara & Sornette (2001) and Longstaﬀ. a successful class of models has appeared in the literature known as market models (LIBOR and swap market models. Miltersen. see Chapter 2. there also exist extremely accurate approximate formulas for swaption implied volatility. These forward rates are. market models are the equivalent of Black & Scholes (1973) for interest rates. An interesting use of the approximate swaption volatility calibration via semideﬁnite programming is given in Brace & Womersley (2000) and D’Aspremont (2003). The number of state .2. since forward LIBOR rates are lognormal martingales. Market models are based on the forward measure technique for caplets. see Brace. positivity of interest rates and nonexplosiveness of short rate models). in fact.1). and is thus not a canonical model for swaptions. see Pietersz (2001. The ease of calibration of market models allows for modelling of other market aspects.5. also referred to as BGM models. Such deviation from the lognormal paradigm however turns out to be extremely small in the LIBOR market model. Market models correspond to (1.1.15).15) with which traders quote prices of underlying options (caps and swaptions). In Chapter 2.
. e. The derivation methodology is fully generic and includes the previous market model speciﬁcations. Andersen & Andreasen (2000). which is useful for. that these can only be valued with Monte Carlo anyway.3 Markovfunctional models A model class that combines some of the features of short rate models and market models is the Markovfunctional model of Hunt et al. Monte Carlo simulation has to be used to value derivatives. and pricing can be eﬃciently performed on. for example. The traditional LIBOR market model is based on a set of forward LIBOR rates.16 CHAPTER 1. Volatility smile can be incorporated in the LIBOR market model. Section 19) and Pelsser (2000. 1. The numerical eﬃciency of pricing on a grid comes however with a price: a single stochastic Markov driver implies that the model has diﬃculty in attaining instantaneous . and a ﬂat volatility smile can be achieved. The CMS market model is key to the pricing of. Hunt & Kennedy (2000. In Chapter 7.. the model has sixty forward rates. see. We address reduction of dimensionality in Chapters 3 and 4. European options on interest rate spreads. and they do not suﬀer from the drawback of short rate models of producing an arbitrary smile: Volatility smile is very much controllable in Markovfunctional models.4) and Galluccio & Hunter (2004) extend with a set of forward swap rates that costart at the same initial date. whether a low factor (short rate or Markovfunctional) or multi factor model is used.g. Certain derivatives have become so complex and strongly pathdependent. (2000). Due to the dimensionality of market models. Section 9). In Chapter 5. e.3. Section 18. Though some derivatives can be valued without Monte Carlo when a short rate or Markovfunctional model is used. Moreover. INTRODUCTION variables can grow large for particular trades: For a thirty years model with semiannual forward rates. which is noneﬃcient compared to recombining lattices used in short rate models. the trend shows a growing complexity in derivatives. Markovfunctional models do not suﬀer from the computational burden of market models.g. we extend market models to include forward constant maturity swap (CMS) rates and other generic speciﬁcations of rates. ﬁxed maturity Bermudan swaptions and Bermudan CMS swaptions. Technological computer hardware developments have however recently enabled the use of Monte Carlo as a suﬃciently eﬃcient pricing and risk management tool. a grid. Jamshidian (1997) extends the market model technology to a set of forward swap rates that coend at the same ﬁnal date. see also Hunt & Kennedy (2000. This means that market models have large dimensionality. Markovfunctional models can be calibrated to interest rate option volatility much like market models. we study an eﬃcient and accurate approximation of the LIBOR market model that enables pricing on a recombining lattice. for example. An interest rate spread is a diﬀerence between separate interest rates.
Bermudan options are thus somewhat in between European and American options. see also Jamshidian (2003). AMERICAN OPTION PRICING WITH MONTE CARLO SIMULATION 17 decorrelation between interest rates. 1. . is whether a lack of decorrelation causes a signiﬁcant impact on the pricing and hedge performance of a model. The key is to make use of crosssectional information present in the simulation. require simulation within simulation. we show that the regressionbased algorithm of Longstaﬀ & Schwartz (2001) leads to ineﬃciently estimated risk sensitivities. The regressionbased approximate holdon value may then be used to formulate an exercise decision. In Chapter 6. This conditional expectation value is not known. The regression based techniques have been generalized to stochastic mesh methods by Broadie & Glasserman (2004) and Avramidis & Matzinger (2004). Other American option pricing techniques include the dual approach of Rogers (2002). We propose a modiﬁcation of the algorithm. while Monte Carlo simulation is of forward induction type. in principle. at a simulation node. in terms of pricing and hedge performance. deemed the constant exercise method.1. see. that enhances the quality of risk sensitivity estimates. A question that has not yet been addressed in the academic literature so far. An eﬃcient algorithm for pricing American options in Monte Carlo was therefore not known in the literature for long. Longstaﬀ & Schwartz (2001). and to calculate it would. by regressing the holdon value onto functions of explanatory variables present in the simulation. the value of holding onto the option. for example.4 American option pricing with Monte Carlo simulation American options feature a period during which the option can be exercised. which is most ineﬃcient. we address this research question by empirical comparison of market models and Markovfunctional models. The problem is that we need to know. Only recently has eﬃcient American option valuation with Monte Carlo been enabled by novel regressionbased methods. and the highdimensional grid approach of Berridge & Schumacher (2003). The valuation of American options typically involves a backward induction routine. In Chapter 6. Bermudan options can be exercised at several discrete points in time.4.
.
A. & Pelsser. R. Journal of Derivatives 11(3). which would make it a central interest rate model. stable. A. A. J. 2. R. as Pietersz. Risk Magazine pp.1 Introduction The LIBOR interest rate model discussed in Section 1. Risk Books. An alternative approach not based on recalibration comes out of comparison with the swap market model. 91–93.2 is popular among both academics and practitioners alike. Dunbar.3. while in the recalibration method the change in volatility is hidden and potentially unstable. A. 1 This chapter has been published in diﬀerent form as Pietersz. pp. An extended abstract of this chapter appeared as Pietersz. Swap vega in BGM: pitfalls and alternatives. A. ed. ‘Derivatives Trading and Option Pricing’. This does not happen in the case of constant swap rate volatility. London. and wellunderstood fashion. March issue.. It features lognormal LIBOR and almost lognormal swap rates. J. (2005b). ‘Riskmanaging Bermudan swaptions in a LIBOR model’. We will call this the BGM model. (2004b). The key to the method is that the perturbation in LIBOR volatility is distributed in a clear. (2004a). UK. & Pelsser. in N. ‘Swap vega in BGM: pitfalls and alternatives’. It accurately estimates vegas for any volatility function in few simulation paths. It shows that for some forms of volatility an approach based on recalibration may make estimated swap vega very uncertain. . This Risk article was republished as part of a Risk book. 51–62. as the instantaneous volatility structure may be distorted by recalibration. A. One reason the LIBOR BGM model is popular is that it can riskmanage interest rate derivatives that depend on both the cap and swaption markets. J. R. 277–285. & Pelsser.Chapter 2 Riskmanaging Bermudan swaptions in a LIBOR model 1 This chapter presents a new approach to calculating swap vega per bucket in a LIBOR model.
Dun & Barton (1998)). however. of course. It follows that there is higher correlation between the discounted payoﬀs along the original path and perturbed volatility. As the vega is the expectation of the diﬀerence between these payoﬀs (divided by the perturbation size). if we increase the number of simulation paths. recalibrate. The strength of the method is that it accurately estimates swap vegas for any volatility function and in few simulation paths. The diﬀerence in value divided by the perturbation size is then an estimate for the Greek. a Bermudan swaption is a complicated enough (swapbased) product (in a LIBORbased model) that depends nontrivially on the swap rate volatility dynamics. There remain a number of issues to be resolved to use BGM as a central interest rate model. with predictability at 10. it may (depending on the volatility function) yield estimates with high uncertainty. the standard error will be lower. It may be used to calculate swap vega in the presence of any volatility function. Second. which is probably the maximum in a practical environment. this additional volatility is naturally distributed evenly over time. while all other swap rate volatilities remain unaltered. while in the recalibration method the change in volatility is hidden and potentially unstable. the vega is estimated with low uncertainty. The key to the method is that the perturbation in the LIBOR volatility is distributed in a clear. For a constantvolatility calibration. and then revalue the option. The reason is that for certain calibrations. for example. First. For a constantvolatility calibration. The uncertainty disappears. but the number required for clarity can far exceed 10.20 CHAPTER 2. however. a Bermudan swaption is not as complicated as some other more exotic interest . We develop a method that is not based on recalibration to compute swap vega per bucket in the LIBOR BGM model.000 or fewer simulation paths. and wellunderstood fashion. In other words. One issue is the calculation of swap vega. its value depends also on swap rate correlation. under a perturbation.000. It is important to verify that a calculation method reproduces the correct numbers when the answer is known. The number of simulation paths needed for clarity of vega thus depends on the chosen calibration. the standard error of the vega is relatively high. BERMUDAN SWAPTIONS IN A LIBOR MODEL and thus also the marketstandard Black formula for caps and swaptions. of course. Approximate swaption volatility formulas such as in Hull & White (2000) have been shown to be of high quality (see Brace. The method is based on keeping swap rate correlation ﬁxed but increasing the instantaneous volatility of a single swap rate evenly over time. stable. A common and usually very successful method for calculating a Greek in a model equipped with a calibration algorithm is to perturb market input. the additional volatility is distributed unevenly and one might even say unstably over time. If this technique is applied to the calculation of swap vega in the LIBOR BGM model. We benchmark our swap vega calculation method using Bermudan swaptions for two reasons.
2 Recalibration approach We ﬁrst consider examples of the recalibration approach to computing swap vega. From (1. for 0 ≤ t ≤ ti . with the LIBOR volatility perturbation we ﬁnd using our method. 2. pi:j (t) where p denotes the present value of a basis point: j−1 pi:j (t) = k=i αk bk+1 (t). the forward rates are related to discount bond prices as follows: fi (t) = bi (t) 1 −1 . ti ] → Rd is the volatility vector function of the ith forward rate.12). the resulting vega is hard to estimate and many simulation paths are needed for clarity. fi (t) The positive integer d is referred to as the number of factors of the model. for two of the three methods. (2. Each forward rate is modeled as a geometric Brownian motion under its forward measure: dfi (t) = σ i (t) · dw(i+1) (t). αi bi+1 (t) The swap rate corresponding to a swap starting at ti and ending at tj is denoted by si:j . Our problem diﬀers from theirs in that we derive a method to calculate the perturbation of LIBOR volatility to obtain the correct swap rate volatility perturbation for swaption vega.2. RECALIBRATION APPROACH 21 rate derivatives. The swap rate is related to discount bond prices as follows: si:j (t) = bi (t) − bj (t) .2. A discount bond pays one unit of currency at maturity. Three calibration methods are considered. . We show for Bermudan swaptions that our method yields almost the same swap vega as found in a swap market model. and some intuition exists about its vega behavior. The kth component of this vector corresponds to the kth Wiener factor of the Brownian motion. The function σ i : [0. w(i+1) is a ddimensional Brownian motion under the forward measure Q(i+1) . Glasserman & Zhao (1999) provide eﬃcient algorithms for calculating risk sensitivities. We show that. given a perturbation of LIBOR volatility.1) It is understood that pi:j ≡ 0 whenever j ≤ i. The Glasserman and Zhao approach may then be applied to eﬃciently compute the swaption vega.
sn:n+1 corresponding to the swaps underlying a coterminal Bermudan swaption. .2) In general. We model LIBOR instantaneous volatility as constant in between tenor dates (piecewiseconstant). . .20) and Section 7. (1998). . 2. (THSRV)—Timehomogeneous swap rate volatility. . 2 A coterminal Bermudan swaption is an option to enter into an underlying swap at several exercise opportunities. there are as many parameters as market swaption volatilities. tn ] → {1. n}: i(t) = 1 + # ﬁxings in [0. . . ti ). Hull & White (2000). (CONST)—Constant forward rate volatility. although they are very close to lognormal as shown. all the underlying swaps that may possibly be entered into have the same ending date. for example. We may thus implicitly deﬁne its volatility vector σ i:n+1 by: dsi:n+1 (t) = σ i:n+1 (t) · dw(i:n+1) (t). Deﬁne i : [0. .22 CHAPTER 2.4 of Brigo & Mercurio (2001). t ∈ [ti−1 . ﬁrst deﬁne a ﬁxing to be one of the time points t1 . . t] . by Brace et al. To deﬁne this. This approach is based on ideas of Rebonato (2001). (THFRV)—Timehomogeneous forward rate volatility. BERMUDAN SWAPTIONS IN A LIBOR MODEL We consider the swap rates s1:n+1 . for example. The ﬁrst and the second stage are described in Equation (6.3) A volatility structure is said to be timehomogeneous if it depends only on the index to maturity i − i(t). Because of the timehomogeneity restriction. There are closedform formulas for the swaptions Black implied volatility. see. We note that constant forward rate volatility implies constant swap rate volatility. si:n+1 (t) (2. σ i:n+1 will be stochastic because swap rates are not lognormally distributed in the BGM model. (2. 3. The algorithm for calibrating with such a volatility function is a twostage bootstrap. tn .2 Swap rate si:n+1 is a martingale under its forward swap measure Q(i:n+1) . A volatility structure {σ i (·)}n is piece wiseconstant if: i=1 σ i (t) = (const). for 0 ≤ t ≤ ti . The holder of a Bermudan swaption has the right at each exercise opportunity to either enter into a swap or hold the option. the Black formula approximately holds for European swaptions. . Because of near lognormality. . The volatility will sometimes be modeled as timehomogeneous. . The corresponding calibration algorithm is similar to the second stage of the twostage bootstrap. . Three volatility calibration methods are considered: 1. A NewtonRhapson sort of solver may be used to ﬁnd the exact calibration solution (if there is one).
Two regression functions are employed.3 The model is recalibrated. At high levels of perturbation. the swap net present value (NPV).1: Market European swaption volatilities. The BGM tenor structure is 0 < 1 < 2 < · · · < 31.1. The Bermudan swaption is repriced through Monte Carlo simulation using the exact same random numbers. 28 3 20. The market European swaption volatilities were taken as displayed in Table 2.2.. see Hull & White (2000). p. Only a single explanatory variable is considered.4) where β is chosen to equal 5%. For each bucket a perturbation ∆σ(≈ 10−8 ) is applied to the swaption volatility in the calibration input data. we use the Longstaﬀ & Schwartz (2001) least squares Monte Carlo method. .0% 20..8% All calibration methods have in common that the forward rate correlation structure is calibrated to a historical correlation matrix using principal components analysis (PCA). At too low levels of volatility perturbation. . the vega is unstable.4% . The option is callable annually. Expiry (Y) Tenor (Y) 1 30 2 29 15. . vegagamma terms aﬀect the vega..2% 3 28 15. To determine the exercise boundary. All forward rates are taken to equal 5%. ﬂoating point number roundoﬀ errors aﬀect the vega. 63) as: ρij (0) = e−βti −tj  . The notation xNCy denotes an “x noncall y” Bermudan option.5) 3 It was veriﬁed that the resulting vega is stable for a wide range of volatility perturbation. Correlation is assumed to evolve timehomogeneously over time. which is exercisable into a swap with a maturity of x years from today but is callable only after y years.. namely. ∆σ (2. We consider a 31NC1 coterminal Bermudan payers swaption deal struck at 5% with annual compounding... For very extreme perturbation. and we check to see that the calibration error for all swaption volatilities is a factor 106 lower than the volatility perturbation.4% 29 2 30 1 23 Swaption Volatility 15. a constant and a linear term. Denote the original price by v and the perturbed price by vi:n+1 . Then the recalibration method of estimating swap vega νi:n+1 for bucket i is given by: νi:n+1 = vi:n+1 − v .6% 20. RECALIBRATION APPROACH Table 2.2. (2. The time zero forward rate instantaneous correlation is assumed following Rebonato (1998.
The standard errors (SEs) are displayed separately in Figure 2.volatility increment. For THSRV. respectively. This holds for all buckets.25.2. 2.1. Usually the swap vega is denoted in terms of a shift in the swaption volatility. BERMUDAN SWAPTIONS IN A LIBOR MODEL THFRV THSRV CONST 5 10 15 Bucket (Y) 20 25 30 Figure 2. The number of paths needed for THFRV to obtain the same SE as CONST is thus (6/0. Figure 2. . the instantaneous variance increment (in the limit) is completely diﬀerent from a constant.3 displays the THFRV vega for 1 million simulation paths. Swap vega results for a Monte Carlo simulation of 10.01) · νi:n+1 .1: Recalibration swap vega results for 10. 000 = 5.000 scenarios are displayed in Figure 2.4M paths are needed. For the THFRV and THSRV recalibration approaches. The swap vega 100bp scaled to a 100 bp shift νi:n+1 is then deﬁned by 100bp νi:n+1 = (0.24 35 30 Swap vega scaled to 100bp shift (bp) 25 20 15 10 5 0 −5 −10 −15 1 CHAPTER 2.3 Explanation The key to explanation of the vega results under recalibration is the change in swap rate instantaneous variance after recalibration. The levels of SE for THFRV and CONST are 6.25)2 × 10.00 and 0.000 simulation paths. we ﬁnd 1. For example. consider a 100 basis point (bp) shift in the swaption volatility.8M .
8M paths 4 3 3 2 1.2.000 simulation paths.3.2: Empirical standard errors of vega for 10. .3: Recalibration THFRV vega results for 1 million simulation paths.25 0 1 5 10 15 Bucket (Y) 20 25 30 Figure 2.4M paths 1 0. EXPLANATION 7 THFRV THSRV CONST 25 6 Standard error of swap vega (bp) 6 5 5. 35 THFRV 1M paths 30 Swap vega scaled to 100bp shift (bp) 25 20 15 10 5 0 −5 −10 CONST 10k paths 1 5 10 15 Bucket (Y) 20 25 30 Figure 2.
the distribution of the variance increment is concentrated in the begin and end time periods.6) where π and πi:n+1 are the payoﬀs along the path of the original and the perturbed model. This total variance increment has to be distributed over all time periods.26 Perc. π + Var π . This does not occur for a . This is at variance with the natural and intuitive even distribution in the CONST recalibration. From (2. which is associated with the calculation of swap vega corresponding to bucket 30. For THFRV. BERMUDAN SWAPTIONS IN A LIBOR MODEL Constant volatility re−calibration approach THFRV re−calibration approach 200% 100% 0% −100% 1 5 10 15 20 Time period (Y) 25 30 Figure 2. respectively. (2. The price diﬀerential has to be computed in the limit of the 30 × 1 swaption implied volatility perturbation ∆σ tending to zero. Here c := 0. This implies a swap rate instantaneous variance increment of 30∆σ 2 .01/∆σi:n+1 .4: Observed change in swap rate instantaneous variance for THFRV and CONST recalibration approach. it follows that the simulation variance of the vega is given by 100bp Var νi:n+1 = c2 Var πi:n+1 − π = c2 Var πi:n+1 − 2Cov πi:n+1 . and is even negative in the second time period. of increment in swap rate instantaneous variance CHAPTER 2. We note that for both data sets the sum of the variance increments equals 100%. For illustration. The vega standard error is thus minimized if there is high covariance between the discounted payoﬀs in the original and the perturbed model. we consider the volatility perturbation shown in Figure 2.4.5).
There is higher covariance between the payoﬀs under the perturbations of variance implied by the CONST calibration. however. Denote the corresponding swap market model by ε Sk:n+1 (ε).6). We consider a perturbation of the swap rate instantaneous volatility given by σε k:n+1 (·) = (1 + ε)σ k:n+1 (·). there are an uncountable number of perturbations of the swap rate instantaneous volatility that produce the same perturbation as the Black implied swaption volatility. 0 As may be seen in this equation.7) where the shift applies only to k : n + 1. From (2. a simple proportional increment. swap rates are lognormally distributed under their forward swap measure.2) are deterministic. leading in turn to a higher standard error of the vega. This will give us an alternative method to calculate swap vega per bucket. This is illustrated in Figure 2. The Black implied swaption volatility σk:n+1 is given by σk:n+1 = 1 tk tk σ k:n+1 (s)2 ds. a natural onedimensional parameterized perturbation of the swap rate instantaneous volatility. namely.4. We deﬁne swap vega in the swap market model as follows. no stochasticity is moved to other random sources. Then the swap . this leads to a reduced covariance. The latter is the derivative of the exotic price with respect to the Black swaption implied volatility. We note that the implied swaption volatility in Sk:n+1 (ε) is given by σk:n+1 = (1 + ε)σk:n+1 . There is. it then follows that the standard error is lower. because then each independent time period maintains approximately the same level of variance. We consider a swap market model S.2. This means that all swap rate volatility functions σ i:n+1 (·) of (2. In the model. (2. How much our dynamically managed hedging portfolio should hold in European swaptions is essentially determined by the swap vega per bucket. 2. Denote the price of the derivative in Sk:n+1 (ε) by vk:n+1 (ε). The ﬁrst step is to study the deﬁnition of swap vega in the swap market model. Denote the price of an interest rate derivative in a swap market model S by v. which we will extend to the LIBOR BGM model. Because the rate increments over diﬀerent time periods are independent.5. because the stochasticity in the simulation is basically moved around to other time periods (in our case from period 2 to period 1). SWAP VEGA AND THE SWAP MARKET MODEL 27 perturbation such as dictated by THFRV.4 Swap vega and the swap market model An alternative method for calculating swap vega has the advantage that the estimates of vega have a low standard error for any volatility function.
8) Equation (2. The swap rate volatility perturbation in (2. We note that the relative and .9) In (2. vega per bucket νk:n+1 is deﬁned as νk:n+1 = lim ε→0 vk:n+1 (ε) − v . respectively.28 CHAPTER 2. This ensures that the absolute level of the swap rate instantaneous volatility is increased by an amount ε. It is also possible to apply an absolute shift in the form of σε k:n+1 (·) = 1+ ε σ k:n+1 (·) σ k:n+1 (·). and vk:n+1 (ε) and v denote the prices of the derivative in models where the kth swaption volatility equals σk:n+1 + ∆σk:n+1 and σk:n+1 .5: Natural increment of Black implied swaption volatility. (2.7) deﬁnes a relative shift. In conventional notation we may write νk:n+1 = ∂v v(σk:n+1 + ∆σk:n+1 ) − v(σk:n+1 ) = lim ∂σk:n+1 ∆σk:n+1 →0 ∆σk:n+1 (2. εσk:n+1 (2.8) εσk:n+1 is equal to the swaption volatility perturbation ∆σk:n+1 . BERMUDAN SWAPTIONS IN A LIBOR MODEL Swap rate instantaneous volatility (1+ε)σ(⋅) ε ε σ(⋅) ε 0 Time T Figure 2.10) where the shift applies only to k : n + 1.8) is the derivative of the exotic price with respect to the Black implied swaption volatility.
Swap rates are not lognormally distributed in the LIBOR BGM model. bi (t) − bn+1 (t) pi:n+1 (t) where the weights wi:n+1 are in general statedependent. straightforward calculations reveal that the perturbed volatility satisﬁes ε σk:n+1 = σk:n+1 + ε 1 tk tk 0 σk:n+1 (s) ds ¯ σk:n+1 + O ε2 .11) at time zero.5.11) i:n+1 γj (t) = bi (t) pi:j (t) − . This means that swap rate instantaneous volatility is stochastic.2. The method for calculating swap vega per bucket is largely the same for both relative and absolute perturbation (but we will point out any diﬀerences). Hull and White derive an approximating formula for European swaption prices that is based on evaluating the weights in (2.9). The method is based on a perturbation in the forward rate volatility to match a constant swap rate volatility increment. Rebonato (2002) also derives this method in terms of covariance matrices. for example. . This is a good approximation by virtue of the near lognormality of swap rates in the LIBOR BGM model. (1998). The ﬁrst diﬀerence is in the change in swaption implied volatility ∆σk:n+1 of (2. by Brace et al. Hull & White (2000) show that the swap rate volatility vector is a weighted average of forward LIBOR volatility vectors: n σ i:n+1 (t) = j=i i:n+1 i:n+1 wj (t)σ j (t). D’Aspremont (2002) shows that the swap rate is uniformly close to a lognormal martingale. 2.12) i:n+1 i:n+1 When we write wj := wj (0) and adopt the convention that σ i (t) = σ i:n+1 (t) = 0 when t > ti . but our derivation is explicitly in terms of volatility vectors. 1 + αj fj (t) (2. ALTERNATIVE METHOD FOR CALCULATING SWAP VEGA 29 absolute perturbation are equivalent when the instantaneous volatility is constant over time.5 Alternative method for calculating swap vega An alternative method for calculating swap vega in the BGM framework may be applied to any volatility function to yield accurate vega with a small number of simulation paths. The stochasticity is almost invisible as shown empirically. (2. wj (t) = i:n+1 αj γj (t)fj (t) . We denote the resulting swap rate instantaneous volatility by σ HW as follows: i:n+1 n σ HW (t) = i:n+1 j=i i:n+1 wj (0)σ j (t). namely.
. The results are displayed in Figure 2. .13) σ HW (t) n:n+1 −1 = n:n+1 wn σ n (t) If W is the upper triangular nonsingular weight matrix (with upper triangular inverse W ).6. + wn σ n (t) 1:n+1 .. . for any vega with absolute value above 1 bp. . for the analytically tractable setup of a twostock Bermudan option.12) is: 1:n+1 1:n+1 σ HW (t) = w1 σ 1 (t) + . . 0 .30 CHAPTER 2. The swap rate volatility under relative perturbation (2. . (2. . and for any vega with absolute value below 1 bp. prices can be recomputed in the BGM model and the vegas calculated. . (2. We note that the swap rate correlation is left unaltered. BERMUDAN SWAPTIONS IN A LIBOR MODEL a useful form of (2.14) We note that only the volatility vectors σ k (t). .7) of the kth volatility is σ ·:n+1 → σ ·:n+1 + ε 0 . . . 2. . That is. which are the vectors that underlie σ k:n+1 (t) in the Hull and White approximation. these volatility vectors can be jointly related through the matrix equation: σ ·:n+1 = W σ · . . . . . 0 . the diﬀerence is less than 4%. The vegas were also calculated for the absolute perturbation method in results not displayed. 0 σ k:n+1 0 . negativity of vega occurs with correlation ≈1. the diﬀerence is always less than a third of a basis point. 0 σ k:n+1 0 . . In Appendix 2. We note that the approach yields slightly negative vegas for buckets 1730.7 Comparison with the swap market model The swap market model (SMM) is the canonical model for computing swap vega per bucket. We compare the LIBOR BGM model and a swap market model with the very .A. . . 2. With the new LIBOR volatility vectors. and volatilities for short expiration dates are higher than volatilities at longer expiration dates—this of course is in a typical interest rate setting. .000 paths. The diﬀerences in the vegas for the two methods are minimal. . we show that negative values are not a spurious result.6 Numerical results We demonstrate the algorithm in a simulation with 10. σ n (t) are aﬀected (due to the upper triangular nature of W−1 ). The corresponding perturbation in the BGM volatility vectors is given by σ · → σ · + εW−1 0 .
5% strike. 8 Swap vega scaled to 100bp shift (bp) BGM Libor Model Swap Market Model 7 6 Swap Vega (bp) 5 4 3 2 1 0 1 2 3 4 5 6 7 8 European Swaption Bucket (Y) 9 10 Figure 2. Error bars denote 95% conﬁdence bound based on the standard error.000 simulation paths.2.7. COMPARISON WITH THE SWAP MARKET MODEL 14 12 10 8 6 4 2 0 −2 1 5 10 15 20 Bucket (Y) 25 30 THFRV THSRV CONST 31 Figure 2. .7: Comparison of LMM and SMM for swap vega per bucket.6: Swap vega results for 10.
In the Monte Carlo simulation of the SMM we apply the discretization suggested in Lemma 5 of Glasserman & Zhao (2000). BERMUDAN SWAPTIONS IN A LIBOR MODEL BGM Libor Model Swap Market Model 20 Total Swap Vega (bp) 15 10 5 0 0% 3% 6% 9% Strike / Fixed Coupon 12% 15% Figure 2. same swap rate quadratic crossvariation structure.8.8)). Approximate equivalence between the two models has been established by Joshi & Theis (2002. as the instantaneous volatility structure may be distorted by recalibration.32 25 CHAPTER 2. and are displayed partially in Figures 2. Results appear in Table 2. In this particular case. . the BGM LIBOR model reproduces the swap vegas of the swap market model very accurately. A singlefactor LIBOR BGM model is used with constant volatility calibrated to the euro cap volatility curve of October 10. We show that for some forms of the volatility an approach based on recalibration may lead to great uncertainty in estimated swap vega.7 and 2. Equation (3.2.8 Conclusions We have presented a new approach to calculating swap vega per bucket in the LIBOR BGM model. We perform the test for an 11NC1 payﬁxed Bermudan option on a swap with annual ﬁxed and ﬂoating payments. 2001. The zero rates were taken to be ﬂat at 5%. 2.8: Comparison of LMM and SMM for total swap vega against strike. This does not happen in the case of constant swap rate volatility.
0 0.6 0.0 1.2 1.0 2.8 0.2 0.0 0.1 0.0 0.1 0.5 4.1 0.0 2.7 2.6 0.8 0.7 3.0 0.000 simulation paths.0 0.8 0.7 2.1 0.2 0.1 0.5% 585 (5) 11.1 0.1 0.7 0.7 0.2 0.6 4.0 0.5 0.2 0.7 1.0 0.3 0.0 0.7 2.3 3.0 1.7 2.5 0.9 2.2 1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y 10Y Total Vega 0.4 0.8.0 0.1 0.6 3.8 22.0 0.2 0.0 0.3 0.1 0.0 SWAP MARKET MODEL Fixed Rate Value 2% 2172 (6) 1.0 0.0 0.4 10% 21 (1) 0.5 10% 19 (1) 0.8 23.2 0.5 1.2 4.9 1.0 0.5 0.7 2.0 0.7 1.0 0.0 4.0 0.0 0.3 0.6 1.8 6.0 0.0 0.5 2.6 0.0 0.1 0.3 17.5% 1146 (6) 4.1 0.4 3.0 0.1 0.5 12% 7 (1) 0.2 0.1 0.0 0.4 0.2 0.1 0.2: Swap vega per bucket test results for varying strikes—10.0 6% 210 (3) 1.0 0.0 0.0 0.1 0.5 8.0 0.1 0.1 7% 112 (2) 0.3 0.3 1.6 1.9 .0 0.5% 592 (5) 11.6 1.2 0.0 0.0 3% 1480 (6) 0.8 0.0 0.1 0.0 3.4 2.0 0.0 4.0 4% 841 (5) 11.0 0.0 0.0 0.2 12.4 3.4 0.0 0. BGM LIBOR MODEL Fixed Rate Value 2% 2171 (4) 2.2 0.5 12.7 0.8 1.9 0.0 3% 1476 (5) 2.6 0.0 2.1 0.8 4.0 0.1 0.5 2.4 1.2.5 1.0 6.0 0.9 0.1 1.2 0.0 0.0 1.3 1.7 4.1 0.0 4.9 1.0 0.1 0.3 0.1 0.4 1.1 0.3 0.5 1.3 0.9 0.0 0.6 2.3 2.0 0.8 0.1 1.1 1.0 0.2 0.3 0.3 1.5 1.0 3.5 0.0 0.2 2.0 0.0 0.0 0.9 0.6 0.0 0.0 0.8 20.4 15% 1 (0) 0.6 1.8 0.0 0.9 12.0 6% 204 (4) 0.5 1.3 8% 61 (2) 0.0 0.4 9% 34 (1) 0.1 3.1 0.1 0.0 0.2 0.3 0.8 21.1 0.4 0.3 0.0 0.1 1.5 5.6 19.0 0.2 2.1 0.0 0.1 0.8 1.5 0.5 1.0 0.0 1.6 2.2 3.0 0.0 0.0 0.0 0.9 0.0 5% 410 (4) 7.0 0.1 0.2 7.1 7% 109 (3) 0.0 1.9 23.6 4.3 0.1 5.8 6.9 12.8 0.3 1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y 10Y Total Vega 0.2 3.3 8.4 12% 8 (1) 0.0 0.7 1.1 1.2 0.8 3.1 0.2 0.0 0.0 3.0 0.3 0.3 15% 2 (0) 0.5% 1138 (5) 2.3 0.3 0.1 1.0 1.0 0.8 1.2 1.6 0.3 0.0 0.0 0.3 8% 64 (2) 0.0 0.0 5% 411 (4) 6.3 0.7 1.1 0.1 4.0 0.3 0.6 0.3 9% 36 (1) 0.9 16. CONCLUSIONS 33 Table 2.3 2.0 2.0 4% 829 (5) 10.5 1.
using the swap market model. i = 1. It follows that the time t1 stock prices are distributed as follows: √ 1 2 si (t1 ) = f si (0). 0. t1 . 2.34 CHAPTER 2.17) max s1 (t1 ) − k1 + . We also show for a Bermudan swaption deal that our method yields almost the same swap vega as a swap market model. Therefore the (cashsettled) payoﬀ v(s1 (t1 ). but in the recalibration method the change in volatility is hidden and potentially unstable. i = 1. stable. Under the riskneutral measure. 2. and wi . if this right is not exercised. z2 ) is standard bivariate normally distributed with correlation ρ and where f (s. the value of the latter is given by the BlackScholes formula. on conditioning and involves a onedimensional numerical integration over the Black formula. BERMUDAN SWAPTIONS IN A LIBOR MODEL We derive an alternative approach that is not based on recalibration. (2. then the option becomes worthless. s2 (t1 ). or holding the option on the second stock. BS2 s2 (t1 ). t. .16) is the time t forward price for delivery at time u of a stock with current price s. The key to the method is that the perturbation in the LIBOR volatility is distributed in a clear.15) where the pair (z1 .A Appendix: Negative vega for twostock Bermudan options We examine a twostock Bermudan option to show that its vega per bucket is negative in certain situations. with correlation ρ. The method accurately estimates swaption vegas for any volatility function and at a small number of simulation paths. u) := s exp r(u − t) . Here t1 < t2 . The holder of a twostock Bermudan option has the right to call the ﬁrst stock s1 at strike k1 at time t1 . t1 exp σi t1 zi − σi t1 . 2 (2. t1 ) of the Bermudan at time t1 is given by: (2. 2. if the holder decides to hold the option. dw1 dw2 = ρdt. are Brownian motions under the riskneutral measure. 2. i = 1. The Bermudan option is valued under standard BlackScholes conditions. and wellunderstood fashion. si where σi is the volatility of the ith stock. At time t1 . the stock prices satisfy the stochastic diﬀerential equations: dsi = rdt + σi dwi . the holder of the Bermudan option will choose whichever of two alternatives has a higher value: either calling the ﬁrst stock. the right remains to call the second stock s2 at strike k2 at time t2 .
0) = e−rt1 E v t1 . s2 .2. As an example of vega negativity. 0) of the Bermudan option may thus be computed by a bivariate normal integration of the discounted version of the payoﬀ in (2. t) = e−r(ti −t) f (s. i = 1. s1 (t1 ).3. ∆σi for a small volatility perturbation ∆σi 1. t) = (i) (i) − ki φ d2 (i) . σi ) 2 + O ∆σi . ti )φ d1 d1. σi t where φ(·) is the cumulative normal distribution function. s2 (t1 ) . The time zero value v(s1 . σi + ∆σi ) − v(s1 .2 (s. t. ∂σi The vega may be numerically approximated by ﬁnite diﬀerences: νi = v(s1 . t. 0. Spot price for stock 1 Spot price for stock 2 Strike price for stock 1 Strike price for stock 2 Exercise time for stock 1 Exercise time for stock 2 Volatilities Correlation Riskfree rate s1 (0) s2 (0) k1 k2 t1 t2 σi ρ r 150 140 100 100 1Y 2Y Variable 0. Results are displayed in Table 2. we compute the vega per bucket for the deal described in Table 2. . APPENDIX: NEGATIVE VEGA TWOSTOCK BERMUDAN OPTIONS Table 2. 2. 2 ln f (s. s2 .A. We note that the vega per bucket may possibly be negative for both the ﬁrst and the second bucket. ti )/ki ± 1 σi t 2 √ .17): v(s1 . s2 . s2 .3: Deal description. 0. 0) .4. s2 . The vega per bucket νi is deﬁned as νi := ∂v(s1 . 2. i = 1. The volatility is perturbed by a small amount.9 5% 35 where BS is the BlackScholes formula: BSi (s.
the highvolatility call becomes out of the money. In several instances a vega per bucket is negative.36 CHAPTER 2. however.56 0. the Bermudan option is thus worth less. In the case of high correlation and one stock with signiﬁcantly higher volatility than the other. 0).44 The resulting vega is insensitive to either the perturbation size or the density of the 2D integration grid. Both stocks move down. in both the ﬁrst and the second bucket. and the lowvolatility call will be exercised. It is based on conditioning and involves a onedimensional numerical integration over the Black formula. we contend that the only added value of the additional option on the lowvolatility stock lies in oﬀering protection against a down move of both stocks (recall that the stocks are highly correlated). The alternative method yields the exact same results.53 30% 10% 65. the highvolatility call will be exercised. We consider a European maximum option on the two highly correlated stocks s1 and s2 .56 0. Therefore. Also.45 0. In total. In the case of a down move.4 that the negative vegas occur in the case of high correlation and for the bucket with the lowest volatility. the payoﬀ remains unaltered. the lowvolatility stock (volatility slightly increased) moves down more than in the unperturbed case.4: Results for negative vega per bucket for twostock Bermudan option. σ1 Scenario 1 Scenario 2 σ2 price 100bp ν1 100bp ν2 10% 30% 64. in the case of an up move. the payoﬀ of the protection call is reduced. BERMUDAN SWAPTIONS IN A LIBOR MODEL Table 2. we develop an alternative valuation of the twostock Bermudan option (available upon request).11 0. We give an alternative explanation of the source of vega negativity for Bermudan swaptions. Because the highvolatility stock moves up much more than the lowvolatility stock. Because the highvolatility stock moves down much more than the lowvolatility stock. To ensure that the negative vega is not due to an implementation error. . Both stocks move up. We note in Table 2. There are two scenarios: • Up move. • Down move. then in these scenarios the exercise strategy remains unchanged. struck at k > 0. with payoﬀ: max(s1 − k. s2 − k. If now the volatility of the lowvolatility stock is increased by a small amount.
. The spread option argument carefully shows a negative component in the vega of a Bermudan swaption.2. This causes a negative component in the total composition of the vega. 0) + 1{s1 >k} max(s2 − s1 . However the ordinary call option value increases when the volatility of the ﬁrst stock increases. We have: max(s1 − k. by which the value of the European spread option decreases. however it does not explain that this negative component can sometimes outweigh the other positive components of the vega. The maximum option is thus the sum of an ordinary European call option and a (conditional) European spread option. since the choice of calling either s1 or s2 corresponds to the choice of exercising at the ﬁrst or second exercise opportunity. which thus constitutes a positive component of the vega. The same argument can be applied to show that an increase in volatility of the second stock causes a negative component in the vega. APPENDIX: NEGATIVE VEGA TWOSTOCK BERMUDAN OPTIONS 37 We deem the risk behaviour of this option to be similar to the risk behaviour of a Bermudan swaption. 0). If the volatility of the ﬁrst stock increases then the volatility of the spread s2 − s1 decreases (for highly correlated stocks). s2 − k. 0) = max(s1 − k. This outweighing of the negative component is explained in the up and down moves argument above.A.
.
An example of such a derivative is a Bermudan swaption. A simulation study suggests that majorization compares favourably with competing approaches in terms of the quality of the solution within a ﬁxed computational time. First we explain how this problem occurs in an interest rate derivatives pricing setting. Such an application mainly concerns interest rates. P. ‘Rank reduction of correlation matrices by majorization’. . 3. P. R. The algorithm is based on majorization and. F. the 2 year swap rate. J. therefore. The algorithm is computationally eﬃcient. it is globally convergent. (2004a). 102. December issue.Chapter 3 Rank reduction of correlation matrices by majorization 1 A novel algorithm is developed for the problem of ﬁnding a lowrank correlation matrix nearest to a given correlation matrix. & Groenen. etc. ‘A major LIBOR ﬁt’. A Bermudan swaption gives its holder the right to enter into a ﬁxed maturity interest rate swap at certain exercise dates.1 Introduction In this chapter. and can handle arbitrary weights on the entries of the correlation matrix. Quantitative Finance 4(6). where the asset prices are modelled as correlated lognormal processes. 1 This chapter has been published in diﬀerent form as Pietersz. At an exercise opportunity. Risk Magazine p. (2004b). J. We will focus on interest rate derivatives that depend on several rates such as the 1 year LIBOR deposit rate. An extended abstract of this chapter appeared as Pietersz. & Groenen. F. R. we study the problem of ﬁnding a lowrank correlation matrix nearest to a given (correlation) matrix. is straightforward to implement. The problem of rank reduction of correlation matrices occurs when pricing a derivative dependent on a large number of assets. the holder has to choose between exercising then or holding onto the option with the chance of entering into the swap later at more favourable interest rates. 649–662.
the value of the caplet depends only on a single forward LIBOR rate. we will focus on derivatives depending on several rates. The matrix P = (ρij )ij should be positive semideﬁnite and should have a unit diagonal. yj = ρij . . In contrast. .. the correlation coeﬃcient between the returns on assets i and j. In other words.1).1) and which are most relevant to our discussion are the LIBOR and swap market models for valuation of interest rate derivatives. Given the model (3. .2) .1) by Monte Carlo we need a decomposition P = YYT . denotes the scalar product. as introduced in Section 1. dwj = ρij . Model (3. In this case. Here si denotes the price of the ith asset. Here. amongst others. . the derivative value can be calculated only by Monte Carlo simulation. also on the forward swap rates corresponding to future exercise dates. RANK REDUCTION BY MAJORIZATION Evidently. P should be a true correlation matrix. Suppose we model n correlated lognormal price processes. σi its volatility and wi ˜ denotes the associated driving Brownian motion. dwi . dsi = . Each of the asset prices is modelled as a lognormal martingale under its respective forward measure. To do so a model is set up that speciﬁes the behaviour of the asset prices. Jamshidian (1997) and Miltersen et al. too. for example. which can be viewed as a call option on LIBOR. Our discussion can however also be applied to the situation of a derivative depending on several assets. the asset prices are correlated. Because the number of assets is assumed to be high and the derivative is assumed complex in this exposition. dt denotes the drift term that stems from the change of measure under the nonarbitrage condition. ˜ ˜ ˜ (3. si yi . The models that ﬁt into the framework of (3.40 CHAPTER 3. In the latter case. . We then implement the scheme dsi = . In other words. (3. (1997).2. we consider 30 forward starting annualpaying swaps. the value depends not only on the current available swap rate but.1) si under a single measure. stocks. with Y an n × n matrix. then the decomposition reads yi . dt + σi dwi . These models were developed by Brace et al. To implement scheme (3. . then our model would consist of 30 annual forward LIBOR rates or 30 coterminal forward swap rates. the price of any derivative depending on the assets can be calculated by nonarbitrage arguments. if we denote the ith row vector of Y by yi . Additionally.3. In this case. For example.1) could however be applied to a derivative depending on a number of. if we model a 30 year Bermudan swaption with annual call and payment dates. (1997). dt + σi yi1 dw1 + · · · + yin dwn . yj = ρij . The term . an example of a derivative that is dependent on a single interest rate is a caplet. Brownian motions i and j are correlated with coeﬃcient ρij . starting at each of the 30 exercise opportunities and all ending after 30 years. . an asset price corresponds to a forward LIBOR or swap rate. where .
We could proceed in two possible ways. . yj = ρij .2) indeed corresponds to scheme (3. INTRODUCTION 41 where the wi are now independent Brownian motions. . . . The second involves approximating the correlation matrix while maintaining an exact ﬁt to the volatilities.2) contains a large number of almost redundant Brownian motions that cost expensive computational time to simulate. (3. in both cases of historic or marketimplied correlation. for the instantaneous covariance we have dsi /si .1) since both volatility and correlation are implemented correctly. In a derivatives pricing setting. we are more conﬁdent of the volatilities. it follows for the choice of c that ϕ is always between 0 and 1. The correlation is usually less known and can be obtained in two ways. n. It follows that an approximation be required. . to minimize ϕ(Y) := 1 i<j wij ρij − yi . with d < n and d typically between 2 and 6. For large interest rate correlation matrices. For ﬁnancial correlation matrices. it can be estimated from historical time series. The above considerations lead to solving the following problem: Find Y ∈ Rn×d .1. 2 . Because each term ρij − yi . (3. The ﬁrst way involves approximating the covariance matrix (σi σj ρij )ij . Instead of taking into account all Brownian motions. in a derivative pricing setting. Moreover. Second. For that reason. we would wish to do the simulation with a smaller number of factors. Such correlation sensitive products are not traded as liquidly as the European plainvanilla options. . i = 1.3) The n × d matrix Y is a decomposition of P. First. The 2 instantaneous variance is dsi /si = σi dt since yi = ρii = 1 and volatility is the square root of instantaneous variance divided by dt. dsj /sj = σi σj yi .3. The scheme then becomes dsi = . Scheme (3. usually almost all variance (say 99%) can be attributed to only 3–6 stochastic Brownian factors. This approach immediately implies that the rank of P be less than or equal to d. this rank restriction is generally not satisﬁed. we approximate the correlation matrix rather than the covariance matrix. d say. These can be calculated via a Blacktype formula from the European option prices quoted in the market. usually the volatilities are wellknown. it can be implied from correlation sensitive markettraded options such as spread options. si yi . Consequently. or mostly these volatilities are directly quoted in the market. A spread option is an option on the diﬀerence between two rates or asset prices. (3. dt + σi yi1 dw1 + · · · + yid dwd . . yj dt = σi σj ρij dt. Therefore.4) Here wij are nonnegative weights and c := 4 i<j wij . yj is always between 0 and 2. yj c subject to yi 2 = 1. The objective value ϕ is scaled by the constant c in order to make it independent of the problem dimension n.
then problems (3. since the added variance may change the importance of certain factors. i = 1. Section 10). n.5) by setting the weights in (3. (3.4) have been added for three reasons: • For squared diﬀerences. yj ˜ ˜ ˜ ˜ ˜ subject to yi 2 = ρii . then ˜ P = Diag(σ) P Diag(σ). therefore we focus on (3. If P = (˜ij )ij denotes the instantaneous covariance matrix.5) yields. a large diﬀerence constitutes a far greater part of the total error in (3. ˜ Approximating covariance with ﬁxed variance as in (3.5) are obviously identical.4) in the remainder of the thesis. We carefully state “in general” and “may change”. . to minimize ϕ(Y) := i<j ρij − yi .5) is thus a special case of the correlation problem (3. . we could be more conﬁdent about the correlation between the 1 and 2 year swap rates than about the correlation between the 8 and 27 year swap rates. for clarity of presentation): ˜ Find Y ∈ Rn×d .42 CHAPTER 3. in general. Section 9) provides an excellent discussion of the pros and cons of using weights.4) as wij = σi σj . ˜ 2 ˜ Here. • Financial reasons may sometimes compel us to assign higher weights to particular correlation pairs. . diﬀerent results than approximating correlation as in (3. yi relates to yi in (3. For example.5) Many thanks to Ton Vorst for pointing out this alternative. Rebonato (2002. . from (3.4). We note that the majorization algorithm in this chapter. and the geometric programming algorithm in Chapter 4. • The objective function with weights has been considered before in the literature.4) (with constant weights) and (3. The weights for small diﬀerences can then be appropriately increased to adjust for this. and σ denotes ρ the vector of volatilities. where Diag(σ) denotes a diagonal matrix with the diagonal ﬁlled with the vector σ. 2 2 . since if all variances are equal to one.3) via ˜ yi = σi yi . . See for example Rebonato (1999c.4) than a small diﬀerence. RANK REDUCTION BY MAJORIZATION An interesting alternative is to approximate the covariance matrix while keeping vari˜ ance ﬁxed2 .4). can be applied to the covariance 2 2 approximating problem in (3. The covariance problem (3. Approximating the covariance matrix while ensuring a perfect ﬁt to variance amounts to solving the following problem (we leave out weights w and the scalar factor c. The weights wij in (3.4).
These methods are outlined in the next section and are shown to have several disadvantages. we provide an overview of the methods available in the literature.2. Fourth. the idea of majorization is introduced and the majorizing functions are derived.4). The chapter ends with some conclusions. where · F denotes the Frobenius F norm. This is exactly the situation in practice: decisions based on derivative pricing calculations have to be made in a limited amount of time. LITERATURE REVIEW 43 The simplest case of ϕ is ϕ(Y) := c−1 P − YYT 2 . If not. an algorithm based on majorization is given along with reference to associated MATLAB code. then the most general objective function it can handle stems from the weighted Frobenius norm · F. We investigate empirically the eﬃciency of majorization in comparison to other methods in the literature.4). namely none of the methods is simultaneously (i) eﬃcient. First. we include here the discussion of the majorization method itself. Y 2 := tr(YYT ) for matrices Y. The algorithm is straightforward to implement. The method is based on iterative majorization that has the important property of guaranteed convergence to a stationary point. We show that the method can eﬃciently handle general weights.2 Literature review We describe seven existing algorithms available in the literature for minimizing ϕ. we develop a novel method to minimize ϕ that simultaneously has the four mentioned properties. Because a review of the majorization method is interesting from the point of view of Chapter 4 (Rank reduction by geometric programming). it is indicated whether it can handle general weights. Global convergence and the local rate of convergence are investigated. The benchmark tests that we will consider are based on the performance given a ﬁxed small amount of computational time. it corresponds to the case of all weights equal. Second.4) will be referred to as ‘general weights’. In this chapter. This objective function (which we shall also call F ‘Frobenius norm’) ﬁts in the framework of (3.3. 3. The remainder of this chapter is organized as follows. The objective function in (3. there exist six other algorithms for minimizing ϕ deﬁned in (3. (ii) straightforward to implement. we present empirical results. Third. (iii) able to handle general weights and (iv) guaranteed to converge to a local minimum.Ω with Ω a symmetric positive . In the literature. For each of the seven algorithms.
The modiﬁcation of PCA in this way is believed to be due to Flury (1988). (3. . n. where Y 2 := tr(YΩYT Ω).6) .44 CHAPTER 3. Modiﬁed PCA is popular among ﬁnancial practitioners and implemented in numerous ﬁnancial institutions. RANK REDUCTION BY MAJORIZATION deﬁnite matrix. A strong drawback of modiﬁed PCA is its nonoptimality: generally one may ﬁnd decompositions Y (even locally) for which the associated correlation matrix YYT is closer to the original matrix P than the T PCAapproximated correlation matrix YPCA YPCA . For ease of exposition.2.6) that is the ‘modiﬁed’ part. 3. with Q orthogonal and Λ the diagonal matrix with eigenvalues. we mention the ‘modiﬁed principal component analysis (PCA)’ method. For a description in ﬁnance related articles. The calculation is almost instant. The rate of convergence is sublinear. b). Modiﬁed PCA is based on an eigenvalue decomposition P = QΛQT .2.Ω 3. i = 1. we mention the majorization approach of Pietersz & Groenen (2004a. . Qd the ﬁrst d columns of Q. and Λd the principal submatrix of Λ of degree d. because almost all that is required is an eigenvalue decomposition.2 Majorization Second. F. and the approximation is reasonably accurate. Majorization can handle an entryweighted objective function and is guaranteed to converge to a stationary point. .1 Modiﬁed PCA First. see also this Chapter 3. .Ω YYT 2 will be referred to as ‘weighted Frobenius norm’ too. If the eigenvalues are ordered descendingly then a lowrank decomposition with associated approximated matrix close to the original matrix is found by {YPCA }i = z . z 2 Qd Λd 1/2 i (3. Sidenius (2000) and Hull & White (2000). Ordinary PCA stops with (3. however the method can be applied to the weighted Frobenius norm as well though not for general weights.7) and it is the scaling in (3. The objective function ϕ(Y) := c−1 R − F. The modiﬁed PCA approximation becomes worse when the magnitude of the left out eigenvalues increases. see. for example. . we restrict to the case of the Frobenius norm. ensuring that the resulting correlation matrices have unit diagonal. Modiﬁed PCA is easy to implement.7) z := Here {X}i denotes the ith row of a matrix X. Throughout this chapter we choose the starting point of any method considered (beyond modiﬁed PCA) to be the modiﬁed PCA solution.
in general. The alternating projections algorithm without normal correction stated in Grubiˇi´ (2002) and Morini & Webber sc (2004) however always converges to a feasible point. Section 5. Another advantage of geometric programming is that it can handle general weights. However. Higham (2002. Their results do not automatically hold for an alternating projections algorithm with normal correction for Problem (4.1)3 . Also. but not necessarily to a stationary point. The method is based on alternating projections onto the set of n × n matrices with unit diagonal and onto the set of n × n matrices of rank d or less. For projection onto the intersection of two convex sets. is the geometric programming approach of Grubiˇi´ sc & Pietersz (2005). ‘Concluding remarks’) mentions that he has been investigating alternative algorithms. Here. we consider the alternating projections algorithm of Grubiˇi´ (2002) and Morini & Webber (2004). The algorithm thus does not minimize the 3 The algorithm with normal correction for rank reduction has also been studied in Weigel (2004). since for d < n the set of n × n matrices of rank d or less is nonconvex. a drawback of the geometric programming approach is that it takes many lines of nonstraightforward code to implement.2. which can be seen as NewtonRhapson or conjugate gradient over curved space.3 Geometric programming The third algorithm that we discuss. . since we can eﬃciently calculate the projection onto the set of rankd matrices. see also Chapter 4. which may hinder its use for nonexperts.4 Alternating projections without normal correction sc Fourth.2. The discussion below of alternating projections applies only to the problem of rank reduction of correlation matrices. the alternating projections method without normal correction does not converge to a stationary point. the constraint set is equipped with a diﬀerentiable structure. Morini & Webber (2004) report that alternating projections with normal correction may fail in solving Problem (4. 3. Both these projections can be eﬃciently calculated.2. The alternating projections algorithm could in principle be extended to the case with rank restrictions. Some preliminary experimentation showed indeed that the extension to the nonconvex case did not work generally. such as to include rank constraints. Dykstra (1983) and Han (1988) have shown that convergence to a minimum can be obtained with alternating projections onto the individual convex sets if a normal vector correction is applied.3. Subsequently geometric programming is applied. The latter allows for an eﬃcient implementation.1). By formulating these algorithms entirely in terms of diﬀerential geometric means. a simple expression is obtained for the gradient. LITERATURE REVIEW 45 3. Convergence of the algorithm is however no longer guaranteed by the general results of Dykstra (1983) and Han (1988) because the constraint set {rank(C) ≤ d} is no longer convex for d < n. In fact.
3. In numerical experiments. For certain numerical settings. It is beyond the scope of this chapter to indicate how often this ‘nonconvergence’ occurs. in terms of Y. RANK REDUCTION BY MAJORIZATION objective function in (4. Subsequently. 3. Proposition 4. the condition that the dth and (d + 1)th eigenvalues are diﬀerent has not been guaranteed. As a result. for ease of exposition.4). then the resulting rankd approximation is a global minimizer of problem (3. we mention the ‘parametrization method’ of Rebonato (1999a.2. the resulting algorithm has been shown to perform not better and even worse than the geometric programming approach (Grubiˇi´ & Pietersz 2005). In essence. convergence of the Lagrange multiplier method to a global minimum or even to a stationary point is not guaranteed. We note that ϕY = 2ΨY can thus be eﬃciently calculated. this equaleigenvalues phenomenon occurs. This method lacks guaranteed convergence: Zhang & Wu (2003. Therefore. Brigo (2002). The major beneﬁt of geometric optimisation over the parametrization method is as follows. However. then the produced lowrank correlation matrix will not satisfy the diagonal constraint.1) and Wu (2003. ordinary nonlinear optimisation algorithms may be applied to minimize the objective value ϕ(Y(Θ)) over the angle parameters Θ. The set of correlation matrices of rank d or less {YYT : Y ∈ Rn×d . Diag(YYT )= I} is parameterized by trigonometric functions through spherical coordinates yi = yi (θ i ) with θ i ∈ Rd−1 . We consider. the case of equal weights. Brigo & Mercurio (2002). Theorem 3. Another sc drawback of the Lagrange multiplier algorithm is that only the weighted Frobenius norm can be handled and not general weights. The Lagrange multiplier algorithm produces a sequence of multipliers for which accumulation points exist. except for the key diﬀerence of optimising over Θ versus over Y.2.6 Parametrization Sixth. If the algorithm has not yet converged.1–20. If.1). 2004b (Sections 20. we mention the Lagrange multiplier technique developed by Zhang & Wu (2003) and Wu (2003). this approach is the same as geometric optimisation.6). with Ψ = YYT − P. see (4. The diﬀerential ϕY . 1999b. Section 6.46 CHAPTER 3.4)). Brigo & Mercurio (2001. The dif .24) below. 2002 (Section 9).1). The appropriate adaptation is to rescale the associated conﬁguration similarly to the modiﬁed PCA approach (3.9) and Rapisarda.5 Lagrange multipliers As the ﬁfth algorithm. the dth and (d + 1)th eigenvalues have diﬀerent absolute values. for the original matrix plus the Lagrange multipliers of an accumulation point.4) prove the following result. it only selects a feasible point satisfying the constraints of (4. is given simply as 2ΨY. 1999c (Section 10). the objective value ϕ(Y) becomes a function ϕ(Y(Θ)) of the angle parameters Θ that live in Rn×(d−1) .
We follow here the lines of Borg & Groenen (1997. To understand the methodology. is 2ΨY multiplied by the diﬀerential of Y with respect to Θ. The idea of majorization has been described. Kiers & Groenen (1996) and Kiers (2002).4). Hayden & Wells (1988) and Suﬀridge & Hayden (1993).4) with equal weights and d := n can be written as min{ P − C 2 .3.. This method can only be used when there are no rank restrictions (d := n) and only with the weighted Frobenius norm. Since the case d < n is the primary interest of this chapter. The latter diﬀerential is less eﬃcient to calculate since it involves numerous sums of trigonometric functions. Section 6) have shown empirically for a particular numerical setting with many randomly generated correlation matrices that the parametrization method is numerically less eﬃcient than either the geometric programming approach or the Lagrange multiplier approach. The key to majorization is to ﬁnd a simpler function that has the same function value at a supporting point x and anywhere else is larger than or equal to the objective function to be minimized. by the chain rule of diﬀerentiation. amongst others. Moreover. C F 0. in De Leeuw & Heiser (1977).3. iterative . the method of Higham (2002) will not be considered in the remainder.e.2. Section 8. 3. The two constraint sets {C 0} and {Diag(C) = I} are both convex. in terms of Θ however. Hong & Wells (1990). As a consequence. Glunt. 3. if the objective and majorization functions are once continuously diﬀerentiable (which turns out to hold in our case). The same technique has been applied in a diﬀerent context in Chu. to the problem of ﬁnding the nearest (possibly fullrank) correlation matrix. Such a function is called a majorization function. we brieﬂy describe the idea of majorization and apply majorization to the objective function ϕ of Problem (3. This procedure guarantees that the function value never increases along points generated by the algorithm.3 Majorization In this section. in which it was shown that the alternating projections algorithm of Dykstra (1983) and Han (1988) could be applied. then the properties above imply that the gradients should match at the supporting point x. The parametrization approach can handle general weights. Hayden. The algorithm of Higham (2002) is the alternating projection algorithm with normal correction applied to the case d = n. MAJORIZATION 47 ferential ϕΘ .7 Alternating projections with normal correction (d = n) The seventh important contribution is due to Higham (2002). Diag(C) = I}. The convexity was cleverly exploited by Higham (2002). note that minimization Problem (3. Funderlic & Plemmons (2003). By minimizing the majorization function – which is an easier task since this function is ‘simpler’ – we obtain the next point of the algorithm. i. Grubiˇi´ & Pietersz sc (2005. from any point where the gradient of the objective function is nonnegligible.4).
1 illustrates the majorization algorithm. x).48 CHAPTER 3. The ﬁrst step is to majorize ϕ(X) as a function of the ith row only and then to repeat this for each row. (ii) ϕ(y) ≤ χ(y. x). and (iii) the function χ(·. (iii) If ϕ y(k) − ϕ y(k+1) < ε then stop with y := y(k+1) . Figure 3.8) . We ﬁnd ϕ(Y) = 1 cj wj1 j2 ρj1 j2 − yj1 . RANK REDUCTION BY MAJORIZATION majorization will be able to ﬁnd a next point with a strictly smaller objective function value. This generic fact for majorization algorithms has been pointed out in Heiser (1995). A majorization algorithm is then given by (i) Start at y(0) . To formalize the notion of ‘ϕ(X) as a function of the ith row only’ we introduce the notation ϕi (y. Y) to denote the function ˆ ϕi (·. x) such that (i) ϕ(x) = χ(x. Y) : y → ϕ Yi (y) . it is straightforward to calculate the minimum of χ(·. Let ϕ(·) denote the function to be minimized. Let for each x in the domain of ϕ be given a majorization function χ(·. Set k := 0. We interpret Y as [y1 · · · yn ]T . y(k) . ˆ for (column)vectors y ∈ Rd with Yi (y) denoting the matrix Y with the ith row replaced T by y . that is. x) is ‘simple’. (iv) Set k := k + 1 and repeat from (ii). (ii) Set y(k+1) equal to the minimum argument of the function χ ·. We formalize the procedure somewhat more. (3. x) for all y. yj2 1 <j2 2 1 = cj T T wj1 j2 ρ21 j2 + (yj1 yj2 )2 − 2ρj1 j2 yj1 yj2 j 1 <j2 = (const in yi ) + 1 c T yi j:j=i T T wij yj yj yi − 2yi j:j=i (I) wij ρij yj (II) . Below we derive the majorizing function for ϕ(·) in (3.4).
y1) (y2) y2 y1 y0 Figure 3. This procedure is repeated to ﬁnd the point y2 etc. y0 ) is minimized to ﬁnd the next point y1 .3.) . y0 ) is ﬁtted by matching the value and ﬁrst derivative of ϕ(·) at y0 .3..y0) (y1)= (y1.1: The idea of majorization.y0) (y0)= (y0. Subsequently the function χ(·. (Figure adopted from Borg & Groenen (1997.y1) (y2..y1) (. The majorization function χ(·.4). Figure 8. (.y0) (y1.) The algorithm sets out at y0 . MAJORIZATION 49 (.
Ψ := YYT − P.11) is readily solved by y∗ := z/ z 2 . Combining (3. is denoted by x. ∀y. (3. RANK REDUCTION BY MAJORIZATION Part (I) is quadratic in yi whereas part (II) is linear in yi . the gradient is then given by Gradϕ = 4c−1 ΨY. which gives after some manipulations yT By ≤ 2λ − 2yT (λx − Bx) − xT Bx. and the current yi . that is. from which it would follow that the current point x is already a stationary point. We restrict to the case of all wij equal.4 The algorithm and convergence analysis Majorization algorithms are known to converge to a point with negligible gradient. As shown in Grubiˇi´ & Pietersz (2005) sc (see also Section 4.10) using the fact that yT y = xT x = 1. so that the following inequality holds: (y − x)T (B − λI)(y − x) ≤ 0. ∀y. Then.8) and (3. we shall denote Bi (Y) by B. Y). Y) ≤ − yT λx − Bx + c j:i=j + (const in y) = χi (y. Y). the current ith row vector of Y. as follows. y 2 = 1 (3. Y) is that it is linear in y and that the minimization problem min χi (y. We only have to majorize part (I).50 CHAPTER 3. ∀y. the matrix B − λI is negative semideﬁnite. 2 wij ρij yj ϕi (y. If z = 0 then this implies that the gradient is zero. (3. an expression for Gradϕ is needed. The advantage of χi (·.3). 3. Let λ denote the largest eigenvalue of B.9) For notational convenience. Deﬁne Bi (Y) := j:j=i T wij yj yj .10) we obtain the majorizing function of ϕi (y. the remaining term is constant in yi . This property holds also for the current situation. as will be shown hereafter. As the convergence criterion is deﬁned in terms of the gradient Gradϕ. Y) .12) . (3. that is. z := λx − Bx + j:j=i wij ρij yj . the running yi by y. Y) over ϕi (·.5.
Assume the speciﬁcation of a subset S ⊂ M called the solution set. n do T Set B := j=i wij yj yj .5. we investigate the local rate of convergence. W denotes the weight matrix. In Sections 3.5..4 and 3. An (autonomous) iterative algorithm is a map A : M → M ∪ {stop} such that A−1 ({stop}) = S. A point Y ∈ S is deemed a solution. The result is repeated here in a form adapted to the case of majorization.4. 2. Here P denotes the input matrix..6)–(3. In particular. 3. The proof of the following theorem is adapted from the proof of Theorem 1 in Zangwill (1969). First. do 3: 4: 5: 6: 7: 8: 9: 10: stop if the norm of the gradient of ϕ at X(k) := X is less than ε improvement in the function value ϕk−1 /ϕk − 1 is less than εϕ . n denotes its dimension. Second. If z = 0. Input: P. . 2. This order eﬀect will be addressed in Section 3. εϕ . .4.7). Calculate λ to be the largest eigenvalue of the d × d matrix B. Let M be a compact set.3. The majorization algorithm has been displayed in Algorithm 1. 1: Find starting point Y by means of the modiﬁed PCA method (3. then set the ith row yi of Y equal to z/ z 2 .3. W.1 Global convergence Zangwill (1969) developed generic suﬃcient conditions that guarantee convergence of an iterative algorithm. In the remainder of this section the convergence of Algorithm 1 is studied.. Set z := λyi − Byi + j=i wij ρij yj . we study the use of the power method. for i = 1.5. 2: for k = 0.5 we study diﬀerent ways of implementing the calculation of the largest eigenvalue of B in line 6 of Algorithm 1. we establish global convergence of the algorithm. d.. 1. ε Gradϕ . An expression for the gradient for the objective function with general weights can be found by straightforward diﬀerentiation. ε Gradϕ is the convergence criterion for the norm of the gradient and εϕ is the convergence criterion on the improvement in the function value. THE ALGORITHM AND CONVERGENCE ANALYSIS 51 Algorithm 1 The majorization algorithm for ﬁnding a lowrank correlation matrix locally nearest to a given matrix. d denotes the desired rank. . The rowwise approach of Algorithm 1 makes it dependent of the order of looping through the rows. n.. end for end for Gradϕ and the Output: the n×n matrix YYT is the rankd approximation of P satisfying the convergence constraints.
Let {Y(kj ) }∞ be any subsequence that converges to Y∗ . In practice. note that the algorithm A(·) is continuous in Y. when minimizing the majorization functions χi (·. Assume the contrary. Third. except for very rare degenerate cases. it may also be a stationary point. A point Y is called a solution if Gradϕ(Y) < ε.52 CHAPTER 3. global convergence to a point with zero ﬁrst derivative is the best one may expect from generic optimization algorithms. PROOF: Without loss of generality we may assume that the procedure generates an inﬁnite sequence of points {Y(k) } none of which are solutions. Suppose given a ﬁxed tolerance level ε on the gradient of ϕ. Then from any starting point Y(0) . Y) there will be at least one i for which we ﬁnd a strictly smaller objective value. First. the algorithm either stops at a solution or produces an inﬁnite sequence of points none of which are solutions. A(Y(kj ) ) → A(Y∗ ). for which the limit of any convergent subsequence is a solution point. we then have ϕ A Y(kj ) ϕ A Y∗ < ϕ(Y∗ ). it will converge to the stationary point (0. Therefore. note that the sequence {ϕ(Y(k) )}∞ has a limit since it is monotonically decreasing and bounded from below by k=0 0. It remains to be proven that the limit of any convergent subsequence must be a solution. Moreover. By the continuity of ϕ(·). we almost always obtain a local minimum. For example. y) = x2 − y 2 . the globally convergent version of the NewtonRhapson algorithm may converge to a stationary point. 2 The algorithm thus converges to a point with vanishing ﬁrst derivative. Thus Y(k+1) := A(Y(k) ) has a strictly smaller objective function value than Y(k) . but. which is in contradiction with ϕ(A(Y(kj ) )) → ϕ(Y∗ ). 0) starting from any point on the line {y = 0}. too: Applied to the function ϕ(x. in principle.2 Local rate of convergence The local rate of convergence determines the speed at which an algorithm converges to a solution point in a neighbourhood thereof.4. Second. note that if Y(k) is not a solution then ϕ Y(k+1) = ϕ A Y(k) < ϕ Y(k) . say. RANK REDUCTION BY MAJORIZATION Theorem 1 (Global convergence) Consider ﬁnding a local minimum of the objective function ϕ(Y) by use of Algorithm 1. By continuity of the iterative procedure. Let {Y(k) } be a sequence of points produced . we necessarily have that the gradients agree at Y(k) . however. We expect such a point to be a local minimum. 3. It must be shown j=1 that Y∗ is a solution. Since the objective and all majorization functions are diﬀerentiable. Namely if Y(k) is not a solution then its gradient is nonnegligible.
ζ 53 (3. Thus the majorization algorithm makes no contribution to existing literature for the case of indeﬁnite iteration.i) with α(δ (k.i) )δ (k. we study and assess the performance of the majorization algorithm in practice. eventually the algorithm with best rate of convergence will provide the best result. We may conclude that the local convergence is worse than linear. for Algorithm 1.A. As the proposition below will show. the convergence is deemed sublinear.i) = α(δ (k. Given a ﬁxed amount of time. It follows that the convergencetype deﬁning Equation (3.3. Among the algorithms available in the literature. Deﬁne δ (k. Suppose. Second.i) + O δ (k. both the geometric programming and parametrization approach can have a quadratic rate of convergence given that a NewtonRhapson type algorithm is applied.i) = yi − yi . let {Y(k) } denote the sequence of points generated by (k) (∞) Algorithm 1 converging to the point Y(∞) . More speciﬁcally. that is. Equation (3. for k large enough. Y(k+1) − Y(∞) ≤ α Y(k) − Y(∞) . respectively. However. but for α = 1 and not for any α < 1. Algorithm 1 has a sublinear local rate of convergence. Third. 2 3.i) ) → 1 as k → ∞. If the convergence estimate is worse than linear. thus sublinear. and the results of the next section on numerical experiments indeed show that majorization is the best performing algorithm in a number of ﬁnancial settings. we did not introduce the majorization algorithm for the purpose of indeﬁnite iteration.5.14) PROOF: The proof of Equation (3. worse than a linear rate of convergence. First.14) can be written as δ (k+1. α is called the linear rate of convergence. Proposition 1 (Local rate of convergence) Algorithm 1 has locally a sublinear rate of convergence. we explain and . we present an example with nonconstant weights. Such performance can almost invariably only be measured by empirical investigation. but rather for calculating a reasonable answer in limited time. The strength of majorization lies in the low costs of calculating the next iterate. When considering several algorithms and indeﬁnite iteration. (3. the performance of an algorithm is a tradeoﬀ between rate of convergence and computational cost per iterate.i) 2 . we numerically compare majorization with other methods in the literature. The next proposition establishes the local sublinear rate of convergence. For linear convergence. as is the case in practical applications of ﬁnancial institutions.13) If ζ = 1 and α < 1 or if ζ = 2 the local convergence is called linear or quadratic.14) may be found in Appendix 3. Then δ (k+1.i) = δ (k. NUMERICAL RESULTS by an algorithm converging to a solution point Y(∞) .13) holds.5 Numerical results In this section. with ζ = 1.
54
CHAPTER 3. RANK REDUCTION BY MAJORIZATION
investigate the order eﬀect. Fourth and ﬁfth, we consider and study alternative versions of the majorization algorithm. Algorithm 1 has been implemented in a MATLAB package called major. It can be downloaded from www.few.eur.nl/few/people/pietersz. The package consists of the following ﬁles: clamp.m, dF.m, F.m, grad.m, guess.m, major.m, P tangent.m and svdplus.m. The package can be run by calling [Yn,fn]=major(P,d,ftol,gradtol). Here P denotes the input correlation matrix, d the desired rank, Yn the ﬁnal conﬁguration matrix, fn denotes the ﬁnal objective function value, ftol the convergence tolerance on the improvement of ϕ, and gradtol the convergence tolerance on the norm of the gradient. The aforementioned webpage also contains a package majorw that implements nonconstant weights for the objective function ϕ.
3.5.1
Numerical comparison with other methods
The numerical performance of the majorization algorithm was compared to the performance of the Lagrange multiplier method, geometric programming4 and the parametrization method. Additionally, we considered the function fmincon available in the MATLAB optimization toolbox. MATLAB refers to this function as a ‘mediumscale constrained nonlinear program’. We have chosen to benchmark the algorithms by their practical importance, that is the performance under a ﬁxed small amount of computational time. In ﬁnancial applications, rank reduction algorithms are usually run for a very short time, typically 0.05 to 2 seconds, depending on the size of the correlation matrix. We investigate which method produces, in this limited amount of time, the best ﬁt to the original matrix. The ﬁve algorithms were tested on random ‘interest rate’ correlation matrices that are generated as follows. A parametric form for correlation matrices is posed in De Jong, Driessen & Pelsser (2004, Equation (8)). We repeat here the parametric form for completeness, that is, ρij = exp − γ1 ti − tj  − √ γ2 ti − tj  − γ4 t i − max(ti , tj )γ3 tj , (3.15)
with γ1 , γ2 , γ4 > 0 and with ti denoting the expiry time of rate i. (Our particular choice is ti = i, i = 1, 2, . . . ) This model was then subsequently estimated with USD historical interest rate data. In Table 3 of De Jong et al. (2004), the estimated γ parameters are listed, along with their standard errors. An excerpt of this table is displayed in Table 3.1. The random ﬁnancial matrix that we used is obtained by randomizing the γparameters
4 For geometric programming we used the MATLAB package LRCM MIN downloadable from www.few.eur.nl/few/people/pietersz. The Riemannian Newtonalgorithm was applied.
3.5. NUMERICAL RESULTS Table 3.1: Excerpt of Table 3 in De Jong et al. (2004). γ1 estimate standard error γ2 γ3 γ4
55
0.000 0.480 1.511 0.186 0.099 0.289 0.127
in (3.15). We assumed the γparameters distributed normally with mean and standard errors given by Table 3.1, with γ1 , γ2 , γ4 capped at zero. Hundred matrices were randomly generated, with n, d, and the computational time t varied as (n = 10, d = 2, t = 0.05s), (n = 20, d = 4, t = 0.1s) and (n = 80, d = 20, t = 2s). Subsequently the ﬁve algorithms were applied each with t seconds of computational time and the computational time constraint was the only stopping criterion. The results are presented in the form of performance proﬁles, as described in Dolan & Mor´ (2002). The e reader is referred there for the merits of using performance proﬁles. These proﬁles are an elegant way of presenting performance data across several algorithms, allowing for insight into the results. We brieﬂy describe the workings here. We have 100 test correlation matrices p = 1, . . . , 100 and 5 algorithms s = 1, . . . , 5. The outcome of algorithm s on problem p is denoted by Y(p,s) . The performance measure of algorithm s is deﬁned to be ϕ(Y(p,s) ). The performance ratio (p,s) is
(p,s)
=
ϕ(Y(p,s) ) . mins ϕ(Y(p,s) )
The cumulative distribution function φ(s) of the (‘random’) performance ratio p → ρ(p,s) is then called the performance proﬁle, φ(s) (τ ) = 1 # 100
(p,s)
≤ τ ; p = 1, . . . , 100 .
A rule of thumb is, that the higher the proﬁle of an algorithm, the better its performance. The quantity φ(s) (τ ) for τ > 1 is the empirical probability that the achieved performance measure of an algorithm s is less than τ times the performance measure of the algorithm with the smallest (i.e., best) performance measure. The proﬁles are displayed in Figures 3.2, 3.3 and 3.4. From the performance proﬁles we may deduce that majorization is the best overall performing algorithm in the numerical cases studied. The tests were also run with a strict convergence criterion on the norm of the gradient. Because the Lagrange multiplier algorithm has not been guaranteed to converge to a local minimum, we deem an algorithm not to have converged after 30 seconds of CPU time. The majorization algorithm still performs very well, but geometric programming and the
56
100 90 80 70 60 50 40 30 20 10 0 1 1.02
CHAPTER 3. RANK REDUCTION BY MAJORIZATION
major Newton param Lagrange
Lagrange Newton major param fmincon
Percentage of attained performance ratio
( )
fmincon
1.04 1.06 1.08 1.1
Performance ratio
( )
Figure 3.2: Performance proﬁle for n = 10, d = 2, t = 0.05s.
Lagrange multiplier method perform slightly better when running up to convergence. This can be expected from the sublinear rate of convergence of majorization versus the quadratic rate of convergence of the geometric programming approach. The results have not been displayed since these are not relevant in a ﬁnance setting. In ﬁnancial practice, no additional computational time will be invested to obtain convergence up to machine precision. Having found that majorization is the most eﬃcient algorithm in a ﬁnance setting for the numerical cases considered, with the tests of running to convergence we do warn the reader for using Algorithm 1 in applications outside of ﬁnance where convergence to machine precision is required. For such nonﬁnance applications, we would suggest a mixed approach: use majorization in an initial stage and ﬁnish with geometric programming. It is the low cost per iterate that makes majorization so attractive in a ﬁnance setting. To assess the quality of the solutions found in Figures 3.2–3.4, we checked whether the matrices produced by the algorithms were converging to a global minimum. Here, we have the special case (only for equal weights) that we can check for a global minimum,
3.5. NUMERICAL RESULTS
100 90 80 70
57
major Newton Lagrange
Percentage of attained performance ratio
( )
Lagrange 60 50 40 30 20 10 Newton
param
major param fmincon
fmincon
0 1 1.2 1.4 1.6 Performance ratio 1.8 2 2.2
( )
Figure 3.3: Performance proﬁle for n = 20, d = 4, t = 0.1s.
although in other minimization problems it may be diﬃcult to assess whether a minimum is global or not. For clarity, we point out that the majorization algorithm does not have guaranteed convergence to the global minimum, nor do any of the other algorithms described in Section 3.2. We only have guaranteed convergence to a point with vanishing ﬁrst derivative, and in such a point we can verify whether that point is a global minimum. If a produced solution satisﬁed a strict convergence criterion on the norm of the gradient, then it was checked whether such stationary point is a global minimum by inspecting the Lagrange multipliers, see Zhang & Wu (2003), Wu (2003) and Grubiˇi´ & Pietersz (2005, sc Lemma 6.1). The reader is referred to Lemma 1 in Section 4.5.4 for details. The percentage of matrices that were deemed global minima was between 95% and 100% for both geometric programming and majorization, respectively, for the cases n = 20, d = 4 and n = 10, d = 2. The Lagrange multiplier and parametrization methods did not produce any stationary points within 20 seconds of computational time. The percentage of global minima is high since the eigenvalues of ﬁnancial correlation matrices are rapidly decreasing. In eﬀect, there are large diﬀerences between the ﬁrst 4 or 5
58
100 90 80 70 60 50 40 30 20 10 0 1
CHAPTER 3. RANK REDUCTION BY MAJORIZATION
major Lagrange Newton
Lagrange Newton major param fmincon
Percentage of attained performance ratio
( )
fmincon
1.2 1.4 1.6 Performance ratio 1.8
param
2 2.2
( )
Figure 3.4: Performance proﬁle for n = 80, d = 20, t = 2s.
consecutive eigenvalues. For the case n = 80, d = 20 it was more diﬃcult to check the global minimum criterion since subsequent eigenvalues are smaller and closer to each other. In contrast, if we apply the methods for all cases to random correlation matrices of Davies & Higham (2000), for which the eigenvalues are all very similar, we ﬁnd that a much lower percentage of produced stationary points were global minima.
3.5.2
Nonconstant weights
We considered the example with nonconstant weights described in Rebonato (2002, Section 9.3), in which a functional form for the correlation matrix is speciﬁed, that is, ρij = LongCorr + (1 − LongCorr) exp − βti − tj  , i, j = 1, . . . , n. The parameters are set to n = 10, LongCorr = 0.6, β = 0.1, ti = i. Subsequently Rebonato presents the rank 2, 3, and 4 matrices found by the parametrization method for the case of equal weights. The majorization algorithm was also applied and its convergence criterion was set to machine precision for the norm of the gradient. Comparative results
In contrast.86×10−05 I II CPU major. respectively.1). majorization ﬁnds exact ﬁts. Rebonato proceeds by minimizing ϕ for rank 3 with two diﬀerent weights matrices. Columns I and II show that the matrices displayed in Rebonato (2002) are not yet fully converged up to machine precision. otherwise.01×10−04 0. otherwise (R) and the weights matrix W(T ) for the trigger swap has ones on the ﬁrst two rows and columns (T ) (T ) wij = 1 if i = 1. 2. The results are displayed in Table 3. Section 9.3. In Algorithm 1.137×10−04 1. since the roundoﬀ error from displaying only 6 digits is much smaller than the error in obtaining full convergence to the stationary point. etc.2.5. i.2: Comparative results of the parametrization and majorization algorithms for the example described in Rebonato (2002. Rebonato subsequently presents solution matrices found by the parametrization method.1s 2 3 4 41×10−04 15×10−04 70×10−04 0.85×10−05 ϕ Rebonato 5.26311×10−04 4.131×10−04 1.02×10−04 0. 5. Columns I and II denote PApprox − PApprox F and PApproxrounded − PApprox F .26307×10−04 4. ‘Approx’ stands for the rankreduced matrix produced by the algorithm and ‘rounded’ stands for rounding the matrix after 6 digits. the row index runs from 1 to n. wij = 0. then 2.5.3. d Gradϕ major. 0. NUMERICAL RESULTS 59 Table 3.4s 1.0s 2. 2 or j = 1. Here major major Reb major. 3.3. These solutions exhibit a highly accurate yet nonperfect ﬁt to the relevant portions of the correlation matrices.3 The order eﬀect The majorization algorithm is based on sequentially looping over the rows of the matrix Y. It would be equally reasonable to consider any permutation . which are interest rate derivatives. These weights matrices are chosen by ﬁnancial arguments speciﬁc to a ratchet cap and a trigger swap. as is the precision displayed in Rebonato (2002). There is however no distinct reason to start with row 1. 2×10−17 2×10−17 2×10−17 F ϕ major. The weights matrix W(R) for the ratchet cap is a tridiagonal matrix wij = 1 if j = i − 1. i + 1.01×10−04 for the parametrization and majorization algorithms are displayed in Table 3. (R) wij = 0.
798634 .842612 .779732 .868147 . obtained ϕ < 2 × 10−30 Row 1 tar.961928 .868128 .779705 .961935 .3).798669 .779732 .962004 .961935 .927513 . Reb.779732 . maj. maj.961867 .961944 (without .961977 .961935 . ‘maj.868128 . Reb.1)) .961935 .962098 .961961 .4s.961935 .896327 .962044 .’ and ‘Reb.961935 .798634 .779732 .819534 .961935 .842612 .819525 .961935 .798634 .961935 .842637 .819525 .762628 .2)) .’ denotes the target value.961935 .868128 . obtained ϕ < 2 × 10−30 tar.896355 .819525 .961935 .779730 .’ denote the resulting value obtained by the majorization algorithm and Rebonato (2002.868128 .842612 .961935 .798634 .927492 .896327 . Section 9.961880 .927492 . Ratchet cap First principal subdiagonal CPU time major: 2.842612 .961935 .798549 . Here ‘tar.961935 .961935 .961944 the unit entry (1.961935 . maj.961935 .962074 Trigger swap First two rows (or equivalently ﬁrst two columns) CPU time major: 2. Row 2 tar.927492 .762628 .3: Results for the ratchet cap and trigger swap.896327 .961935 .961935 .896285 .819525 .961935 . Reb.60 CHAPTER 3.961935 .962015 .762638 .842650 . RANK REDUCTION BY MAJORIZATION Table 3.961935 .8s. (without .961935 .868097 the unit entry (2. respectively.896327 .927565 . .927492 .819532 .961935 .
8 degrees and the standard deviation of the rotation was 0. The order eﬀect can have two consequences.0). . .15). . QQT = I. . that is. For the latter.4. For R2 . p(2). (Davies & Higham (2000) random correlation matrix. First. This dependency of the order is termed ‘the order eﬀect’. we show that this is unlikely to happen for ﬁnancial correlation matrices. which turned out to be a global minimum by inspection of the Lagrange multipliers.2 degrees. consider a n × d conﬁguration matrix Y and assume given any orthogonal d × d matrix Q. III. However. . The associated objective function values and the frequency at which the types occurred are displayed in Table 3. II. as follows. n} and then let the row index run as p(1). and IV. We investigated the order eﬀect for Algorithm 1 numerically. Second. .5. All produced matrices Y were diﬀerently rotated.) Essentially four types of produced solution correlation matrices could be distinguished. . The random correlation matrix generator gallery(’randcorr’. The results for the two diﬀerent correlation matrices are as follows. which we shall name I. Algorithm 1 was applied with d = 2 and a high accuracy was demanded: ε Gradϕ = εϕ = 10−16 .5. n. We also investigated the orthogonal transformation eﬀect.3. For each of the permutations. (Random interest rate correlation matrix as in (3. . . see Section 3. .. . Subsequently we generated 100 random permutations with p=randperm(n). there is nothing to guarantee or prevent that the resulting solution point produced with permutation p would diﬀer from or be equal to the solution point produced by the default loop 1.15).n). . A priori. We generated either a random matrix by (3. We inspected the Lagrange multipliers to ﬁnd that 5 The indeterminacy of the result produced by the algorithm can easily be resolved by either considering only YYT or by rotation of Y into its principal axes.n=30. . The maximum rotation was equal to 0. or a random correlation matrix in MATLAB by rand(’state’. the conﬁguration Y can diﬀer – in this case we have equal objective function values. p(n).R=gallery(’randcorr’.0). NUMERICAL RESULTS 61 p of the numbers {1. the produced solution correlation matrix can diﬀer – this generally implies a diﬀerent objective function value as well. To see this. We show empirically that the solutions produced by the algorithm can diﬀer when using a diﬀerent permutation. The order eﬀect is a bad feature of Algorithm 1 in general.randn(’state’. Then the principal axes representation is given by YQ.1. let YT Y = QΛQT be an eigenvalue decomposition.n) has been described in Davies & Higham (2000). Then the conﬁguration matrices Y and YQ are associated with the same correlation matrices5 : YQQT Y = YYT .) Only one type of produced solution correlation matrix could be distinguished. but no reﬂection occurred. an orthogonal transformation can be characterized by the rotation of the two basis vectors and then by 1 or +1 denoting whether the second basis vector is reﬂected in the origin or not. even when the produced solution correlation matrix is equal.
The maximum rotation was equal to 38 degrees and the standard deviation of the rotation was 7 degrees.5. we display the natural logarithm of the relative residual versus the computational time for the random Davies & Higham (2000) matrix R included in the major package.5. our implementation in the MATLAB function major implements lambda=max(eig(B)). see Golub & van Loan (1996). at least not for the numerical setting that we investigated. but no reﬂection occurred. we could consider ﬁnding an easytocalculate upper bound on the . the algorithm can be accelerated by calculating only the largest eigenvalue.110423 0.110630 0. This choice of implementation unnecessarily calculates all eigenvalues whereas only the largest is required. Instead. we conclude that the order eﬀect is not much of an issue for the case of interest rate correlation matrices. For type II. 3. Out of the 88 produced matrices Y that could be identiﬁed with type II. For example. Four types of produced correlation matrices could be distinguished.110730 7% 3% none of the four types was a global minimum.few. Instead of a full calculation. as follows. which can be implemented in several diﬀerent ways.5 Using an estimate for the largest eigenvalue In Algorithm 1. all were diﬀerently rotated. Such methods may be relatively expensive to apply. We numerically tested the use of the power method versus lambda=max(eig(B)). In Figure 3.4 Majorization equipped with the power method Line 6 in Algorithm 1 uses the largest eigenvalue of a matrix.nl/few/people/pietersz. the most frequently produced lowrank correlation matrix.eur.62 CHAPTER 3. From the results above. The power method is available as majorpower at www. which uses available MATLAB builtin functions. Here n = 30. As can be seen from the ﬁgure. for both the power method and lambda=max(eig(B)).110465 2% 88% 0. type ϕ frequency I II III IV 0. d = 2 and 100 random permutations were applied.5. for example with the power method. the largest eigenvalue of B is calculated by an eigenvalue decomposition or by the power method. The table displays the associated ϕ and frequency. we also investigated the orthogonal transformation eﬀect. 3.4: The order eﬀect. RANK REDUCTION BY MAJORIZATION Table 3. the power method causes a signiﬁcant gain of computational eﬃciency.
Replacing λ and its calculation by n − 1 in Algorithm 1 will result in a reduction of computational time by not having to calculate the eigenvalue decomposition. NUMERICAL RESULTS 0 1 2 LN( relative residual ) 3 4 5 6 7 8 9 10 Computational time (s) 0 5 10 15 major power major 20 63 Figure 3. Whether to use n − 1 instead of λ is thus a tradeoﬀ between computational time for the decomposition and the stepsize. The relative residual is Gradϕ(Y(i) ) F / Gradϕ(Y(0) ) F . We investigated d = 3. these results could be particular to our numerical setting. d = 40 and d = 70. without a single exception.5. However. Such upper bound is readily determined as n − 1 due to the unit length restrictions on the n − 1 vectors yi . The ‘n − 1’ version of the algorithm remains an interesting alternative and could potentially be beneﬁcial in certain experimental setups. largest eigenvalue of B. . In other words. This result suggests that a complete calculation of the largest eigenvalue is most eﬃcient.5: Convergence run for the use of the power method versus lambda=max(eig(B)). We tested replacing λ by n − 1 for 100 correlation matrices of dimension 80 × 80. the steps taken by the majorization algorithm will be smaller. We allowed both versions of the algorithm a computational time of less than 1 second. For all 400 cases. Here n = 80 and d = 3. A disadvantage is however that the resulting ﬁtted majorizing function might be much steeper causing its minimum to be much closer to the point of outset. d = 6. These matrices were randomly generated with the procedure of Davies & Higham (2000).3. the version of the algorithm with the full calculation of λ produced a matrix that had a lower value ϕ than the version with n − 1.
z(∞) y . Y(k) ). so y(k+1) = m(y(k) ). For ease of exposition we suppress the dependency on the row index i and current state Y(k) . by ﬁrst order Taylor approximation j:i=j wij ρij yj . z where B depends on Y according to (3. the Jacobian matrix equals Dm(y(∞) ) = I − y(∞) y(∞) T y(k) − y(∞) 2 . The algorithm is based on iterative majorization and this chapter is the ﬁrst to apply majorization to the area of derivatives pricing. The majorization method eﬃciently and straightforwardly allows for arbitrary weights.64 CHAPTER 3. RANK REDUCTION BY MAJORIZATION 3.9). y(k+1) − y(∞) ≈ Py(∞) = Py(∞) = Py(∞) = 1 (λI − B) y(k) − y(∞) z(∞) 1 (λI − B)y(k) + a − (λI − B)y(∞) + a z(∞) 1 z(k) − z(∞) (∞) z (3. As an addition to the previously available methods in the literature.A Appendix: Proof of Equation (3. the Lagrange multiplier technique or the parametrization method. 1 (λI − B). majorization is easier to implement than any method other than modiﬁed PCA. majorization was in our simulation setup more eﬃcient than either geometric programming. We have locally around y y(k+1) = y(∞) + Dm(y(∞) ) y(k) − y(∞) + O By straightforward calculation. 3. Furthermore. Then.16) z(k) P (∞) y(k) − y(∞) . z = (λI − B)y + a.13) (k+1) (k) Deﬁne the Algorithm 1 mapping yi = mi (yi . with z m(y) = .6 Conclusions We have developed a novel algorithm for ﬁnding a lowrank correlation matrix locally nearest to a given matrix. We showed theoretically that the algorithm converges to a stationary point from any starting point. z(∞) The matrix I − (y(∞) )(y(∞) )T is denoted by Py(∞) . up to ﬁrst order in δ (k) = y(k) − y(∞) . λ is the largest eigenvalue of B and a = (∞) .
6.11) 65 Figure 3.3.6. The resulting length Py(∞) (y(k) − y(∞) ) has been illustrated in Figure 3. up to ﬁrst order in δ (k) . If we denote this length by µ. where in the last equality we have used Py(∞) y(∞) = 0.17). where θ is the angle as denoted in the ﬁgure. We note that. It follows that µ = sin 2 arcsin δ (k) /2 = 2 sin = 2 = δ (k) arcsin δ (k) /2 1− 2 cos 2 arcsin δ (k) /2 δ (k) 2 δ (k) 2 1 − δ (k) /4 = δ (k) + O δ (k) 2 .17) 2 The result δ (k+1) = δ (k) + O((δ (k) )2 ) follows by combining (3. APPENDIX: PROOF OF EQUATION (3.16) and (3. . The projection operator Py(∞) sets any component in the direction of y(∞) to zero and leaves any orthogonal component unaltered. The term Py(∞) (y(k) − y(∞) ) can be calculated by elementary geometry. z(k) / z(∞) ≈ 1. (3. see Figure 3. then µ = sin(θ). Also sin(θ/2) = δ (k) /2 from which we obtain θ = 2 arcsin(δ (k) /2).A.6: The equality Py(∞) (y(k) − y(∞) ) = δ (k) 1 − (δ (k) )2 /4.
.
1 Introduction The problem of ﬁnding the nearest lowrank correlation matrix occurs in areas such as ﬁnance. The problem of ﬁnding the nearest lowrank correlation matrix occurs as part of the calibration of multifactor interest rate market models to correlation. . in numerical tests. . We show. along with an identiﬁcation of whether a local minimum is a global minimum. Let the desired rank d ∈ {1. . The mathematical formulation of this problem is stated in (3. An additional beneﬁt of the geometric approach is that any weighted norm can be applied. . The problem is then given by Find C ∈ Sn to minimize 1 P − C 2 2 subject to rank(C) ≤ d.Chapter 4 Rank reduction of correlation matrices by geometric programming Geometric optimisation algorithms are developed that eﬃciently ﬁnd the nearest lowrank correlation matrix.2) . that our methods compare favourably to the existing methods in the literature. n} be given. i<j (4. n. 4. = 1 2 wij (ρij − cij )2 . The connection with the Lagrange multiplier method is established. . we state the problem in terms of correlation matrices. . Here. physics and image processing. The most important instance is 1 P−C 2 2 (4. Let Sn denote the set of real symmetric n × n matrices and let P be a symmetric n×n matrix with unit diagonal. cii = 1. in terms of the conﬁguration matrix Y. For C ∈ Sn we denote by C 0 that C is positive semideﬁnite.1) 0. i = 1. C Here · denotes a seminorm on Sn .4). chemistry. . .
we solve (4.1980 −0. is based on geometric optimisation that can locally minimize the objective function in (4.1). The details of this representation are given in Section 4.0000 0. In fact.9782 0.1) with standard convex optimization methods.1) with P as above and d = 2. This novel technique we propose. As the ﬁgure suggests. the target matrix P and the solution point C∗ .1980 1.2416 .0000 and then produces a sequence of points on the constraint set that converges to the point 1.1.3827 −0. 0.3827 −0. The constraint set and the points generated by the algorithm are represented in Figure 4.6277 −0.4068 −0. consider the following example. and it is called the Hadamard seminorm. This approach allows us to use numerical methods on manifolds that are numerically stable and eﬃcient.0000 −0. The blue point in the center and the green point represent. which makes it not straightforward to solve Problem (4.0000 −0.0000 and W is the full matrix.68 CHAPTER 4. In words: Find the lowrank correlation matrix C nearest to the given n×n matrix P.2.4559 −0.1) and which incorporates the Hadamard seminorm.9782 1.0000 0.0000 −0. We note that the constraint set is nonconvex for d < n. We formulate the problem in terms of Riemannian geometry. −0. The seminorm in (4. The choice of the seminorm will reﬂect what is meant by nearness of the two matrices. that the numerical eﬃciency of geometric optimisation compares favourably to the other algorithms available in the literature. respectively. The only drawback of the practical use of geometric optimisation is that .9699 . The algorithm takes as initial input a matrix C(0) of rank 2 or less. C(0) 1.0000 ∗ that solves (4. RANK REDUCTION BY GEOMETRIC PROGRAMMING where W is a weights matrix consisting of nonnegative elements. for example.5.4068 1. With the algorithm developed in this chapter. wij = 1.8982 0. For concreteness. for the numerical tests we performed. in particular the RiemannianNewton method is applied. the algorithm has fast convergence and the constraint set is a curved space. We show.8982 = 0. our method can be applied to any suﬃciently smooth objective function. see Horn & Johnson (1990). Suppose P is 1.0000 −0.2. see the literature review in Section 3.1) can handle an arbitrary objective function. Not all other methods available in the literature that aim to solve (4.9699 1.4559 1.2416 1.6277 C = −0.2) is well known in the literature.
2. i.1: The shell represents the set of 3 × 3 correlation matrices of rank 2 or less. as high as the dimension of the matrix.2.nl/few/people/pietersz. This model is an interest rate derivatives pricing model. Miltersen et al. and that makes Problem (4. Jamshidian (1997) and Musiela & Rutkowski (1997).1). the implementation is rather involved. INTRODUCTION 69 Figure 4.eur.1) is important in ﬁnance. To overcome this drawback. The number of stochastic factors needed for the model to ﬁt to the given correlation matrix is equal to the rank of the correlation matrix.e.. Problem (4. by adaptation of Lagrange multiplier results of Zhang & Wu (2003). This rank can be as high as the number of forward LIBORs in the model. The fact that one may instantly identify whether a local minimum is a global minimum is very rare for nonconvex optimisation problems. We develop a technique to instantly check whether an obtained local minimum is a global minimum. whereas until now only the reverse direction (an expression for the matrix C given the Lagrange multipliers) was known. which is nonconvex for d < n. we have made available a MATLAB implementation ‘LRCM min’ (lowrank correlation matrices minimization) at www. (1997). all the more interesting. (1997).5. as it occurs as part of the calibration of the multifactor LIBOR market model of Brace et al. as explained in Section 1.4. The novelty consists of an expression for the Lagrange multipliers given the matrix C. and it is used in some ﬁnancial institutions for valuation and risk management of their interest rate derivatives portfolio. The details of this representation are given in Section 4.3. The number .few.1.
with n the number of LIBORs and d the number of factors. the empirical work shows that it is certainly not as high as. we discuss the convergence of the algorithms. One of the most famous problems comes from physics and is called Thomson’s problem. We parameterize the set of correlation matrices of rank at most d with a manifold named the Cholesky manifold. geodesics. In Section 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING of LIBORs in the model can grow large in practical applications. a lower factor model simply requires drawing less random numbers than a higher factor model. The importance of Problem (4. Due to its generality our method ﬁnds locally optimal points for a variety of other objective functions subject to the same constraints.3. This is one reason for using a model with a low number of factors. the constraints of the problem are formulated in terms of diﬀerential geometry. The chapter is organized as follows. In Section 4. There is much empirical evidence that the term structure of interest rates is driven by multiple factors (three. the Riemannian structure of the Cholesky manifold is introduced. Though the number of factors driving the term structure may be four or more.2. see the review article of Dai & Singleton (2003).5. say. Another reason is the enhanced eﬃciency when estimating the modelprice of an interest rate derivative through Monte Carlo simulation. the complexity of calculating LIBOR rates over a single time step in a simulation implementation is of order n × d.4. Second. Formulas are given for parallel transport. In comparison. for example.1) in ﬁnance has been recognized by many researchers. the application of the algorithms to the problem of ﬁnding the nearest lowrank correlation matrix is worked out . First.2 refers to twenty articles or books addressing the problem. which are made explicit. see Joshi (2003b).70 CHAPTER 4. Finally. we stress here that our approach considers a lower dimensional manifold. These are needed for the minimization algorithms. In Section 4. too. This is a canonical space for the optimisation of the arbitrary smooth function subject to the same constraints. Geometric optimisation techniques have previously been applied to the Thomson problem by Depczynski & St¨ckler (1998). The Thomson problem is concerned with minimizing the potential energy of n charged particles on the sphere in R3 (d = 3). in which the freedom of rotation has not been factored out. but these authors have only considered conjugate gradient o techniques on a ‘bigger’ manifold. This implies that the number of factors needed to ﬁt the model to the given correlation matrix can be high. gradient and Hessian. An implementation of geometric optimisation applied to the Thomson o problem has also been included in the ‘LRCM min’ package. or even more). In Section 4. for a literature review of interest rate models. the literature review of Section 3. a model with over 80 LIBORs is not uncommon. which allows for Newton’s algorithm (the latter not developed in Depczynski & St¨ckler (1998)). In fact. the reader is referred to Rebonato (2004a). 80. four.
1 Weighted norms We mention two reasons for assigning nonconstant or nonhomogeneous weights in the objective function of (4. . Here X 2 = tr(XΩXT Ω). It can thus be the case that we are more conﬁdent of speciﬁc entries of the matrix P. or (ii) a weighted Frobenius norm · F.2 Solution methodology with geometric optimisation Find C ∈ Sn to minimize ϕ(C) ˜ subject to rank(C) ≤ d. in Section 4. First. Such a projection. . the weighted norm of (4. In this chapter methods will be developed to solve Problem (4. i = 1. The geometric optimisation theory developed in this chapter. page 336). The seminorm in the objective function ϕ can be (i) a Hadamard seminorm with arbitrary weights per element of the matrix.2) has important applications in ﬁnance.3) for the case when ϕ is ˜ twice continuously diﬀerentiable. Rebonato (1999c) and Pietersz & Groenen (2004b). the reader is referred to Section 3. In Section 4. .2). cii = 1. can be eﬃciently applied to both cases.7.4.2). such an eﬃcient solution is not available. we assume d > 1. For the Hadamard seminorm. as deﬁned in (4. Higham (2002). The reason is that both these methods need to calculate a projection onto the space of matrices of rank d or less.2).1.1) is a special case of the following more general problem: (4.6. we numerically investigate the algorithms in terms of eﬃciency. C We note that Problem (4. see. Finally. since for d = 1 the constraint set consists of a ﬁnite number (2n−1 ) of points. can be eﬃciently found by an eigenvalue decomposition. to our knowledge. we conclude the chapter. The Lagrange multipliers and alternating projections methods however can only be eﬃciently extended to the case of the weighted Frobenius norm. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION 71 in detail.Ω with Ω a positive deﬁnite matrix. in our setting P has the interpretation of measured correlation. Second. For a literature review on rank reduction methods. .3) 0.Ω weighted Frobenius norm is. and most of the algorithms mentioned in Section 3. In the remainder of the chapter.2. .2. for example. The F. 4.2. n. from a practical point of view. for the weighted Frobenius norm. 4. by far less transparent than the Hadamard or weightsperentry seminorm (4. and as also mentioned in Higham (2002.
1 Basic idea The idea for solving Problem (4.d = C ∈ Sn .2. Such geometric optimisation has been developed by Smith (1993).2.e. in Section 4. The set Cn. We may apply geometric optimisation on the Cholesky manifold. a topological isomorphism) between the latter topological space and the quotient space of n products of the d−1 sphere over the group of orthogonal transformations of Rd . In the following. and we make an identiﬁcation with a certain quotient space.2 Topological structure In this section.d . of the same dimension as the rankd manifold. that maps surjectively to the constraint set. Intuitively the correspondence is as follows: We can associate with an n×n correlation matrix of rank d a conﬁguration of n points of unit length in Rd such that the inner product of points i and j is entry (i. Elements of Od are denoted by a d×d orthogonal matrix Q. Diag(C) = I.d is a subset of Sn .d with the subspace topology.72 CHAPTER 4. the set of n×n correlation matrices of rank d or less is equipped with the subspace topology from Sn . Here I denotes the identity matrix and Diag denotes the map Rn×n → Rn×n . however a dense subset is shown to be a manifold. We subsequently establish a homeomorphism (i. Denote by Od the group of orthogonal transformations of dspace. the constraint set is equipped with a topology.d are denoted as a matrix Y ∈ Rn×d . rank(C) ≤ d. In Section 4. it will be shown that the constraint set as such is not a manifold.2. Any orthogonal rotation of the conﬁguration does not alter the associated correlation matrix. j) of the correlation matrix. Diag(C)ij = δij cij . we will deﬁne a larger manifold (named Cholesky manifold ).3. Elements of Tn. where δij denotes the Kronecker delta.2. · .3) is to parameterize the constraint set by a manifold.2. This idea is developed more rigorously below. the product of n unit spheres S d−1 is denoted by Tn. such as Newton’s algorithm and conjugate gradient algorithms.. which in turn deﬁnes a topology. We equip Cn. The connection between minima on the Cholesky manifold and on the constraint set will be established. In Section 4. namely the set of matrices of rank exactly d. 4. C 0 . Deﬁnition 1 The set of symmetric n×n correlation matrices of rank at most d is deﬁned by Cn. and subsequently utilize the recently developed algorithms for optimisation over manifolds.4. RANK REDUCTION BY GEOMETRIC PROGRAMMING 4.2. Subsequently. The latter space is equipped with the Frobenius norm F . with each row vector yi of unit length.
E Cn. Such Y can always be found as will be shown in Theorem 2 below.1. (Y.d and Mn.A.4) An equivalence class {YQ : Q ∈ Od } associated with Y ∈ Tn. 1 It is trivially veriﬁed that the map thus deﬁned is indeed an Od smooth action: YI = Y and Y(Q1 Q2 )−1 = (YQ−1 )Q1 −1 .d .d /Od is denoted by Mn. Ψ [Y] We consider a map Φ in the inverse direction of Ψ.d → Cn.d Φ S Φ Ψ = YYT . The quotient space Tn. 73 (4. We have the following: (i) The maps Ψ and Φ are well deﬁned.e. Standard matrix multiplication is smooth. 2 2 Although rather obvious.4. Finally. (iii) The map Ψ is a homeomorphism with inverse Φ. Ψ ◦ Π = S and Φ ◦ S = Π. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION Deﬁnition 2 We deﬁne the following right Od action1 on Tn. deﬁne the map S : Tn.d . Q) → YQ.d is denoted by π. it will be shown in Theorem 2 that this map is well deﬁned. The canonical projection Tn. (ii) The diagram is commutative.d × Od → Tn.. Theorem 2 Consider the following diagram Tn.d is denoted by [Y] and it is called the orbit of Y.d . .d −→ Mn. deﬁned as follows: For C ∈ Cn.d .d such that YYT = C. It will be shown in Theorem 2 that this map is well deﬁned.d . i. Then set Φ(C) = [Y]. the proof has been deferred to Appendix 4.d . Deﬁne the map2 Ψ as Mn.d → Tn.2.d take Y ∈ Tn.d /Od = Mn.d Π Ψ © c (4.d : Tn. The following theorem relates the spaces Cn.5) Mn.d = Tn.d −→ Cn. S(Y) = YYT .d /Od with the objects and maps as in Deﬁnitions 1 and 2. Cn.
A. In our case we ∗ can explicitly give such a map Σ. However there is a subspace of Mn.d = Y ∈ Tn.d . jd ). It is obvious that UY and Π−1 (UY ) are open in the corresponding topologies. Since C is ˜ see for example Golub & positive deﬁnite. Duistermaat & Kolk (2000). RANK REDUCTION BY GEOMETRIC PROGRAMMING 4. The collection of all such I is denoted by IY . the reader is referred to do Carmo (1992). It is readily veriﬁed that IY is not empty. Let denote the lexicographical ordering. denoted by J(Y) = (j1 .d be the subspace deﬁned by ∗ Tn. ∗ ∗ ∗ A section on Mn. ) is a wellordered set. The proof of the following proposition has been deferred to Appendix 4. . ∗ ∗ ∗ 2. since for ∗ ∗ ∗ i ∈ JY . Then ΣY : UY → Π−1 (UY ). let p be the largest integer such that i > jp . but a socalled stratiﬁed space. such that Π ◦ Σ = ∗ idMn. .d is a map Σ : U → Tn.d the quotient space Tn.2. then (IY .2. Then Mn.d that will ultimately lead to the ﬁnal manifold over which will be optimized. Denote by Mn. It turns out that Mn. .d equipped with a diﬀerentiable structure For an exposition on diﬀerentiable manifolds. Deﬁne Y∗ = YQ. In the following. Deﬁne C = YYT . which is the manifold of equivalence classes of matrices of exactly rank d.d that is a manifold. we will ∗ study charts given by sections of the manifold Mn. We note that I is well deﬁned since any two Y(1) . thus yi = yji .2. a subset Mn.d . . with U open in Mn. . Deﬁne ˜ ˜ ˜˜ ˜ ˜ Y ∈ Rd×d by taking the rows of Y from JY .d is a manifold of dimension n(d − 1) − d(d − 1)/2.g. . yjp . Thus we can choose the smallest element.. .d of Mn. ∗ Proposition 2 Let Tn. ∗ Then deﬁne UY = {[Z] : J(Y) ∈ IZ } ⊂ Mn.d .d . We note that Y∗ is lowertriangular.d . ¯ van Loan (1996. see. / as JY is the smallest element from IY . .d ⊂ Tn. . and let I denote a subset of {1. Y(2) ∈ [Y] are coupled by an orthogonal transformation. e.d is a manifold. By Theorem 2. Tn. Then we have the following: ∗ 1. which implies a lowertriangular form for Y∗ . Such a map singles out a unique matrix in each equivalence class. . . such that dim(span({yi : i ∈ I})) = d. ∗ As shown in Proposition 2.d is not a manifold.d : rank(Y) = d . see the proof of Theorem 2. to obtain a unique lowertriangular matrix Y such that ¯¯ ˜ ¯ YYT = C and yii > 0. (4.d is a submanifold of Tn. Cholesky decomposition can be applied to C.5). n} with exactly d elements. then Yi is dependent on y1 .6) . and orthogonal transformations preserve independence. [Z] → Z∗ .3 A dense part of Mn.d .74 CHAPTER 4. for Y ∈ [Y]. there exists a unique orthogonal matrix Q ∈ Od ¯ ˜ such that Y = YQ.d /Od . . Let [Y] in Mn. Theorem 4.
d. we have that YYT attains a global minimum of ϕ on Cn.5. The image ΣY (UY ) is a smooth submanifold of Tn. . the proof of which has been relegated to Appendix 4. SOLUTION METHODOLOGY WITH GEOMETRIC OPTIMISATION 75 is a section of UY at Y. . . The map SΣY (UY ) is a homeomorphism.d → R.d is surjective.d . too. then there exists a Y ∈ Choln. . . Deﬁne the following submanifold of Tn.d . n . .2.d can be considered on Choln. . . we will write ϕ(Y) := ϕ(YYT ) viewed as a function on Choln.d . We can consider the map S : Tn. yi ∈ S i−1 . . . i = 2.A. . .d .d . . (2002). . we will show that. ∗ Proposition 3 The diﬀerentiable structure on Mn. it is suﬃcient to perform the optimisation on a compact manifold that contains one of the sections. . S d−1 is similarly embedded in Rd .4.d restricted to ΣY (UY ).d of dimension n(d − 1) − d(d − 1)/2. Choln.4 The Cholesky manifold In this section. Y → YYT → ϕ(YYT ). A function ϕ on Cn. From here on. Theorem 3 If C ∈ Cn. For simplicity we choose the section ΣY where J(Y) = {1. for the purpose of optimisation. The proof has been deferred to Appendix 4. i = d + 1. . but these authors do not consider nonEuclidean geometric optimisation.A.2.d is surjective. . . The Cholesky parametrization has been considered before by Rapisarda et al.d .d . in virtue of Theorem 2. . S ϕ ˜ . ˜ For a global minimum ϕ(Y) on Choln. The map SCholn. which we call the Cholesky manifold. d}. . .d → Cn. For a local minimum.d . i−1 with S+ embedded in Rd by the ﬁrst i coordinates such that coordinate i is bigger than 0 and with the remaining coordinates set to zero. yi ∈ S+ . yi ∈ S d−1 .d with the following representation in Rn×d T y1 . . in virtue of the following theorem. which is diﬀerentiable since ΣY (UY ) is a submanifold of Tn. .d → Cn. i = d + 1.d → Cn. . The proof has been deferred to Appendix 4. via the composition ˜ ˜ Choln. 0). For the purpose of optimisation. since the map S : Choln. The following proposition shows that the sections are the charts ∗ of the manifold Mn. 4. . 0.d = Y ∈ Rn×d : y1 = (1. we need a compact manifold which is surjective with Cn. we ˜ have the following theorem. n .d such that YYT = C. 0. Also.A. yi ∈ S d−1 . d. . . T yn T i−1 : y1 = (1.d is the one which makes ΣY : UY → Σ(UY ) into a diﬀeomorphism.3. . . 0).4. . i = 2.
.d . in turn.d .5 Choice of representation ˜ ˜ In principle. For the numerical computations however this choice is essential because the simplicity of formulas for the geodesics and parallel transport depends on the chosen coordinates. the geometric optimisation tools are developed for the Cholesky manifold. since they require a notion of diﬀerentiability. From a theoretical point of view. etc..d . 4. we might as well optimize ϕ over the manifold Choln.d ) = dim(Mn. In the next section.d .e. Our exposition follows do Carmo (1992). A tangent vector at a point Y is an element of TY Choln. the ∗ Cholesky manifold has the minimal dimension.d . parallel transport. these objects are introduced and made explicit for Choln. We insist however on explicit knowledge of the geodesics and parallel transport. such as geodesics.d show that. we can use such numerical methods.d . We found that simple expressions are obtained when Choln.1 Riemannian structure We start with a review of basic concepts of Riemannian geometry.2. is viewed as a subset of the ambient space Rn×d .d is denoted by TY Choln. for this is essential to obtaining an eﬃcient algorithm. We found that if we choose the Cholesky manifold then convenient expressions for geodesics. ˜ These considerations on global and local minima on Choln.d ). etc. there is no straightforward way to use numerical methods such as Newton and ˜ conjugate gradient. it is suﬃcient to calculate these on a single sphere.3 Optimisation over the Cholesky manifold For the development of minimization algorithms on a manifold. Moreover. which.d is viewed as a submanifold of Tn. but for optimisation of ϕ on Choln.d and is denoted by ∆. 4.d at a point Y ∈ Choln. we could elect another manifold M and a surjective open map M → Cn. dim(Choln. to optimize ϕ ˜ over Cn. For the optimisation of ϕ over Cn. This representation reveals that. to calculate geodesics and parallel transport on Choln.d .d if and only if YYT attains a local minimum of ϕ on Cn. A Riemannian . The tangent space of the manifold Choln. are obtained. Let M be an mdimensional diﬀerentiable manifold.d . it does not matter which coordinates we choose to derive the geometrical properties of a manifold. certain objects of the manifold need to calculated explicitly. RANK REDUCTION BY GEOMETRIC PROGRAMMING Theorem 4 The point Y attains a local minimum of ϕ on Choln.d .76 CHAPTER 4.3. In this section. i.d . 4.
. the tangent space at point Y. then ϕY (∆) can be expressed in this coordinate chart by m ∂ (4. OPTIMISATION OVER THE CHOLESKY MANIFOLD 77 structure on M is a smooth map Y → ·.11) . (4. · Y . In particular. ε). denoted by Gradϕ. A vector ﬁeld is a map on M that selects a tangent ∆ ∈ TY M at each point Y ∈ M . In particular. . χ smooth functions on M and Y1 . (4.j Xk . (4. for Newton and conjugate gradient methods. . Y3 vector ﬁelds on M .7) ϕY (∆) = (ϕ ◦ x−1 )(Υ) i ∂xi t=0 i=1 The linear space of linear functionals on TY M (the dual space) is denoted by (TY M )∗ .8) This vector ﬁeld is called the gradient of ϕ. namely the connection. . Then ϕY is a linear functional on TY M . A given family Y2 (t) of tangent vectors at the points Υ(t) is said to be parallel transported along Υ if Y1 Y2 = 0 on Υ(t). (4. . xm ) is equal to ∆. The Riemannian structure induces an isomorphism between TY M and (TY M )∗ . let ˙ Υ(t). Denote the diﬀerential of ϕ at a point Y by ϕY . respectively.3. such that ϕY (∆) = Gradϕ. Y2 are vector ﬁelds that coincide with Y1 (t) and Y2 (t). if (U. ˙ Let Υ(t) be a smooth curve on M with tangent vector Y1 (t) = Υ(t). Y2 . If the tangent vector Y1 (t) itself is parallel transported along Υ(t) then the curve Υ(t) is called a geodesic. satisfying the following two conditions: ϕY1 +χY2 Y3 =ϕ Y1 Y3 +χ Y2 Y3 . . Y2 on M a vector ﬁeld Y1 Y2 on M . . Xm } the corresponding vector ﬁelds then the aﬃne connection on U can be expressed by m Xi Xj = k=1 k γi. x1 . Let ϕ be a smooth function on a Riemannian manifold M . X Y for all X ∈ TY M. . x1 . To do this on a general manifold. be a smooth curve on M such that Υ(0) = Y and Γ(0) expressed in a coordinate chart (U. which guarantees the existence of a unique vector ﬁeld on M . Also. · Y on TY M . we need to be able to diﬀerentiate vector ﬁelds. A Riemannian manifold is a diﬀerentiable manifold with a Riemannian structure. we need to equip the manifold with additional structure. .10) where Y1 .4. on Υ(t). Y1 (ϕY2 ) =ϕ Y1 (Y2 ) + (Y1 ϕ)Y2 . we have to use second order derivatives. . which for every Y ∈ M assigns an inner product ·. . . A connection on a manifold M is a rule · · which assigns to each two vector ﬁelds Y1 . t ∈ (−ε. In particular. xm ) is a coordinate chart on M and {X1 .9) for ϕ.
This means that Christoﬀel symbols can be expressed as functions of a metric on M . It is straightforward to verify that if N = DY. .12) where xk are the coordinates of Υ(t).j are smooth functions. It follows that the normal space is n dimensional. once we have determined the equation for the geodesic.e..e.d .j xi xj = 0. then N is in the normal space.d (∆) = ∆ − Diag(∆YT )Y. Diag(∆YT ) = 0. ∆2 is deﬁned as ∆1 . D ∈ Rn×n diagonal .d at a point Y can be obtained by diﬀerentiating Diag(YYT ) = I yielding Diag(Y∆T +∆YT ) = 0. (4. we see that the normal space NY Tn.j=1 k γi. called the Christoﬀel symbols for the connection. the geodesic equation becomes m xk + ¨ i.d at Y ∈ Tn. it consists of the matrices N. the inner product ·.78 CHAPTER 4. ˙ ˙ (4. i. we can simply read oﬀ Christoﬀel symbols.d are given by ΠNY Tn.3.d = DY .. Both tangent spaces are identiﬁed with suitable subspaces of the ambient space Rn×d . · Y is independent of the point Y.d onto the normal and tangent spaces of Tn. With respect to an induced metric the geodesic is the curve of shortest length between two points on a manifold. for which tr∆NT = 0 for all ∆ in the tangent space. therefore we suppress the dependency on Y. 4. We use the LeviCivita connection. We note that (4. In components. and subsequently the inner product for two tangents ∆1 .2 Normal and tangent spaces An equation determining tangents to Tn.d is given by NY Tn. RANK REDUCTION BY GEOMETRIC PROGRAMMING k The functions γi. called the LeviCivita connection. On a Riemannian manifold there is a unique torsion free connection compatible with the metric. The dimension of the tangent space is n(d − 1). i. associated to the metric deﬁned as follows on the tangent spaces. We start by deﬁning Riemannian structures for Tn. We note that.12) implies as well that.d (∆) = Diag(∆YT )Y and ΠTY Tn.d and for the Cholesky manifold Choln. ∆2 = tr∆1 ∆T . For a manifold embedded in Euclidean space an equivalent characterization of a geodesic is that the acceleration vector at each point along a geodesic is normal to the manifold so long as the curve is traced with uniform speed.13) 2 which is the Frobenius inner product for n × d matrices. The normal space at the point Y is deﬁned to be the orthogonal complement of the tangent space at the point Y. in our special case. Since the dimension of the space of such matrices is n. The projections ΠNY Tn.d and ΠTY Tn. where D is n×n diagonal.
per component on the sphere.d onto the tangent space of Choln. OPTIMISATION OVER THE CHOLESKY MANIFOLD respectively.d with respect to the LeviCivita connection obey the second order diﬀerential equation ¨ ˙ ˙ Y + ΓY (Y.15) (4. Diag(YYT ) = I. substitute (4. ¨ ˙ ˙ ¨ Diag(YYT + 2YYT + YYT ) = 0.d is given by ΠTY Choln.d . Y) = 0. which yields (4.d in the direction ∆ ∈ TY(0) Tn. we obtain.17) (4. . with ΓY (∆1 . To obtain an expression for D. k The function ΓY is the matrix notation of the Christoﬀel symbols. . (4. γij .4.j=1 k γij (X1 )i (X2 )j . ¨ Y(t) = D(t)Y(t) (4. .3. the standard basis vectors of R .14). .14) for some diagonal matrix D(t). Ei Ej = nd γij Ek k=1 k with γij deﬁned by nd ΓY (X1 .18) for i = 1.d (∆) .19) Since Choln. we obtain an expression for the evolution of the tangent along the geodesic: ˙ Yi (t) = − ∆i sin ∆i t Yi (0) + cos ∆i t ∆i .d (∆) = Z ΠTY Tn.e. Y(t) must be in the normal space at Y(t). geodesics on Tn. . we begin with the condition that Y(t) remains on Tn.. 2 To see this. The geodesic at Y(0) ∈ Tn. with respect n×d k to E1 . ∆2 ) := Diag(∆1 ∆T )Y. n. i. ¨ In order for Y(·) to be a geodesic.16). Diﬀerentiating this equation twice. . In this coordinate system.d it has the same geodesics. End .17) into (4. Yi (t) = cos( ∆i t Yi (0) + 1 sin ∆i ∆i t ∆i .16) (4. By diﬀerentiating. Ek = i.3. The projection ΠTY Choln. . with Z(∆) deﬁned by zij (∆) = 0 δij for j > i or i = j = 1. (4. .3 Geodesics It is convenient to work with the coordinates of the ambient space Rn×d . 79 4.d is a Riemannian submanifold of Tn. More precisely.d is given by. X2 ). otherwise. .
22) .3. ΓY (∆1 . It follows that.d such that Hessϕ(∆. for all tangents X to Choln.d is parallel transported along a geodesic starting from Y(1) in the direction of ∆(1) ∈ TY(1) Tn. s=0 Newton’s method requires inverting the Hessian at minus the gradient. ∆i (t) = ∆i (0).8). X) = −Gradϕ. where ϕYY (∆1 .19) and Ri remains unchanged.d is a submanifold of Rn×d we can use coordinates of Rn×d to express the ∂F diﬀerential ϕY of ϕ at the point Y. (4.21) d dt Y = ∆1 .4 Parallel transport along a geodesic We consider this problem per component on the sphere. let ∆1 .d . ∆i (0) ∆i (t) + Ri . ∆2 ) = In local coordinates of Rn×d Hessϕ(∆1 . If ∆(2) ∈ TY(1) Tn. ∆2 (4.6 Hessian The Hessian Hessϕ of a function ϕ is a second covariant derivative of ϕ. t=0 d ds Y = ∆2 . ∆2 ) − ϕY . therefore we need to ﬁnd the tangent ∆ to Choln.5 The gradient Since Choln.d . The gradient Gradϕ of a function ϕ on Choln.d it has the same equations for parallel transport. then Hessϕ(∆1 . When it is clear in between which two points is transported. Gradϕ = ΠTY Choln. RANK REDUCTION BY GEOMETRIC PROGRAMMING 4.d → TY(2) Tn.20) 4.3. ∆2 ) = d d dt ds ϕ(Y(t. ∆2 ) = ϕYY (∆1 .d . Y(2) ) : TY(1) Tn.d can be determined by (4. (1) (2) (1) (2) (1) (1) 4. Then ∆i (t) changes according to (4.80 CHAPTER 4.3. s)).d is a Riemannian submanifold of Tn. (4. ∆2 be two vector ﬁelds. Ri ⊥∆i (0). More precisely.d (ϕY ) = Z ϕY − Diag(ϕY YT )Y . then decompose ∆(2) in terms of ∆(1) . Parallel transport from Y(1) to Y(2) deﬁnes an isometry T(Y(1) . ∆2 ) . X . Since Choln. namely (ϕY )ij = ∂Yij . then parallel transport is denoted simply by T. with t=s=0 ∆1 Gradϕ.
we discuss global convergence for the RiemannianNewton algorithm.23) 4. These algorithms are instances of the geometric programs presented in Smith (1993).4. a (k) line minimization of the objective value is performed in both directions. Subsequently.7 Algorithms We are now in a position to state the conjugate gradient algorithm.22). t=0 (4.3.d . 4. we discuss convergence properties of the geometric programs: global convergence and the local rate of convergence. X . X) = H.4. for all tangents X to Choln. So Algorithm 3 is adjusted in the following way. The standard way to resolve these issues. given as Algorithm 2. ∆(k) and ∆Steep . we have not seen any global convergence results for conjugate gradient . Such a steepest descent method with line minimization is well known to have guaranteed convergence to a local minimum. and the Newton algorithm. it is convenient to calculate the unique tangent vector H = H(∆) satisfying Hessϕ(∆. because the Hessian mapping could be singular. for optimisation over the Cholesky manifold. From (4.d (ϕYY (∆)) − Diag(ϕY YT )∆. 4.4.14) and (4. We then take as the next point of the algorithm whichever search direction ﬁnds the point with lowest objective value. DISCUSSION OF CONVERGENCE PROPERTIES 81 To solve (4.4 Discussion of convergence properties In this section.21). given as Algorithm 3. as displayed in Algorithm 3. Moreover. where the notation ϕYY (∆) means the tangent vector satisfying ϕYY (∆) = d dt ˙ ϕY (Y(t)). we obtain H(∆) = ΠTY Choln.22) becomes H(∆) = −Gradϕ. we discuss global convergence for conjugate gradient algorithms. the steps in Algorithm 3 may even not be well deﬁned. is to introduce jointly a steepest descent algorithm. since then the Newton Equation (4. When the new search direction ∆(k) has been calculated. then we also (k) consider the steepest descent search direction ∆Steep = −Gradϕ(Y(k) ).1 Global convergence First. For the Riemannian case. Second. Y(0) = ∆. for the particular case of the Cholesky manifold. It is well known that the Newton algorithm. is not globally convergent to a local minimum.
d and Z ϕYY (∆(k) ) − Diag ϕYY (∆(k) )(Y(k) )T Y(k) −Diag(ϕY (Y(k) )T )∆(k) = −G(k) . .82 CHAPTER 4.G(k) . do Compute G(k) = Gradϕ(Y(k) ) = Z ϕY − Diag(ϕY YT )Y . Require: Y(0) such that Y(0) (Y(0) )T = I.. FletcherReeves. PolakRibi`re.d . 2 end for Algorithm 3 Newton’s method for minimizing ϕ(Y) on Choln. k G(k) 2 Reset J(k+1) = −G(k+1) if k + 1 ≡ 0 mod n(d − 1) − 1 d(d − 1). 2. ϕ(·). 1. Compute the new search direction (k+1) −TG(k) . e (k+1) (k+1) (k) J = −G +γk TJ where γ = G(k+1) 2 .e. Compute G(k+1) = Gradϕ(Y(k+1) ) = Z ϕY − Diag(ϕY (Y(k+1) )T )Y(k+1) ... Move from Y(k) in direction ∆(k) to Y(k) (1) along the geodesic.G(k+1) γk = G G(k) . Set Y(k+1) = Y(k) (1). Compute G(0) = Gradϕ(Y(0) ) = Z(ϕY − Diag(ϕY (Y(0) )T )Y(0) ) and set J(0) = −G(0) . Input: Y(0) . 1. end for . ∆(k) ∈ TY Choln.. RANK REDUCTION BY GEOMETRIC PROGRAMMING Algorithm 2 Conjugate gradient for minimizing ϕ(Y) on Choln. for k = 0. 2. Compute ∆(k) = −H−1 G(k) . Parallel transport tangent vectors J(k) and G(k) to the point Y(k+1) . Require: Y(0) such that Diag(Y(0) (Y(0) )T ) = I. i. ϕ(·). . do Minimize ϕ Y(k) (t) over t where Y(k) (t) is a geodesic on Choln. for k = 0.d Input: Y(0) .d starting from Y(k) in the direction of J(k) . Set tk = tmin and Y(k+1) = Y(k) (tk ).
convergence runs are displayed in Figure 4. e 4. generated by Algorithm 2. Priouret & Malajovich (2003).4.d )steps quadratic j=0 ˆ convergence to Y. to minus the gradient. to be discussed in Section 4.3). If Y is a nondegenerate stationary point. the sequence of points produced ˆ by Algorithm 3 converges quadratically to Y. Figure 4. the following result is established for the Riemannianˆ Newton method.2. Steepest descent. see Smith (1993.4. the Levenberg (1944) & Marquardt (1963) method. such that starting from any Y(0) in U. 4. FletcherReeves conjugate gradient. converging to Y. Theorem 2. for which the search direction J(k+1) in Algorithm 2 is equal to −G(k+1) . The code that is used for this test is the package ‘LRCM min’. which is a Newtontype method.3 of Smith (1993). then there exists an open set ˆ U containing Y. PolakRibi`re conjugate gradient. 2. Lev. for suﬃciently large j. Newton. .4. FRCG. for reducing a 10 × 10 correlation matrix to rank 3. Arias & Smith (1999) and Dedieu. 5. In Theorem 4.. the sequence {Y(j) }∞ has dim(Choln. DISCUSSION OF CONVERGENCE PROPERTIES 83 algorithms in the literature.e. This package also contains the correlation matrix used for the convergence run test. Zoutendijk (1970) and AlBaali (1985) establish global convergence of the Fletcher & Reeves (1964) conjugate gradient method with line minimization.2 clearly illustrates the convergence properties of the various geometric programs.6. the following result is stated for the Riemannian Fletcher & Reeves (1964) and Polak & Ribi`re (1969) conjugate gradient methods. PRCG. The following algorithms are compared: 1.3 of Smith (1993). In Theorem 3. Edelman. e 3. j=0 Then. Supe ˆ ˆ pose Y is a nondegenerate stationary point such that the Hessian at Y is positive deﬁnite.6 below.. The eﬃciency of the algorithms is studied in Section 4. As a numerical illustration. Therefore we focus on the results obtained for the ﬂatEuclidean case. (j) ∞ ˆ Suppose {Y } is a sequence of points. i. The steepest descent method has a linear local rate of convergence.2 Local rate of convergence Local rates of convergence for geometric optimisation algorithms are established in Smith (1993).Mar. Gilbert & Nocedal (1992) establish alternative line search minimizations that guarantee global convergence of the Polak & Ribi`re (1969) conjugate gradient method.
2: Convergence runs: log relative residual ln( Gradϕ(Y(i) ) / Gradϕ(Y(0) ) ) versus the iterate i. RANK REDUCTION BY GEOMETRIC PROGRAMMING 0 −5 −10 Log relative residual −15 −20 −25 −30 Steepest descent PRCG FRCG Newton Lev. .− Mar 0 20 40 60 80 100 Iterate (i) 120 140 160 180 200 −35 Figure 4.84 CHAPTER 4.
5. Formally.4. n = 3 1 x y x 1 z .2). Second. some particular choices for n and d are examined. A SPECIAL CASE: DISTANCE MINIMIZATION 85 4. To get an intuitive understanding of the complexity of the problem. The objective function (4. The outline of this section is as follows. W◦1/2 ◦ Ψ . 4. Third. 2 . the primary concern of this chapter to minimize the objective function of (4. Fourth.5.5.5. 4. y z 1 A 3 × 3 symmetric matrix with ones on the diagonals is denoted by Its determinant is given by det = − x2 + y 2 + z 2 + 2xyz + 1. First.2 The case of d = 2. for two matrices A and B of equal dimensions. we discuss the PCA with rescaling method for obtaining an initial feasible point. By straightforward calculations it can be shown that det = 0 implies that all eigenvalues are nonnegative.5 A special case: Distance minimization In this section. this will lead to an identiﬁcation method of whether a local minimum is a global minimum.1 The case of d = n The case that P is a symmetric matrix and the closest positive semideﬁnite matrix C is to be found allows a successive projection solution. the Hadamard product A◦B is deﬁned by (A◦B)ij = aij bij . the connection with Lagrange multipliers is stated. the feasible region has been displayed in Figure 4. The Hadamard product denotes entrybyentry multiplication.2) can then be written as ϕ(Y) = 1 2 T wij (ρij − yi yj )2 = i<j 1 W◦1/2 ◦ Ψ 2 2 ϕ = 1 W◦1/2 ◦ Ψ. the diﬀerential and Hessian of ϕ are calculated.3 Formula for the diﬀerential of ϕ We consider the speciﬁc case of the weighted Hadamard seminorm of (4.2) is studied. which was shown by Higham (2002). The set of 3 by 3 correlation matrices of rank 2 may thus be represented by the set {det = 0}. This seminorm can be represented by a Frobenius norm by introducing the Hadamard product ◦. 4. in particular.1.
d .86 CHAPTER 4.24). W ◦ Ψ ∆. We note that the Lagrange theory is based on an eﬃcient expression of the lowrank projection by an eigenvalue decomposition.4 Connection normal with Lagrange multipliers The following lemma provides the basis for the connection of the normal vector at Y versus the Lagrange multipliers of the algorithm of Zhang & Wu (2003) and Wu (2003). Lemma 1 Let Y ∈ Tn. Deﬁne 1 diag ϕY YT 2 and deﬁne P(λ) := P + Diag(λ).1. ϕY . with ϕY in (4. The result below establishes the reverse direction. dt t=0 with Y(·) any curve starting from Y in the direction of ∆. This Lagrange result will allow us to identify whether a local minimum is also a global minimum. W◦1/2 ◦ Ψ = Ψ. . W ◦ Ψ ∆YT + Y∆T . Here. (4.A. 2(W ◦ Ψ)Y = ∆.5. That we are able to eﬃciently determine whether a local minimum is a global minimum. Then there exist a joint eigenvalue decomposition λ := P(λ) = QDQT . RANK REDUCTION BY GEOMETRIC PROGRAMMING √ wij . ϕYY (∆) = 4. Therefore the theory below can be extended eﬃciently only for the Hadamard norm with equal weights and for the weighted Frobenius norm. Gradϕ is the gradient of ϕ on Tn. is a very rare phenomenon in nonconvex optimisation. see also the discussion in Section 4.24) Similarly. ϕY = 2(W ◦ Ψ)Y. we may compute the second derivative d ϕY (Y(t)) = 2 (W ◦ Ψ)∆ + W ◦ (∆YT + Y∆T ) Y . and makes the rank reduction problem (nonconvex for d < n) all the more interesting.d be such that Gradϕ(Y) = 0. Then with Ψ := YYT − P and with (W◦1/2 )ij = d ϕ(Y(t)) = dt = = = Thus from (4.d (ϕY ) = ϕY − Diag(ϕY YT )Y.7) we have ˙ ˙ W◦1/2 ◦ Ψ. ∀∆. The proof of the following lemma has been deferred to Appendix 4. W ◦ Ψ ∆YT . Gradϕ(Y) = ΠTY Tn. YYT = QD∗ QT where D∗ can be obtained by selecting at most d nonnegative entries from D (here if an entry is selected it retains the corresponding position in the matrix). W ◦ Ψ + Y∆T .1.6. The result is novel since previously only an expression was known for the matrix Y given the Lagrange multipliers.
4.6. NUMERICAL RESULTS
87
The characterization of the global minimum for Problem (4.1) was ﬁrst achieved in Zhang & Wu (2003) and Wu (2003), which we repeat here: Denote by {C}d a matrix obtained by eigenvalue decomposition of C together with leaving in only the d largest eigenvalues (in absolute value). Denote for λ ∈ Rn : P(λ) = P + Diag(λ). The proof of the following theorem has been repeated for clarity in Appendix 4.A.7. Theorem 5 (Characterization of the global minimum of Problem (4.1), see Zhang & Wu (2003) and Wu (2003)) Let P be a symmetric matrix. Let λ∗ be such that there exists {P + Diag(λ∗ )}d ∈ Cn,d with Diag {P + Diag(λ∗ )}d = Diag(P). (4.25)
Then {P + Diag(λ∗ )}d is a global minimizer of Problem (4.1). This brings us in a position to identify whether a local minimum is a global minimum: Theorem 6 Let Y ∈ Tn,d be such that Gradϕ(Y) = 0 on Tn,d . Let λ and P(λ) be deﬁned as in Lemma 1. If YYT has the d largest eigenvalues from P(λ) (in absolute value) then YYT is a global minimizer to the Problem (4.1). PROOF: Apply Lemma 1 and Theorem 5. 2
4.5.5
Initial feasible point
To obtain an initial feasible point Y ∈ Tn,d we use the modiﬁed PCA method described in Section 3.2.1. To obtain an initial feasible point in Choln,d , we perform a Cholesky decomposition as in the proof of Theorem 3. We note that the condition in Section 3.2.1 of decreasing norm of the eigenvalues is thus key to ensure that the initial point is close to the global minimum, see the result of Theorem 6.
4.6
Numerical results
There are many diﬀerent algorithms available in the literature, as detailed in Section 3.2. Some of these have an eﬃcient implementation, i.e., the cost of a single iteration is low. Some algorithms have fast convergence, for example, the Newton method has quadratic convergence. Algorithms with fast convergence usually require less iterations to attain a predeﬁned convergence criterion. Thus, the realworld performance of an algorithm is a tradeoﬀ between costperiterate and number of iterations required. A priori, it is not clear which algorithm will perform best. Therefore, in this section, the numerical performance of geometric optimisation is compared to other methods available in the literature.
88
CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
4.6.1
Acknowledgement
Our implementation of geometric optimisation over lowrank correlation matrices ‘LRCM min’3 is an adoption of the ‘SG min’ template of Edelman & Lippert (2000) (written in MATLAB) for optimisation over the Stiefel and Grassmann manifolds. This template contains four distinct wellknown nonlinear optimisation algorithms adapted for geometric optimisation over Riemannian manifolds: Newton algorithm; dogleg step or Levenberg (1944) and Marquardt (1963) algorithm; Polak & Ribi`re (1969) conjugate gradient; and e Fletcher & Reeves (1964) conjugate gradient.
4.6.2
Numerical comparison
The performances of the following seven algorithms, all of these described in Sections 4.3 and 3.2, except for item 7 (fmincon), are compared: 1. Geometric optimisation, Newton (Newton). 2. Geometric optimisation, FletcherReeves conjugate gradient (FRCG). 3. Majorization, e.g., Pietersz & Groenen (2004b) (Chapter 3) (Major.). 4. Parametrization, e.g., Rebonato (1999b) (Param.). 5. Alternating projections without normal vector correction, e.g., Grubiˇi´ (2002) sc (Alt. Proj.). 6. Lagrange multipliers, e.g., Zhang & Wu (2003) (Lagrange). 7. fmincon, a MATLAB builtin mediumscale constrained nonlinear program (fmincon). We note that the ﬁrst two algorithms in this list have been developed in this chapter. The algorithms are tested on a large number (one hundred) of randomly generated correlation matrices. The beneﬁt of testing on many correlation matrices is, that the overall and generic performance of the algorithms may be assessed. The random ﬁnancial correlation matrices are generated as described in Section 3.5.1. As the benchmark criterion for the performance of an algorithm, we take its obtained accuracy of ﬁt given a ﬁxed amount of computational time. Such a criterion corresponds to ﬁnancial practice, since decisions based on derivative valuation calculations often need to be made within seconds. To display the comparison results, we use the stateoftheart and convenient performance proﬁles; see Dolan & Mor´ (2002). The reader is referred e
3
LRCM min can be downloaded from www.few.eur.nl/few/people/pietersz.
4.6. NUMERICAL RESULTS
100
Newton Major.
FRCG
89
90
fmincon
Lagrange
Preformance of attained performance ratio Ω
80
70
60
50
Alt. Proj.
40
Newton FRCG Major. Param. Alt. Proj. Lagrange fmincon
30
20
10
Param.
0
1
1.01
1.02
1.03
1.04 1.05 Performance ratio ξ
1.06
1.07
1.08
Figure 4.3: Performance proﬁle with n = 30, d = 3, 2 seconds of computational time, Hadamard norm with equal weights. A rule of thumb is, that the higher the graph of an algorithm, the better its performance.
there for details, but the idea is described brieﬂy in Section 3.5.1. A rule of thumb is, that the higher the proﬁle of an algorithm, the better its performance. The performance proﬁles are displayed in Figures 4.3–4.6, for various choices of n, d, and computational times. Each performance proﬁle represents a benchmark on 100 diﬀerent test interest rate correlation matrices. For Figures 4.3–4.5, an objective function with equal weights is used. For Figure 4.6, we use a Hadamard seminorm with nonconstant weights. These weights are chosen so as to reﬂect the importance of the correlation entries for a speciﬁc trigger swap, as outlined in, e.g., Rebonato (2004b, Section 20.4.3). For this speciﬁc trigger swap, the ﬁrst three rows and columns are important. Therefore the weights matrix W takes the form 1 if i ≤ 3 or j ≤ 3, wij = 0 otherwise. From Figures 4.3–4.6 it becomes clear that geometric optimisation compares favourably to the other methods available in the literature, with respect to obtaining the best ﬁt to the original correlation matrix within a limited computational time.
90
CHAPTER 4. RANK REDUCTION BY GEOMETRIC PROGRAMMING
100
FRCG
Major.
90
fmincon
Alt. Proj.
Preformance of attained performance ratio Ω
80
Newton
70
Lagrange
60
50
40
Newton FRCG Major. Param. Alt. Proj. Lagrange fmincon
30
20
10
Param.
0
1
1.1
1.2
1.3 Performance ratio ξ
1.4
1.5
Figure 4.4: Performance proﬁle with n = 50, d = 4, 1 second of computational time, Hadamard norm with equal weights.
4.7
Conclusions
We applied geometric optimisation tools for ﬁnding the nearest lowrank correlation matrix. The diﬀerential geometric machinery provided us with an algorithm more eﬃcient than any existing algorithm in the literature, at least for the numerical cases considered. The geometric approach also allows for insight and more intuition into the problem. We established a technique that allows one to straightforwardly identify whether a local minimum is a global minimum.
4.A
4.A.1
Appendix: Proofs
Proof of Theorem 2
PROOF of (i). The maps Ψ and Φ are well deﬁned: To show that Ψ is well deﬁned, we T T need to show that if Y2 ∈ [Y1 ], then Y2 Y2 = Y1 Y1 . From the assumption, we have
4.A. APPENDIX: PROOFS
100
FRCG
Newton
91
Major.
90
Lagrange
Alt. Proj.
Preformance of attained performance ratio Ω
80
70
60
50
fmincon
40
Newton FRCG Major. Param. Alt. Proj. Lagrange fmincon
30
20
10
Param.
0
1
1.05
1.1
1.15
1.2 1.25 1.3 Performance ratio ξ
1.35
1.4
1.45
1.5
Figure 4.5: Performance proﬁle with n = 60, d = 5, 3 seconds of computational time, Hadamard norm with equal weights.
that ∃Q ∈ Od : Y2 = Y1 Q. If follows that
T T Y2 Y2 = (Y1 Q)(Y1 Q)T = Y1 QQT Y1T = Y1 Y1 ,
which was to be shown. To show that Φ is well deﬁned, we need to show: (A) If C ∈ Cn,d then there exists Y ∈ Tn,d such that C = YYT . (B) If X, Y ∈ Tn,d , with XXT = YYT =: C then there exists Q ∈ Od such that X = YQ. Ad (A): Let C = QDQT , Q ∈ On , D = Diag(D), be an eigenvalue decomposition with dii = 0 for i = d + 1, . . . , n. We note that such a decomposition of the speciﬁed form is possible because of the restriction C ∈ Cn,d . Then note that √ √ Q D = (Q D)(:, 1 : d) 0 .
It follows that there exists an orthogonal rotation Q. xk . We note that then also Qxi = yi for x i = k + 1. The map Ψ is thus . yk+1 .d . QQT = I. trigger swap Hadamard seminorm. by linearity of Q and since the last n − k row vectors are linearly dependent on the ﬁrst k row vectors by assumption. then S(Y) = YYT and Φ(YYT ) = [Y] and also Π(Y) = [Y]. we obtain a basis {y1 . which was to be shown. fmincon Preformance of attained performance ratio Ω 80 70 60 Newton FRCG Major. . . . then Π(Y) = [Y] and Ψ([Y]) = YYT and also S(Y) = YYT . . which was to be shown. . .002 1. . . Similarly. . . . . √ Thus if we set Y = (Q D)(:. To show that Φ ◦ S = Π: Let Y ∈ Tn. Without loss of generality. . .006 1. such that d ˜ ˜ the latter forms a basis of R . .d . xd }.018 Figure 4. we may assume that the ﬁrst k rows of X and Y are independent. d). n. yd } of Rd . . xk+1 . .92 CHAPTER 4. Param. . k). We extend the set of ˜ ˜ k row vectors {x1 . .5) is commutative: To show Ψ ◦ Π = S: Let Y ∈ Tn. . .008 1. . . . Ad (B): Let rank(X) = rank(Y) = rank(C) = k ≤ d.d .016 1. RANK REDUCTION BY GEOMETRIC PROGRAMMING 100 FRCG Newton 90 Major. . 2 PROOF of (iii). 2 PROOF of (ii). such that Qxi = yi ˜ (i = 1. . Diagram (4. 1 : d) then YYT = C and Y ∈ Tn. yk . . Q˜ i = yi (i = k + 1.004 1. . The map Ψ is a homeomorphism with inverse Φ: It is straightforward to verify that Φ ◦ Ψ and Ψ ◦ Φ are both the identity maps. . It follows that XQ = Y.01 Performance ratio ξ 1. d = 3. 1 second of computational time. . 0 1 1.014 1. fmincon 50 40 30 20 10 Param. . .012 1.6: Performance proﬁle with n = 15. xk } to a set of d row vectors {x1 .
d ) − dim(Od ) = n(d − 1) − 1 d(d − 1). 4 For a deﬁnition.d .d × Od is bounded it −1 follows that Ξ (K) is compact. −1 ∗ ∗ By continuity of Ξ. Theorem 5. This theorem essentially states that for a smooth action of a Lie group on a manifold the quotient is a manifold if the action is proper and free. it follows from the proof of Theorem 2 (i) that there exists precisely one Q ∈ Od such that YQ = Y.4. we have an open neighbourhood U ⊆ Mn. . this Q must be the identity matrix. for every [Y] ∈ Mn. we show that the ∗ action of Od on Tn. Since rank(Y) = d. Q) → (YQ.d × Od .d and Q ∈ Od such that YQ = Y. It is suﬃcient to show that {Y ∈ Rn×d : rank(Y) = d} is open in Rn×d . V(Y)). it follows that {Y ∈ Rn×d : rank(Y) = d} = S−1 (rank−1 (d)) is an open subset of Rn×d .A. We have to show that Ξ−1 (K) is compact. Let Y ∈ Tn.8). Marsden & Ratiu (1988).A. 2 2.d × Od → Tn. 2 4.d × Tn. Y) ∗ ∗ and K a compact subset of Tn.d and a bijective map: U : Π−1 (U) → U × Od . In our case. To show that Ψ is continuous.d is open in Tn. Because Tn.4 of Duistermaat & Kolk (2000).3 Proof of Proposition 3 This part is a corollary of Theorem 1.4 of Duistermaat & Kolk (2000). ∗ ∗ The dimension of Mn. Let ∗ ∗ ∗ Ξ : Tn.d = dim(Tn.A. This part is a corollary of Theorem 1.11.6).d is proper4 .d if and only if {Y ∈ Rn×d : rank(Y) = d} is open in Rn×d . This theorem states that there is only one diﬀerentiable structure on the orbit space which satisﬁes the ∗ ∗ following: Suppose that.2 Proof of Proposition 2 1. APPENDIX: PROOFS 93 bijective with inverse Φ. Proposition 1.4. Since the rank of a symmetric matrix is a locally constant function. with S(Y) = YYT as in Deﬁnition 2. 2 2 4.d .11. page 53).d .d is free. Ξ (K) is closed in Tn. ∗ ∗ Second. Y → (Π(Y). First. Ψ ◦ Π = S with S(Y) = YYT is continuous. see Duistermaat & Kolk (2000. The proof now follows from a wellknown lemma from topology: A continuous bijection from a compact space into a Hausdorﬀ space is a homeomorphism (see for example Munkres (1975). we show that the Od action on Tn. (Y. Thus.d × Tn. since ∗ Tn. note that for quotient spaces we have: The map Ψ is continuous if and only if Ψ ◦ Π is continuous (see for example Abraham.
Then we deﬁne UY by UY (Z) = ([Z]. we have that U−1 ([Z]. For Z ∈ Π−1 ([Z]). Y∗ 0 n×(d−k) . The topology of Mn.d → Cn.4 Proof of Theorem 3 Let C ∈ Cn.d is open.d . The procedure in Section 4. . Q ∈ Od . but k should be read there in this case. Apply to Y the procedure5 outlined in Section 4.4 of Duistermaat & Kolk (2000) stated above. By deﬁnition. note that it is suﬃcient to show that Π : Choln. U(YQ) = (Π(Y). [Z] ∈ UY . to obtain a lowertriangular matrix Y∗ ∈ Tn. Q ∈ Od .d that satisﬁes YYT = C can now easily be obtained by setting ¯ Y= which was to be shown.d . We have Π−1 Π(U) 5 = YQ : Y ∈ U.A.d and suppose that rank(C) = k ≤ d.d → Cn.d and ΣY be a section over UY deﬁned in (4.2. item 3) and S = Ψ ◦ Π. then. since for any C = Y Y T ∈ S(U). Thus. then S(Y) = YYT attains a local minimum of ϕ on the open ˜ neighbourhood S(U) of YYT . A lowertriangular matrix ¯ ¯¯ Y ∈ Choln. QZ ).d is a homeomorphism (see Proposition 2.k such that YYT = C.d is open. For then if Y attains a local minimum of ϕ on the open neighbourhood U ⊂ Choln. We deﬁne UY : −1 Π (UY ) → UY ×Od as follows.d . ∗ Let Y ∈ Tn.d → Mn. It follows that UY is a diﬀeomorphism if and only if ΣY : U → ΣY (U) ∗ is a diﬀeomorphism. we prove the ‘only if’ part. The diﬀerentiable ∗ ∗ structure on Mn. 2 4. Then there is a Y ∈ Tn. Suppose.A.3 is stated in terms of d. ϕ(C ) = ϕ(Y Y T ) = ˜ ˜ T ϕ(Y ) ≥ ϕ(Y) = ϕ(YY ). ˜ To show that S : Choln. the diﬀerentiable structure on Mn.d is open. RANK REDUCTION BY GEOMETRIC PROGRAMMING such that.d . for every Y ∈ Π−1 (U). Q) = ΣY ([Z])Q. V(Y)Q). Since U−1 : Π−1 (UY ) → UY ×Od is a Y Y bijective map.5 Proof of Theorem 4 First. 2 4. by deﬁnition of the quotient topology of Mn. we have that UY is bijective. since Ψ : Mn.2.k . too. that U is open in Choln.11. such that Y∗ (Y∗ )T = C. QY Q) of Theorem 1.94 CHAPTER 4.d is the one which makes U into a diﬀeomorphism.6). We note that it is suﬃcient to show that the map S : Choln.d → Cn.3. We have to show that Π−1 (Π(U)) is open in Tn. It can be easily veriﬁed that UY satisﬁes the condition UY (YQ) = ([Y].d is the one which makes ΣY : UY → Σ(UY ) into a diﬀeomorphism.d obtained in this manner is equal to the quotient topology. by Theorem 2. there is a unique element QZ ∈ Od such that Z = ΣY ([Z])QZ .
28) and from the symmetry of P(λ) that ¯ (i) YYT P(λ) = 0 and also. ¯ From (ii). YYT and P(λ) admit a joint eigenvalue decomposition.6 Proof of Lemma 1 It is recalled from matrix analysis that C1 and C2 admit a joint eigenvalue decomposition ¯ if and only if their Lie bracket [C1 .A. with Z(ij ) → Z∗ ∈ U c and Q(ij ) → Q∗ . i→∞ i→∞ (4.d is continuous. Let {Y(i) } be a sequence in (Π−1 (Π(U)))c converging to Y. say. We can write Y(i) = Z(i) Q(i) with Z(i) ∈ U c and Q(i) ∈ Od .7 Proof of Theorem 5 Deﬁne the Lagrangian (C.28) (4.d (ϕY ) of ϕY onto the normal space at Y. From (4. C2 ] = C1 C2 − C2 C1 equals zero. Q(ij ) )}. Then. The reverse direction is obvious since the map S : Choln. We note also that ¯ YYT + P(λ) = P(λ).26) it follows that Z∗ = Y(Q∗ )T ∈ U c .27) The last equality follows from the assumption that Gradϕ(Y) = 0. limi→∞ Y(i) − Y = 0. 2 4. APPENDIX: PROOFS 95 It is suﬃcient to show that the complement (Π−1 (Π(U)))c is closed. ¯ (ii) [YYT .26) Since U c × Od is compact.e.d .A. there exists a convergent subsequence {(Z(ij ) . Suppose P(λ) = QDQT . i.) It follows from ¯ (4. and . From (i) we then have that d∗ and ii ¯ dii cannot both be nonzero. P(λ)] = 0. but then also jointly ¯ ¯ with P(λ) because of (4. (4. 2 4.4. λ) := − P − C 2 ϕ − 2λT diag(P − C).d → Cn. the diﬀerential ϕY is normal at Y.27). i.A. Gradϕ(Y) denotes the gradient on Tn. Deﬁne P(λ) := −Ψ + Diag(λ). We note that 2Diag(λ)Y is the projection ΠNY Tn. which implies Y ∈ (Π−1 (Π(U)))c . i→∞ lim Y(i) − Y = lim Z(i) Q(i) − Y = lim Z(i) − Y(Q(i) )T = 0.e. We calculate ¯ P(λ)Y = 1 1 − Ψ + Diag(λ) Y = − ϕY + ΠNY Tn. The result now follows since YYT is positive semideﬁnite and has rank less than or equal to d.d ϕY 2 2 = 0. (Here.
29) We note that the minimization problem in (4. Equation (30) of Wu (2003)).) Here (in)equality (a) is obtained from the property that C ∈ Cn. and (c) is by assumption of (4. 2 .d .29) is attained by any {P(λ)}d (see e.25). λ) : rank(C) = d . (This is the equation at the end of the proof of Theorem 4.. (4.g. For any C ∈ Cn.4 of Zhang & Wu (2003). RANK REDUCTION BY GEOMETRIC PROGRAMMING v(λ) := min (C.96 CHAPTER 4.d . P−C 2 (a) F = − (C. λ∗ ) ≥ −v(λ∗ ) = P − {P(λ)}d (b) (c) 2 F. (b) is by deﬁnition of v.
Chapter 5 Fast driftapproximated pricing in the BGM model 1 It is demonstrated that the forward rates process discretized by a single time step together with a separability assumption on the volatility function allows for representation by a lowdimensional Markov process. (2001). In an attempt to limit the computational time. In the BGM framework almost all prices are computed using Monte Carlo simulation. (1997) and Jamshidian (1997). Miltersen et al. We compare the single time step method for pricing on a grid with multistep Monte Carlo simulation for a Bermudan swaption. & van Regenmortel. Hunter et al. Wilmott Magazine 18... reporting a computational speed increase by a factor 10. A. Pelsser. Section 12. 93–124. J. R. which reduce the Monte Carlo stage to single timestep simulation. J¨ckel (2002.5) and Kurbanmuradov. it has the drawback of being computationally rather slow. is now one of the most popular models for pricing interest rate derivatives. This in turn leads to eﬃcient pricing by. (1997). (2004). 98–103. M. ‘Fast driftapproximated pricing in the BGM model’. 1 This chapter has been published in diﬀerent form as Pietersz. developed by Brace et al.1 Introduction The BGM framework. R. 5. A. A. yet maintaining suﬃciently accurate pricing. An extended abstract of this chapter appeared as Pietersz. However. (2005). Journal of Computational Finance 8(1). M. a Sabelfeld & Schoenmakers (2002) introduced predictorcorrector drift approximations. . ‘Bridging Brownian LIBOR’. Pelsser. J. & van Regenmortel. for example. We then develop a discretization based on the Brownian bridge that is especially designed to have high accuracy for single time stepping. A. The scheme is proven to converge weakly with order one. ﬁnite diﬀerences. An advantage of Monte Carlo is its applicability to almost any product.
which is proven to have leastsquares error in a certain sense (to be deﬁned) for single time step discretizations. separability. both for a one. After setting out some basic notation and the most important formulas for the BGM model. The latter is a nonrestrictive requirement on the form of the volatility function. This chapter also contains two other results: • The ﬁrst result is a new time discretization using a Brownian bridge. various discretization schemes are discussed and the Brownian bridge scheme is introduced. In Section 5. the single time step pricing framework is developed. In the second part of Section 5.6. who proposed the use of drift approximations.3. • The second result is a method for measuring the accuracy of single time stepping.9. This method. We show that much more eﬃcient numerical methods (either numerical integration or ﬁnite diﬀerences) may be used at the cost of a minor additional assumption. A comparison is made with prices obtained by leastsquares multitime step Monte Carlo simulation in the BGM model.98 CHAPTER 5.5 it is shown numerically that the Brownian bridge scheme outperforms (in the case of single time steps) various other discretizations for the LIBORinarrears density test. we compare the Brownian bridge scheme numerically with other discretizations for multitime steps. conclusions are drawn. This enables eﬃcient implementation. This is followed by an example of the pricing of Bermudan swaptions. The particular application of the likelihood ratio method to the LIBOR market model has been developed by Glasserman & Zhao (1999). The Brownian bridge scheme is then investigated theoretically and numerically for both single and multitime steps. eﬃciently estimates risk sensitivities for Monte Carlo pricing. The outline of this chapter is as follows. Next. The single time step together with separability renders the state of the BGM model completely determined by a lowdimensional Markov process. This is the timing inconsistency test as outlined in Section 5. The computational speed increase achieved with the use of ﬁnite diﬀerences for BGM single time step pricing is the main result. A further application of the Brownian bridge drift approximation is its use in the likelihood ratio method. In the ﬁrst part of Section 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM This chapter presents a signiﬁcant addition to the single time step pricing method. introduced by Broadie & Glasserman (1996). as introduced in Section 5. we prove theoretically that the Brownian bridge scheme converges weakly with order one when used for multitime step Monte Carlo. Finally.and a twofactor model. the proposed framework is worked out for the onefactor case. This includes the use of the Longstaﬀ & Schwartz (2001) method. A test is then developed to assess the quality of single time steps.6. We give an example of the fast single time step pricing framework for Bermudan swaptions. respectively. .
Denote by bi (t) the timet price of a discount bond that expires at time ti . we introduce the notion of all available forward rates at a given point in time. Bond prices and forward rates are linked by the relation 1 + αi fi (t) = bi (t) . . Deﬁne f to consist of all forward rates that have not yet expired at time t.2 Notation for BGM model In this section our notation of the BGM model is introduced. . . There is a speciﬁc reason why we use the terminal measure.2.e. This simply expresses the wellknown fact that a forward rate is a martingale under its associated forward measure. . i. . w. ˜ fi (t) (5. . fn ) = − ˜ αk fk σ k (t) · σ i (t) .2) For i = n the drift term is zero. Musiela & Rutkowski (1997). .1) Here σ i is the ddimensional volatility vector. fi .. f (t) = fi(t) (t).3. M2 Such a model features n forward rates. (5. e. the so called terminal measure. fn (t) . n. Throughout this chapter. . . 1 + αk fk k=i+1 n (5. NOTATION FOR BGM MODEL 99 5. 1 µi (t) = µi (t) − σ i (t) 2 . For the terminal measure. Deﬁne i(t) to be the smallest integer i such that t ≤ ti . For the remainder of this chapter it will be useful to have stochastic diﬀerential equation (SDE) (5. the drift term will have the following form for i < n: µi (t. where forward i accrues from time ti to time ti+1 . fi+1 . ti+1 ].3) Last. bi+1 (t) Each forward rate is driven by a ddimensional Brownian motion (where d is the number of stochastic factors in the BGM model). We consider a BGM model. whose form will ˜ in general depend on the choice of probability measure. Denote by αi the accrual factor over the period [ti .. and this is explained in Remark 2 of Section 5. Pelsser (2000) or Brigo & Mercurio (2001). we use the numeraire probability measure associated with the bond maturing at time tn+1 . .4) 2 The construction of such a model may be found in.g. as follows: dfi (t) = µi (t)dt + σ i (t) · dw(t).5. 0 < t1 < · · · < tn+1 . i = 1. and µi is the drift term. . . ˜ 2 (5.1) in logarithmic form: d log fi (t) = µi (t)dt + σ i (t) · dw(n+1) (t). .
These schemes will be elaborated on in Section 5. the predictorcorrector or the Brownian bridge scheme. Given a scheme for the log rates log fi (τj+1 ) = log fi (τj ) + µi τj .1 Justiﬁcation of the above assumptions Because the forward rates are approximated by a singlestep scheme. (5. The singlestep approximation is accurate enough for the pricing of derivatives.4). Here µ is the “drift approximation” and it is determined by the scheme applied. This timing inconsistency is addressed in Section 5. and we end by considering methods for pricing American style options with Monte Carlo methods. The vector z is deﬁned by analogy with f in (5.8.2 Notation We assume as given a time discretization τ0 < · · · < τm . Deﬁne zi (u.4.5. which may be the Euler. FAST DRIFTAPPROXIMATED PRICING FOR BGM 5. t) its single time stepapproximated equivalent.3. 5. which permits the dynamics of the single time step forward rates process to be represented by a lowdimensional Markov process. z(0. where it is shown that its impact is negligible for most cases. The A in f A stands for “approximated”. τj+1 . the model will in general no longer be arbitragefree. A single time step v discretization is a discretization with m = 1. f (τj ). after which we establish the lowdimensional Markov representation result. m ≥ 1. We proceed by ﬁrst introducing notation for the single stepapproximated forward rates process. and • the volatility structure should be separable.5) .9.3. t) + zi (0. Single time step discretizations are then discussed. τj+1 ) then denote by fiA (t) = fi (0) exp µi 0. z(τj . t. This is followed by a statement of the separability assumption.3 Single time step method for pricing on a grid The two key elements in the development of a method to price interest rate derivatives in the BGM model by lowdimensional ﬁnite diﬀerences are: • the forward rates process should be discretized by a single time step scheme. 5. Its superiority (for single time steps only) over other discretizations is established in Section 5. v) = u σ i (t) · dw(n+1) (t). as shown numerically in Section 5. f (0). At the end of this section we introduce a novel discretization scheme based on the Brownian bridge that is especially designed for single time stepping.100 CHAPTER 5. τj+1 ) + zi (τj .
3. . is called “separable” if there exists a vectorvalued function σ : [0. we mention the ﬁnitedimensional Markov realizations for stochastic volatility forward rate models (see Bj¨rk. Third.6) 5. such that σ i (t) = vi σ(t) (no vector product. . SINGLE TIME STEP METHOD FOR PRICING ON A GRID 101 5. ti ] → Rd . We give an example of a separable volatility function in the case of a onefactor model (d = 1). Here a necessary condition for a stochastic volatility model e to have a ﬁnitedimensional Markov realization is that the drift term and each component of the volatility term in the Stratonovich representation of the short rate SDE should be a sum of functions that are separable in time to expiry and the stochastic volatility driver. We mention three examples. n. (5. n. i = 1. If all φn are separable. . Second. . we mention Ritchken & Sankarasubramanian (1995. .4 Single time step method The following proposition shows that a single time step plus separability yields lowdimensional representability. (5.3. Separability appears regularly in the context of requiring a process to be Markov. . (2004). .3 Separability Deﬁnition 3 (Separability) A collection of volatility functions σ i : [0. . . they show that separability is a necessary and suﬃcient condition on the volatility structure such that the dynamics of the term structure may be represented by a twodimensional Markov process.1). Working in the HJM model (Heath. First. Example 1 (meanreversion) Following De Jong et al. Jarrow & Morton 1992).5. Proposition 2. we mention the Wiener chaos expansion framework of Hughston & Rafailidis (2005). n. . tn ] → Rd and vectors vi ∈ Rd . The nth chaos expansion is represented by a function φn : Rn → R that satisﬁes certain integrability conditions. o Land´n & Svensson (2004)). Then the single time step discretized forward rates process may be represented by a ddimensional Markov process. .3. . the instantaneous volatility may be speciﬁed as σi (t) = γi e−κ(ti −t) . i = 1. for which the instantaneous volatility structure is separable.7) The constant κ is usually referred to as the meanreversion parameter. Proposition 4 Suppose that M is a dfactor BGM model. . entrybyentry multiplication) for 0 ≤ t ≤ ti . i = 1. + the resulting interest rate model turns out to be Markov. In this framework any interest rate model is completely characterized by its socalled Wiener chaos expansion.
This is explained in the following remark. in virtue of the separability of the volatility structure: t t σ i (s) · dw(s) = 0 0 vi σ(s) · dw(n+1) (s) = vi · x(t). As proven in Proposition 4. for some function f .5) and v is a matrix of which row i is vi . However. This result holds for any choice of measure or numeraire. . FAST DRIFTAPPROXIMATED PRICING FOR BGM PROOF: Deﬁne the Markov process x : [0.8) is exactly equal to the stochastic part occurring in the BGM SDE (5. but this does not hold in the case of. (5. This is explained as follows. 2 Remark 1 The vector of single timestepped rates may be considered (if separability holds) to be a timedependent function of the Markov process x. the timet single timestepped forward rates are fully determined by x(t). Another essential building block for the fast single time step pricing framework is use of the terminal measure. bar a clarifying remark: The second term in the exponent of (5. Then the single time step process f A : [0.. Theorem 1) showed that this is impossible to achieve for the true BGM forward rates themselves in the case when x is onedimensional and under some technical restrictions. Hunt et al. (entrybyentry multiplication) where σ is as in Deﬁnition 3. (2000. the value of the numeraire at time t is fully determined by the forward rate values at time t. x(t) .1).e.8) Here µi is deﬁned implicitly in (5. i j=1 bj (tj−1 ) 1 t0 := 0.102 CHAPTER 5. The spot numeraire b0 rolls its holdings over by the spot LIBOR account. tn ] → Rd by t x(t) = 0 σ(s)dw(n+1) (s). f (0). for the terminal numeraire. t. for example. in that the latter is generally determined by bond values observed at earlier times. Remark 2 (Choice of numeraire) For the workings of the fast single time step pricing algorithm it is essential that the terminal measure be used. i. vx(t) + vi · x(t) . tn ] → (0. f A (t) = f t. Its timeti value is b0 (ti ) = . where the notation of Deﬁnition 3 has been used. The claim follows. ∞)n−i(t)+1 at time t satisﬁes fiA (t) = fi (0) exp µi 0. the spot numeraire.
The derivatives that may be valued include. f (τj ). and • Brownian bridge. ﬂoors.4 Discretizations We discuss four timediscrete approximation schemes of the log BGM SDE (5. f (τj ). • predictorcorrector. z(τj . τj+1 .5. • Milstein secondorder scheme. f (τj ). τj+1 ) We implicitly deﬁne µ by ˜ µi τj . The notation (Equation (5.3): • Euler. provided that the single timestepped rates are used and the separability assumption holds.4.5 Valuation of interest rate derivatives with the single time step method Interest rate derivatives with mild pathdependency may be valued by numerical integration. DISCRETIZATIONS 103 Put in another way.5)) for a discretization of SDE (5. so as to remove the term common to the Euler. 1 + αk fk (τj ) k=i+1 n . τj+1 ) − ˜ 1 2 τj+1 τj σ i (s) 2 ds. τj+1 ) = µi τj . Therefore the fast single time step framework requires the use of the terminal numeraire. z(τj .3. τj+1 . trigger swaps and discrete barrier caps. for example. z(τj . τj+1 ) + zi (τj .3) is recalled here: log fi (τj+1 ) = log fi (τj ) + µi τj .3. but are not restricted to: caps. whereas that of the terminal numeraire is not. 5. τj+1 . the value of the spot numeraire is pathdependent. predictorcorrector and Brownian bridge discretizations. by a lattice/tree or by ﬁnite diﬀerences.1)) sets µi τj . 5. Equation (9. European and Bermudan swaptions. τj+1 ) = − ˜ αk fk (τj )σ k (τj ) · σ i (τj ) (τj+1 − τj ). Kloeden & Platen (1999.1 Euler discretization The Euler discretization (see. z(τj .4. 5. τj+1 . For pricing on a grid it is essential that the numeraire value is known given the value of x(t). f (τj ).
µi τj . b and 2004). The Milstein scheme can indeed be used to obtain a single time step discretization of the forward rates process .5) used in the present work.3 Milstein discretization The secondorder Milstein scheme (see. Initially. The results of Glasserman and Merener thus justify the logtype discretization (5. τj+1 ) ˜ = − 1 αk fk (τj )σ k (τj ) · σ i (τj ) 2 k=i+1 1 + αk fk (τj ) + αk fk (τj+1 )σ k (τj+1 ) · σ i (τj+1 ) 1 (τj+1 − τj ). Kloeden & Platen (1999. τj+1 ). we set µn (τj .4. Equation (14. . (2001). Also. which is key to the development of the jumpdiﬀusion LIBOR market model. to the spot LIBOR rate. τj+1 )) = 0. m = k + 1. i(t). z(τj . .but it is not particularly suited to single large time steps. Moreover. . an iterative procedure may be applied that loops from the terminal forward rate. FAST DRIFTAPPROXIMATED PRICING FOR BGM 5. z(τj . . The key idea is to use predicted information to more accurately estimate the contribution of the drift to the increment of the log rate. ˜ Then. n. in terms of the bonds. . Therefore we omit here the exact form of the scheme. relative discount bond prices and logrelative discount bond prices. τj+1 . f (τj ). .2. for example.4. n.2 Predictorcorrector discretization The predictorcorrector discretization was introduced to the setting of LIBOR market models by Hunter et al.5. f (τj ).1)) was introduced to the setting of LIBOR market models in the series of papers by Glasserman & Merener (2003a. these papers extended the convergence results to the case of jumpdiﬀusion with thinning. . logforward rates.104 CHAPTER 5. such as forward rates. . 2004) it is shown numerically that the timediscretization bias of the logEuler scheme is less than the bias of other discretizations. these papers considered discretizations in various diﬀerent sets of state variables.and hence it may be applied to the single time step pricing framework . 5. For the terminal measure. i(t). τj+1 . as shown in the numerical comparisons for single time step accuracy in Section 5. for i = n − 1. for example. 2 k=i+1 1 + αk fk (τj+1 ) n n with fk (τj+1 ) dependent on fm (τj ) and zm (τj . In Glasserman & Merener (2003b. .
z(τj . The Brownian bridge scheme thus takes into account the speciﬁc form of drift terms for LIBOR market model forward rates. The idea is to calculate the expectation of the drift integral given the (timechanged) Wiener increment.9) The Brownian bridge scheme uses information present in the Wiener increment z(τj . τj+1 ) ds. f (τj ).9) can be calculated in practice.4 Brownian bridge discretization Here we develop a novel discretization for the drift term. 1 + αk fk (s) k=i+1 n n (5. We note that once this assumption has been made. the forward rates are approximated with a singlestep Euler discretization. In Section 5.1. z(τj .6.4. expression (5.4.10) − τj E (n+1) This is a straightforward application of Fubini’s theorem (see. DISCRETIZATIONS 105 5. as will be shown in the ﬁrst part of Section 5. z(τj . the ﬁrst step is to note that the order of the expectation and integral may be interchanged.6.5. z(τj . whereby it can outperform general discretization schemes. τj+1 ) to approximately determine the most likely value for the stochastic drift term. we show how expression (5. for example.9). Step 2 (approximation) For the purposes of calculating the conditional expected value of expressions of the form f /(1 + αf ). it converges weakly with order one. This is shown theoretically and numerically in Section 5. In practice.2)).5. The calculation proceeds in four steps (it is indicated when a step contains an approximation): Step 1 To calculate expression (5. Williams (1991. 1 + αk fk (s) k=i+1 n (5. −E(n+1) τj+1 τj+1 τj αk fk (s)σ k (s) · σ i (s) ds F(τj ). In the remainder of this section. Viewed as a numerical scheme for multistep discretizations. Section 8.5. µi τj . τj+1 ) = ˜ −E(n+1) τj+1 τj αk fk (s)σ k (s) · σ i (s) ds F(τj ). we establish that the Brownian bridge scheme has leastsquares error (in a yet to be deﬁned sense). the drift no longer aﬀects the calculation. τj+1 . τj+1 ) . we show that the bias is signiﬁcantly less than for the Euler discretization. This stems from a property of . The Brownian bridge discretization is superior when a single time step is applied. τj+1 ) = 1 + αk fk (s) k=i+1 αk fk (s)σ k (s) · σ i (s) F(τj ).9) can be approximated with high accuracy. In the multistep Monte Carlo numerical experiments of the second part of Section 5.
. Step 3 The conditional mean and conditional variance of the log forward rates are calculated.9). Thus the estimation of the drift integral (5. z(τj . τj+1 ) ds ≈ 1 + αk fk k=i+1 (n+1) BB αk fk σ k · σ i F(τj ). (n+1) [f BB F(τ ). See Appendix 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM the Brownian bridge: a Wiener process with deterministic drift conditioned to pass through a given point at some future time is always a Brownian bridge.e.11) is thus obtained by approximating (5.106 CHAPTER 5. z(τ . z(τj . τj+1 − τj E(n+1) τj+1 αk fk σ k · σ i F(τj ). Step 4 (approximation) The drift expression (5. In principle. independently of its drift prior to conditioning. the loss in accuracy is negligible on an absolute level. the average of the begin and end points) is used to evaluate the time integral in expression (5. the expectation term is approximated by inserting the conditional mean of the forward rates process:3 τj+1 − τj E(n+1) τj+1 BB αk fk σ k · σ i F(τj ). the Brownian bridge reduces to 3 Alternatively. A theoretical error analysis of the meaninsertion approximation is given in Appendix 5. τj+1 )]σ k · σ i ds.10) by using single step Euler dynamics for the forward rates instead of the true LIBOR market model dynamics.5 with the meaninsertion approximation (“BB”). the approximation could aﬀect the quality of the discretization. The full numerical integration (“BB alternative”) has been compared numerically in Section 5.9) may be approximated by a single numerical integration over time. and where we have suppressed the dependence of time s. z(τj . τj+1 ) ds ≈ BB 1 + αk fk k=i+1 BB αk E(n+1) [fk F(τj ). τj+1 ) ds. We show numerically that this is not the case in the LIBORinarrears case considered in Section 5. τ 1 + αk E j j j+1 )] k k=i+1 n n − τj Remark 3 If a twopoint trapezoidal rule (i. the expectation term could be evaluated by numerical integration as well. z(τj .B. Formula (5. Single step Euler dynamics imply Brownian bridge dynamics.9) is the same whether it is assumed that the forward rates are driftless or whether these follow a single time step Euler approximation.5. but this is computationally expensive.A for details. since we condition on timeτj+1 values. We note that the assumption of singestep Euler discretization for the calculation of expression (5. BB 1 + α k fk k=i+1 n n (5.11) − τj E where BB indicates the use of the Brownian bridge.9) renders this calculation an approximation. .
9) has leastsquares error for a single time step over all discretizations that use information about the Gaussian increment for the drift term. MATLAB code is given in Appendix 5. as shown by Avramidis & Matzinger (2004). The method used is the regressionbased method of Longstaﬀ & Schwartz (2001). we establish theoretically and numerically that the Brownian bridge scheme has superior accuracy for single time steps. τj+1 ) + z(τj . We end this section with a discussion of the method used in this chapter for pricing Americanstyle options with Monte Carlo. We note that Euler. z(τj . (5. The next theorem states that. The code implements a single timestep in a singlefactor model with constant volatility.5 The Brownian bridge scheme for single time steps In this section. for the BGM setting.C. of course. implementing the Brownian bridge scheme.5. the Brownian bridge scheme (5. τ (5.5. These simpliﬁcations are for clarity of exposition only and are.5. THE BROWNIAN BRIDGE SCHEME FOR SINGLE TIME STEPS 107 the predictorcorrector scheme. x(t) dt + Σ(t)dw(t). We consider a certain class of discretizations: ¯ Deﬁnition 4 Let the function µ(·.13) Here z(τj . not a restriction imposed by the Brownian bridge scheme. 5. ·. y(τj ). τj+1 ) = τjj+1 σ(s)dw(s).1 Theoretical result Consider a stochastic diﬀerential equation of the form dx(t) = µ t. which is a method of stochastic mesh type (see Broadie & Glasserman (2004)). Convergence of the method to the correct price follows generically from the asymptotic convergence property of stochastic mesh methods.12) with the following form: ¯ y(τj+1 ) = y(τj ) + µ τj .3) is of the above form. Any such discretization is said to use information about the Gaussian increment to estimate the drift term. ·) denote a single time step discretization of SDE (5. . 5. In this sense. For illustration. τj+1 ). the predictorcorrector scheme is a special case of the Brownian bridge scheme. predictorcorrector and Brownian bridge are such schemes.12) We note that the BGM log SDE (5.
over all possible discretizations that use information about the Gaussian increment to estimate the drift term is deﬁned by ¯ µ∗ τj . for details the reader is referred to Hunter et al. z(τj .14) µ s.12) starting from (t. and the latter has. ¯ ¯ For ease of exposition we write z = z(τj . we ﬁnd s2 {y } = Eτj ¯ µ −i 2 = E τj E τj ¯ µ −i 2 z ≥ Eτj Eτj ¯ µ∗ − i 2 z = s2 {y∗ } . Continuing. by deﬁnition. s2 . i. τj+1 ) and µ = µ(τj . Then the discretization {y∗ } that yields least squared error. τj+1 ) = E PROOF: Deﬁne i := τj τj+1 τj τj+1 µ s.y(τj )} (s) ds F(τj ). Then let {y } with ¯ drift term µ be a discretization of the form of Deﬁnition 4. As y was an arbitrary discretization of the form 2 of Deﬁnition 4. τj+1 ) . the result follows. y(τj ). z(τj . We consider the distribution of a forward rate under the measure associated with the numeraire of a discount bond maturing at the ﬁxing time of the forward. y∗ has less squared error than y . z}measurable drift terms. FAST DRIFTAPPROXIMATED PRICING FOR BGM Lemma 2 Let {y} be a single time step discretization of SDE (5. is known. z). x{τj .5. we condition on z: Eτj ¯ µ −i 2 z ≥ Eτj Eτj [iz] − i 2 z = Eτj ¯ µ∗ − i 2 z . We extend here the LIBORinarrears test of Hunter et al. We note that the forward rate is not a martingale under such a measure as the natural payment time of the forward is not the same as its ﬁxing time.y(τj )} (τj+1 ) F(τj ) . First. y). x{τj .y} denotes the solution of SDE (5. least squared error over all possible {F(τj ). (2001) by including the Milstein and Brownian bridge schemes. (5.2 LIBORinarrears case We estimate numerically the accuracy in the LIBORinarrears test of the various schemes of Section 5. (2001). y(τj ). but we keep in ¯ mind that µ is {F(τj ).108 CHAPTER 5. 5. We can thus compare the density obtained from . Also write Et [·] := E[·F(t)].y(τj )} (s) ds. z}measurable.4.. Consider the discretization expected squared error 2 s2 {y} := E y(τj+1 ) − x{τj . The test is designed to measure the accuracy of a single time step discretization. however. Here x{t.e. An analytical formula for the associated density. The inequality holds since expectation equals projection.12) that uses information about the Gaussian increment for the drift term. The idea of the test is brieﬂy described here.
We note that three densities have been added to the above ﬁgures compared with Figure 1 of Hunter et al.40% f 0. The results of this test are displayed in Figure 5.5. the threemonth forward rate ﬁxing 30 years from today is set initially to 8% and its volatility to 24%.60% error in density 0.5 0.00% f 10.6 analytical minus Euler analytical minus predictor corrector analytical minus Milstein analytical minus Brownian bridge analytical minus Brownian bridge alternative analytical minus 20 20 0. the latter being the second best scheme. a single time step discretization with the analytical formula for the density. (2001).00% 100.2 0. respectively. (2001): Milstein and the two Brownian bridge schemes. The maximum absolute error in the density for the Brownian bridge and its alternative are 10−3 and 6 × 10−4 .5. The most notable addition is the Milstein density. The legend key “BB” denotes Brownian bridge and “BB alternative” denotes full numerical integration of the expectation term. the Milstein scheme reaches a maximum absolute error that is around twice the maximum absolute error for the Euler scheme.00% f 10.1.1: Plots of the estimated densities and absolute errors in densities of various single time step discretizations.01% 0.3 analytical 10 Euler predictcorr Milstein 5 BB BB alternative Series8 0 0.1 0 0. The deal setup is the same as in Hunter et al.10% 1.30% 0.00% 100. Outside of the error graph. On both ﬁgures.50% 0. In this particular test the Brownian bridge scheme thus achieves a reduction by a factor 100 in the maximum absolute error over the predictorcorrector scheme.4 19 15 density 0.1 0. however. the diﬀerences between the analytical and Brownian bridge densities are indiscernible to the eye.10% 1. It is shown (for the particular setup) that the Brownian bridge scheme reduces the maximum error in the density by a factor 100 over the predictorcorrector scheme. . THE BROWNIAN BRIDGE SCHEME FOR SINGLE TIME STEPS 21 109 0.01% 0.00% 0.00% Figure 5.
Therefore we are interested mainly in weak convergence of Monte Carlo simulations. for suﬃciently small ε. Here we develop the remainder of the theory in terms of approximating an autonomous SDE. The deﬁnition is recalled here and may be found in.19) ≤ c(ε). Second. E g x(t) − E g yε (t) ≤ c · εβ . First.1 Weak convergence of the Brownian bridge scheme In a ﬁnancial context.18) yε (τj+1 ) − yε (τj ) yε (τj+1 ) − yε (τj ) F(τj ) 2 (5. The deﬁnition of weak consistency is recalled here. Deﬁnition 6 (Weak consistency) A scheme {yε (τj )} with maximum step size ε is weakly consistent if there exists a function c = c(ε) with lim c(ε) = 0 ε↓0 (5.16) (5.15) A criterion that is easier to verify than the above deﬁnition is the concept of weak consistency. we estimate numerically the convergence behavior of the various schemes of Section 5. 5. and may be found for example on page 327 of Kloeden & Platen (1999). x(0) deterministic. for each function g with 2(β + 1) polynomially bounded derivatives. for example. we show theoretically that the Brownian bridge scheme converges weakly with order one. say. the theory holds in more general cases too.7). Section 9. −Σ yε (τj ) Σ yε (τj ) . the interest lies in calculating the prices of derivatives.4. and under quite natural conditions it follows that weak consistency implies weak convergence.17) 2 such that E and E E 1 ∆τj E yε (τj+1 ) − yε (τj ) F(τj ) − µ yε (τj ) ∆τj ≤ c(ε) (5. Kloeden & Platen (1999.6 The Brownian bridge scheme for multitime step Monte Carlo This section consists of two parts. there exists a constant c such that.110 CHAPTER 5.6. However. Deﬁnition 5 (Weak convergence) A scheme {yε (τj )} with maximum step size ε is said to convergence weakly with order β to x if. (5. dx(t) = µ x(t) dt + Σ x(t) dw(t). FAST DRIFTAPPROXIMATED PRICING FOR BGM 5. which are in certain cases expectations of payoﬀ functions.
it follows that the function g{i. . We need only verify the consistency Equation (5.τ.τ. Taylor’s formula states that there exists an error term e{i. deﬁne for i and for all τ ∈ [0. Theorem 7 (Linking weak consistency to weak convergence) Suppose that µ and Σ in (5.16).3).f } is analytical in t. PROOF: Without loss of generality.20) where c(ε) is as in Deﬁnition 6. The above theorem then allows us to deduce that the Brownian bridge scheme converges weakly.τ.f } : [0. tn ] and for all f the function g{i.τ.τ.τ. (5. Proposition 5 (Brownian bridge scheme is weakly consistent) Assume that the volatility functions σ i (·) are piecewise analytical on the model horizon [0. q = 1. 2. .18) for the drift term. 111 Kloeden and Platen prove the following theorem (see Theorem 9. we may assume that the volatility functions are analytical.16) are four times continuously diﬀerentiable with polynomial growth and uniformly bounded derivatives. To achieve this. .7. Then yε converges weakly to x. Let {yε (τj )} be a weakly consistent scheme with equitemporal steps ∆τj = ε and initial value yε (0) = x(0) which satisﬁes the moment bounds E max yε (τj )2q ≤ k 1 + x(0)2q .f } (0) + t ∂g{i.6.τ.21) .f } (0) + e{i. we can break up the problem into subproblems for which each has analytical volatility functions. We note as well that all derivatives of the volatility functions are bounded because the interval [0. Otherwise.f } (t) = g{i. THE BROWNIAN BRIDGE SCHEME FOR MULTITIME STEPS Here {F(t)} is the ﬁltration generated by the Brownian motion driving SDE (5. Then the Brownian bridge scheme deﬁned by (5.4 is weakly consistent with the forward rates process deﬁned in (5. g{i. tn − τ ] → R. tn ]. 0 Due to the assumption that the volatility functions are analytical.f } (t) ∂t (5.f } (·) depending on i.f } (t) = − αk fk 1 + αk fk k=i+1 n t σ k (τ + s) · σ i (τ + s)ds. due to the piecewise property of the volatility functions.5.4 of Kloeden & Platen (1999)) linking weak consistency to weak convergence. tn ] is compact. τ and f such that g{i. and j E 1 ε y (τj+1 ) − yε (τj ) ε 6 ≤ c(ε).τ. In the proposition below we show that the Brownian bridge scheme with the proposed calculation method is weakly consistent.9) and by the fourstep calculation method described in Section 5.
boundedness and limiting behaviour of the function h(x) = x/(1+ x). (5. It follows that consistency Equation (5.f } (t) < ∞. We note that the term within braces is exactly drift term i evaluated at (τj . tn ] × Rn → R. All these requirements can be met for the Brownian bridge scheme with a quadratic function c.τ.9) and by the fourstep calculation method described in Section 5.τ. 2 .f } /∂t2 .f } ∂t ε n = t=0 − k=i+1 σ k (τ ) · σ i (τ ) αk fk . with c(·) satisfying the requirements (5. The function c(·) is then quadratic in ε. FAST DRIFTAPPROXIMATED PRICING FOR BGM e{i.τj .17).τ.15) is less than c(ε).f } (0) = 0 and ∂g{i. Here we have used g{i. It has order of convergence one. then ε ε E yi (τj+1 ) − yi (τj ) F(τj ) = g{i.22). Viewed as a function [0. 1 + αk fk If Y denotes the Brownian bridge scheme. (5.20). Taking the square root then yields ﬁrstorder weak convergence for the Brownian bridge scheme.3).18) holds with c(ε) equal to (e(ε)/ε)2 . In the proof of Theorem 7 in Kloeden & Platen (1999). f ).18).21) may be chosen independently of τ and f . namely h ↑ 1 (h ↓ 0) as x → ∞ (x → −∞.7 of Apostol (1967) then states that the error term in (5. the Brownian bridge scheme deﬁned by (5. respectively). (5.22) t2 Due to the analyticity. τ. with e satisfying the secondorder Equation (5. independent of (τ. Hence we ﬁnd that n g{i. it is shown that the error term in the weak convergence criterion (5. we have that all its derivatives are bounded. lim t↓0 (t. f ) → g{i.Yε (τj )} (ε) = ε − ε αk yk (τj )σ k (τj ) · σ i (τj ) ε 1 + αk yk (τj ) k=i+1 n + e(ε).τ. we can thus ﬁnd a bound on the second derivative ∂ 2 g{i. Yε (τj )).f } (t). Theorem 7. tn ] × [0.19) and (5.f } (t) = t − k=i+1 σ k (τ ) · σ i (τ ) α k fk 1 + αk fk + e(t).τ.τ. PROOF: We only need verify the claim with regards to the order of convergence. 2 Corollary 1 (Brownian bridge scheme converges weakly with order one) Under the assumptions of Proposition 5.4 converges weakly to the forward rates process deﬁned in (5.112 with CHAPTER 5.
then (5. πfwd Var πterminal (5.6. however. under the forward measure.5.2 Numerical results We now turn to the second part of Section 5. since a bias that stems from an implementation error is more easily ﬁltered out from the random noise of the MC simulation. which yields the unbiased estimator of the bias πterminal − πanalytical . there is no drift term and therefore the associated payoﬀ is an unbiased estimator of the value of the contract.000 paths. The benchmark against the forward measure payoﬀ is thus also a useful tool when validating an implementation of a LIBOR market model.003%.2.000 paths is 0. They show that the predictorcorrector. This large number of paths was used because the time discretization bias for the log rates is small compared to the standard error often observed with 10.25) (5. has a clear time discretization bias for larger time steps.23) (5. πfwd Var πterminal − πanalytical = Var πterminal If we assume Var πterminal ≈ Var πfwd .02% and 0. If we denote by πterminal and πfwd the numerairedeﬂated contract payoﬀ in the terminal and forward measure. The results are presented in Figure 5. In our numerical LIBOR tests we found ρ ≈ 0.999. whereas twice the standard error at 10.01%. if the correlation term ρ[πterminal . Such procedure ﬁlters out the discretization bias from the random noise in the simulation. respectively. in which the various discretization schemes are compared numerically. we jointly simulate the values of individual payments in the ﬂoating leg and cap under their respective forward measures. The Euler scheme.6. For example. πfwd ] is larger than 1 . A ﬂoating leg and a cap were valued with 10 million simulation paths.23) becomes Var πterminal − πfwd ≈ 2 1 − ρ πterminal .07% and 0. Alternatively. To obtain a biasestimate with minimal standard error. We classify the schemes from best suited to worst suited (for the particular numerical cases under consideration) using the criterion of the minimal computational time required to achieve a bias that is indistinguishable . we have variance reduc2 tion. we can benchmark against the analytical value πanalytical of the ﬂoating leg or cap. Milstein and Brownian bridge schemes have a time discretization bias that is hardly distinguishable from the standard error of the estimate. We note that. then an unbiased estimator of the bias is πterminal − πfwd .24) Therefore.6. THE BROWNIAN BRIDGE SCHEME FOR MULTITIME STEPS 113 5. the Euler onestepperaccrual discretization relative bias for the ﬂoating leg and the cap was estimated at 0. which means that the variance is reduced by a factor of 500. respectively. The variances of the two estimators are Var πterminal − πfwd = Var πterminal + Var πfwd − 2Cov πterminal .
The net present values of the ﬂoating leg and cap are 0. The ﬂoating leg is a sixyear deal. .2 0. predictorcorrector. . with the ﬁxings at 1. which in turn is faster than the Brownian bridge. Brownian bridge.2 0. 5.2: Plots of the estimated biases for a ﬂoating leg and a cap for the Euler. . .3 Tim e step (years) Figure 5. 6 years.25.5 years. .000 paths. We stress here that this classiﬁcation might be particular to the numerical cases that we considered. FAST DRIFTAPPROXIMATED PRICING FOR BGM Floating leg Cap 6E07 7E05 Euler 6E05 5E05 4E05 3E05 2E05 1E05 0E+00 0 1E05 Tim e step (years) 0. second. . 1. . 0.5. . . 1.4 0. 0. Euler. and all volatility is constant at 20%. . and fourth. We also stress that the strength of the Brownian bridge lies in single time steps rather than in multitime steps. The error bars denote a 95% conﬁdence bound based on twice the sample standard error. The market conditions are the same for both deals: all initial forward rates are 6%.5year deal.7 An example: onefactor BGM framework with drift approximations This section illustrates the framework for fast single time step pricing in BGM by setting it up in the special case of a onefactor model with a volatility structure as in Example . . Milstein and Brownian bridge schemes.75. with the ﬁxings at 0.6 0. and payments of annual LIBOR at 2. A singlefactor model was applied.2 1E07 0E+00 0 Euler 5E07 Predictorcorrector Milstein BB Estimated bias Predictorcorrector Milstein 4E07 BB Estimated bias 3E07 2E07 1E07 0. on a notional of one unit of currency. The cap is a 1.24 and 0. .25 years. . we obtain: ﬁrst. respectively.000.1 0.5. . .013.25 0.15 0. Milstein. from the standard error at 10. predictorcorrector.05 0.114 CHAPTER 5. As Milstein is slightly faster than predictorcorrector. 5 years. . third.8 1 1. and payments of quarterly LIBOR above 5% (if at all) at 0.
Forward rate i accrues from year i until year i + 1. and exercise boundary conditions at the exercise times. All computations are displayed in Table 5. the “driftfrozen” forward rates (where the drift is evaluated at time zero) are displayed in column (V). x(0) = −∞ π(t. Column (II) is determined by (5. the process x jumps to 1. To evaluate the eﬀect of the Brownian bridge scheme over the Euler scheme. µ.1 A simple numerical example We will evolve ﬁve annual (αi = 1) forward rates over a oneyear period.2). . In case of ﬁnite diﬀerences. Then. x(t) ∼ N 0. x. 0. EXAMPLE: ONEFACTOR DRIFTAPPROXIMATED BGM 1. In the case of numerical integration. 2κ where v(t) = 0 e2κs ds = t. . κ = 0. Prices may now be computed by either numerical integration or ﬁnite diﬀerences. κ = 0. we have ∞ π 0. x) denotes the numerairedeﬂated value of the contingent claim.7. ˜ 115 for certain constants γi . if π(t.26) with use of appropriate boundary conditions. FeynmanKac yields the following PDE for the price relative to the terminal bond: ∂π 1 2κt ∂ 2 π + e = 0. for a Bermudan payer swaption we have π(·. Suppose that. ∂t 2 ∂x2 (5. . To compute the drift term integral at time 1 for forward rate 4. v) denotes the Gaussian √ density with mean µ and standard deviation v. t e2κt −1 .1. Take fi (0) = 7%. This structure may be written as follows: σi (t) = γi eκt . using the equation (V) = (I) exp ((II) + (III) + (IV)). then v(1) ≈ 1. v(t) dx where t denotes the expiry of the contingent claim and p(·.5.7. γi = 25% and ˜ κ = 15%. Forward rate 5 is easily computed as no drift terms are involved. x)p x. is then deﬁned as and ˜ characterized by t x(t) = 0 eκs dw(s). we start with computing the Brownian bridge scheme forward rate 5 and work back to forward rate 1. thus x(1) = 1. 5. The corresponding Markov factor. 5. −∞) ≡ 0. i = 1. we compute the drift term integral of .166196. v(t) . after one year. . For example. zero convexity ∂ 2 π/∂x2 ≡ 0 at x = ∞.
01227 −0.56% 8.00000 −0.116 CHAPTER 5.60% 8.00562 (VII) Brownian bridge fi (1) 8.53% 8.25 0.00567 −0.03644 −0.25 .25 0.25 0.00% 7.53% (VI) Equation (5.00% 0.03644 0.1: A simple numerical example.47% 5 4 3 2 1 7.67% 8.03644 −0.00% 7.01636 −0.03644 −0.25 0.00% 7.03644 −0.57% 8.9)i −0.67% 8.63% 8.00569 −0.00409 −0. FAST DRIFTAPPROXIMATED PRICING FOR BGM Table 5.00% 7. (I) i fi (0) (II) µi (0) (III) 2 − 1 γi v(1) 2˜ (IV) γi x(1) ˜ (V) Drift frozen fi (1) 8.00818 −0.00564 −0.9)i−1 −(5.62% 8.
2.4. Continuing. And so on until all forward rates have been computed.5. Computational times may be found in Table 5. where it is assumed that b < 1. The onefactor setup introduced in the previous section was used with zero meanreversion. Exercise boundaries for the 8 year deal are displayed in Figure 5. the Europeans through numerical integration. EXAMPLE: BERMUDAN SWAPTION 117 (5. Hence the exercise boundary s∗ may be computed from the regression coeﬃcients by the above formula.8. Callable Bermudan and European payer swaptions were priced in a onefactor BGM model for various tenors and noncall periods. an analysis is made for Bermudan swaptions in comparison with a BGM model combined with the leastsquares Monte Carlo method introduced by Longstaﬀ & Schwartz (2001). The Bermudans were priced on a grid. . s. and the volatility of the forwards was set ﬂat at 15%. The zero rates were taken to be ﬂat at 5%.8 Example: Bermudan swaption As an example of the single time step pricing framework. The eﬀects of these phenomena have been analyzed by comparing the exercise boundaries4 and risk sensitivities of LongstaﬀSchwartz and single time step BGM. Since the framework is not arbitragefree. is larger than a certain value s∗ . The explanatory variable in the leastsquares Monte Carlo was taken to be the net present value (NPV) of the underlying swap. For a full description of the deal see Table 5. In both models the exercise rule turned out to be of the following form: exercise whenever the NPV of the underlying swap. This was regressed on to a constant and a linear term.3. 5. This we may then use to compute the Brownian bridge scheme forward rate 4 (see column (VII)). These two basis functions yield suﬃciently accurate results because the value of a Bermudan swaption increases almost linearly with the value of the underlying swap. The result is displayed in column (VI). Problems may possibly occur for Americanstyle derivatives in the single time step framework. which is then deﬁned to be the exercise boundary. So the option is exercised whenever s > a + bs ⇔ s > a/(1 − b) =: s∗ .3. we compute the drift for forward rate 3 using only the Brownian bridge forward rates 4 and 5. including conﬁdence bounds on the LongstaﬀSchwartz 4 In the LongstaﬀSchwartz case. the future discounted cashﬂows are regressed against the NPV of the underlying swap with a constant and linear term – say. with coeﬃcients a and b. The PDE was solved using an explicit ﬁnitediﬀerence scheme. where we use the equation (VII)i = (I)exp({ n j=i+1 (VI)j } + (III) + (IV)). spurious early or delayed exercise may take place to collect the arbitrage opportunity. Results have been summarized in Table 5.9) for forward rate 5. which turns out to hold in practice.
5 5.06978% (ATM) Semiannual None Half year = 0. FAST DRIFTAPPROXIMATED PRICING FOR BGM Table 5.5 0% 10.2: Speciﬁcation of the Bermudan swaption comparison deal. Callable Bermudan swaption Market data Zero rates Volatility Product speciﬁcation Tenor Noncall period Call dates Pay/receive Fixed leg properties Frequency Date roll Day count Fixed rate Floating leg properties Frequency Date roll Day count Margin Numerics Simulation paths Finitediﬀerence scheme LongstaﬀSchwartz Explanatory variable Basis function type No. of basis functions Flat at 5% Flat at 15% Variable (28 years) Variable Semiannual Pay ﬁxed Semiannual None Half year = 0.118 CHAPTER 5.000 Explicit Swap NPV Monomials 2 (constant and linear) .
78 101.55 1.78 78.79 221.5.08 2.73 52.94 151.33 101.84 266. In case of a European swaption.09 141. The notation xNCy in the ﬁrst column denotes an xyear underlying swap with a noncall period of y years.55 151.13 53.88 52. Bermudan Driftapprox.40 181.19 2.24 0.77 42.61 2.15 97.22 2.48 136.95 86.36 123.27 120.00 190.85 62.88 50.20 101.96 European DriftApprox.60 50.31 80.71 96.15 3.70 Standard error 0.98 140.22 89.50 3.53 1.64 3.25 182. BGM 2NC1 3NC1 4NC1 4NC3 5NC1 5NC3 6NC1 6NC3 6NC5 7NC1 7NC3 7NC5 8NC1 8NC3 8NC5 8NC7 29.51 43.8.87 2.75 179.96 .23 54.63 226.38 185.38 0.43 50.3: Results of the Bermudan swaption comparison deal.40 64.20 LongstaﬀSchwartz 28.06 50.11 100.04 42.43 0.55 99.01 0.53 1.59 266.85 83. All prices and standard errors are in basis points.38 Monte Carlo BGM 26.93 224. it means that the swaption is exercisable after y years exactly.92 2.34 2.20 137.36 53.38 177.36 1.28 159.93 100.57 161.41 0.70 1.83 1.07 142.71 1.93 156.16 134.95 53.65 2. BGM 27.92 78. EXAMPLE: BERMUDAN SWAPTION 119 Table 5.66 44.09 140.86 2.12 Standard error 0.42 0.14 2.68 1.03 0.29 0.83 123.35 226.59 137.83 1.08 122.66 153.
All computational times are in seconds.4 0.2 4. In the single time step framework Bermudans are priced on a grid and Europeans are priced through numerical integration. 1−b a 1−b We thus obtain the empirical variance of s∗ (as well as its standard error).2 0.9 18.4: Computational times for the Bermudan swaption comparison deal for a computer with a 700 MHz processor.6 5.1 0. Denote random errors in a and b by a and b.8 9.2 18.4 9.6 23.2 0. respectively. FAST DRIFTAPPROXIMATED PRICING FOR BGM Table 5.5 7.120 CHAPTER 5.9 3.2 1. a 95% conﬁdence interval is given by plus and minus twice the standard error.4 45.6 0. The notation xNCy in the ﬁrst column denotes an xyear underlying swap with a noncall period of y years.6 0.9 30.8 13.1 Longstaﬀ Schwartz 3.0 6.3 2.4 0.0 0.5 8.5 17.5 14.4 0. Bermudan Driftapproximated BGM 2NC1 3NC1 4NC1 4NC3 5NC1 5NC3 6NC1 6NC3 6NC5 7NC1 7NC3 7NC5 8NC1 8NC3 8NC5 8NC7 0.2 0.8 boundaries.4 7.1 0.6 11.1 0.2 1.2 0.8 13.1 21. a Taylor expansion yields (ignoring secondorder terms) s∗ ≈ a a b 1+ + .8 4.8 0.4 0.3 5.6 0.3 9.2 11. Assuming that s∗ is normally distributed.0 0.7 0.8 16.4.1 6.6 2.7 0.5 We looked at exercise boundaries for other deals as well and these revealed a similar picture. .0 24.0 1. Risk sensitivities for the various deals are displayed in Figure 5.1 3.4 0.2 12. If it is assumed that these errors are relatively small.2 0.6 0.8 33.4 0.1 4.7 6.0 Monte Carlo BGM 1.4 European Driftapproximated BGM 0. 5 The empirical covariance matrix of the regressionestimated coeﬃcients a and b may be used to obtain the empirical variance of s∗ .
6 0.2 0 2NC1 3NC1 4NC1 4NC3 5NC1 5NC3 6NC1 6NC3 6NC5 7NC1 7NC3 7NC5 8NC1 8NC3 8NC5 8NC7 Longstaff Schwartz 200 150 100 50 0 2NC1 3NC1 4NC1 4NC3 5NC1 5NC3 6NC1 6NC3 6NC5 7NC1 7NC3 7NC5 8NC1 8NC3 8NC5 8NC7 Bermudan swaption Bermudan swaption Figure 5.4: Risk sensitivities: deltas and vegas with respect to a parallel shift in the zero rates and caplet volatilities.2 1 0.6 1. .8. The error bars for the LongstaﬀSchwartz prices represent a 95% conﬁdence bound based on twice the empirical standard error.4 1.scaled to shift of 10 bp) 250 Longstaff Schwartz Drift Approx BGM 1.4 0.scaled to shift of 100 bp) 2 Drift Approx BGM Delta (Parallel shift .5.8 0.3: Exercise boundaries for the eightyear deal.8 1. 300 Vega (Parallel shift . respectively. EXAMPLE: BERMUDAN SWAPTION 121 700 Drift approximated exercise boundary Longstaff Schwartz exercise boundary 600 Swap NPV exercise level (bp) 500 400 300 200 100 0 1 2 3 4 5 Exercise point (Y) 6 7 Figure 5.
51 263.49 265. the BGM pricing simulation was rerun for 500.05 DA precomputed exercise boundaries 28. BGM simulation price LS precomputed exercise boundaries 2NC1 3NC1 4NC1 5NC1 6NC1 7NC1 8NC1 28.122 CHAPTER 5.80 99. LongstaﬀSchwartz may possibly yield better exercise boundaries when it is regressed on to more explanatory variables. Also. The results.08 221.27 Standard error 0.30 0. The nonarbitragefree issue is investigated further in the next section.62 62. All prices and standard errors are in basis points.41 222. FAST DRIFTAPPROXIMATED PRICING FOR BGM Table 5.000 paths using the precomputed exercise boundaries.5. therefore only a single standard error is reported. given in Table 5. To determine which approach computed the best exercise boundaries. risk sensitivities for longermaturity deals (seven to eight years) can be outside of the twostandarderror conﬁdence bound. 6 . show that the driftapproximated exercise boundaries are not worse than their LongstaﬀSchwartz counterparts and are even slightly better. Moreover. This does not necessarily mean that the DA framework outperforms LongstaﬀSchwartz because we only regress on the NPV of the underlying swap. The Brownian bridge drift approximation thus becomes worse for longermaturity deals.77 99.24 0. the computational time involved is less by a factor 10. The standard errors for both prices were virtually the same in all cases.51 138.63 62.9. We note that the exercise boundary is calculated slightly diﬀerently by the LongstaﬀSchwartz and driftapproximated (DA) approach. This section ends with the results for a twofactor model.18 0.6 Hence there is no problem with the spurious early exercise opportunities arising from the absence of noarbitrage in the fast single time step framework.38 178. In all cases the price diﬀerence is within twice the standard error of the simulation.5: BGM pricing simulation rerun for 500. as also explained in Section 5.42 The results show that the single time step BGM pricing framework indeed prices the Bermudan swaptions close to LongstaﬀSchwartz.55 179.36 0. including correct estimates of risk sensitivities for shortermaturity deals.58 138.06 0.12 0.000 paths using precomputed exercise boundaries.
08 93.59 283.2 0.68 0.97 242.76 210.5. EXAMPLE: BERMUDAN SWAPTION 123 Table 5.2 dw2 (t). 50. the driftapproximated prices agree more with the benchmark than with prices obtained when LongstaﬀSchwartz regresses on the NPV of a single swap.64 58. which we now take to be dfi (t) (i+1) (i+1) = vi.6: Twofactor model comparison. We therefore take the results with regression on all forward swap rates to be the benchmark.41 212.89 202.26) we used the hopscotch method (see paragraph 48.54 124. Results for the twofactor model are displayed in Table 5.7 0. respectively. For a model with forward expiry structure t1 < · · · < tn .5 5.79 89.3 1.3 0.8.1 Twofactor model We consider a twofactor model with the same setup as above with the exception of the volatility structure. fi (t) Here vi  = 15%.49 292.62 23.89 251.1 dw1 (t) + vi. Fast LongstaﬀSchwartz drift Swap NPV All forward rates Standard approximation only (benchmark) error 2NC1 3NC1 4NC1 5NC1 6NC1 7NC1 8NC1 9NC1 25.5 0. ai = i tn − t1 This instantaneous volatility structure is purely hypothetical.8.42 169.6.88 294. vi = (15%) ai .79 162.00 129.1 1.35 171. Indeed.45 59.27 55.5 of Wilmott (1998)). The computational time for .000 paths were used for the LongstaﬀSchwartz simulation. In a twofactor model (with decorrelation) the exercise decision no longer depends only on the NPV of the underlying swap but also on all forward swap rates.22 94. All prices and standard errors are in basis points.67 132. we take the vi ∈ R2 to be ti − t1 . To solve the twodimensional PDE version of (5.9 1. 1 − a2 .15 252. It has the property that correlation steadily drops between more separated forward rates. “Swap NPV only” and “All forward rates” indicate that LongstaﬀSchwartz regressed on only the NPV of the swap and on all forward swap rates.89 24.
t]. 5. 2 the fast driftapproximated pricing twodimensional grid was. We could jump immediately to year 2 and calculate the forwards there.9 Test of accuracy of drift approximation Besides the approximation of the drift. then almost surely our guess for the most likely value of fi (t) will be diﬀerent. in two years. In general. In a way. The value of any other forward rate fi (t) does not depend solely on the value of x(t) but is dependent on the whole path that x traversed on the interval [0.5: Timing inconsistency in the single time step framework for BGM. Given the value of x(t). FAST DRIFTAPPROXIMATED PRICING FOR BGM x(1) x x(2) 0 1 time Figure 5. We do. . In this way. In the following. The framework for fast single time step pricing simply calculates the most likely value of fi (t) given the value of x(t). say. on average. Consider the following. any lowdimensional approximation of BGM will exhibit this timing inconsistency. a test will be proposed to evaluate the size of the inconsistency error. the framework (Proposition 4) contains a timing inconsistency.124 CHAPTER 5. we cannot determine all timet forward rates. know the value of fn (t) because fn has zero drift under the terminal measure n + 1. We consider computing the value of the forwards at year 2. we could consider ﬁrst calculating the forwards at time 1 (under the assumption that x jumps to some value x(1)) and from this point calculate the forwards at time 2 (assuming that x then jumps to the very same x(2)). but we will nonetheless investigate it.5). only a quarter of the computational time for Monte Carlo. it is not really fair to consider this timing inconsistency. if we start from the state determined by x(1)). Suppose that the underlying Markov process x jumps to x(2). If we start from a diﬀerent initial model state (for example. The inconsistency is best described by an example (see Figure 5. the so computed forwards at time 2 will be diﬀerent. Alternatively. however.
fi (0) was taken to be 5%. . . . 5. Under the notation of the previous section. respectively. See also Figure 5.1 Test of accuracy of drift approximations based on noarbitrage The accuracy test is described by an example. which was varied at 10%. and that the framework is arbitragefree. . The “arbitragefree” forward rates fiAF (s. We consider some time t at which forwards i.27) i+1 αi bAF (s)/bAF (s) αi E(n+1) bDA (t) x(s) = x n+1 i+1 DA (t) b n+1 where each of the abovestated t random variables should be evaluated at (t. 5% and 10%. . κ. x) = −1 = −1 (5. x) obtained by single time stepping. The framework for fast driftapproximated pricing yields timet forward rates as a function of x(t). was varied at 0%. . 29. x(t)). Ten annual forward rates were considered where forward rate i accrued from year i to i + 1. n have not yet expired. x(10) 7 Here the notations “AF” and “DA” indicate “arbitragefree” and “driftapproximated”. Under the assumptions that the model state is determined by the Markov process x. t was taken to be 20 years and tn+1 was taken to be 30 years.5. 10 20 30 5. The second equality follows from bAF /bAF being a martingale by the assumption of no ari n+1 bitrage. TEST OF ACCURACY OF DRIFT APPROXIMATION 125 f20 0 time (Y) s t tn+1 Figure 5.6: Setup for inconsistency test. x) obtained in this way may then be compared with forward rates fiDA (s. s was taken to be 10 years. . for i = 20.6.9. The γi were chosen such that the ˜ volatility of the corresponding caplet was equal to some general volatility level v. 15% and 20%.2 Numerical results for single time step test The inconsistency test was performed under the following setup. and meanreversion. the fundamental arbitragefree pricing formula will yield values of forward rates at time s < t as a function of x(s) given by the following formula:7 DA (n+1) bi (t) x(s) = x 1 bAF (s)/bAF (s) 1 E bDA (t) i n+1 n+1 fiAF (s.9. . . Let sd denote the standard deviation of x(10).9.
79 10.7. ±sd/2.47 3. We note that the error for f29 is always zero as it is fully determined by x.07 (%) 3.27 4.32 1.92 3.81 9. Results for the volatility/meanreversion scenario 15%/10% are given in Table 5.21 5.75 4.126 CHAPTER 5. sd denotes the standard deviation of x(10). for less volatile market scenarios. with errors only up to a few basis points.91 AF DA Table 5.74 4.94 8.86 2.81 4.23 4.56 Predictorcorrector MeanVolatility level (v) reversion 10% 15% 20% 0% 5% 10% 2.81 4.7: Quality of drift approximations: comparison of f20 (10) and f20 (10) under diﬀerent x(10) moves for the volatility/meanreversion scenario 15%/10%.79 5.8: Quality of drift approximations: Maximum of f20 (10) − f20 (10) over x(10) moves 0.11 4. The test was performed for both the Brownian bridge and predictorcorrector schemes.45 53.73 19.11 28. Brownian Bridge x(10) −sd −sd/2 0 +sd/2 +sd AF f20 DA f20 DA f20 Predictorcorrector − (bp) AF f20 x(10) −sd −sd/2 0 +sd/2 +sd (%) 3.60 12. . FAST DRIFTAPPROXIMATED PRICING FOR BGM AF DA Table 5. In Table 5.46 9.69 8. . The results show that the former outperforms the latter in the timing inconsistency test. ±sd/2. Diﬀerences are denoted in basis points.59 moves were considered for 0.27 4.8 the maximum error (over the ﬁve considered x(10) DA AF moves) between f20 (10) and f20 (10) is reported. and therefore its corresponding error is the largest among i = 20.19 4.38 6.03 7. sd denotes the standard deviation of x(10).46 12. ±sd for diﬀerent volatility/meanreversion scenarios.17 7. .29 10.34 8. and ±sd.56 1.77 5.70 5.38 6.05 3. The comparison is only reported for f20 because this forward rate contains the most drift terms. All variables are evaluated at time t=10.91 37.79 5.03 2. The inconsistency test results show that.85 44. Brownian Bridge MeanVolatility level (v) reversion 10% 15% 20% 0% 5% 10% 2. . . 29.03 AF f20 (%) DA f20 (%) DA AF f20 − f20 (bp) 5.28 5.37 0. the single time step framework performs very accurately.38 6.97 2.
5. We have shown that. The proposed drift approximation framework was applied to the pricing of Bermudan swaptions.8). 5. namely.10 Conclusions We have introduced a fast approximate pricing framework as an addition to the predictorcorrector drift approximation developed by Hunter et al. the diﬀerence between the forward index i and n. y(t∗ ) = y ∗ . v(t) := 0 σ(s) 2 ds. such as numerical integration or ﬁnite diﬀerences.) The solution of y (unconditional of timet∗ ) is given by y(t) = y0 ex(t)− 2 v(t) . the single time step approximation will break up. These authors used the drift approximation only to speed up their Monte Carlo by reducing it to single time step simulation. y(t) (Compare with (5. The additional cost is a nonrestrictive assumption. We note that 1 ω ∈ Ω. CONCLUSIONS 127 For more volatile market scenarios the approximation deteriorates. etc.A Appendix: Mean of geometric Brownian bridge In this appendix. for which it yielded very accurate prices with much lower computation times. given by dy(t) = σ(t) · dw(t). the tenor of the deal. we may determine the timet mean of the process y. at a slight cost. Equivalently. Care should be taken in applying the single time step framework for BGM that the market scenario does not violate the realm where the single time step approximation is reasonably valid. But for realistic yield curve and forward volatility scenarios there are no problems with respect to pricing (see Section 5. The worsening of the approximation for more volatile scenarios is what may be expected from the nature of the drift approximations: as the model dimensions increase. y(t∗ ) = y ∗ = ω ∈ Ω. where x(t) := 0 t t 1 σ(s) · dw(s).9).10. (2001). much faster computational methods may be used.9) is determined. By “model dimensions” we mean the volatility level.5. x(t∗ ) = log(y ∗ /y0 ) + v(t∗ ) =: x∗ . separability of the volatility function. y(0) = y0 . or time zero forward rates. 2 . the timet mean of the process fk deﬁned in (5.
v(t) − ∗) v(t v(t∗ ) . In this appendix an error bound for this approximation is derived. The expectation term can always be rewritten as g(µ. Working in the timechanged time coordinates. and it is shown that the approximation is of order two in volatility in the neighbourhood of zero. (5. With this. It is straightforward to verify that the above function g : R2 → R is inﬁnitely diﬀerentiable at every point of the whole real plane. this translates to x(t) x(t∗ ) = x∗ ∼ N v(t) v(t) ∗ x .28) where the following simple rule has been used: E[ez ] = eβ+τ distributed. We note that approximating the above expectation at the mean signiﬁes that the above function is approximated as g(µ. we may evaluate the mean of (y(t)y(t∗ ) = y ∗ ) to be E y(t) y(t ) = y ∗ ∗ = y0 y∗ y0 v(t) v(t∗ ) exp 1 v(t) v(t∗ ) − v(t) 2 v(t∗ ) 2 /2 . we have that x(τ (·)) is a Brownian motion.9) is described. whenever z is normally 5. v(t) > s}.9) In Section 5. x(·)x(τ ∗ ) = x∗ is a standard Brownian bridge.τ − ∗ .6. according to Section 5. σ) = E exp{µ + σz} . x(τ ) x(τ ∗ ) = x∗ ∼ N τ ∗ τ2 x . where the time change τ is deﬁned by τ (t) = inf{s ≥ 0.B Appendix: Approximation of substituting the mean in the expectation of expression (5. FAST DRIFTAPPROXIMATED PRICING FOR BGM According to the martingale time change theorem (for example Theorem 4. 0) = exp{µ} . 1 + exp{µ} .3 a fourstep method for the calculation of expression (5.6 of Karatzas & Shreve (1991)). An approximating fourth step is proposed that evaluates the expectation of the BGM drift inserting the mean. z ∼ N (β. σ) ≈ g(µ.B of Karatzas & Shreve (1991). τ∗ τ 2 Back in the original time coordinates. τ 2 ). 1 + exp{µ + σz} where z is distributed standard normally. and so.128 CHAPTER 5.
% creates zero array with n entries .z) % Calculates forward LIBOR rates in onefactor model with Brownian bridge % drift approximation & single time step. 0) = 0. array with n elements.4. vol[i] = volatility of forward LIBOR i t.f0. it follows from Theorem 7.7 of Apostol (1967) that the constant ¯ c may then be chosen independently of µ for all σ ∈ [0. a positive integer f0. σ ].5. of forward LIBORs. σ ]. The interchange of diﬀerentiation and expectation is a subtle argument that may.vol. ¯ 5. σ) − exp{µ} ≤ cσ 2 . 2 Due to the odd nature of the above integrand at the point σ = 0.1). We carefully veriﬁed that in the above case all the requirements for interchange are satisﬁed. paragraph A. array with n elements. be found in Williams (1991. Gaussian increment ~N(0. APPENDIX: MATLAB CODE FOR BROWNIAN BRIDGE SCHEME 129 Fix µ and calculate the derivative of g with respect to σ. no. day count fractions vol.16. We then ﬁnd ∂g exp{µ + σz} (µ.t.1). given the normal increment z. 1 + exp{µ} Because a bound on the second derivative of σ → g(µ. ∂σ Taylor’s formula then states that there exists c ≥ 0 (possibly depending on µ) such that g(µ. scalar % f is used to store result f=zeros(n. we ﬁnd that ∂g (µ.a. array with n elements. function result = fBB(n. is displayed below. % % % % % % n. time (scalar) z. time zero forward LIBORs a. σ) = E z ∂σ 1 + exp{µ + σz} .C.1). for example. σ) may be found independently of µ on some interval [0.C Appendix: MATLAB code for Brownian bridge scheme MATLAB code illustrating the Brownian bridge scheme and the foursteps calculation method in Section 5.
0..5*vol(n)^2*t+vol(n)*sqrt(t)*z).p1. % This function will be integrated over time. run_drift=run_drift .. . % Loop from penultimate LIBOR down to first LIBOR. % used for efficient calculation of drift for i=n1:1:1 zt=log(f(i+1)/f0(i+1))+0.5*vol(i+1)^2*t. % % % % % % s.0.p1..t.*zt0.) over s from a to b with convergence criteria tol and % trace. scalar. % quad is a standard integration routine in MATLAB.f0. the predictorcorrector scheme is a special case of % the Brownian bridge scheme if the crudest integrator (twopoint % trapezoid) is used.*s. scalar. % Adjusting the convergence criterion of the numerical integrator % allows for a tradeoff between accuracy and computational speed.^2. run_drift=0. % Equation (5./t. FAST DRIFTAPPROXIMATED PRICING FOR BGM % First do ultimate forward LIBOR => martingale! f(n)=f0(n)*exp(0.1. Equation (5.0. volatility of forward LIBOR t.t../t+log(f0)+log(a).b. end result = f. % For example. scalar. help variable associated with LIBOR predicted at time t % Mean of Brownian bridge. time zero forward LIBOR a. scalar.a(i+1). % Of course..tol. quad(@driftBB.f0(i+1).zt).*s.5*vol(i)^2*t)+vol(i)*sqrt(t)*z).zt) % Calculates drift term evaluated at the mean of the Brownian bridge.p2.t.trace.) integrates the function % f(s.vol.3) in exp form f(i)=f0(i)*exp((run_drift*vol(i)0. one can use any integration routine instead of quad.5.vol(i+1)... % quad(@f. % For definitions of tol and trace we refer to MATLAB documentation. scalar.0e6.130 CHAPTER 5. % return result f function result = driftBB(s. scalar.*vol.a.. time (at which forward LIBOR has already been predicted) zt. day count fraction vol.a.p2.26) in logform: m=s. current (intermediate) time f0. % Needed for driftBB.0.
)): result=vol*exp(m).C./(1. APPENDIX: MATLAB CODE FOR BROWNIAN BRIDGE SCHEME % Essential form of BGM drift in terms of log rates:exp(. 131 .5.0+exp(m)).)/(1+exp(.
132 CHAPTER 5. FAST DRIFTAPPROXIMATED PRICING FOR BGM .
Chapter 6 A comparison of single factor Markovfunctional and multi factor market models We compare single factor Markovfunctional and multi factor market models for hedging performance of Bermudan swaptions. we show that the impact of smile can be much larger than the impact of correlation. A cancellable swap can be valued by the following parity relation. Moreover. thereby supporting the claim that Bermudan swaptions can be adequately riskmanaged with single factor models. We show that hedging performance of both models is comparable. Such callable swap options are referred to as Bermudan swaptions. Thus a cancellable swap can be valued when the callable swap can be valued. Institutional debt issuers use interest rate swaps to revert from ﬂoating to ﬁxed interest rate payments. A cancellable interest rate swap is equal to a plainvanilla interest rate swap plus a callable interest rate swap with reversed cash ﬂows. and vice versa. The hedge results show that this new method enables proper functioning of market models as riskmanagement tools. which is a modiﬁcation of the leastsquares Monte Carlo method. We propose a new method for calculating risk sensitivities of callable products in market models.1 Introduction Bermudan swaptions form a popular class of interest rate derivatives. in which periodic ﬁxed payments are exchanged for ﬂoating LIBOR payments. 6. The underlying is a plainvanilla interest rate swap. Bermudan means that the exercise opportunities are at a discrete set of time points. . A European swaption is an option to enter into a swap at only a single exercise date. Often the issuers want to reserve the right to cancel the swap.
Markovfunctional models and market models. Examples of shortrate models include the models of Vasicek (1977). We distinguish three categories: shortrate models. A disadvantage is then that the instantaneous correlation between interest rates can only be 1. that are consistent with the market practice of using the formula of Black (1976) for caps (options on LIBOR) and swaptions (options on swap rates). four. Market models however eﬃciently allow for any number of stochastic variables to be used. which points to possibly better hedge performance. The question addressed in this chapter is whether the increase in hedge performance due to use of a multi factor model is signiﬁcant. Shortrate and Markovfunctional models are usually1 implemented as models with a single stochastic process driving the term structure of interest rates. (1997). so that any instantaneous correlation structure can be captured. thereby preserving wealth in the most stable manner. lead to suboptimal exercise strate1 Two factor short rate models exist too. because of supposedly misspeciﬁed dynamics. we will study the pricing and hedging performance of two popular models for Bermudan swaptions. (1990). A more realistic description of reality may thus be expected from multi factor models. The name ‘market model’ refers to the modelling of market observable variables such as LIBOR rates and swap rates. Ho & Lee (1986) and Hull & White (1990). Dothan (1978). see for example Ritchken & Sankarasubramanian (1995). Shortrate models model the dynamics of the term structure of interest rates by specifying the dynamics of a single rate (the short rate) from which the whole term structure at any point in time can be calculated. We mention four articles that compare single and multi factor models. or even more). Longstaﬀ et al. First. . rather they are models that most reduce variance of proﬁt and loss (P&L). (2000) assumes that the discount factors are a function of some underlying Markov process. we say: Models that are best for managing an interest rate derivatives book are not necessarily models that are most realistic. (1985). COMPARISON OF SINGLE AND MULTI FACTOR MODELS In this chapter. (2001) claim that shortrate models. To those that a priori dismiss the use of single factor models due to their economic irrelevance by failure in capturing the multi factor dynamics of the term structure of interest rates.134 CHAPTER 6. see the review article of Dai & Singleton (2003). Market models were introduced by Brace et al. There is substantial evidence that the term structure of interest rates is driven by multiple factors (three. Miltersen et al. The Markovfunctional model of Hunt et al. Many models have been proposed in the literature for valuation and risk management of Bermudan swaptions. The explicit modelling of market rates allows for natural formulas for interest rate option volatility. (1997) and Jamshidian (1997). Cox et al. Black et al. The model is then fully determined by noarbitrage arguments and by requiring a ﬁt to the initial yield curve and interest rate option volatility. in favour of multi factor models.
i. Second. The two practices of (i) ﬁtting to an appropriate set of swaptions. (2003) and Fan et al.1. They show that if the number of hedge instruments is equal to the number of factors. discount bonds. (2003) are thus consistent with the ﬁndings of Driessen et al. In fact. (2003). and (ii) a large set of discount bonds. The argument of Andersen & Andreasen (2001). There is one drawback of using high factor models however. If. (2003). A European product depends solely on the marginal distributions of the swap rates. For valuation in high factor models. is that their choice of calibration does not correspond to market practice and leads to models that are poorly ﬁtted to market. Valuation by MC is not a problem. Gupta & Ritchken (2003) show. whereas a Bermudan product depends on the joint distribution. This claim is supported by empirical evidence performed with the shortrate models of Black et al. which is lesser tractability than low (one or two) factor models. (1990) and Black & Karasinski (1991). which is the case in practice. The authors then conclude that the costs to Wall Street ﬁrms of following single factor exercise strategies could be several billion dollars. too. we show that the variance of P&L is signiﬁcantly reduced when a vega hedge has been set up additional to a delta hedge. against the claim of Longstaﬀ et al. in favour of single factor models. These authors investigate two types of delta hedge instruments. The results of Fan et al. (i) a number of delta hedge securities. Fourth. we ﬁt the models exactly to a subset of European swaptions particular to a Bermudan swaption rather than attempting to ﬁt to the whole swaption volatility surface.6. are probably more close in spirit to ﬁnancial practice. (2003). (2001). for the case of the number of hedge instruments equal to the number of factors. INTRODUCTION 135 gies. Relative to Driessen et al. Driessen. and (ii) vega hedging. Andersen & Andreasen (2001) claim that the exercise strategy obtained from a properly calibrated single factor model only leads to insigniﬁcant losses when applied in a two factor model. Moreover. that higher factor models perform better than lower factor models in terms of delta hedging of European swaptions and European swaption straddles2 . then multi factor models outperform single factor models. . we must resort to Monte Carlo (MC) simulation. one for each security spanning the yield curve. we make the contribution of also considering vega hedging and Bermudanstyle swaptions rather than only delta hedging and only Europeanstyle swaptions. the large set of hedging instruments is used. then single factor models perform as well as multi factor models in terms of delta hedging of European swaptions. Klaassen & Melenberg (2003) are the ﬁrst to investigate hedge performance. however. Fan. but the estimation 2 A European swaption straddle consists of a position of long a payer swaption and long an otherwise identical receiver swaption.e. as Driessen et al. Third. (2003) and Fan et al. and also ours. equal to the number of factors.
see. it then no longer matters for pricing Bermudan swaptions whether the single factor Markovfunctional model is a realistic or unrealistic model of other parts of reality in the interest rate market. determined by the joint distribution of only the underlying spot coterminal swap rates at the exercise dates. We then equip a full factor swap market model with a parameterized instantaneous correlation matrix. Essentially. The novel formula allows the Markovfunctional model to be calibrated to terminal correlation. the failure of not capturing a realistic instantaneous correlation structure can be remedied. we only need to specify correlation. With the thus ﬁtted Markovfunctional model. Since such correct correlation speciﬁcation more or less determines the price of the Bermudan swaption. we have projected all relevant parts of reality correctly onto the single factor Markovfunctional model. Thus. it can be tweaked such that it is ﬁtted to product speciﬁc terminal correlation. . as follows. In theory the price of a coterminal Bermudan swaption is dependent of and fully determined by the joint distribution of the forward coterminal swap rates at each of the exercise dates. whereby. as can sometimes be the case as shown by Pietersz & Pelsser (2004a) (see Chapter 2). This is not due to the choice of calibration. for Bermudan swaptions and perhaps for other derivatives.136 CHAPTER 6. we subsequently compare hedge performance of Bermudan swaptions with real market data over a 1 year period. There are only n such spot coterminal swap rates. In this chapter. COMPARISON OF SINGLE AND MULTI FACTOR MODELS of sensitivities (Greeks) can be less eﬃcient. We consider two methods to improve the eﬃciency of sensitivity estimates. in a lognormal model. We will call their correlation the terminal correlation. We show that such discontinuity appears in the Longstaﬀ & Schwartz (2001) algorithm for valuation of Bermudanstyle options. The marginal distributions of these swap rates are governed by the associated European swaption volatility quoted in the market. The less eﬃcient estimation of sensitivities occurs if the payoﬀ along the path can change discontinuously as dependent on initial parameters. The comparison of hedge performance of single and multi factor models thus entails a tradeoﬀ between more realistic modelling and tractability. since in this chapter the safe option of timeconstant volatility (but dependent on the forward rates) is used. In eﬀect there are thus n(n + 1)/2 stochastic variables that determine the price. Glasserman (2004.1). and also with swap and LIBOR market models. page 67). although the Markovfunctional model fails to capture instantaneous correlation. up to ﬁrst order approximation. we use the observation that the price of a Bermudan swaption is. Piterbarg (2004. too. For the Markovfunctional model. outside of the volatilities and correlations of the relevant swap rates. in some sense. Section 7. see. for example..g. e. The accuracy of the new formula is tested numerically. calculate the resulting terminal correlation and ﬁt the Markovfunctional model to this terminal correlation. A novel approximating formula is derived for the terminal correlation in the Markovfunctional model.
Smile is the phenomenon that for European options diﬀerent Blackimplied volatility is quoted for diﬀerent strikes of the option. The remainder of the chapter is organized as follows. inverse ﬂoater (max(k − f. The smile Markovfunctional model and smile swap market model are subsequently ﬁtted to USD swaption smile data. since these have evolved from standard Bermudan swaptions. Relative to Bennett & Kennedy (2004) this chapter makes the contribution of also comparing multi factor models with the Markovfunctional model. INTRODUCTION 137 The research in this chapter is not aimed at comparing the model generated Bermudan swaption prices to reallife market quoted prices.2). In any case. First. last paragraph of Section 3. These nonstandard Bermudan swaptions are called callable LIBOR exotics. (2004) (see also Chapter 5) and Hunter et al. The model is then used as an extrapolation tool to determine a Bermudan swaption price consistent with swap and European swaption prices. Examples of such exotic coupon payments are capped ﬂoater (min( f. the hypothetical viewpoint is taken that swaps and European swaptions are liquidly traded in the market. For both the swap market model and the Markovfunctional model we initially use the basic wellknown nonsmile versions. we show how multi factor models can a priori be compared to the Markovfunctional model which is not a straightforward extension from the onedimensional case. We provide details. the results of this chapter are interesting for the study of callable LIBOR exotics. we outline the comparison methodology for the two models. for which the underlying has more exotic coupon payments.6. and Bermudan swaptions are less liquidly traded. Rather. These authors show that the one factor LIBOR Markovfunctional model with mean reversion and the one factor separable LIBOR market model are largely similar in terms of dynamics and pricing. Second. The results of this chapter may apply to many types of callable LIBOR exotics. Moreover. the study in this chapter is relevant for nonstandard Bermudan swaptions. with the fraction for which LIBOR within the accrual period is within a certain range). The LIBOR Markovfunctional model has been compared with the LIBOR market model before by Bennett & Kennedy (2004). but further research will have to provide a deﬁnitive answer. as introduced by Pietersz et al. They also show this for an approximated version of the LIBOR market model by drift approximations. the Markovfunctional model can be ﬁtted to smile. Third. and show that the resulting smileﬁtting procedure is numerically eﬃcient and straightforward to implement. k) for some cap rate k and leverage ).1. (2000. also for the swap market model. as well as the two Greeks calculation methods for market models. we numerically test the accuracy of an approximating formula for the terminal correlation in the Markovfunctional . The LIBOR and swap market models and Markovfunctional model are discussed. and such that the risk sensitivities provide a hedge of the former in terms of the latter securities. (2001). We then compare empirically the impact of smile versus the impact of correlation. 0)) and range accrual ( f . As mentioned in Hunt et al. the data is described. Nonetheless.
for example. . . We contend that the main driver for the price of Bermudan swaptions is the joint distribution of the realizations of the coterminal swap rates {si:n+1 (ti ) . Introduce the variable η ∈ {−1. Denote the notional amount by q and the day count fraction for accrual period [ti . . the ﬁxed maturity version. . Ostrovsky (2002) calls this the diagonal process. . n}. COMPARISON OF SINGLE AND MULTI FACTOR MODELS model. The type of Bermudan swaption that is considered here is the coterminal version. The underlying swap makes a payment πi at time ti+1 depending on the LIBOR rate f (ti ) ﬁxed at time ti for i = 1. Here si:j denotes the forward swap rate for a swap that start at ti and ends at tj+1 . 1} by η = 1 for a pay ﬁxed swap and η = −1 for a receive ﬁxed swap. . sn:n+1 } are called coterminal since they all coend at the same termination date. In contrast. sn:n+1 (tn ). . the holder can nonetheless decide to hold on to the option in view of a more favourable forward swap rate sj:n+1 (ti ). . . The payment πi is then ηαi (f (ti ) − k)q. The economic argument is that prima facta. . . . The holder will thus only exercise the Bermudan at time ti if η(si:n+1 (ti ) − k) > 0. . t2 . i = 1. . Alternatively. . Fifth. . . A coterminal Bermudan swaption is an option to enter into an underlying swap at several exercise opportunities. Second. . . Sixth. Fourth. . n. πn . . where each swap ends at the same contractually determined end date. . . the holder of the option has to choose between receiving the payoﬀs of entering into the swaps starting at t1 . The maturity of the swap entered into thus becomes smaller as the option is exercised later. It follows that the price of a Bermudan swaption is dependent of and fully determined by the joint distribution of the variables {sj:n+1 (ti ) . each swap that can be entered into has the same contractually speciﬁed maturity and the respective end dates then diﬀer.138 CHAPTER 6. s2:n+1 (t2 ). tn and the associated payoﬀs are determined fully by s1:n+1 (t1 ). . .2 Methodology In this section. as opposed to. we set up the framework that enables a comparison between multi factor and single factor models. . Associated with this swap is a tenor structure 0 < t1 < · · · < tn+1 . . . n. j > i. empirical comparison results are presented. i = 1. ti+1 ] by αi . for a ﬁxed maturity Bermudan swaption. . If the holder exercises the option at time ti . . j = i. The forward swap rates {s1:n+1 . we ﬁrst introduce some notation. we conclude. . . But even when the immediate exercise value is positive. . . the impact of smile is investigated. . The holder of the Bermudan swaption has the right to enter into the swap at the dates t1 . 6. . n}. tn . We consider a Bermudan swaption on an underlying swap with n payments and a ﬁxed rate k. . then he or she will receive the payments πi . in the market the holder could have entered into an otherwise equal swap but with ﬁxed rate equal to the swap rate si:n+1 (ti ).
respectively. For example. . This means that the variance of the variables {s1:n+1 (t1 ). Here σ i:n+1 (·) denotes the instantaneous volatility function and w(i:n+1) denotes a Brownian motion under the ith forward swap measure. In the next three sections. n forward LIBORs are modelled as lognormal processes under their respective forward measure. . The terminal correlation itself is determined both by the instantaneous correlation and the term structure of instantaneous volatility. with forward swap rate si:n+1 satisfying. i = 1. . We assume that any valuation model for the Bermudan swaption is calibrated to the socalled diagonal of European swaptions that start at ti and end at tn+1 . by the instantaneous volatility than by the instantaneous correlation. . .4) it is shown that the terminal correlation is inﬂuenced just as much. sn:n+1 (tn )} is already fully determined. The idea of terminal correlation is not new to ﬁnance. weighted by the respective day count fractions. .2. with maturity times corresponding to the payment times of the swap.2) shows that it is the terminal and not the instantaneous correlation that directly aﬀects the price of swaptions. This correlation matrix will be called the terminal correlation. Thus the diagonal process is fully determined (given a normal or lognormal distribution) if we specify the correlation matrix for the variables {si:n+1 (ti ) . . The latter measure is associated with a portfolio of discount bonds.j:n+1 (t)dt. 6. . We show how the terminal correlation can approximately be calculated in the swap market model and the Markovfunctional model. . . we calibrate models to only those sections of the market that are relevant to the product. we discuss the LIBOR and swap market models and the Markovfunctional model. . METHODOLOGY 139 As is common in ﬁnancial practice. with forward LIBOR fi satisfying.1. Here σ i (·) denotes the instantaneous volatility function and w(i+1) denotes a Brownian motion under the ith forward measure. dw(j:n+1) (t) = ρi:n+1. Section 7. si:n+1 (t) dw(i:n+1) (t). In Rebonato (1999c. and even more.2. rather than attempting to ﬁt the models to all available market data. dsi:n+1 (t) = σ i:n+1 (t) · dw(i:n+1) (t). n forward swap rates are modelled as lognormal processes under their respective forward measure. Rebonato (2002. dw(j+1) (t) = ρij (t)dt. Section 11. . The latter measure is associated with a discount . n. dfi (t) = σ i (t) · dw(i+1) (t). Within the LIBOR market model. The value of such a portfolio of discount bonds is named the present value of a basis point (PVBP). For the Markovfunctional model we show how the model can be calibrated to the terminal correlation. fi (t) dw(i+1) (t).6.1 The LIBOR and swap market models Within the swap market model. n}. i = 1.
where n = 10 corresponds to the setting in the forthcoming hedge tests. see.. for a = 0. In fact. . σ i:n+1 (t) = σ i:n+1 and ρi:n+1.1) This parametrization of instantaneous correlation allows for a simple calibration of the Markovfunctional model to the terminal correlation of the swap market model. Wu (2003). for a suitable choice of a. ﬁx β. Grubiˇi´ & Pietersz (2005) (see Chapter 4). for 10 × 10 correlation matrices. via an approximation of swaption volatility in terms of LIBOR volatility. a constant instantaneous volatility assumption leads to eﬃciently estimated risk sensitivities. i.e. First. Second. b) sc (see Chapter 3). Hull & White (2000). In other words. for i < j. Rebonato (2002. Section 9) or Brigo (2002). and ρij (a) ≡ 1. When an arbitrary correlation matrix has been speciﬁed. see also Chapter 2. (6.1) is nonetheless a good choice.g. We parameterize the instantaneous correlation matrix by.2) We numerically ﬁtted the form of (6. impact the results. as shown by Pietersz & Pelsser (2004a). we consider only either rank 1 or fullrank correlation matrices. for example. ρij (a) = (e2ati − 1)/ti for a > 0. for some β ≥ 0. (e2atj − 1)/tj (6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS bond that matures at ti+1 .. but then if a number of factors d < n be required. Within both market models. generally such matrix has full rank n.140 CHAPTER 6.1) to (6. To test the two extreme cases.5). see. or only favourably. ρij (β) = exp − βti − tj  . page 83). 3 For solving such rank reduction problems the reader is referred to Pietersz & Groenen (2004a. the payment time of the ith LIBOR deposit. our choice of parametrization of the correlation matrix is both a constant and timehomogeneous parametrization. as explained by the following two arguments. These choices. the resulting calibration algorithm reduces to a simple bootstrap algorithm for determining the LIBOR volatility levels. The rank of the correlation matrix P = (ρij )n i. will not.2). Equation (4.1) has been chosen such that the resulting terminal correlation of the swap market model exactly matches the terminal correlation of a Markovfunctional model with mean reversion parameter a.1) corresponds to a form that is often quoted in the literature. we set the instantaneous volatility and correlation constant over time.j:n+1 for the swap model. since we will show that. relative to the timehomogeneous case. whereas certain speciﬁc timehomogeneous speciﬁcations may not. (6. By assumption of constant volatility and constant correlation (see below).j=1 determines the number of Brownian motions (number of factors) driving the model. and σ i (t) = σ i and ρij (t) = ρij for the LIBOR model. The correlation structure (6.j:n+1 (t) = ρi:n+1. Rebonato (1998. parametrization (6. allowing respectively correlation constant at 1 or a full ﬁt to any correlation matrix. e. we are led to solve a rank reduction problem3 . The LIBOR market model is calibrated approximately to swaption volatility.
obtaining an average absolute error over the entries in the correlation matrix that is less than 0.2. . The ﬁt error is the average absolute error over the entries. satisfying dx(t) = τ (t)dw(t). for a forward swap rate we have si:n+1 (tj ) = si:n+1 (tj .02 for typical values of β and a. matrix (right axis) 0. see Hunt et al. The relationship between the ﬁtted a as dependent on β is displayed in Figure 6. the ﬁt is of good quality.2 The Markovfunctional model We consider the swap variant of the Markovfunctional model.2). (2000.1. x(tj )).4) for details on this variant.00 Figure 6. As can be seen from the ﬁgure. METHODOLOGY 25% 0. We assume that the driving Markov process of the model is a deterministically timechanged Brownian motion. and then ﬁnd a that solves n n min a≥0 i=1 j=1 ρij (a) − ρij (β) .1: Fitted aparameter of parametrization (6.6. any model variable is a function of an underlying Markov process x. For example. Section 3.03 141 a (left axis) 20% avg.1) (left axis) and ﬁt error (right axis) versus the βparameter of the Rebonato (1998) parametrization (6.2. 6.02 15% 10% 0. abs. error per entry of corr. Within the (swap) Markovfunctional model.01 5% 0% 0% 5% beta 10% 15% 0.
142 CHAPTER 6.4) In fact.1).1) of the instantaneous correlation of the swap market model was constructed such that the (approximate) terminal correlation (6. see Section 4 of Hunt et al.6) The speciﬁcation (6.e. ln sj:n+1 (tj ) By straightforward calculation. COMPARISON OF SINGLE AND MULTI FACTOR MODELS Here τ (·) denotes a deterministic function (that can be chosen piecewise constant) and w denotes a Brownian motion. from (6. with a denoting the mean reversion parameter. Since correlation is unaltered by a linear transformation. x(tj )) = Var(x(ti ))Var(x(tj )) ti 0 tj 0 ≈ ρ x(ti ). and in a later section we investigate the accuracy of the approximating formula. (2000). In this case we have. we have ln si:n+1 (ti . (6.3) τ 2 (t)dt τ 2 (t)dt . x(tj ) = e2ati − 1 . x(tj ) . ρ ln si:n+1 (ti ). i. ρ x(ti ).4. any functional of the Markov process can be linearized by a Taylor expansion and. We now present an approximate formula for the terminal correlation. as will be shown numerically in Section 6. τ (t) = exp(at). By a Taylor expansion. x) ≈ (0) (1) si:n+1 (ti ) + si:n+1 (ti )x.5) To verify that the Markovfunctional model is properly calibrated to terminal correlation. In principle. for i < j. For ease of exposition we will however restrict our attention to the case of mean reversion. (6. ρ x(ti ). ti 0 σi:n+1 (t)σj:n+1 (t)ρij (t)dt 2 σi:n+1 (t)dt tj 0 2 σj:n+1 (t)dt ti 0 = σi:n+1 σj:n+1 ρij ti 2 2 σi:n+1 ti σj:n+1 tj = ρij ti = tj e2ati − 1 . The above theoretical argument is therefore not very strong. x(tj ) = Cov(x(ti ). the Markovfunctional model can thus be approximately ﬁtted to the terminal correlation by minimization of the ﬁtting error given a marketimplied or historically estimated terminal correlation matrix. e2atj − 1 (6. e2atj − 1 (6. would exhibit the same approximate terminal correlation (6. the terminal correlation of the swap rates is thus approximately equal to the terminal correlation of the underlying Markov process. according to the argument above. for i < j. An argument explaining the formula is given. The approximation however turns out to be accurate.5) of the Markovfunctional . in the swap market model this correlation is approximately calculated to be. The parameters for this minimization problem are for example the n parameters governing the piecewise constant function τ (·).4). for i < j.
The ﬁnite diﬀerence estimate of the Greek is then (v(ε) − v)/ε. The mean square error (MSE) of the ﬁnite diﬀerence estimator is dependent on the chosen perturbation size ε. Initial market data is perturbed by amount ε. leading to an optimal (‘large’ and positive) choice of ε that attains least MSE. We note that this correspondence does not necessarily hold for the LIBOR market model. We describe two methods that enhance the eﬃciency of ﬁnite diﬀerence estimates. Method (ii) that we propose. Method (i). 6. Such a discrete choice is inherently discontinuously dependent on initial input. The two methods are discussed below in more detail. These are: (i) Finite diﬀerences with optimal perturbation size. Section 7. the second of which is novel. Therefore. the change in value due to a change . but rather use the very same exercise strategy as in the base valuation case. Any discontinuity in a simulation may cause ﬁnite diﬀerence estimates of sensitivities to be less eﬃcient.6) in the swap market model with parameter a. see Glasserman (2004). then there is a tradeoﬀ between increasing and decreasing ε.1). If the payoﬀ is discontinuous however. The constant exercise boundary method is a heuristic. The bias is likely to be small. In the perturbed model. for the base valuation we record per path when the exercise decision takes place. 0. the model is recalibrated and subsequently priced at v(ε). The discontinuity in the LS algorithm stems from the estimated optimal exercise index chosen from a discrete set of possible exercise opportunities.2. then least MSE is obtained when ε is selected as small as possible (though larger than machine precision). (ii) Constant exercise decision heuristic. we found perturbation sizes of roughly 1 basis point (bp. is named the constant exercise decision method. though we nonetheless employ it in the comparison tests. After some preliminary testing. The bias stems from not taking into account the change in value of the derivative as a result of a change in the (approximate) exercise decision. the value of the derivative in the unperturbed model. We denote by v the base value of the derivative.2. see Glasserman (2004).3 Estimating Greeks for callable products in market models The algorithm of Longstaﬀ & Schwartz (2001) (LS) renders the numeraire relative payoﬀ along a simulated path discontinuously dependent on initial input. we no longer perform LS leastsquares Monte Carlo. because the exercise decision is close to optimal by construction.01%) for delta and 5 bp for vega. since its estimate is stable but biased. the ﬁnite diﬀerences method is best described as the bumpandrevalue approach.6. Here. i.e. see Glasserman (2004. METHODOLOGY 143 model with mean reversion parameter a is equal to the (approximate) terminal correlation (6.. If the numeraire relative payoﬀ along the path is continuously dependent on initial input.
we nevertheless consider it in our tests. we describe the data used in the comparison test. {s1:n+1 (t1 ). which rules out inverting the instantaneous volatility matrix. The pathwise method cannot handle discontinuous payoﬀs. we use an arbitrarily chosen timespan. we have an n × d matrix with n the number of forward rates and d the number of stochastic factors. A statistical description of the swaption volatility data is displayed in Table 6. the associated column with four entries . From the discussion on perturbation sizes for method (i). We use 10−5 bp for both delta and vega. We end this section by a brief discussion of other methods for calculation of Greeks available in the literature. For the market model setting. We note that the constant exercise method renders a revaluation continuously dependent on initial market data. 6. 7Y. Finally. Section 4. .1. The likelihood ratio and Malliavin calculus method both require that the matrix of instantaneous volatility be invertible. For the comparison test. We use the 1 and 12 months deposit rates and the 2Y. First. Any discount factors required at dates not available from the bootstrap are calculated by means of linear interpolation on zero rates. swap rates and atthemoney (ATM) swaption volatility. The discount factors are bootstrapped from market data. Lasry. . which is the case for the Bermudan swaption studied in this chapter. Moreover.3 Data We describe the data used in the empirical comparison and smileimpact tests. Lions & Touzi 1999) and the utility minimization e approach (Avellaneda & Gamba 2001). COMPARISON OF SINGLE AND MULTI FACTOR MODELS in exercise decision is likely to be small.2) have resolved the noninvertibility issue only for a particular case. of USD data of midquotes for deposit rates. provided the underlying swap payoﬀ is continuous. 16 June 2003–2004. sn:n+1 (tn )}. the method is straightforward to implement. and more eﬃcient. Though the method is biased. that does not apply to our case: When the payoﬀ is dependent only on the rates at their ﬁxing times. 10Y and 15Y swap rates.144 CHAPTER 6. since in revaluations linear regressions for the LS algorithm are no longer required. These methods could not straightforwardly be extended to the situation of our investigations. Discussed are the pathwise method (Glasserman & Zhao 1999). For each available tenor and expiry (Exp. All market data was kindly provided by ABN AMRO Bank. rather it is reduction of variance of P&L. the importance is not bias. 5Y. the likelihood ratio method (Glasserman & Zhao 1999). . Glasserman & Zhao (1999. Usually d < n and most often d n. . the Malliavin calculus approach (Fourni´. it then follows that a leastMSE ﬁnite diﬀerence estimate of sensitivities is obtained by employing perturbation sizes that are as small as possible.). 4Y. Lebuchoux. 3Y. the utility minimization approach simply calculates a diﬀerent sort of risk sensitivity and is thus altogether biased. In ﬁnance. s2:n+1 (t2 ).
1 (2.2. 28.0) [19. 30. 24.2.0] 51.5 (3.1] 20.0.5) [16.8 (4.1 (3.0.5 (3.2) [20.7) [24.6] 27. 34. 56.6.4 (3.5 (1.3) [30. Tenor (Years) 1 2 46. 49.4 (1.0] 45.3] 16.9 (2.3. 24.8.6.9) [34.3.1] 14. 27. 26. 15. 27.4] 19.1.9. 57.4] 31.7.1) [19. 38.9] 46.8 (1.4] 29.4.8.3] 37.0. 33.1) [23. 44.8) [17.9) [27. 24. 39.3 (2.1] 20.0.5 (1. 46.6] 25. 31.0 (4. 61.9 (1. 34.5 (2.5 (2.8. 20.1) [16.2) [14.8] 25.9 (5. 40. 20.5 (1. 52. 30.5 (3. 35.6] 33. 39.7) [18.5) [16.0.7 (4. 47.3] 28.6) [17.7. 50.6 (2.9.7] 36.0 (4.5) [15.7.7.2) [23. 55. 17.0] 17.5.6.0.1) [26.7 (3.1] 44.4] 27.0] 27. 20.9] 17.9. 42.4 (2.8] 30. 22. 32.9.5 (4.1] 39.8) [18.2.7] 145 Exp. 37.2. 1M 3 45.0 (2.0] 7 32.3 (6.7.1] 40.0] 23.6) [27.0) [12.5) [16.8] 22.7 (2. 28. 57.9 (1.1.4.5) [34.6] 16.1. 50.5] 49.2.3 (2.6) [17.6.8 (2.8] 2M 3M 6M 1Y 2Y 3Y 4Y 5Y .6 (1.0.9 (3.5] 36.0] 28.3 (1.5) [30.1] 31.5] 25.4 (1.4.1) [29.0 (2.5] 24.6) [16.7] 18.8] 18.3] 10 27.2. 24.2] 15 22. 34.8) [36.2) [20.4 (1.7] 21.0.8) [13.4 (1.2) [34.6] 22.1. 25. 68.0] 22. 65.7 (2. 54.6.6 (5.3. 19. 25. 29.9) [27.5.4) [25. 22.7] 30 19. 27. DATA Table 6.0] 22.9.8] 5 37. 28. 20.9. 37.2) [23.7. 44.9 (2.4] 18.7] 48. 43.1.4) [25.8] 45.8) [23.5) [19.2.6] 23.0 (2.7. 22.3. 26. 18.8. 24. 53.3) [26.5] 33. 19.0.9] 22.9. 37.0 (1. 60.6) [33.3) [14.6 (1.7.0) [32.2] 18.7) [29.6 (5.0) [11.3 (3.3] 25.8) [18.1.0 (5.2] 21.9 (3.9] 24.9) [19.7 (6.0 (4.4) [34.7) [21.5) [21.3 (1.1) [19.6.7) [13.9 (3.6] 19.6.2) [13.7.3) [27.3.6.1) [20.1) [13.2 (3.9.8.6 (1.7] 37.2) [19.2) [14. 31.2] 30.8.7.9 (1.4 (4.4) [20.1.9.1) [23.3) [14.0] 44.5) [19.4 (1.0 (2.5] 21.9 (3.6.6) [21.1: Statistical description of the swaption volatility data.6.4 (1.3 (4. 48.8] 41.7 (1.0.3] 46.7] 4 40.4] 19.9 (6.6) [17. 23.8 (4.1] 41. 62.2 (4. 55.7 (1.0) [16. 36.0) [11. 25. 48.7.9) [36.8.3) [34.4.4 (1.6) [22. 16.0) [27.0 (2.3] 50.7) [26.8.5 (3.2] 36.9.5] 31. 63. 29.3 (1.3.3] 27. 30. 27.8 (1. 32.5) [16.3] 34.1. 30.9 (3.3) [15.8) [19.1.2) [13.8) [22.8 (1.2 (1.1 (2.7) [17.1.5 (1.5] 15.1) [12.3) [15.0.8] 22.2) [30.0 (4.8.6) [26.0] 15.4 (2.3 (2.4] 26.9] 13.9) [19.6 (1.7] 18. 47. 56.9] 22.2.7.6] 26.0 (3.8) [22.2] 20.0.
46 25.36 0 33.95 25. . COMPARISON OF SINGLE AND MULTI FACTOR MODELS Table 6. Strike.78 43.7) can be calculated on a lattice.’ denotes Expiry.7) The above equality follows from the F(ti )measurability of ln si:n+1 (ti ).41 38. The discount factors are displayed in Table 6.62 35. the standard deviation (in parentheses).96223 0.30 24.12 32.43 reports.92 100 37.32 27. respectively.99 28.146 CHAPTER 6. Here ‘Exp. We estimate (6. We have.3.19 30.84286 0.72 22.59 25. for any measure.23 24.08 50 32.01 27.41 30. All displayed swaptions coterminate 6 years from today.31 24. Second.72 38.96 26.82 28.1 are calculated by means of linear surface interpolation. The swaption volatility against strike and expiry is displayed in Table 6. Any volatility required at expiries and tenors not available from Table 6. in oﬀset in basis points from the ATM forward swap rate Exp.34 32.46 26. form) and the maximum (in ·] form). subsequently we integrate the result multiplied by ln si:n+1 (ti ) to obtain the required expectation. 1Y 2Y 3Y 4Y 5Y 300 58. for i < j.17 200 45. the mean.79986 Table 6.55 28. we describe the data used in the smileimpact test. We use USD data for 21 February 2003.3: Swaption volatility.15 29.57 30.66 50 35.10 25.7) by calculating for each grid point at time ti the conditional expectation E[ln sj:n+1 (tj ) F(ti )]. (6.59 26.65 37. against strike and expiry for the USD data of 21 February 2003.12 24. in percentages. in which we will consider a 6 year deal.98585 2Y 3Y 4Y 5Y 6Y 0.13 26.2: Discount factors for the USD data of 21 February 2003. 1Y 0.75 23.63 300 31. Expression (6. the minimum (in [·.52 22.88571 0.21 27.31 23.20 200 31.63 100 31.03 23. 6.4 Accuracy of the terminal correlation formula The terminal correlation in the Markovfunctional model is estimated via the terminal covariance.65 40.75 23.2. E ln si:n+1 (ti ) ln sj:n+1 (tj ) = E ln si:n+1 (ti )E ln sj:n+1 (tj ) F(ti ) .92697 0.
0 × 10−6 9. As basis functions we use a constant and one linear term .e.4. for a 40 years annualpaying deal.. over the entries in the matrix. for which the swaption volatility level is on average 14%. Avg.4. we report the results of our empirical comparison. M.6 × 10 5. for relative. err. err. thus for a 40 × 40 correlation matrix. especially considered over a 40 years horizon.3).1 × 10−5 1.5 × 10−5 1.8 × 10−7 4. The deal description is given in Table 6.0190% 0.0072% 0.3). for error. with all forward rates as explanatory variables.0030% 0.r.6.0032% 0. for absolute. with EUR market data of 8 February 1999. for mean reversion. i. abs.0012% 0. 10%. for a = 0% a single factor model must be used).r.3). For market models we use the terminal measure. for average. In Table 6. err.4: Error analysis of the terminal correlation measured in the Markovfunctional model versus given by the approximate formula (6. max.0003% The accuracy of the approximate formula (6. we use the leastsquares Monte Carlo algorithm of Longstaﬀ & Schwartz (2001). The terminal correlation matrix within the Markovfunctional model is calculated numerically on a lattice under the terminal measure and subsequently compared to the correlation matrix given by the approximate formula (6.0018% 0. for maximum. all available LIBOR rates for the LIBOR market model and all available swap rates for the swap market model. these errors are quite small. err. 0% 5% 10% 15% 20% 1. To determine the exercise boundary in market models. bar when a = 0%.0011% Avg. the maximum absolute and relative errors. 4. the numerical error inherent in the lattice calculation. various descriptive data for the comparison test are displayed.000 antithetic) and 10 stochastic factors (a full factor model).0006% 0.7 × 10−6 −4 Max. The reason for using all available rates as explanatory variables is that the multi factor nature of the market models needs be retained (if at all present. 5%. and 20%. The test is performed at various mean reversion levels.5. abs.0076% 0.5 Empirical comparison results In this section. abs. and avg.5. We note that the comparison contains two sources of error: First. Abbreviations used are m.000 simulation paths (5. we use a single factor model.3) is tested for a 40 years deal.0 × 10−5 5. EMPIRICAL COMPARISON RESULTS 147 Table 6. As can be seen from Table 6. 0%.000 plus 5. and the average absolute and relative errors. second. rel.0 × 10−5 2. rel. 15%. 0.2 × 10−5 3. 10. 6.0 × 10−7 0. the approximation (6. rel. err. Max. and. Reported are.
5% and 10%. The reasons that we employ bucket hedging rather than factor hedging are twofold. First.148 CHAPTER 6. Driessen et al. one discount bond for each tenor time associated with the deal. where m denotes the number of explanatory variables.2% Per Annum ACT/365 Modiﬁed Following At Fixing Dates per explanatory variable. With bucket hedging.5: The Bermudan swaption deal used in the comparison. A price comparison is displayed in Figure 6. As can be seen from the ﬁgure. for the mean reversion levels 0%. {1. and prices comove and stay together over time. The NPVs. (ii) Delta and vega hedging. The delta hedge is set up in terms of discount bonds. With respect to hedging.5. deltas and vegas of the deal are calculated at each trade date from 16 June 2003 till 15 June 2004. there are 11 such . the Markovfunctional and market models are similar in terms of NPV. inclusive. more importantly. . the number of hedge instruments equals the number of factors in the model. COMPARISON OF SINGLE AND MULTI FACTOR MODELS Table 6. . bucket hedging corresponds to ﬁnancial practice. Section VII.2. and then by revaluation of the derivative in a model recalibrated to the perturbed market data. Risk sensitivities are calculated by perturbing only the model intrinsic factors.C) show that bucket hedging outperforms factor hedging for caps and European swaptions (for delta hedging). x1 . Trade: Trade Type: Notional: Start Date: End Date: Fixed Rate: Index Coupon: Index Basis: Roll Type: Callable: Bermudan Swaption Receive Fixed USD 100m 16Jun2004 16Jun2014 3. the number of hedge instruments equals the number of market traded instruments to which the model has been calibrated to. Second. In the case of the deal of Table 6. xm }. With factor hedging. Two types of hedges are considered: (i) Delta hedging only. compared in terms of hedge performance. The models are. (2003. . . Risk sensitivities are calculated by perturbing the value of a market traded asset. we use socalled bucket hedging rather than factor hedging.
is calculated. of the Bermudan and European swaptions. discount bonds. we calculate the amount of each of the European swaptions needed to have zero portfolio vega for all underlying volatilities. see Joshi (2003b) for LIBOR models.6.000. around 42 seconds for market models with constant exercise decision method.000 3.000 Bermudan swaption value (USD) 2. Section 5) (see also Section 7. with perturbation sizes 10−5 bp for both delta and vega (referred to as ‘small’ perturbation sizes).500. Third. Second. and. we calculate the vegas of the 10 underlying European swaptions. (i) ﬁnite diﬀerences with perturbation sizes 1 bp for delta and 5 bp for vega (referred to as ‘large’ perturbation sizes). Needless to say.5. we used these fast algorithms. Fourth.000 500.500. as detailed in Section 6.2.000 2.000 1. The risk sensitivities are calculated in two ways.5 in this thesis) for swap models. 4 Feb04 Jan04 Jun04 Oct03 Jul03 Oct03 Apr04 Apr04 .000 1. EMPIRICAL COMPARISON RESULTS 3. the aggregate delta position. the 11 deltas and the 10 vegas.3. To set up a joint delta and vega hedge. and Pietersz & van Regenmortel (2005. for various models and correlation speciﬁcations.000 0 May04 May04 149 MF MR=0% MF MR=5% MF MR=10% SMM MR=0% SMM MR=5% SMM MR=10% LMM MR=0% LMM MR=5% LMM MR=10% May04 Jun03 Jun03 Jul03 Aug03 Aug03 Sep03 Sep03 Dec03 Dec03 Dec03 Feb04 Mar04 Mar04 Jan04 Nov03 Nov03 Trade day Figure 6. versus 3 seconds for the Markovfunctional model. We note here that the computational time of calculating the NPV. (ii) constant exercise decision method. This diﬀerence of compuThere are fast algorithms for implementation of market models with Monte Carlo. at any particular trade date.000. we proceed in the following four steps.2: Bermudan swaption values per trade date.000.500. First. is around 92 seconds for market models4 with ordinary LS. discount bonds are acquired to obtain zero delta exposure for all 11 delta buckets.
we use the constant exercise decision method.150 400.000 0 100. Boxwhisker plots provide a convenient representation of a distribution. Delta hedging signiﬁcantly decreases variance of P&L. Here. Of course. Boxwhisker plots for the change in value (in USD) of the hedged portfolio.000 200.000 CHAPTER 6.5.000 300. for various models and mean reversion or correlation parametrization parameters.000 500. and maximum values. Boxwhisker plots. For market models. MF. by displaying ﬁve of its key characteristics: the minimum. tational time is inherent to the (least squares) Monte Carlo implementation of market models versus the lattice implementation of Markovfunctional models. are displayed in Figure 6.000 100.3.000 200. The percentages denote the mean reversion level (MF) or correlation parametrization parameter (LMM and SMM). LIBOR market model and swap market model. We draw the following conclusions from the boxwhisker plots in Figure 6. Markovfunctional model. and SMM denote respectively. COMPARISON OF SINGLE AND MULTI FACTOR MODELS 1st qrt min median max 3rd qrt MF 0% MF 0% (unhedged) (delta hedged) MF 0% SMM 5% SMM 5% SMM 5% LMM 10% LMM 10% LMM 10% (delta & (delta & (unhedged) (delta (delta & (unhedged) (delta vega hedged) hedged) vega vega hedged) hedged) hedged) Figure 6. median.000 300.000 400. . such lattice implementation is allowed only because of the mild pathdependency of Bermudan swaptions. The hedge portfolios are set up at each trade day and the change in portfolio value on the next trade day is recorded. LMM. for the change in hedge portfolio value. with ‘small’ perturbation sizes. The hedge test results are ordered in three subsections. 6.3: Comparison of delta versus delta and vega hedging.1 Delta hedging versus delta and vega hedging The performance of delta hedging versus delta and vega hedging is compared. and the ﬁrst and third quartiles.3: 1.
sizes pert. 2. are displayed in Figure 6..6. We draw the following conclusions from the boxwhisker plots in Figure 6. 'small' ex.5.4.000 200. respectively.’ and ‘pert. sizes ex.000 MF (delta&vega hedged) LMM LMM SMM SMM (delta&vega (delta&vega (delta&vega (delta&vega hedged) 'Large' hedged) Const. It is clear that a joint delta and vega hedge by far outperforms a delta hedge. 1. The analogous boxwhisker plots for mean reversion or correlation parameters 5% and 10% are similar.’ denote ‘constant exercise decision method’ and ‘perturbation’. further study of delta hedges without a vega hedge.4. ‘const. Mean reversion or correlation parameter of 0%. sizes 151 1st qrt min median max 3rd qrt Figure 6. ..000 0 50.000 100. Boxwhisker plots for the change in value (in USD) of the hedged portfolio.000 100. pert. The estimation of sensitivities by ﬁnite diﬀerences over MC with ‘large’ perturbation sizes adversely aﬀects the variance of P&L for hedging in market models. with a mean reversion of 0% or a correlation parameter of 0%. in the remainder of the chapter.4: ‘Large’ perturbation sizes versus constant exercise decision method with ‘small’ perturbation sizes.2 ‘Large’ perturbation sizes versus constant exercise decision method with ‘small’ perturbation sizes The performance of joint deltavega hedging is compared as dependent on the method used to calculate risk sensitivities. 6. Therefore we omit.5.000 50.000 200. sizes pert.000 150. Here. Vega hedging additional to delta hedging signiﬁcantly further decreases variance of P&L. EMPIRICAL COMPARISON RESULTS 250. ex. Boxwhisker plots for the change in value of the deltavega hedged portfolios.000 150. hedged) 'Large' hedged) Const. 'small' pert.
We draw the following conclusions from the boxwhisker plots in Figure 6. ‘larger’ perturbation sizes cause more variance in the ﬁnite diﬀerence estimate of a sensitivity.152 CHAPTER 6. The use of the constant exercise decision method enables proper functioning of market models as risk management tools. In a displaced . we use the constant exercise decision method with ‘small’ perturbation sizes. further study of ordinary LS with ‘large’ perturbation sizes. Boxwhisker plots for the change in value of the deltavega hedged portfolios are displayed in Figure 6. for which we approximately obtain similar results as with the Markovfunctional model. Second. These two effects lead to more Monte Carlo caused randomness in the contents of the hedge portfolio. as can be seen in Figure 6.3 Deltavega hedge results The performance of joint deltavega hedging is compared across models and mean reversion or correlation speciﬁcations. is the constant exercise decision method. which renders ﬁnite difference estimates of sensitivities to be less eﬃcient. we provide details on how the Markovfunctional and swap market models can be ﬁtted to smile and investigate the impact of smile relative to the impact of correlation cq. which ultimately leads to increased variance of P&L. 3. The theoretical explanation of this outperformance is related to two issues. The best performing Greek calculation method. 6.5. We omit. The hedge performance for all three models is very similar.4. 1. It is clear that the constant exercise decision method with ‘small’ perturbation sizes by far outperforms ordinary LS with ‘large’ perturbation sizes.6 The impact of smile In this section. First. in the remainder of the chapter. For the market models. mean reversion on the prices of Bermudan swaptions. for deltavega hedging. the displaced diﬀusion smile dynamics of Rubinstein (1983) are considered.5. The impact of mean reversion or correlation parameter speciﬁcation on hedge performance is not very large. COMPARISON OF SINGLE AND MULTI FACTOR MODELS 2. the classical LS algorithm causes a discontinuity in the numeraire relative payoﬀ along the path. since the correlation between the payoﬀ in the original and perturbed models becomes smaller. As a concrete example.5. 6. for callable products on underlying assets that are continuously dependent on initial market data. 2.
Here αi denotes the day count fraction for period [ti . with ‘small’ perturbation sizes.000 20. the forward swap rate is modelled as si:n+1 (t) = si:n+1 (t) − ri .000 10. (6.6.5: Deltavega hedge results. we use the constant exercise decision method. The percentages denote the mean reversion level (MF) or correlation parametrization parameter (LMM and SMM).000 0 10. ˜ d˜i:n+1 (t) s = σi:n+1 dw(i:n+1) (t).000 40.000 20. pi:n+1 = k=i αi bi+1 (t). si:n+1 (t) ˜ (6.10) where φ(·) denotes the cumulative normal distribution function and where pi:n+1 denotes n the present value of a basis point. Boxwhisker plots for the change in value (in USD) of the hedged portfolio. The Markovfunctional model is ﬁtted to volatility by ﬁtting the digital swaptions.9) The displaced diﬀusion extension is ﬁrst discussed for the Markovfunctional model and second for the swap market model. The value v (i) of the digital swaption on swap rate si:n+1 (ti ) with strike k is given by the familiar formula in the Black world v (i) = pi:n+1 (0)φ d2 (i) .000 1st qrt MF 0% (delta & min vega median hedged) max 3rd qrt 153 LMM MF 5% MF 10% SMM 0% SMM 5% SMM LMM 0% LMM 5% (delta & (delta & 10% 10% (delta & (delta & (delta & (delta & (delta & vega vega vega vega vega vega (delta & vega hedged) hedged) hedged) hedged) hedged) hedged) vega hedged) hedged) Figure 6. For market models. diﬀusion setting.6. THE IMPACT OF SMILE 30.000 30.8) is si:n+1 (t) = −ri + si:n+1 (0) + ri exp 1 2 σi:n+1 w(i:n+1) (t) − σi:n+1 t 2 . ti+1 ] and bi (t) denotes the timet value of a discount bond . d2 = (i) 2 log k/si:n+1 (0) − 1 σi:n+1 ti √ 2 σi:n+1 ti (6. The solution to stochastic diﬀerential equation (SDE) (6.8) with ri the displacement parameter and w(i:n+1) a Brownian motion under the forward swap measure associated with si:n+1 .
In a displaced diﬀusion setting. (6. Next. . First. ·). .7.6. using the equation bn+1 (tn ) = 1 . 1 + αn sn:n+1 (tn ) (6.9) into (6.13) this is exactly the penultimate equation on page 399 of Hunt et al. x(ti )) = −ri + si:n+1 (0) + ri exp √ 1 2 j (i) (x(ti )) − σi:n+1 ti − σi:n+1 ti φ−1 2 pi:n+1 (0) . d2 = ˜ log si:n+1 (0)+ri k+ri σi:n+1 ti √ 2 − 1 σi:n+1 ti 2 . we invert (6. COMPARISON OF SINGLE AND MULTI FACTOR MODELS for payment of one unit of currency at time ti . with j (i) (x) denoting the value of a digital swaption with strike x in the model. the displaced diﬀusion swap market model is made reference to.10) and obtain si:n+1 (ti . (2000).154 CHAPTER 6. Equation (6). i = 1. . page 320).13) becomes ˜n+1 (tn . As can be seen from the table. x(ti )) = si:n+1 (0) exp √ 1 2 j (i) (x(ti )) − σi:n+1 ti − σi:n+1 ti φ−1 2 pi:n+1 (0) . The dynamics of the forward swap rates under the terminal measure in general smile models can be found in Jamshidian (1997.11) The implementation of a nonsmile Markovfunctional model has to be changed only in two places to incorporate displaced diﬀusion smile dynamics. The ﬁtted volatility and ﬁt errors are displayed in Table 6. .12) and then (6. x(tn )) = b 1 2 1 + αn − ri + sn:n+1 (0) + ri exp − 1 σn:n+1 tn + 2 σn:n+1 x(tn ) e2atn −1 . the functional form of the terminal discount bond bn+1 at time tn is determined.12) In a nonsmile Markovfunctional model. n−1 are determined. . by inverting the value of the digital swaption against strike. x(tn )) = 1 2 1 + αn sn:n+1 (0) exp − 1 σn:n+1 tn + 2 σn:n+1 x(tn ) e2atn −1 . Second. we substitute (6. In the displaced diﬀusion world. the value v (i) of the digital swaption is given by a displaced forward swap rate and strike ˜ ˜(i) ˜(i) v (i) = pi:n+1 (0)φ d2 . We ﬁt the displaced diﬀusion model to the market data of Table 6. .3 and ﬁnd the volatility parameters σi:n+1 and displacement parameters ri as listed in Table 6. In a displaced diﬀusion setting. we invert (6. In a nonsmile Markovfunctional model. calculated by induction from i = n − 1. (6. we then have bn+1 (tn . . 1. the functional forms of the swap rates si:n+1 (ti .11) to obtain si:n+1 (ti . . .
in oﬀset in basis points from the ATM forward swap rate Exp.23 30.87 0.58 4.63 0 33.6: Displaced diﬀusion parameters ﬁtted to the USD market data of 21 February 2003 of Table 6.32 32.35 200 32.54 27. All displayed swaptions coterminate 6 years from today.54 4.45 50 1.37 22.96 9. THE IMPACT OF SMILE 155 Table 6.98 0.’ denotes Expiry.66 28.83 26.82 34.23 24.67 0.71% 2 3 4 5 2Y 3Y 4Y 5Y 4Y 3Y 2Y 1Y 21.33% 2.14 50 32.07 28.58 27.06 50 0.31 8.3.60 0.99 1.98 25.46 23.50 1. in oﬀset in basis points from the ATM forward swap rate Exp.01 0.41 0.02 0. 1Y 2Y 3Y 4Y 5Y 300 37.34 29.45 0.21 50 33.74 100 33.21 0.10 300 0.49 8.82 23.72 100 32.92 25.09 100 0.03 27.46 2.90 0.6.45 25.29% 0.34 0.57 24.74 26.89 28. i Expiry Tenor σi:n+1 ri 1 1Y 5Y 28.30 27.76% 18.6.28 1.55% 2.23 Absolute ﬁt errors.33 0.39% Table 6.06 0.11 26. in percentages.7: Fitted swaption volatility and ﬁt errors with the displaced diﬀusion model.15 0.18 100 3.21 24.73 300 32.33 8.85 22.89% 3.16 23.11 31.44 0. Fitted swaption volatility Strike.00 200 10. 1Y 2Y 3Y 4Y 5Y 300 20.71 1.28% 16.00 0.57 29.51 25.43 26.88 30.63 23.14 0.14 200 0. model volatility minus market volatility Strike.29 7.02 0.48 29.13 0.02 26.82 0.73 0 0.15 0.00 0.62% 1.05 5.15 29.67 1. Here ‘Exp.08% 14.20 .72 24.17 200 35.29 25. against strike and expiry for the USD data of 21 February 2003.
Table 11). European swaptions are valued in (i) a constant volatility model with the volatility associated with the expiry and strike of the swaption and (ii) the smile model. and 1% to 6%. The benchmark is of high quality. for perstrike volatilities. for which the model underﬁts the market up to 21%. Appendix) (see also Appendix 2. regardless of the strike of the Bermudan swaption. We note here that the disability of obtaining a perfect ﬁt to the smile volatility data is due solely to the displaced diffusion model. the impact of smile can thus be even higher. and in their displaced diﬀusion counterparts. Bermudan swaptions are priced with varying strikes and otherwise speciﬁed in Table 6. Subsequently. though there are some slight diﬀerences due to numerical errors in the grid calculation. Holmes. since a 10% change in mean reversion can cause a change in value equal to a parallel volatility shift of 1%. The impact of correlation is comparable to that reported by Choy. with the relativeentropy minimization framework of Avellaneda. An exact ﬁt to the swaption smile surface