You are on page 1of 32

mathematics

Article
Between Scylla and Charybdis: The Bermudan Swaptions
Pricing Odyssey
Dariusz Gatarek 1 *,† and Juliusz Jabłecki 2,†

1 Polish Academy of Sciences, Systems Research Institute, 01-447 Warszawa, Poland


2 Quantitative Finance Division, Faculty of Economic Sciences, University of Warsaw and National Bank of
Poland, 00-919 Warszawa, Poland; juliusz.jablecki@nbp.pl
* Correspondence: gatarek@ibspan.waw.pl
† The authors would like to thank two anonymous referees for insightful comments.

Abstract: Bermudan swaptions are options on interest rate swaps which can be exercised on one
or more dates before the final maturity of the swap. Because the exercise boundary between the
continuation area and stopping area is inherently complex and multi-dimensional for interest rate
products, there is an inherent “tug of war” between the pursuit of calibration and pricing precision,
tractability, and implementation efficiency. After reviewing the main ideas and implementation
techniques underlying both single- and multi-factor models, we offer our own approach based on
dimension reduction via Markovian projection. Specifically, on the theoretical side, we provide a
reinterpretation and extension of the classic result due to Gyöngy which covers non-probabilistic,
discounted, distributions relevant in option pricing. Thus, we show that for purposes of swaption
pricing, a potentially complex and multidimensional process for the underlying swap rate can be
collapsed to a one-dimensional one. The empirical contribution of the paper consists in demonstrating
that even though we only match the marginal distributions of the two processes, Bermudan swaptions
prices calculated using such an approach appear well-behaved and closely aligned to counterparts
from more sophisticated models.


 Keywords: Bermudan swaptions; term structure models; interest rate modeling

Citation: Gatarek, D.; Jabłecki, J.


MSC: 91B24; 91B28
Between Scylla and Charybdis:
The Bermudan Swaptions Pricing
JEL Classification: G12; G13
Odyssey. Mathematics 2021, 9, 112.
https://doi.org/10.3390/math9020112

Received: 5 December 2020


Accepted: 29 December 2020
1. Outline
Published: 6 January 2021 Interest rate contracts constitute the largest chunk of the global over-the-counter
derivatives market, with an estimated total notional outstanding of just over $480 trillion,
Publisher’s Note: MDPI stays neu- of which roughly $350 trillion is traded in swaps and almost $50 trillion is traded in interest
tral with regard to jurisdictional clai- rate options. The total market value of interest rate swaps and options tallied together
ms in published maps and institutio- exceeds $8 trillion—or about 10% of world GDP.
nal affiliations. Anticipating a more formal discussion, an interest rate swap is simply a contract in
which two parties exchange two flows of payments (“legs”)—one calculated on the basis of
a fixed, pre-determined rate (coupon), and one calculated on the basis of a floating interest
Copyright: © 2021 by the authors. Li-
rate (typically the interbank market benchmark, often referred to as- London Interbank
censee MDPI, Basel, Switzerland.
Offered Rate (LIBOR). Swap options, or swaptions, give the owner the right—but not
This article is an open access article
the obligation—to enter into an interest rate swap with a given final maturity and a pre-
distributed under the terms and con- determined strike rate at a certain, specific time in the future called option expiration
ditions of the Creative Commons At- and corresponding to the fixing date of the swap. Swaptions come in many shapes and
tribution (CC BY) license (https:// forms, some more complex (“exotic”) than others. A prototypical example of an exotic
creativecommons.org/licenses/by/ interest rate derivative is a Bermudan swaption. Somewhat, the name has nothing to do
4.0/). with geography, but rather with the option exercise profile, as Bermudans, unlike their

Mathematics 2021, 9, 112. https://doi.org/10.3390/math9020112 https://www.mdpi.com/journal/mathematics


Mathematics 2021, 9, 112 2 of 32

simpler—so-called European—counterparts, can be exercised not just on a single date,


but on any one of a given set of dates prior to the final maturity of the swap (an option that
can be exercised at any time is called American).
While we do not know exactly how much of the total market value of interest rate
derivatives is made up by Bermudan swaptions, such a breakdown would not do full
justice to their importance anyway, as the numbers cited do not account for options and
option-like features of Bermudan nature built into simpler fixed income instruments,
like callable bonds and mortgage backed securities. Consider that roughly 60% of bonds
included in the Bloomberg Barclays Global Aggregate Credit Index—a widely followed
benchmark for the corporate bond universe—are issued with call provisions giving issuers
the right to repurchase them at par at certain dates in the future. The call provision may
be added to increase the coupon, and hence the perceived attractiveness of the issue,
but it rarely stays with the issuer, who will instead typically seek to strip the call option
and sell it—e.g., in the form of a cancelable swap—to an exotic dealer desk to lower
the effective cost of the issuance. For example, Goldman Sachs analysts estimate that
Bermudans embedded in 70% of the so-called Formosa bonds (bonds that are denominated
in a foreign currency, but registered and listed on the Taipei Exchange) acquire a new life
being passed on to exotic dealer desks who tend to actively manage this risk by selling
European swaptions with risks most closely mimicking that of the Bermudan (effectively
providing “Vega supply” to the market) (“Formosa issuance and USD rate vol supply”,
Global Rates Insight, Goldman Sachs, 13 February 2020). Other notable natural buyers of
such stripped Bermudan swaptions are mortgage originators or issuers of mortgage-backed
securities (such as, e.g., Fannie Mae and Freddie Mac in the US), who seek to offset the risk
of prepayment options inherent in typical fixed rate mortgage products.
What is special about Bermudan swaptions, and what explains their important role
in the financial system, is the complex risk profile stemming from the multiple exercise
feature. It turns out that accounting properly for this exotic feature poses a considerable
challenge and has been a major driver of the (r)evolution in interest rate modeling—a story
that we intend to tell below.
From a mathematical standpoint, the early exercise property gives rise to a free
boundary problem not unlike the so-called Stefan problem encountered when analyzing
the temperature of water bordering a melting block of ice: if there are N exercise dates
T1 , T2 , ..., TN , then at each Ti , i ∈ {1, 2, ..., N }, the Bermudan holder must decide whether to
retain the option or exercise by entering into the i-th co-terminal swap. Naturally, for each
Ti there will be some critical value of the underlying instrument below which it no longer
makes sense to exercise. This reduces the problem of valuing the Bermudan to determining
the optimal exercise boundary that separates the “region” where exercise is optimal from
the “region” where it is optimal to continue holding the swaption—with obvious analogies
to Stefan problems for the location of the interface between water and melting ice (see,
e.g., [1]). Such parallels have been known for decades and fruitfully exploited, at least
in the case of early exercise boundaries for American-style equity options. And while an
early exercise boundary as such cannot be solved for in closed form, it can be represented
using an integral equation to be solved iteratively [2] and its upper and lower bounds
can be rigorously determined. Barles, Burdeau, Romano, and Samsoen [3] even show that
near expiration these upper and lower bounds approach each other, which leads them to
postulate an approximation for the value of the option.
Unfortunately, those results do not carry through to interest rate options in a straight-
forward way. For one, the partial differential equation governing the evolution of the stock
option price (the Black–Scholes Partial Differential Equation(PDE)) has been derived under
the assumption that the underlying instrument, a share of common stock, is a tradable
asset. After all, as shown by Black and Scholes [4], the self-financing dynamic trading
strategy replicating the final payoff of the option is implemented by taking positions in,
i.e., buying and selling, units of the underlying instrument. That underlying instrument
may be—as initially considered—a stock or a commodity futures contract, but not an
Mathematics 2021, 9, 112 3 of 32

interest rate that is not tradable—one cannot “buy” or “sell” a yield on, say, a 10-year bond,
a swap rate or the LIBOR. As it turns out, this obstacle can be overcome by a judicious
application of the change-of-measure technique ([5]; more on this below). But even if one
can come up with an equivalent probability measure under which a given underlying
interest rate is a martingale, and thus price the option by simply taking expectation of
the terminal payoff, a more profound problem arises: how to ensure consistency between
options on different interest rate underlyings? The crucial difference between interest
rates and, say, stocks—other than tradability—is that individual interest rates are simply
pieces of a greater whole called the term structure, bound together by no-arbitrage relations
and complex correlation patterns. This becomes immediately clear when we consider a
swaption where the underlying is a (stochastic) average of a number of forward LIBOR
rates. As explained above, a Bermudan can even be viewed as a “best of” chooser option
to optimally select and enter into one of potentially many co-terminal swaps spanned by
the contract. As such, it is by definition an instrument driven by complex dependence
patterns among interest rate underlyings—a phenomenon we shall have more to say on
later. All this implies that handling interest rate derivatives, and Bermudans especially,
requires a consistent model of the term structure. And since different approaches have been
proposed to the modeling of interest rates, so too there have been—and still are—different
ideas about pricing and risk-managing Bermudan swaptions.
Although it is impossible within the confines of this essay to do justice to the large and
still growing literature in the area of interest rate modeling ([6,7] and vol. II are very good
general references), we may, at least at this stage, identify two broad schools of thought:
low-factor short-rate models and multi-factor market models. The first camp, drawing
inspiration from Vasicek [8], starts from a fundamental assumption that the dynamics of
the whole yield curve is driven by the “instantaneous spot rate”, or short rate, r (t), at which
interest accrues continuously in a locally risk-free way. This definition leads to the so-called
money market account, B(t), which evolves according to:

dB(t) = r (t) B(t)dt, B(0) = 1, (1)

and induces the risk-neutral measure Q making all discounted asset prices martingales. Al-
though the short rate is a purely artificial mathematical construct (all be it with an intuitive
interpretation), it determines prices of all discount bond that are simply Q-expectations of
RT
the future path of r (t), t r (s)ds. And since all interest rates are derived from bond prices,
it follows that the entire discount curve can in principle be parameterized by stipulating
a suitable stochastic process for r (t)—a point supported by a large body of empirical
evidence suggesting that the first principal component typically accounts for about 90%
of the variation in the entire term structure. With two factors—two state variables feed-
ing through to the evolution of r (t)—one would expect to cover well over 95% of yield
curve variation. Apart from parsimony and consistency, short rate models have another
considerable virtue that came to explain their initial success as tools-of-choice for the
valuation of path-dependent interest rate instruments such as Bermudan swaptions: they
are relatively easy to implement numerically in lattices, i.e., finite differences or trees, which
can efficiently handle the early exercise boundary through backward induction.
However, these virtues do not come without vices to match. With only one or two
factors, it might be difficult to generate all practically relevant yield curve shapes (two
Principal Components Analysis PCAs would still leave about 5% of variation unexplained)
and match a large set of market prices—especially with intuitive and economically jus-
tifiable parameter values (using a large set of calibration instruments is necessary if one
wishes to consistently apply a single calibrated model to many kinds of exotic interest rate
derivatives). More importantly, a low number of factors (one or two) may limit the degree
of de-correlation achievable among interest rates, artificially lowering prices of Bermudan
swaptions (when correlation among rates is high, the right to choose among different
swaptions is lower than in the case when interest rates are decorrelated and subsequent
payoffs are unpredictable). Longstaff, Santa-Clara, and Schwartz [9] even thought that the
Mathematics 2021, 9, 112 4 of 32

errors and pitfalls of using single-factor models amounted to “throwing away a billion
dollars” and in their influential paper argued for a move towards a multi-factor setting.
In fact, the stage in that respect had already been set in the form of so-called LIBOR/swap
market models [10–12] and their later extensions. Models in this class are generally ex-
pressed in terms of discretely compounded market rates (hence the term “market models”),
each of which can be shown to be a martingale under an appropriate forward measure.
This perspective naturally allows for multiple sources of randomness (often there are more
than 50 state variables), with almost arbitrarily complex dependence structure, and offers
many degrees of freedom to fit observed market patterns, even including the so-called
implied volatility smiles (e.g., [13,14] or [15]; cf. also [16–22]). Whether the number of fac-
tors, in and of itself, is enough to improve the pricing of Bermudans has been persuasively
contested [23] and we shall review the main arguments presented in this debate. It is clear,
however, that while increasing dimensionality improves calibration, it does so at the cost of
efficiency and numerical accuracy: with a high number of state variables, lattice techniques
are no longer viable, and even if viable (cf. e.g., [24])—not really practicable. One therefore
is left with Monte Carlo simulations (with some variance reduction technique to reduce
bias). This presents an obvious problem: by evolving the underlying state variables forward
in time, the simulation cannot easily determine the optimal time to exercise. And while
efficient techniques of circumventing that problem have since been proposed (cf. especially
the classic approach of [25,26] for a very good general reference), the exercise region and
corresponding Bermudan swaption prices will necessarily be estimated—not exact.
The two approaches sketched broadly above, the single- and multi-factor ones, are
like Scylla and Charybdis, the mythical sea monsters, lurking for the Bermudan swap-
tion modeler: pass too close to Scylla, and your model shall crash against the rocks of
computational challenges and estimation biases of the multidimensional setting; move
towards Charybdis and risk getting sucked by a whirlpool of artificial parameter values
and unachievable calibration targets. And yet, unlike in the classical Greek tragedy, we
believe there is safe passage between the two hazards, and one that does not involve losing
six strongest and bravest companions—a cost tragically borne by Odysseus. Thus, in what
follows we try to tell the story—the odyssey—behind the evolution of Bermudan swap-
tion pricing in greater detail, reviewing in a much more formal way the main ideas and
solutions underlying both single- and multi-factor models. We discuss their relative merits
and drawbacks, presenting especially the crux of the consensus-shattering debate between
Longstaff, Santa-Clara, and Schwartz [9] and Andersen and Andreasen [23]. Finally, build-
ing on Gatarek and Jabłecki [27], we show our dimension reduction approach based on the
concept of Markovian projection pioneered by Gyöngy [28] and independently adapted to
the financial landscape by Dupire [29] under the guise of “local volatility”. Our idea con-
sists essentially in showing that for purposes of pricing Bermudan swaptions, a potentially
complex and multidimensional process for the swap rate can—using the tools of Markovian
projection—be collapsed to a one-dimensional one. Naturally, the diffusions of the two
processes will differ, but it turns out the early exercise boundaries will approximately agree,
with pricing errors within normal bid–ask spreads.
The rest of the essay proceeds as follows. Section 2 presents notation and formally
introduces the relevant interest rate instruments. Section 3 presents the evolution of term
structure modeling, from single- to multi-factor approaches, zooming in on implemen-
tation and calibration aspects. Section 4 then discusses the specific challenges related to
applying the models in the context of pricing Bermudan swaptions, and revisits the issue
of their factor dependence with some numerical tests. Finally, Section 5 presents a novel
argument in the factor-sensitivity debate, demonstrating an elegant extension of the classic
result due to Gyöngy [28] on Markovian projection and dimension reduction. Section 6
draws conclusions.
Mathematics 2021, 9, 112 5 of 32

2. Notation, Modeling Setting and Instruments


Consider a continuous-time economy together with a probability space (Ω, F , P) with
a frictionless and arbitrage-free market for zero coupon bonds, i.e., contracts that guarantee
holders the payment of one unit of currency at maturity, with no intermediate payments.
Let P(t, T ) be the time t price of a zero coupon bond maturing at time T such that P(t, t) = 1
for every t. We assume that such a P(t, T ) exists for every 0 < t < T < ∞ and for a given t,
P(t, T ) is differentiable with respect to maturity time T. The instantaneous forward rate
f (t, T ) with maturity T contracted at t is defined by:
 Z T 
∂ ln P(t, T )
f (t, T ) ≡ − ⇐⇒ P(t, T ) = exp − f (t, s)ds . (2)
∂T t

The instantaneous spot rate r (t)—i.e., the short rate—has already been loosely defined
above and can now be formalized as:

r (t) ≡ f (t, t). (3)

As discussed, r (t) can be thought of as capturing the locally


nR risk-free
o return from a
t
continuously compounded money market account B(t) ≡ exp 0 r (s)ds with dynamics
dB(t) = r (t) B(t)dt, B(0) = 1. The significance of the money market account in modern
finance stems from the fact that it establishes a connection between the economic concept
of absence of arbitrage and the mathematical property of existence of the equivalent
martingale measure (risk-neutral measure). Specifically, as shown by Harrison and Kreps
(1979) and Harrison and Pliska (1981, 1983), postulating that our market is arbitrage-free is
equivalent to stating that there exists a measure Q on (Ω, F ), equivalent to P, such that all
asset prices discounted by B(t) are Q-martingales. In other words, in our economy, for any
T we have:
      Z T 
P(0, T ) P( T, T ) 1
= P(0, T ) = EQ = EQ = EQ exp − r (s)ds , (4)
B (0) B( T ) B( T ) 0

so that time-zero bond (asset) prices are Q-expectations of their terminal values (payoffs).
We will explore (4) in Section 3 in greater detail as it gives rise to an important class of
term structure models, but for now let us simply observe that when we allow for stochastic
interest rates—as we naturally should when dealing with interest rate derivatives—the
martingale measure Q may not be the most convenient or natural to work with. Thus,
Geman et al. (1995) prove that for any positive non-dividend-paying asset N (so-called
numeraire) there exists a measure Q N , which is equivalent to the original martingale mea-
sure Q, making the prices of all claims normalized by N Q N -martingales. Furthermore,
the Radon–Nikodym derivative defining Q N is given simply by B( T ) N ( T )/N (0).
We now move on to introducing other instruments. For this we define a uniformly
spaced tenor structure:
0 = T0 < T1 < ... < TN , (5)
and set δn = Tn − Tn−1 for n = 1, ..., N. A spot LIBOR rate over the interval ( Tn−1 , Tn ) is
given by the formula:
1 − P( Tn−1 , Tn )
L( Tn−1 , Tn ) = (6)
δn P( Tn−1 , Tn )
whereas its forward counterpart, forward LIBOR, is given by:

P(t, Tn−1 ) − P(t, Tn )


Fn (t) = F (t, Tn−1 , Tn ) = . (7)
δn P(t, Tn )
A fixed-for-floating interest rate swap (IRS) with unit notional, fixed rate (coupon) K,
β
and a specified tenor structure T = { Tn }n=α+1 is a contract whereby two parties exchange
differently indexed cashflows over a pre-agreed time span. Specifically, on each date
Mathematics 2021, 9, 112 6 of 32

Tn ∈ T , the fixed leg pays δn K, whereas the floating leg pays the floating LIBOR rate given
by (6). When the fixed leg is paid, the IRS is called a “payer”, and conversely the swap is
called a “receiver.”
An interest rate cap/floor is a payer (receiver) IRS in which each payment is executed
only if it has positive value. It has become market practice to quote prices of caps in terms
of Black’s implied volatilities, i.e., parameters σα,β , which plugged into the formula:
   F (0)     F (0)  
β ln nK + 12 σα,β
2 T
n −1 ln nK − 21 σα,β
2 T
n −1
∑ P(0, Tn )δn  Fn (0)Φ √
σα,β Tn−1
 − KΦ √
σα,β Tn−1
 (8)
i = α +1

give the market price. The Black model itself is inconsistent with market quotes, which
tend to exhibit a pronounced implied volatility smile/skew.
The forward swap rate Sα,β (t) corresponding to the tenor structure T is the rate in
the fixed leg that sets it equal to the floating leg and hence makes the net present value of
the transaction equal zero:

P(t, Tα ) − P(t, Tβ )
Sα,β (t) ≡ β
. (9)
∑n=α+1 P(t, Tn )δn

The portfolio of zero-coupon bonds in the denominator of Equation (9) is the so-called
annuity factor, for which we will sometimes use a continuous-time definition, writing
R Tβ
Nα,β (t) ≡ Tα P(t, u)du along with:
R Tβ
P(t, Tα ) − P(t, Tβ ) Tα P(t, s) f (t, s)ds
Sα,β (t) = = . (10)
Nα,β (t) Nα,β (t)

Note, in passing, that since Nα,β (t) is positive and P(t, Tα ) − P(t, Tβ ) is a tradable
asset (a portfolio of “long” the Tα bond and “short” the Tβ one), it follows that Sα,β (t) is a
martingale under Qα,β induced by Nα,β (t).
A European payer (receiver) swaption with strike K, maturity Tα and tenor Tβ − Tα
(henceforth referred to also as Tα × ( Tβ − Tα ), or Tα -into-( Tβ − Tα )) is simply an option
that gives the holder the right to enter at Tα into a payer (receiver) swap that matures at
Tβ and entitles to pay (receive) a fixed rate K in exchange for a floating LIBOR rate on the
tenor dates T . Thus, the time zero price of a European payer swaption with unit notional
is given by:

Nα,β ( Tα )
  h
+ α,β + i
EQ
Sα,β ( Tα ) − K = Nα,β (0)EQ Sα,β ( Tα ) − K (11)
B( Tα )

Analogously, as with caps/floors, market practice is to quote swaption prices in terms


of Black (or more recently—Bachelier) implied volatilities; i.e., parameters σα,β , which
plugged into the Black formula retrieve the market price, e.g., for a payer swaption:
      
Sα,β (0) Sα,β (0)
 
1 2 1 2
β ln K + σ
2 α,β Tα ln K − σ
2 α,β Tα
∑ P(0, Tn )δn Sα,β (0)Φ √  − KΦ √  . (12)
    
i = α +1 σ α,β Tα σ α,β Tα

Again, from a theoretical point of view, applying the Black model to price swaptions
would be inconsistent as the swaptions market exhibits a pronounced implied volatility smile.
Finally, a Bermudan receiver (payer) swaption is an option to enter at any time Ti ,
i ∈ {α, α + 1, ..., β − 1} into a swap that terminates at Tβ and gives the holder the right to
receive (pay) a pre-determined fixed rate K in exchange for floating LIBOR. The period
up to Tα is called the lockout or no-call period, and hence a Bermudan swaption with
final exercise date Tβ−1 and first exercise Tα is often called “Tβ no-call Tα ”, or “Tβ ncTα .”
For example, a 11nc1 swaption with annually spaced exercise dates can be exercised at
Mathematics 2021, 9, 112 7 of 32

the beginning of any year, starting from year 1. By exercising the option, the holder enters
a swap starting at the time of exercise (i.e., years 1, 2, 3,..., 10) and ending at year 11.
Accordingly, the price of the “Tβ no-call Tα ” Bermudan receiver is given by:

Nτ,β (τ )
 
+
RBSα,β (t, K ) = sup E Q
K − Sτ,β (τ ) . (13)
τ ∈T B(τ )

3. The Modeling Panorama


This section presents an overview of the main approaches to modeling the dynamics
interest rates and their term structure. We focus on the distinction between (low-factor)
short-rate models and the (multi-factor) market models before presenting the main argu-
ments in the debate as to which school of thought is more appropriate for dealing with
contracts with early exercise features. Our treatment is purposefully general and selective,
so we refer readers to the excellent magnum opuses of Brigo and Mercurio [7] or Andersen
and Piterbarg [6] for a more detailed account.

3.1. One-Factor Short-Rate Models


As we have seen in (4), in an arbitrage-free market bond prices are given as risk-
neutral expectations of a functional of the process of the short rate. And since all kinds of
interest rate instruments are themselves defined in terms of bond prices, the whole term
structure, or zero-coupon curve, can, in principle, be characterized by the distributional
properties of a single state variable—the short rate.

3.1.1. Vasicek
In a seminal contribution, Vasicek [8] leveraged the latter idea by postulating that
the dynamics of the short rate (in the real-world measure) is governed by a stochastic
differential equation made up of a deterministic mean-reverting component and a stochastic
part with a constant diffusion coefficient (later on a variant of the same model with a
diffusion coefficient proportional to the short rate itself was proposed by Cox et al., 1985).
Formally, Vasicek [8] assumed that r follows an Ornstein–Uhlenbeck process of the form
(for practical purposes we present the model in the risk-neutral setting, equivalent to the
original formulation provided a specific form of the market price of the risk parameter
is chosen):
dr (t) = k [θ − r (t)]dt + σdW (t), r (0) = r0 (14)
where k, θ and σ are positive constants representing, respectively, the mean reversion speed,
the long-term average rate and volatility. The above equation can be integrated to express
r (t) as a function of its value at any chosen prior time point s:
  Z t
r ( t ) = r ( s ) e−k(t−s) + θ 1 − e−k(t−s) + σ e−k(t−u) dW (u). (15)
s

 Thus, r (t
) is normally distributed
 with a conditional mean given by r (s)e−k(t−s) +
2
θ 1 − e−k(t−s) and variance σ2k 1 − e−2k(t−s) . Initially, the positive probability of r < 0
drew some criticism of the model but this property can hardly be held against it in current
conditions. And while negative values may appear, they are unlikely to persist, as the
instantaneous drift term k[θ − r (t)] ensures that the process is pulled towards its long-term
(positive) mean θ at a reversion speed k. Another pleasant feature of the model is that the
expectation (4) can be calculated explicitly yielding bond prices of the form:

P(t, T ) = A(t, T )e− B(t,T )r(t) , (16)


Mathematics 2021, 9, 112 8 of 32

where:

σ2 σ2
  
2
A(t, T ) = exp θ − 2 [ B(t, T ) − T + t] − B(t, T )
2k 4k
1h i
B(t, T ) = 1 − e−k( T −t) .
k
Hence, the whole zero-coupon curve is given by a function T 7→ P(t, T; k, θ, σ, r (t)).
To use the model for valuation purposes, we need to parameterize this function to make
sure that the model incorporates a given observed zero-coupon bond curve T 7→ P M (0, T ).
This typically consist in running an optimization to find {k, θ, σ } such that T 7→ P(t, T; k, θ, σ,
r (t)) is sufficiently close to the market. More generally, consistency with the market is
not an inherent feature of the model, but rather needs to be ensured by a judicious choice
of parameters. Unfortunately, this is sometimes rather difficult to do, and even outright
impossible for some yield curve shapes (e.g., a so-called inverted yield curve typically
considered to be a leading indicator of recessions).

3.1.2. Hull and White


The inability of the model (15) to reproduce some typical yield curve shapes presented
a major obstacle: if it is impossible to fit the current term structure of interest rates, how can
the model be relied upon to price consistently more complex products? To remedy this
problem Hull and White [30] have suggested a simple yet profound extension of Vasicek [8]
assuming the following dynamics for the short rate:

dr (t) = [θ (t) − kr (t)]dt + σdW (t), (17)

with the mean reversion now depending on time (the most general version of the Gaus-
sian single-factor model allows all parameters to be functions of time; see [6], vol. II).
Integrating (17) now gives:
Z t Z t
r ( t ) = r ( s ) e−k(t−s) + e−k(t−u) θ (u)du + σ e−k(t−u) dW (u) (18)
s s

which, plugged into (4), implies after straightforward, if tedious, algebra that ensuring
consistency with the spot curve requires:

∂ f (0, t) σ2  
θ (t) = + k f (0, t) + 1 − e−2kt . (19)
∂T 2k
The last term in the above is an adjustment required to preclude arbitrage [31], and
∂ f (0,t)
it tends to be very small, so that the drift of the short rate is approximately ∂T +
k[ f (0, t) − r (t)]—and hence r follows the slope of the time zero instantaneous forward
curve. Moreover, much like in the Vasicek model, we can write down bond reconstitution
formulas explicitly:
P(t, T ) = A(t, T )e− B(t,T )r(t) , (20)
where this time:

σ2 
 
P(0, T ) −2kt

2
A(t, T ) = exp B(t, T ) f (0, t) − 1−e B(t, T )
P(0, t) 4k
1h i
B(t, T ) = 1 − e−k( T −t) .
k

3.1.3. Black and Karasinski (1991)


As already hinted above, the possibility to generate negative interest rates has his-
torically been viewed as an important drawback of any interest rate model. And while
the Hull–White model improved upon the original Vasicek version in several important
Mathematics 2021, 9, 112 9 of 32

respects, it still implies a normal distribution for the short rate, and with it a non-zero
probability of negative rates. Accordingly, various attempts have been suggested to over-
come that feature, perhaps most notably by Black and Karasinski [32], who postulated the
following general process for the logarithm of r (t) under the risk-neutral measure:

d ln(r (t)) = [θ (t) − k(t) ln(r (t))]dt + σ (t)dW (t), r (0) = r0 , (21)

whereby of course typically one would set σ (t) = σ and k (t) = k, i.e., positive constants.
A straightforward application of Ito’s lemma shows that:

σ2
 
dr (t) = r (t) θ (t) + − k ln r (t) dt + σr (t)dW (t), (22)
2

with the solution, for each s ≤ t,


 Z t Z t 
r (t) = exp ln(r (s))e−k(t−s) + e−k(t−u) θ (u)du + σ e−k(t−u) dW (u) . (23)
s s

Now r (t) is lognormally distributed, precluding the possibility of negative rates.


This certainly made sense in the 1990s and early 2000s given the market practice of quoting
caps and swaptions prices in terms of (lognormal) Black volatilities, and goes a long way
towards explaining the general appeal and early popularity of the model among market
practitioners. However, a clear drawback of the model is that it no longer provides closed
form bond reconstitution formulas, which renders it much less analytically tractable than
Hull and White [30].

3.2. Extensions: Two-Factor Hull–White and Cheyette Models


Although relatively simple, intuitive and analytically tractable, single-factor models
also tend to share some undesirable properties that become apparent especially when one
considers their application to pricing instruments with more involved payoffs. It is perhaps
easiest to appreciate this point by recalling that in the Vasicek [8] model all bond prices can
be computed directly from Equation (16) and—by extension—all zero coupon yields/spot
rates are given as simple affine functions of a single state variable r. This implies in
particular that at any instant the model-implied interest rates of all maturities are perfectly
correlated—hardly an accurate description of economic reality. Exactly how big a problem
that is for the valuation of Bermudan swaptions has been at the very core of the famous
debate in the literature (i.e., [9,23] and others), which has hugely influenced both the theory
and practice of pricing Bermudans, and one to which we shall return in greater detail in
Section 4.2. However, at this point let us simply acknowledge that striving for a more
realistic correlation pattern has driven interest in multifactor models, of which we discuss
below two important examples: the two-factor Hull–White model and the quasi-Gaussian
model of Cheyette [33].

3.2.1. Two-Factor Hull–White (2FHW)


The two-factor Hull–White model (described in [34], see also the classic exposition
in [7]) is described by the following system of stochastic differential equations (without
loss of generality we change the formulation slightly to better present the two additive
factors):

r (t) = x (t) + y(t) + θ (t)


dx (t) = −k1 x (t)dt + σ1 (t)dW1 (t), x (0) = 0
dy(t) = −k2 y(t)dt + σ2 (t)dW2 (t), y(0) = 0

where 0 < k1 < k2 , σ1 (·), σ2 (·) are either constants or deterministic functions of time,
θ (t) is deterministic, and finally—(W1 , W2 ) is a two-dimensional Brownian motion with
Mathematics 2021, 9, 112 10 of 32

instantaneous correlation of the form dW1 (t)dW2 (t) = ρdt with ρ constant and obviously
ρ ∈ [−1, 1]. Simple integration shows that:
Z t
x (t) = e−k1 (t−u) σ1 (u)dW1 (u)
0
Z t
y(y) = e−k2 (t−u) σ2 (u)dW2 (u)
0

which in turn implies that letting t → ∞ both our state variables mean revert to their initial
values of zero. It follows that:
Z t Z t
r (t) = θ (t) + e−k1 (t−u) σ1 (u)dW1 (u) + e−k2 (t−u) σ2 (u)dW2 (u) (24)
0 0

mean reverts to θ (t). Despite the two-dimensional setting, many features carry through
from the single factor case. In particular, we can derive the following bond reconstitution
formula:
( Z )
T 1 − e−k1 ( T −t) 1 − e−k2 ( T −t) 1
P(t, T ) = exp − θ (u)du − x (t) − y(t) + V (t, T ) , (25)
t k1 k2 2

where the convexity correction term V (t, T ) can be calculated exactly to be a deterministic
function of k1 , k2 and ρ (see, e.g., [7] for details). Using (25) we can also determine the
functional form of θ (t), which guarantees matching exactly the time-zero term structure of
discount factors by solving the following:
T
 Z 
1
P(0, T ) = exp − θ (u)du + V (0, T ) . (26)
0 2

3.2.2. Cheyette
Another two-factor (or perhaps more accurately, one-and-a-half-factor) represen-
tation of the term structure is afforded by a family of models considered originally
by Cheyette [33], Ritchken and Sankarasubramanian [35] and later also others (cf. [36–39]).
These models start from the Heath, Jarrow, and Morton [40] specification of instantaneous
forward rates dynamics:

∂σP (t, T ) ∂σ (t, T )


d f (t, T ) = σP (t, T )dt + P dW (t), 0 ≤ t ≤ T < ∞. (27)
∂T ∂T
and assume that the instantaneous volatility function is separable into a time- and rate-
dependent term—i.e., the short rate volatility σ (t)—and a maturity-dependent term, h( T ):

∂σP (t, T ) σ(t)


σ f (t, T ) = = h ( T ). (28)
∂T h(t)

Such volatility of the term structure obviously implies:


Z T
σ(t)
σ f (t, u)du = ( H ( T ) − H (t)), (29)
t h(t)
RT
where H ( T ) = 0 h(u)du. Substituting (28) and (29) into (27) and integrating yields:
"Z  2 #
t σ(s)
Z t
σ(s)
f (t, T ) = f (0, T ) + h( T ) ( H ( T ) − H (s))ds + dW (s) (30)
0 h(s) 0 h(s)
Mathematics 2021, 9, 112 11 of 32

so that:
"Z  2 #
t σ(s)
Z t
σ(s)
r (t) = f (t, t) = f (0, t) + h(t) ( H (t) − H (s))ds + dW (s) . (31)
0 h(s) 0 h(s)

Solving the above for the stochastic integral and plugging into (30) gives:
Z t 2
h( T ) σ(s)
f (t, T ) = f (0, T ) + (r (t) − f (0, t)) + h( T )( H ( T ) − H (t)) ds. (32)
h(t) 0 h(s)

Let x (t) denote the deviation of the short rate from its value implied by the forward
curve at inception:
"Z  #
t σ(s) 2 Z t

σ(s)
x (t) = r (t) − f (0, t) = h(t) ( H (t) − H (s))ds + dW (s) (33)
0 h(s) 0 h(s)

and y(t) denote the cumulative short rate volatility:


Z t 2
2 σ(s)
y(t) = h (t) ds. (34)
0 h(s)

The forward rate can now be written in terms of just these two state variables, x (t)
and y(t), as:
H ( T ) − H (t)
 
h( T )
f (t, T ) = f (0, T ) + x (t) + y(t) , (35)
h(t) h(t)
dH ( T )
which, along with (2) and the fact that dT = h( T ), yields the following reconstitution
formula for the price of any zero coupon bond:
!
H ( T ) − H (t) 1 H ( T ) − H (t) 2
   
P(0, T )
P(t, T ) = P(t, T, x (t), y(t)) = exp − x (t) − y(t) . (36)
P(0, t) h(t) 2 h(t)

As suggested by Cheyette [33] and Ritchken and Sankarasubramanian [35], a nat-


ural choice is to set h(t) = exp(−κt) with κ being some small and constant number.
Then (36) becomes:
 
P(0, T ) 1 −κ ( T − t )
 1  −κ ( T − t )
2
P(t, T ) = exp − 1 − e x (t) − 2 1 − e y(t) . (37)
P(0, t) κ 2κ

Furthermore, the dynamics of the two state variables can be expressed in terms of
their current values through differentiation of (33) and (34) as:

dx (t) = (y(t) + −κx (t))dt + σ(t)dW (t) (38)


 
dy(t) = σ2 (t) − 2κy(t) dt (39)

Thus, with the two state variables ( x (t), y(t)) we obtain a two-dimensional Markovian
model of the term structure. The yield curve model features an upward drift representing
forward curve steepening due to volatility (a “convexity correction”) as well as mean
reversion captured by κ. Note that although y(t) does not have a diffusion term, it is
not deterministic—i.e., not known with certainty at time 0—as it captures information
revealed over the time interval [0, t]. Thus, the Cheyette model can be thought of as having
one-and-a-half factors. So far, and in the original formulations of the model [33,35], no
restrictions were placed on the volatility of the short rate. The Markovian representation is
valid even if σ (t) is stochastic or depends on the entire term structure at time t. The funda-
mental problem in calibration and pricing consists of course in determining the function
Mathematics 2021, 9, 112 12 of 32

σ (·), allowing it to be either state- and time-dependent (as in [36,39,41] or [38]) or fully
stochastic [42].

3.3. Multi-Factor Market Models: LMM and DD-LMM


The final important category of models that we discuss are the market models of the
type originally proposed by Brace, Gatarek, and Musiela [10], Jamshidian [11], Miltersen,
Sandmann, and Sondermann [12] and later developed and extended in multiple directions
by many others. The common conceptual thread of these models is a departure from
modeling an unobserved, artificial concept such as the short rate in favor of quantities
directly observable and quoted by the market—the LIBOR or swap rates. To fix ideas,
consider a tenor structure such as in (5) on which a family of forward LIBOR rates is
defined for each of the expiry–maturity pairs ( Tn−1 , Tn ), n = 1, ..., N as in (7) (each rate is
“alive” until expiration whereupon it becomes the spot LIBOR rate introduced in (6)). Since
the discount bonds P(t, Tn−1 ) and P(t, Tn ) are tradable assets, Fn (t) P(t, Tn ) expresses the
price of a tradable asset and thus must be a martingale under the forward measure Qn
associated with the numeraire P(·, Tn ). Thus it is natural to assume the following driftless
dynamics for each Fn :

dFn (t) = σn (t) Fn (t)dW n (t), n = 1, ..., N, t ≤ Tn−1 , (40)

where W n (t) is a standard Brownian motion under Qn and the instantaneous correlation
between individual shocks is given by:

dW n (t)dW m (t) = ρn,m dt. (41)

Even such a brief introduction allows us to comment on the main virtues of the
model. First, the driftless lognormal dynamics (40) ensures that the model-implied prices
of caps/floors are given by Black’s formula—in line with market practice (an analogous
model written for the forward swap rate prices swaptions with Black’s formula—again
consistently with market practice). None of the previously discussed short rate models
could boast such a feat. Second, unlike the one-factor short rate models [30,32], which
unrealistically imply perfect instantaneous correlation between forward rates, LMM can
easily produce a much richer and more realistic instantaneous correlation structure. On the
other hand, with potentially a large number of factors involved, recombining trees or PDE
methods is not feasible, and pricing needs to be based on Monte Carlo simulations. To this
end, the dynamics of all rates need to be expressed in a single measure Qi :

dFn (t) = µin (t) Fn (t) + σn (t) Fn (t)dW i (t) n = 1, ..., N, t ≤ Tn−1 (42)

with drifts calculated using the measure change technique as:



i n δk Fk (t)σk σn ρnk
 µ n = ∑ k = i +1 1+ δ F ( t )

 k k
i<n
µin = 0 i=n (43)
δk Fk (t)σk σn ρnk

 µi = − ∑i

i>n
n k = n +1 1+δk Fk (t)

Although the original LMM formulation is not consistent with the implied volatility
smile/skew observed in interest rate markets, and precludes the existence of now perva-
sive negative interest rates, a relatively straightforward extension can be introduced that
addresses both issues. The quick fix entails shifting the lognormal dynamics (40) by a
forward-rate specific parameter αn (typically treated as constant) so that:

dFn (t) = σn (t)( Fn (t) + αn )dW n (t) (44)

The distribution of each Fn (t) thus becomes a shifted lognormal distribution and
while the quantities ( Fn (t) + αn ) still have to be positive, rates as such are permitted to
be negative and bounded from below by −αn . More importantly, a shifted lognormal
Mathematics 2021, 9, 112 13 of 32

process can consistently generate a (negatively) skewed pattern in Black implied volatilities.
Yet another feature of market reality to take into account, especially in the post-crisis
environment, is the existence of “basis” between different money market curves reflecting
LIBOR rates’ lack of risk-free character (see [43] esp. chap. 4 for a good general overview).
In practice, this leads to so-called dual-curve pricing, whereby the true risk-free rates (e.g.,
overnight index swaps, OIS) are modeled and evolved under Monte Carlo and the basis
adjustment is applied to the OIS rates in calibration and the pricing of derivatives is derived
from LIBOR-based underlyings. In the simplified case of a constant (time-independent)
basis the forward LIBOR rates are derived by adding a fixed spread to the simulated paths
of the forward OIS rates with no need for extra simulation. To simplify presentation and
given that counterparty credit risk issues are not central to the main thrust of the argument,
we shall simply assume a flat zero basis.
Thus, the model parameters to be calibrated are the functions σn (·), the shift values αn
and the instantaneous correlations ρn,m . For some products this would entail finding the
best fit among potentially hundreds of “free” parameter candidates, rendering the whole
procedure prohibitively costly. Therefore, for practical purposes, parametric forms for the
volatility term structure and correlation are typically imposed (see, e.g., [44] or [7] for a
good discussion of potential choices). In line with well-established practice, we opt for
a piecewise constant volatility function (leaving us with N ( N + 1)/2 parameters), and
Rebonato’s classic two-parameter full-factor correlation structure of the form ρn,m = ρ∞ +
(1 − ρ∞ ) exp(−ρdec |n − m|), where ρ∞ is the base correlation to which off-diagonal matrix
entries exponentially decay at a rate of ρdec . To price Bermudans in LMM (in the displaced
diffusion form), we calibrate the model to the entire swaptions matrix, including both
ATM and OTM volatilities, making use of approximate swaption pricing formulas [45,46].
Specifically, we take advantage of the fact that the swap rate is a stochastic average of
forward rates and thus follows itself in approximately (shifted) lognormal dynamics.

4. Bermudan Swaption Pricing


Before we turn to pricing Bermudan swaptions more formally, it is useful first to
consider some stylized facts related to the economic risk profile of the instrument. As
is evident from the preceding discussion, each Bermudan swaption spans a basket of
European co-terminal options, all struck at the same rate K, of which only one can be
exercised. This basket nature of Bermudan swaptions has two implications.
First, compared to European counterparts, Bermudans typically have much higher
sensitivity to changes in implied volatility, i.e., Vega, and their volatility risk profile is
also more distributed along the whole term/tenor structure, rather than concentrated at a
single point. On the one hand this underscores the attractiveness of Bermudan swaptions
as a hedging instrument for investors in callable products, but on the other it suggests
that hedging or replicating the risk profile of a Bermudan requires multiple European
swaptions to match the former’s Vega buckets. For example, the 11nc1 Bermudan receiver
swaption priced using current data and a calibrated HW1F model has 1 bp Vega (defined
as the change in market value for a 1 bp shift in volatilities of the co-terminal swaptions) of
0.06% of notional, while Vegas of its co-terminal Europeans range from 0.009% (10Y × 1Y)
to 0.048% (3Y × 8Y). More importantly, as shown in Figure 1 for a 20Y Bermudan receiver
swaption, the volatility risk profile is distributed along the spectrum of term/tenor buckets.
Second, to preclude straightforward arbitrage, a Bermudan swaption must be at
least as valuable as any European swaption spanned by it. Consequently, the price of
the most expensive European swaption spanned by a given Bermudan tenor structure
must be the lower bound on the price of the Bermudan swaption itself. This valuation
difference between a Bermudan and its Most Expensive European (MEE) swaption—called
the Bermudan-MEE gap, or B/E basis—is a useful simple handle for the economic value
of the Bermudan exercise right, i.e., the privilege of having all other exercise possibilities
available. Indeed, Qu [41] suggests analyzing the MEE basis in historical perspective as an
Mathematics 2021, 9, 112 14 of 32

indication of potential pricing anomalies and a handle for the model risk inherent in the
chosen Bermudan pricer (see also [47]).
Figure 2 shows the price of the 10Y Bermudan receiver plotted against the prices of
its co-terminal European swaptions, all struck at the same rate, K = 1.66%. Evidently,
the Bermudan price is higher than each of the Europeans in the “basket” and the gap
is tightest for the 3Y-into-8Y swaption, which is therefore the MEE. Both the size of the
MEE gap and the MEE swaption itself are likely to depend on a combination of market
factors, most importantly yield curve level and steepness, as well as forward swap rate
volatilities and correlations. For example, while the 3 × 8 swaption is the MEE for the
1 × 10 Bermudan at current prices, under a significant fall in market interest rates, a 1 × 10
swaption would become the MEE, and should rates rise a 6 × 5 European would become
the MEE. The MEE basis in these scenarios would fluctuate from 30 to 200 bp (Figure 3).

Figure 1. The distribution of Vega risk across maturity and tenor buckets for a 21nc1 receiver ATM
Bermudan swaption (in % of total swaption Vega).

Figure 2. Price of a 11nc1 Bermudan receiver swaption plotted against the prices of co-terminal
Europeans (all options struck at K = 1.66%).
Mathematics 2021, 9, 112 15 of 32

Figure 3. The 21nc1 Bermudan swaption and its most expensive Europeans in different mar-
ket regimes.

4.1. Numerical Methods for Pricing Models


β
Consider now a tenor structure T = { Tn }n=α and a “Tβ no-call Tα ” Bermudan receiver
swaption introduced above with time t value RBSα,β (t, K ). Assuming no prior exercise, at
any time point Tn the swaption holder has the right to receive the exercise value Ve of the
swaption, i.e., present value of the underlying swap:

β

+
Ve ( Tn ) ≡ K − Sn,β ( Tn ) P( Tn , Tk )δk . (45)
k = n +1

The exercise value has to be compared to the so-called continuation value, Vc , of


holding the option beyond Tn :
 

Vc ( Tn ) ≡ E RBSα,β ( Tn+1 , K ) Sn,β ( Tn ) .
(46)

The value of the Bermudan swaption can now be given in terms of (45) and (46) via a
dynamic programming recursion:
+
RBSα,β ( Tβ−1 , K ) = P( Tβ−1 , Tβ )δβ K − S β−1,β ( Tβ−1 )
 (47)
RBSα,β ( Tj , K ) = max Ve ( Tj ), Vc ( Tj ) ,

for j = β − 2, β − 3, ..., n. The evaluation of (47) proceeds backward in time: at Tβ−1 the value
of the Bermudan is known and determined by the standard swaption payoff. This allows us to
update the continuation value at Tβ−2 by discounting and comparing it to the exercise value
prevailing at the time. The procedure of comparing “backwardly-cumulated” continuation
value with the immediate exercise value and deciding upon a swaption exercise is repeated
until the initial valuation date is reached, at which point the algorithm yields a price estimate
for the Bermudan swaption. The calculation of the continuation value is clearly model-
dependent and the choice of modeling framework itself often determines the scope of available
Mathematics 2021, 9, 112 16 of 32

numerical techniques. Again without pretense of being exhaustive, we shall consider and
briefly discuss three main approaches—binomial/trinomial trees, finite difference schemes
based on the derivative PDE and Monte Carlo simulation—traditionally applied in the context
of the model categories described above. Our focus will not be on the technical details—for
which we refer readers to Glasserman [26], Tavella and Randall [48] as well as Mitchell and
Griffiths [49]—but rather on the connection between the modeling framework and numerical
implementation techniques.

4.1.1. Tree-Based Methods


Perhaps the most classic approach for pricing derivatives with early exercise features,
including Bermudan swaptions, is by building a binomial/trinomial tree for the underlying
variable using its discretized SDE (forward induction method). Such methods are only
feasible for low-dimensional processes (ideally one- or two-dimensional), which is why
they have been extensively used in modeling one-factor short rate models, all of which can
be approximated with lattices. Specifically, when approximating a one-factor interest rate
diffusion on a trinomial tree, it is assumed that at each time ti from a discretized domain
of time points 0 = t0 < t1 < ... < tn = T (note that the time points in general do not
need to be uniformly spaced) there is a finite number of equispaced interest rate states
ri,j (i corresponds to the time point and j to the space index), and at each node (i, j) the
short rate can move to ri+1,k+1 , with probability pu ,ri+1,k with probability pm , or ri+1,k−1 ,
with probability pd , whereby the branching parameters are determined in such a way as to
ensure that the conditional mean and variance match as closely as possible their model
values. For example, in a simple brute-force implementation of the Hull–White model (17),
the probabilities associated with the node (i, j) are:

σ2 ∆t + (µi,j + ( j − k)∆r )2 µi,j + ( j − k )∆r


pu = 2
+
2∆r 2∆r
σ2 ∆t + (µi,j + ( j − k)∆r )2
pm = 1 −
∆r2
pd = 1 − pu − pm

where µi,j = (θ (i∆t) + κri,j )∆t is the drift and we have omitted the dependence of transition
probabilities on nodes to ease notation. Clearly, building a tree requires as an input the
values for short rate volatility σ and mean√reversion speed κ, while for numerical stability
and convergence considerations ∆r = σ 3∆t (in fact, this procedure can be improved
upon as explained, e.g., in Hull and White [50]).
The Hull–White model is particularly convenient for pricing Bermudans because
knowledge of the short rate is tantamount to the knowledge of all bond prices through the
closed reconstitution Formula (20), and thus once an interest rate tree is built along the
lines sketched above, we can immediately value all relevant payoffs. Thus, starting from
Tβ−1 , i.e., the last time the swaption can be exercised, and using the readily available bond
j
prices, we calculate for all nodes j in the current column the value RBSα,β ( Tβ−1 , K ) as:
!+
j

j
+
j
1 − P j ( Tβ−1 , Tβ )
P ( Tβ−1 , Tβ )δβ K − Sβ−1,β ( Tβ−1 ) = P ( Tβ−1 , Tβ )δβ K − (48)
δβ P j ( Tβ−1 , Tβ )

We then propagate these continuation values backward in time through the interval
Tβ−1 = td( β−1) > td( β−1)−1 > ... > td(0) = Tβ−2 using the Formula (46) and the transition
probabilities, obtaining “backwardly cumulated” continuation values for each node:
j
Vc (ti ) = pu Vck+1 (ti+1 ) + pm Vck+1 (ti+1 ) + pd Vck+1 (ti+1 ), i = d(0), ..., d( β − 1) − 1 (49)
Mathematics 2021, 9, 112 17 of 32

Once the exercise point Tβ−2 is reached we check the exercise opportunity by compar-
j
ing Vc ( Tβ−2 ) with the exercise value:
 + h i
j j
Ve ( Tβ−2 ) = K − S β−2,β ( Tβ−2 ) P j ( Tβ−2 , Tβ−1 )δβ−1 + P j ( Tβ−2 , Tβ )δβ (50)

and update the option value accordingly. Proceeding in this way we finally reach the first
exercise date Tα and subsequently the initial node of the tree with the time 0 swaption price.
While the method sketched in general terms above can be optimized and/or general-
ized along various dimensions (e.g., varying the discretization step, or including backward
propagation of bond prices where no analytical reconstitution formulas are available as in
the Black–Karasiński model) it should come as no surprise that the tree can only efficiently
handle low-factor short rate rate models. Building a binomial tree for the Hull–White two-
factor model is still relatively straightforward; however the added layers of computational
complexity seem to defeat the purpose of using a simple (simplistic?) model of the term
structure in the first place.

4.1.2. Finite Difference Schemes


The trinomial trees discussed above are in fact a special kind of the so-called explicit
finite difference scheme and as such are only conditionally stable. A more efficient method
of handling Bermudans in low-factor models, by directly discretizing the partial differential
equation for the swaption price, can be deployed as follows. Recall first that if x (t) is the
state variable of the model—driving the discount curve and all its derivables—then by
no-arbitrage arguments, the swaption price V (t, x ) can be expressed as the risk-neutral
expectation of the terminal payoff V ( T, x ( T )). Now, since V (t, x ) is a martingale, its drift
must be zero (in the appropriate measure), and hence applying Ito’s lemma and writing
down the dynamics dV, we find that V (t, x ) must satisfy the following parabolic partial
differential equation (PDE):
∂V (t, x )
+ AV (t, x ) = 0, (51)
∂t
where
EQ (V (t + ε, x + ε| x (t) = x ) − V (t, x )
A = lim (52)
ε →0 ε
is the infinitesimal operator for the SDE describing the dynamics of the state variable. For
example, in the special case of the Hull–White model, the European swaption price solves
the following parabolic PDE:

∂V (t, r (t)) 1 ∂2 V (t, r (t)) ∂V (t, r (t))


+ σ ( t )2 2
+ (θ (t) − κr (t)) = r (t)V (t, r (t))
∂t 2 ∂r ∂r
Analogously, in the Cheyette model, i.e., Equations (38) and (39), the pricing PDE
takes the following form:

∂V (t, x (t), y(t))


( x (t) + f (0, t))V (t, x (t), y(t)) =
∂t
1 ∂2 V (t, x (t), y(t))
+ σ ( t )2
2 ∂x2
∂V (t, x (t), y(t))
+(y(t) − κx (t))
∂x
∂V ( t, x (t), y(t))
+(σ(t)2 − 2κy(t)) .
∂y

Associated with both equations is of course the known terminal swaption payoff
condition.
Mathematics 2021, 9, 112 18 of 32

To solve Equation (51) numerically, we discretize it on some finite rectangular domain


[0, T ] × [ xmin , xmax ], where the grid boundaries xmin and xmax are either determined a priori
as some multiples of the variance of x ( T ) or within the PDE itself. The partial derivatives
are then replaced by corresponding finite difference operators, with schemes differing i.a.
in the nature of approximation. The central difference approximation of ∂V/∂t, known as
the Crank–Nicolson scheme, is often the method of choice (second-order convergent in ∆t),
whereas the backward approximation—given the explicit scheme—should be avoided for
stability and convergence considerations. Specifically, for the Crank–Nicolson scheme,
     
1 t i +1 + t i 1 t i +1 + t i
I − ∆t T V ( ti ) = I + ∆ t T
b b ( t i +1 ),
V (53)
2 2 2 2

b ( t ) = [V
where I is the identity matrix, V b (t, xmax )]| and V
b (t, xmin ), ..., V b (t, x ) stands for the
approximation to the true solution V (t, x ), and T is a tri-diagonal matrix of coefficients
derived from approximating PDE terms. For a known V b (ti+1 ), (53) is simply a linear system
of equations to be solved by standard methods. Thus, starting from the terminal condition
V ( T, x )—where the swaption payoff is known—we use Equation (53) to iterate backwards
in time until we get to Vb (0). Of course, since we are dealing with Bermudan swaptions,
at each time step corresponding to exercise time, we additionally need to compare the
rollback value with the exercise value—just as has been done on the trinomial tree.
Multi-dimensional PDEs for pricing within the two-factor Hull–White or Cheyette
models can still be handled efficiently using the so-called alternating direction implicit
method (ADI), introduced by Craig and Sneyd [51]. However, it should be borne in mind
that the computational complexity of such problems grows exponentially in the dimension,
making it prohibitively inefficient to use finite difference schemes for pricing Bermudans
in truly multi-factor models of the term structure.

4.1.3. Monte Carlo


An obvious conclusion would therefore be to switch to Monte Carlo simulation, where
the computational cost grows only linearly in dimension. Unfortunately, by design, Monte
Carlo simulation works forward in time and therefore dynamic programming of the kind
we used above to value Bermudans is not straightforward to implement. A brute-force
approach to computing the conditional expectation underlying the continuation value at
Tn , as per Equation (46), would consist in nesting a new simulation— within the original
one—with an initial condition Sn,β ( Tn ), for every Tn and each simulation path, and then
calculating the expectation directly. Thus, if the simulation consistently uses M paths, then
at time 0 we have M paths, at t1 we generate additional M2 paths, and so on until at the
penultimate exercise date t N = Tβ−1 we produce another M N simulation paths (this time
spanning just the final fixing period [ Tβ−1 , Tβ ]). Clearly, even for a modest choice of M and
N the computational burden of such an embedded scheme becomes prohibitive.
This subtle yet crucial point had arguably been one of the key hindrances holding
back the financial industry’s embrace of multi-factor market models. Whether many
factors per se are desirable for accurate valuation of Bermudans or not—we shall see
below that the question is in general not trivial—if they are not amenable to efficient
numerical implementation, then they are of little practical significance. A breakthrough
came with Longstaff and Schwartz [25], who adapted earlier ideas found in Tilley [52] and
Carriere [53], suggesting that the continuation value (46) can be interpreted as a regression
of the one-step-ahead payout on the state variables. Indeed, recall that
 

Vc ( Tn ) = E RBSα,β ( Tn+1 , K ) Sn,β ( Tn ) , n = β − 1, β − 2, ..., 1.
(54)

Since the conditional expectation of the swaption payoff is an element of the space of
square-integrable functions, it has a countable orthonormal basis and can be represented
Mathematics 2021, 9, 112 19 of 32

as a linear function of the elements of the basis. Moreover, in a Markovian model, only
current values of the state variables are necessary, so we can write:

Vc ( Tn ) ≡ Ψn ( x ( Tn )) = Ψn ( x1 ( Tn ), ..., xd ( Tn )), (55)

for some unknown function Ψn and d Markov state variables x1 (t), ..., xd (t). Longstaff
and Schwartz [25] suggest estimating Ψn —and hence the continuation value—by a linear
combination of a finite set of J basis functions
J
∑ λn,j ψj (x(Tn )),
J
Ψn ( x ( Tn )) ≈ Ψn ( x ( Tn )) = (56)
j =1

where the weights λn,j are determined by least-squares regression on the Monte Carlo
paths (typically, to improve quality of fit and run-time performance, only in-the-money
paths may be considered, which limits the region on which the conditional expectation is
to be estimated). Since the values of the basis functions are independently and identically
cJ
distributed, it can be shown that the fitted value Ψn ( x ( Tn )) converges in mean-square
J
and in probability to Ψn ( x ( Tn )) as the number of paths goes to infinity, which justifies
the method.
Once the basis functions and their loadings are determined, valuation follows the
dynamic programing method applied above in the context of finite difference schemes
and lattices. Thus the algorithm to perform least-squares Monte Carlo (LSMC) to value a
Bermudan might look as follows (for concreteness we assume the pricing model is LMM):
1. Start by specifying and calibrating a desired form of the LMM (number of factors,
correlation etc.) as well as the numeraire for the simulation;
2. Simulate the desired number of paths ω1 , ..., ω M for the underlying LIBOR rates;
3. Calculate the numeraire, the value of the underlying swap, the values of the basis
functions and the exercise values Ve along each path and on all relevant exercise dates;
 +
4. Set Vbc ( Tβ−1 ) = 0 and Ve ( Tβ−1 ) = RBSα,β ( Tβ−1 , K ) = P( Tβ−1 , Tβ )δβ K − Sβ−1,β ( Tβ−1 ) ;
5. Going backward in time, i.e., for n = β − 2, ..., 1, calculate the weights on the basis functions
J
Vbc ( Tn ) = ∑
j =1λ
d n,j ψ j ( x ( Tn )) along all paths ω through standard error minimization:

 2
J
∑RBSωα,β (Tn+1 , K) − ∑ λd
n,j ψ j ( x ( Tn ))
 → min (57)
ω j =1

and use the resulting estimates


 to update the swaption values
RBSα,β ( Tn , K ) = max Ve ( Tj ), Vbc ( Tn ) ;
6. Return RBSα,β ( T0 , K ).
Several remarks are in order at this point. First, as should be clear from the discussion
above, the robustness of our valuation scheme is critically affected by the choice of the
explanatory variables in the regression, i.e., the form and number of the basis functions.
In their formulation, Longstaff and Schwartz [25] recommend using a polynomial basis,
i.e., monomials of the state variables. While sound in principle, this approach might be
impractical in the context of LMM due to the possibly high dimension of the vector of
state variables (core LIBOR rates). Note also that with too many regressors, the quality
of the parameter estimates is likely to deteriorate, given a fixed number of Monte Carlo
paths. For example, Glasserman and Yu [54] show that the number of regression variables
for which accurate estimation is possible from M paths is O(log M). Thus with, say,
50,000 paths we should not use more than four variables. A popular choice is to choose
i.a. the front LIBOR rate (fixing on the exercise date), and several swap rates capturing the
overall level and shape of the yield curve.
Second, unlike in lattices or finite difference schemes, in LSMC, exercise decisions will
be sub-optimal by design as they rest on estimates of continuations values. At the same
Mathematics 2021, 9, 112 20 of 32

time, to the extent that the same set of paths is used to determine both the parameters
λ n,j and continuation values, the price estimate might be biased upwards. To reduce bias
d
and make sure that the regression generates a lower bound on the option price, we could
use two independent Monte Carlo simulations: the first pilot run to fit the coefficients
and then a second run to estimate continuation values for a newly seeded set of paths
based on the previously determined decision rule. To gauge the quantitative effect of
sub-optimal exercise decisions we conducted a simple experiment in which we priced
the same 20-non-call-10 Bermudan swaption with a one-factor Hull–White model using a
standard Crank–Nicolson finite-difference grid and LSMC. The model was calibrated to
ATM European swaptions with final 20Y maturity, and the mean reversion rate was in each
case set at 0.03. The results are shown in Figure 4 and clearly demonstrate the downward
bias of the Monte Carlo with spreads relative to the exact finite-difference scheme in the
order of 40–50 bps.

Figure 4. Comparison of “20-non-call-10” Bermudan swaption prices calculated with a one-factor


Hull–White model using a finite difference scheme and Least Square Monte Carlo (in percent; USD
market data as of September 2020). Note: for reference, the LSMC uses 25,000 presimulation paths to
fit the coefficients and then another 25,000 with antithetic variates (a total of 50,000 paths) to estimate
continuation values based on the previously determined decision rule; we used twelve time steps
per year and standard Euler discretization; the Crank–Nicolson finite difference gird has a resolution
of 20 time steps per year and 400 space steps.

4.2. Factor Dependence of Bermudan Swaptions


Having presented the main categories of models and implementation techniques
that can be used to price Bermudan swaptions we now turn to the issue of their factor
dependence. In particular, using a set of numerical experiments we revisit the debate
between Longstaff, Santa-Clara, and Schwartz [9] and Andersen and Andreasen [23], and
the threats posed by the modeling equivalents of Scylla and Charybdis mentioned in the
introduction.
Consider once again a “Tβ no-call Tα ” Bermudan receiver swaption, which, as we
know, can be exercised into one of the so-called core swap rates:

Sα,β ( Tα ), Sα+1,β ( Tα+1 ), ..., S β−1,β ( Tβ ) (58)

at either one of the designated exercise times Tα , Tα+1 , ..., Tβ , at the same strike rate K. In an
important paper Longstaff, Santa-Clara, and Schwartz [9] made a point that this “chooser-
option” character of a Bermudan makes is critically sensitive to the correlation among the
Mathematics 2021, 9, 112 21 of 32

core rates. Drastically simplifying their argument, we can imagine that when the individual
Si,β , S j,β are perfectly correlated, so too should be the respective exercise values. Hence,
there is likely to be little value added provided by the option to postpone the exercise.
By contrast, the argument goes, when correlation between Si,β , S j,β is low or even negative,
a Bermudan’s right to choose the exercise date will have more value. Indeed, if the exercise
value of the first swaption happens to be positive, it is worth exercising sooner, as payoffs
further down the line are likely to turn negative. A straightforward conclusion follows that
since single-factor models by design imply perfect correlation between core swap rates,
they should artificially under-price Bermudan swaptions, making traders “throw away
billions of dollars.” From a theoretical point of view, the result—coupled with advances in
numerical techniques under the guise of LSMC— provided a powerful argument in favor
of switching from short-rate to multi-factor market models.
And yet, Andersen and Andreasen [23] pointed out a subtle flaw in the above reason-
ing: to the extent that our Bermudan is viewed as a chooser option, one has to bear in mind
that the choice we refer to is made at different points in time: Tα , Tα+1 , ..., Tβ . Consequently,
what matters for pricing is not the instantaneous correlation
 
E dSi,β (t)dS j,β (t) = ρi,j (t)dt (59)

between the Brownian motions driving the dynamics of the core swap rates, but rather the
forward or serial correlation between the rates setting at different dates Ti and Tj , which can
be very roughly approximated as:
R Ti
0 σi (t)σj (t)ρi,j dt
qR r , (60)
Ti 2 R Tj 2
0 σi ( t ) dt 0 σj ( t ) dt

and depend not just on ρi,j but also on the volatilities of Ti - and Tj -maturity rates. This shows
that while single-factor models do indeed imply perfect instantaneous correlation, they can
de-correlate core rates setting at different times (through the time-dependent volatilities),
with non-trivial consequences for Bermudan swaptions prices.
To appreciate this last point, we conducted our own set of numerical experiments
using the modeling tools described above. Specifically, we priced Bermudan swaptions
on a 5Y and 10Y swap with quarterly payment frequency and time to first exercise set
at 1, 5 and 10 years. The strikes in each case were set to ATM levels based on the USD
interest rate curve as of 11 September 2020. We used the Hull–White one-factor model
(HW1F) implemented on a finite-difference grid as a benchmark and considered its two-
factor extension (HW2F), and four versions of the LIBOR market model assuming 2-,
3-, 4- and full-factor parameterization. In line with standard practice, we used a piece-
wise constant volatility function and Rebonato’s two-parameter correlation structure of
the form ρn,m = ρ∞ + (1 − ρ∞ ) exp(−ρdec |n − m|), for the full-factor model. We used
25,000 pre-simulation paths to fit the coefficients and then another 25,000 with antithetic
variates (a total of 50,000 paths) to estimate continuation values based on the previously
determined decision rule.
The results of our numerical experiments, shown in Figure 5, are generally in line with
those reported in a much more comprehensive study by Andersen and Andreasen [23]
and seem to confirm the de-correlation possibilities of low-factor models. Indeed, prices
produced by the HW1F model are consistently higher than those derived from the two-,
three- and very high-factor market models. For longer-maturity products this pattern can
be quite pronounced with the relative error for the 20-nc-10 swaption reaching almost 15%.
Whereas a part of this pattern may be due to different numerical procedures employed
(grid solution for HW1F and LSMC for other models), our earlier tests, reported in Figure 4,
show that this cannot be the entire explanation. Indeed, to assess the dependence of
swaption prices on rate correlation, we used our calibrated full-factor LMM to reprice
the same 10 × 10 Bermudan receiver (struck at 1.33%) varying base correlation ρ∞ from
Mathematics 2021, 9, 112 22 of 32

0.1% to 0.9% (Figure 6). Again, the results show that increasing correlation increases, not
decreases, Bermudan swaption prices.

Figure 5. Factor dependence: prices of Bermudan swaptions on 5Y (left-hand panel) and 10Y swaps (right-hand panel)
with time to first exercise set at 1, 5 and 10 years calculated using single- and multi-factor models calibrated to USD market
data as of 11 September 2020 (in percent).

Figure 6. Prices of 20-nc-10 Bermudan swaption in a full-factor LIBOR market model under different
base correlation assumptions.

To provide some intuition for these results, we recall that a forward swap rate is a
complex average of underlying forward rates, so we can write:

β
Sα,β (t) = ∑ wi (t) Fi (t), (61)
i = α +1
Mathematics 2021, 9, 112 23 of 32

where the weights are given by:

∏ij=α+1 1
1+δj Fj (t)
wi ( t ) = . (62)
∑k=α+1 δk ∏kj=α+1 1
β
1+δj Fj (t)

If we now freeze the weights at their time-zero values—as originally proposed by


Jackel and Rebonato [55], Rebonato [56] and justified by the fact that the variability of wi s
is significantly smaller than the variability of the LIBOR rates—we can differentiate (61)
to obtain:
β β
dSα,β (t) ≈ ∑ wi (0)dFi (t) = (...)dt + ∑ wi (0)σi (t) Fi (t)dWi (t). (63)
i = α +1 i = α +1

Calculating the quadratic variation and approximating it further by freezing all rates
at their time-zero values, we obtain the following approximate expression for integrated
percentage variance, or (squared) swap rate volatility:

β
Z Tα
dSα,β (t)dSα,β (t) wi (0)w j (0) Fi (0) Fj (0)ρi,j Z Tα
0 Sα,β (t)2
≈ ∑ Tα Sα,β (0)2 0
σi (t)σj (t)dt (64)
i,j=α+1

which must be plugged into the Black Formula (12) to recover the market price of a
European swaption on the swap fixing in Tα and maturing in Tβ . Now, if the volatilities of
the forward rates spanning the swap are held fixed (e.g., through thorough calibration),
then decreasing ρi,j by moving to a lower-factor model or by adjusting the base correlation
parameter ρ∞ will necessarily also reduce the swap rate volatility—and hence ultimately
the price of the Bermudan.
Although other subtle effects can also play a role, justifying more extensive tests
than allowed by the scope of this essay (see, e.g., [57,58] and as usual [7] for a general
reference), we can nevertheless safely conclude at this point that factor dependency does
seem to be an issue in pricing Bermudan swaptions. This empirical conclusion validates
our perspective on the evolution of Bermudan swaption pricing as driven by a difficult
choice between a more cumbersome and difficult to sensibly calibrate low-factor model
that can be more efficiently implemented (Charybdis) and a multi-factor model that allows
much more calibration freedom, but raises problems on the numerical implementation
front and produces sub-optimal exercise decisions.

5. Markovian Projection and Dimension Reduction


We now consider a novel approach to deal with the “curse of dimensionality”, which
rests on the idea of Markovian projection and the seminal contribution of Gyöngy [28],
who showed that for every stochastic process with an Ito differential:

dξ (t) = δ(t)dW (t) + β(t)dt, ξ (0) = 0 (65)

there exists a stochastic differential equation:

dχ(t) = γ(t, χ(t))dW (t) + b(t, χ(t))dt (66)

having the same one-dimensional probability distribution as ξ (t) for every t, with γ(t, χ)2 =
E(δ(t)2 |ξ (t) = χ) and b(t, χ) = E( β(t)|ξ (t) = χ). Below we extend and generalize this
result to non-probabilistic measures. This will be more relevant to option pricing, which
Mathematics 2021, 9, 112 24 of 32

requires taking expectations of discounted terminal payoffs. Concretely, we will show


below that for every pair of stochastic processes:

dξ (t) = δ(t)dW (t) + β(t)dt, ξ (0) = 0


dZ (t) = z(t)dW (t) + ζ (t)dt

there exist Markovian counterparts (χ, U ) satisfying the following Ito equations:

dχ(t) = δ(t, χ(t))dW (t) + β(t, χ(t))dt


dU (t) = U (t)ζ (t, χ(t))dt
h i h i
such that the values E Z (t)1{ξ (t)>K } and E U (t)1{χ(t)>K } coincide for every t and K.
h i
If the function ζ is negative, then E U (t)1{χ(t)>K } = Prob(χ(t) > K ), where χ is a
Markov process with Ito differential dχ(t) = δ(t, χ(t))dW (t) + β(t, χ(t))dt and killing
rate −ζ (t, χ). Furthermore, the infinitesimal generator A of such a process is given by:

1
Ah(χ) = δ(t, χ)h00 (χ) + β(t, χ)h0 (χ) + ζ (t, χ)h(χ). (67)
2
The practical significance of this result is easy to see. Since ξ (·) and χ(·) have the same
marginal probability distributions, it follows that prices of all European options on ξ (·)
and χ(·) will be equal, for any expiration time t. For our present purposes this means that
even if the swap rate evolves according to some very high-factor model with a stochastic
volatility, we can replace it with a much simpler model by “localizing” its parameters,
effectively reducing dimensionality. And while in theory this only guarantees that we
match European options prices, we shall see below that a Markovian process for the swap
rate—with a number of freezing approximations—provides a simple and robust tool for
pricing Bermudan swaptions.

5.1. Markovian Projection for the Swap Rate


Consider once again the forward swap rate Sα,β defined in continuous-time convention
by Equation (10), and the corresponding Tα -expiry European swaption with time-zero price:

Nα,β ( Tα )
 
+
E Q
Sα,β ( Tα ) − K . (68)
B( Tα )
If both the expectation and the underlying swap rate are viewed, counter-intuitively,
as functions of Tα = t, then (for fixed Tβ = T) there arise two spot processes:
 
N (t) +
C (t, K ) ≡ E Q
(S(t) − K ) , S(t) ≡ St,T (t) (69)
B(t)

N (t)
For notational convenience denote D (t) = B(t)
and F (t) = D (t)S(t) so that:
h i
C (t, K ) = EQ D (t)(S(t) − K )+ . (70)

Now, suppose that the dynamics of S(t) is described by some potentially complicated
high-factor model with the following SDE:

dS(t) = ϕ(t)dW (t) − (...)dt (71)

Note that the notation suggesting that dW (t) is one-dimensional involves no loss of
generality since for any multi-dimensional process we can always find a diffusion and
one-dimensional Brownian motion such that laws of both processes exactly match.
Mathematics 2021, 9, 112 25 of 32

It is straightforward to calculate that:

dD (t) = (...)dW1 (t) − B(t)dt (72)

and
dF (t) = (...)dW2 − B(t)r (t)dt. (73)
for some Wiener processes W1 and W2 .
We now present the main result in the form of the following theorem.

Theorem 1. Let S(t), D (t) and F (t) be stochastic processes with general Ito differentials (71)–(73).
Then there exists a pair of processes ( Z (t), Q(t)) with dynamics

dZ (t) = ϕ(t, Z (t))dW (t) + ( g(t, Z (t)) − Z (t)h(t, Z (t)))dt (74)

dQ(t) = Q(t)h(t, Z (t)) (75)


suchh that the discounted
i one-dimensional
h distributions
i of S(t) and Z (t) are equivalent, i.e.,
Q Q
E D (t)1{S(t)>K} = E Q(t)1{Z(t)>K} for every t, K. The coefficients ϕ,g, h are non-
ϕ2 (t, t
 2 
random
h and have theifollowing simple interpretation:
h i x ) = E ϕ (t)|S(t) = x , g(t, x ) =
−Et Nr((tt)) S(t) = x , and h(t, x ) = −Et N1(t) S(t) = x , where Et [·] is the expectation under

t-forward measure defined by:


EQ [ D (t)ξ ]
Et ( ξ ) ≡ Q (76)
E [ D (t)]
for any ξ.

Remark 1. The Theorem above can be seen as a generalization of the result due to Gyöngy [28], who
first observed that it is possible to construct a process that mimics the one-dimensional distributions
of a given Ito process, but is simpler in the sense that its drift and diffusion terms are non-random,
localized versions of the drift and diffusion of the original process (which in general may themselves
be stochastic processes). We offer a reinterpretation and extension of the result to cover non-
probabilistic and discounted distributions relevant in option pricing. Therefore, the proof we
present below is finance-motivated and starts by considering the swaption payoff spot process as per
Equation (70).

Proof. By Tanaka’s formula,

1
d(S(t) − K )+ = 1{S(t)>K } dS(t) + δ(S(t) − K )dhS(t)it . (77)
2
Hence, recalling Equation (70), we can calculate the dynamics of C (t, K ) as: use

 1
d D (t)(S(t) − K )+ = ϕ2 (t)δ(S(t) − K ) D (t)dt + 1{S(t)>K } [ D (t)dS(t) + dhS, D it ] + (S(t) − K )+ dD (t)
2
1 2
= ϕ (t)δ(S(t) − K ) D (t)dt + 1{S(t)>K} [ D (t)dS(t) + dhS, D it + (S(t) − K )dD (t)]
2
1 2
= ϕ (t)δ(S(t) − K ) D (t)dt + 1{S(t)>K} [dF (t) − KdD (t)].
2
Taking the expected value of both sides under the risk neutral measure Q yields:
 
∂C (t, K ) 1 h i
= EQ D (t)δ(S(t) − K ) ϕ2 (t) + EQ 1{S(t)>K} (− B(t)r (t) − KB(t)) . (78)
∂t 2
Mathematics 2021, 9, 112 26 of 32

Since C (t, x ) = EQ [ D (t)(S(t) − x )+ ] = EQ [ D (t)]Et [(S(t) − x )+ ], the probability dis-


tribution of S in the t-forward measure Qt is:

1
∂2 C (t, x ). (79)
E [ D (t)] xx
Q

Hence,
 
Q 1 1 h i
E D (t)δ(S(t) − K ) ϕ (t) = EQ [ D (t)]Et δ(S(t) − K ) ϕ2 (t)
2
2 2
1 h i
= E[ D (t)]Et [δ(S(t) − K )]Et ϕ2 (t)|S(t) = K
2
1 ∂2 C (t, x ) t h 2 i
= E ϕ ( t )| S ( t ) = x . (80)
2 ∂x2
Observe now that:
 
h i D (t)
EQ 1{S(t)>K} (− B(t)r (t) − KB(t)) = EQ 1{S(t)>K } (− B(t)r (t) − KB(t))
D (t)
− B(t)r (t) − KB(t)
 
= EQ [ D (t)]Et 1{S(t)>K} .
D (t)

Pick a sequence ( xn )∞
n=0 of evenly spaced numbers such that x0 = K, xi +1 − xi = ∆
and limn→∞ xn = ∞. Then we may write:
  ∞  
B ( t )r ( t ) B ( t )r ( t )
t
E 1{S(t)>K} = ∑ E 1{ xi <S(t)< xi+1 }
t
D (t) i =0
D (t)
∞  
t B ( t )r ( t )
= ∑E x < S ( t ) < x i +1 Pt ( x i < S ( t ) < x i +1 ).

i =0
D (t) i

Letting ∆ → 0, and using Equation (79), we obtain:


  Z ∞ 2  
t B ( t )r ( t ) 1 ∂ C (t, x ) t B(t)r (t)
E 1{S(t)>K} = Q E S(t) = x dx. (81)
D (t) E [ D (t)] K ∂x2 D (t)

Analogously,
Z ∞
∂2 C (t, x )
   
KB(t) 1 t B(t)
Et 1{S(t)>K} ∂2xx

= Q K E S ( t ) = x dx. (82)
D (t) E [ D (t)] K ∂x2 D (t)

Using (80)–(82) we can finally re-express Equation (78) as:


Z ∞ 2
∂C (t, x ) 1 ∂2 C (t, x ) 2 ∂ C (t, x )
= ϕ (t, x ) + [ g(t, x ) − Kh(t, x )]dx. (83)
∂t 2 ∂x2 K ∂x2
Differentiating (83) twice with respect to the strike rate gives us the Fokker–Planck
equation:

∂3 C (t, x ) 1 ∂2 ∂2 C (t, x ) 2
 
= ϕ ( t, x ) (84)
∂t∂x2 2 ∂K2 ∂x2
 2
∂2 C (t, K )

∂ ∂ C (t, x )
+ 2
[ g ( t, x ) − Kh ( t, x )] − h(t, K )
∂K ∂x ∂K2
Mathematics 2021, 9, 112 27 of 32

which describes the time evolution of the probability density implied by swaptions prices
{C (t, K }t,K . Note that the unique Markovian solution to Equation (84) is a pair of processes
( Z (t), Q(t)) with the following dynamics:

dZ (t) = ϕ(t, Z (t))dW (t) + ( g(t, Z (t)) − Z (t)h(t, Z (t)))dt (85)

dQ(t) = Q(t)h(t, Z (t)) (86)


It follows that ∂2xx C (t, x )
is the “Q-discounted” density of Z (t) and clearly also the
“D-discounted” density of S(t), as desired.

Remark 2. The discounting referred to in the proof is crucial: while the distributions of S(t) and
Z (t) are not equivalent, it nonetheless holds that:
h i h i
EQ D (t)1{S(t)>K} = EQ Q(t)1{Z(t)>K} (87)

and non-probabilistic measures PD [S(t) > K ] and PQ [ Z (t) > K ] are equivalent.

As a corollary, observe that the equivalence of non-probabilistic measures implies the


equivalence of swaption prices:
h i h i
C (t, K ) = EQ D (t)(S(t) − K )+ = EQ Q(t)( Z (t) − K )+ , (88)

which in turn implies that functions h(t, K ) and g(t, K ) are given in closed form. Indeed,

∞ ∂2 C (t, x )
  Z t   Z
EQ exp − r (s)ds 1{S(t)>K } = − h(t, x )dx (89)
0 K ∂x2

and h  R  i
−1 ∂EQ t

∂2 C (t, x ) exp − 0 r (s)ds 1{S(t)>K }
h(t, K ) = . (90)
∂x2 ∂K
Analogously,
  Z t   Z ∞ 2
∂ C (t, x )
E Q
exp − r (s)ds 1{S(t)>K } r (t) = − g(t, x )dx, (91)
0 K ∂x2

and h  R  i
−1 ∂EQ t

∂2 C (t, x ) exp − 0 r (s)ds 1{S(t)>K } r (t)
g(t, K ) = . (92)
∂x2 ∂K
Note that owing to the matching of one-dimensional (discounted) distributions,
the equality of European options prices on S(·) and Z (·) is exact. However, Markovian
projection does not preserve higher-order distributions, and therefore, in general, (88) does
not extend automatically to Bermudan swaptions. That said, we can hope that:

Nτ,β (τ )
 
+
RBSα,β (t, K ) = sup EQ K − Sτ,β (τ ) (93)
τ ∈T B(τ )
h i h i
= sup EQ D (τ )(K − S(τ ))+ ≈ sup EQ Q(τ )(K − Z (τ ))+ (94)
τ ∈T τ ∈T

Whether and to what extent Equation (93) holds is ultimately an empirical matter,
but our numerical experiments reported below suggest that the approximation is in fact
good enough to justify its practical use.
Mathematics 2021, 9, 112 28 of 32

5.2. Parameter Approximations and Numerical Experiments


Our strategy now is to calculate the right-hand side of (93) and compare the obtained
price—across a range of strikes—to RBSα,β (t, K ) determined using a proper multi-factor
model. Perhaps the most straightforward way involves stripping the functions g(·, ·) and
h(·, ·) off the interest rate curve and market swaptions prices using Formulas (90)–(92). But
we outline here an even simpler strategy suggested in Gatarek and Jabłecki [59]. Concretely,
we assume that
P(0, T )
P(t, T ) ≈ , (95)
P(0, t)
which implies that
Z T Z T
P(0, t) P(t, s)ds ≈ P(0, s)ds, (96)
t t
so that

r (t) r (t)S(t) f (0, t) P(0, t)


= ≈ S(t) (97)
N (t) 1 − P(t, T ) P(0, t) − P(0, T )
and
1 P(0, t)
≈ RT (98)
N (t)
t P (0, s ) ds
f (0,t) P(0,t) P(0,t)
Set p(t) ≡ P(0,t)− P(0,T )
and q(t) ≡ RT . Under these approximations, the inte-
t P(0,s)ds
grand in (83) simplifies and so does the related Fokker-Planck equation which now admits
a unique solution  
e (t) + Se(t)(q(t) − p(t))dt,
dSe = σ t, Se(t) dW (99)

with
e (t) = − Q
dQ e (t)q(t)dt. (100)
Note, in passing, that the dynamics (99) is not only one-dimensional but also resambles
one typically used in the equity space to model the evolution of stock prices where the
risk neutral drift of a share of stock is made up of the risk-free rate adjusted by the stock’s
dividend yield. Here, in the drift term q(t) − p(t), q(t) can be thought
 of as the “short rate”
 + 
and p(t)—the “dividend yield.” The calculation of supτ ∈T E Q Q e (τ ) K − Se(τ ) should
now be a straightforward exercise, and can be handled either in a finite difference pricer or
LSMC simulation. We favor the former approach. Concretely, we now set Tα1 = 1, Tβ1 = 6
and Tα2 = 1, Tβ2 = 11 for two sample Bermudans: a 6-nc-1 and an 11-nc-1, respectively.
As above, we use USD interest rate curve and volatility data as of September 2020.
First, we calculate p(t) and q(t) using yield curve data which we then use—along
with market prices of co-terminal swaptions—to derive the local volatility surface from the
approximated PDE:

∂t C (t, K ) − (q(t) − p(t))K∂K C (t, K ) + p(t)C (t, K )


σ(t, K )2 ≈ 2 . (101)
∂2KK C (t, K )

The result of this stripping exercise is shown in Figures 7 and 8.


Mathematics 2021, 9, 112 29 of 32

Figure 7. Implied volatility surface and its corresponding local volatility surface stripped using (101). Note: Terminal swap maturity is
fixed at 6 years; US market data as of 28 September 2020; to avoid “sinkholes” local volatility floored at zero.

Figure 8. “Short-rate-like” and “dividend-like” parameters q(t) and p(t) of the process (99) (stripped
f (0,t) P(0,t) P(0,t)
from yield curve data as of 28 September 2020). Note: p(t) = P(0,t)− P(0,T ) and q(t) = R T ,
t
P(0,s)ds
T = 6; plot truncated due to explosive behavior of p with t → T; q(6) = 12.

Once the local volatility has been calculated (separately for each of the swap maturity
dates T = 6 and T = 11), we set up the finite difference gridp
for the process (99). In each case
the boundaries are selected such that Smax = E(S( T )) + 4 Var (S( T )) and Smin = −Smax
which translate to ±0.06 for T = 6 and ±0.08 for T = 11. For reference, we use 400 time
steps and 400 grid intervals and solve the valuation PDE using the Crank-Nicolson method
with Gauss-Seidel successive overrelaxation for improved convergence. Calculation time
in both cases is instantaneous (Figure 9).
We benchmark valuation results against the whole class of models discussed above:
the market standard Hull-White one-factor model, the two-factor Cheyette model, as well
as the multi-factor Libor Market Model, obtaining the results shown in Figure 10.
Perhaps the first most striking observation is that the prices for strike rates ranging
from 0% to 3% derived from the respective models tend to be generally closely aligned
(Cheyette appears to underprice high-strike swaptions which may be due to the Monte
Carlo implementation). Remarkably, the approximated model—i.e., the “poor man’s”
approach, featuring an evolution of only a single process—produces results which seem
to be within normal market bid-ask spreads (not to mention model-based uncertainty) of
those calculated using a fully-fledged multi-factor Libor Market Model, or a local-volatility-
enhanced Cheyette term structure model. As we have stressed above, the “poor man’s”
Mathematics 2021, 9, 112 30 of 32

model can be readily implemented in a finite difference scheme, with minimal adaptations
to standard American equity option pricers, which ensures faster convergence as compared
to the Cheyette Monte Carlo implementation (see again Figure 9). This demonstrates not
only the fitting capability of the model, but—perhaps equally important—that Markovian
projection in our extended framework can be fruitfully employed to valuing derivatives
with early exercise features.

Figure 9. Convergence profile of the Crank-Nicolson scheme implemented for the 6nc1 Bermu-
dan swaption.

Figure 10. Model prices of a 6nc1 (lhs panel) and 11nc1 (rhs panel) Bermudan receiver swaptions
with 100 notional.

6. Conclusions
We have tried above to sketch our perspective on several decades of evolution in
modeling the term structure of interest rates, with a particular focus on the travails involved
in pricing Bermudan swaptions. While our treatment is by no means exhaustive, we hope
to have shed light on the fascinating “tug of war” between the pursuit of calibration and
pricing precision, tractability and implementation efficiency. The curse of dimensionality
presented with recourse to Greek mythology boils down to the fact that the exercise
boundary between the continuation area and stopping area is inherently complex and
multi-dimensional for interest rate products. Ironically, therefore, the better we can capture
and describe the product itself (i.e., the more factors we use), the poorer and less exact
our handle on the exercise boundary is. And yet, we believe that our extension of the
classic approach due to Gyöngy [28] promises a potentially attractive way past both the
“Scylla” and “Charybdis” paths. Specifically, on the theoretical side, we show that it is
possible to construct a process that mimics the discounted marginal distributions of a given
Ito process, but is considerably simpler in the sense that its drift and diffusion terms are
Mathematics 2021, 9, 112 31 of 32

non-random, localized versions of the drift and diffusion of the original process (which in
general may themselves be stochastic processes). This means that for purposes of swaption
pricing, a potentially complex and multidimensional process for the underlying swap rate
can be collapsed to a one-dimensional one. The empirical contribution complementing
our theoretical observation consists in demonstrating that even though we only match
the marginal distributions of the two processes, Bermudan swaption prices calculated
using such an approach appear well-behaved and closely aligned to counterparts from
more sophisticated models. And while much remains to be done in this area—not least
in understanding what exactly enforces the alignment between the true early exercise
boundary and the approximated ones—we believe that Markovian projection can be
further fruitfully exploited in pricing interest rate products with early exercise features.

Author Contributions: All authors contributed equally. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Guenther, R.; Lee, J. Heat conduction with radiating boundary conditions. J. Comput. Appl. Math. 1998, 88, 119–124. [CrossRef]
2. Kuske, R.A.; Keller, J.B. Optimal exercise boundary for an American put option. Appl. Math. Financ. 1998, 5, 107–116. [CrossRef]
3. Barles, G.; Burdeau, J.; Romano, M.; Samsoen, N. Critical stock price near expiration. Math. Financ. 1995, 5, 77–95. [CrossRef]
4. Black, F.; Scholes, M. The pricing of options and corporate liabilities. J. Political Econ. 1973, 81, 637–654. [CrossRef]
5. Geman, H.; Karoui, N.E.; Rochet, J.-C. Changes of numeraire, changes of probability measure and option pricing. J. Appl. Probab.
1995, 32, 443–458. [CrossRef]
6. Andersen, L.; Piterbarg, V. Interest Rate Modelling Volume; Atlantic Financial Press: London, UK, 2010.
7. Interest Rate Models-Theory and Practice: With Smile, Inflation and Credit; Springer Science & Business Media: Berlin/Heidelberg,
Germany, 2007.
8. Vasicek, O. An equilibrium characterization of the term structure. J. Financ. Econ. 1977, 5, 177–188. [CrossRef]
9. Longstaff, F.A.; Santa-Clara, P.; Schwartz, E.S. Throwing away a billion dollars: The cost of suboptimal exercise strategies in the
swaptions market. J. Financ. Econ. 2001, 62, 39–66. [CrossRef]
10. Brace, A.; Gatarek, D.; Musiela, M. The market model of interest rate dynamics. Math. Financ. 1997, 7, 127–155. [CrossRef]
11. Jamshidian, F. LIBOR and swap market models and measures. Financ. Stochastics 1997, 1, 293–330. [CrossRef]
12. Miltersen, K.R.; Sandmann, K.; Sondermann, D. Closed form solutions for term structure derivatives with log-normal interest
rates. J. Financ. 1997, 52, 409–430. [CrossRef]
13. Andersen, L.; Andreasen, J. Volatility skews and extensions of the Libor market model. Appl. Math. Financ. 2000, 7, 1–32.
[CrossRef]
14. Brigo, D.; Mercurio, F. Analytical pricing of the smile in a forward LIBOR market model. Quant. Financ. 2003, 3, 15–27. [CrossRef]
15. Zühlsdorff, C. Extended Libor Market Models with Affine and Quadratic Volatility; Verlag Dr. Kovac: Hamburg, Germany, 2002.
16. Andersen, L.B.; Brotherton-Ratcliffe, R. Extended Libor Market Models with Stochastic Volatility. J. Comput. Financ. 2005, 9.
[CrossRef]
17. Joshi, M.S.; Rebonato, R. A displaced-diffusion stochastic volatility LIBOR market model: motivation, definition and implementa-
tion. Quant. Financ. 2003, 3, 458–469. [CrossRef]
18. Piterbarg, V.V. Stochastic Volatility Model with Time-dependent Skew. Appl. Math. Financ. 2005, 12, 147–185. [CrossRef]
19. Wu, L.; Zhang, F. The stochastic volatility Libor market model. Risk 2001, 14, 105–109. [CrossRef]
20. Wu, L.; Zhang, F. Libor market model with stochastic volatility. J. Ind. Manag. Optim. 2006, 2, 199. [CrossRef]
21. LIBOR Market Model with Semimaringales. In Handbooks in Mathematical Finance, Topics in Option Pricing, Interest Rates and Risk
Management; Cambridge University Press: Cambridge, UK, 2001.
22. Jarrow, R.; Li, H.; Zhao, F. Interest rate caps “smile” too! but can the Libor market models capture the smile? J. Financ. 2007, 62,
345–382. [CrossRef]
23. Andersen, L.; Andreasen, J. Factor dependence of bermudan swaptions: Fact or fiction? J. Financ. Econ. 2001, 62, 3–37. [CrossRef]
24. Gatarek, D. Pricing of American swaptions in approximate lognormal model. In University of New South Wales Working Paper;
University of New South Wales: Sydney, Australia, 1996.
25. Longstaff, F.A.; Schwartz, E.S. Valuing American options by simulation: A simple least-squares approach. Rev. Financ. Stud. 2001,
14, 113–147. [CrossRef]
26. Glasserman, P. Monte Carlo Methods in Financial Engineering; Springer Science & Business Media: Berlin/Heidelberg, Germany,
2013; Volume 53.
27. Towards a general local volatility model for all asset classes. J. Deriv. 2019, 27, 14–30. [CrossRef]
Mathematics 2021, 9, 112 32 of 32

28. Gyöngy, I. Mimicking the one-dimensional marginal distributions of processes having an Itô differential. Probab. Theory Relat.
Fields 1986, 71, 501–516. [CrossRef]
29. Dupire, B. Pricing with a smile. Risk 1994, 7, 18–20.
30. Hull, J.; White, A. Pricing interest-rate-derivative securities. Rev. Financ. Stud. 1990, 3, 573–592. [CrossRef]
31. Boyle, P.P. Immunization under stochastic models of the term structure. J. Inst. Actuar. (1886–1994) 1978, 105, 177–187. [CrossRef]
32. Black, F.; Karasinski, P. Bond and option pricing when short rates are lognormal. Financ. Anal. J. 1991, 47, 52–59. [CrossRef]
33. Cheyette, O. Term structure dynamics and mortgage valuation. J. Fixed Income 1992, 1, 28–41. [CrossRef]
34. Hull, J.C.; White, A.D. Numerical procedures for implementing term structure models II: Two-factor models. J. Deriv. 1994, 2,
37–48. [CrossRef]
35. Ritchken, P.; Sankarasubramanian, L. Volatility structures of forward rates and the dynamics of the term structure1. Math. Financ.
1995, 5, 55–72. [CrossRef]
36. Gatarek, D.; Jabłecki, J.; Qu, D. Non-parametric local volatility formula for interest rate swaptions. Risk 2016, 20, 120–124.
37. Andreasen, J. Turbo Charging the Cheyette Model (1 June 2001). Available online: https://ssrn.com/abstract=1719142orhttp:
//dx.doi.org/10.2139/ssrn.1719142 (accessed on 5 January 2021).
38. Chibane, M.; Law, D. A quadratic volatility Cheyette model. Risk 2013, 7, 68–71. [CrossRef]
39. Gatarek, D.; Jabłecki, J. A nonparametric local volatility model for swaptions smile. J. Comput. Financ. 2018, 21, 35–62. [CrossRef]
40. Heath, D.; Jarrow, R.; Morton, A. Bond pricing and the term structure of interest rates: A new methodology for contingent claims
valuation. Econom. J. Econom. Soc. 1992, 77–105. [CrossRef]
41. Qu, D. Manufacturing and Managing Customer-Driven Derivatives; John Wiley & Sons: Chichester, UK, 2016.
42. Anonymous. Back to the future. Risk 2005, 104–109.
43. Morini, M. Understanding and Managing Model Risk: A Practical Guide for Quants, Traders and Validators; John Wiley & Sons:
Hoboken, NJ, USA, 2011.
44. Volatility and Correlation: The Perfect Hedger and the Fox; John Wiley & Sons: Hoboken, NJ, USA, 2005.
45. Gatarek, D. Modelling without tears. Risk 2000, 20–24.
46. Gatarek, D.; Bachert, P.; Maksymiuk, R. The LIBOR Market Model in Practice; John Wiley & Sons: Hoboken, NJ, USA, 2007;
Volume 408.
47. Hagan, P.S. Methodology for Callable Swaps and Bermudan “Exercise into” Swaptions; Discussion Paper; Bloomberg: New York, NY,
USA, 2000.
48. Tavella, D.; Randall, C. Pricing Financial Instruments: The Finite Difference Method; John Wiley & Sons: Hoboken, NJ, USA, 2000;
Volume 13.
49. Mitchell, A.R.; Griffiths, D.F. The Finite Difference Method in Partial Differential Equations; Wiley: Hoboken, NJ, USA, 1980.
50. Hull, J.C.; White, A.D. Numerical procedures for implementing term structure models I: Single-factor models. J. Deriv. 1994, 2,
7–16. [CrossRef]
51. Craig, I.J.; Sneyd, A.D. An alternating-direction implicit scheme for parabolic equations with mixed derivatives. Comput. Math.
Appl. 1988, 16, 341–350. [CrossRef]
52. Tilley, J.A. Valuing American options in a path simulation model. Trans. Soc. Actuar. 1993, 45, 83–104.
53. Carriere, J.F. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insur. Math. Econ.
1996, 19, 19–30. [CrossRef]
54. Glasserman, P.; Yu, B. Number of paths versus number of basis functions in American option pricing. Ann. Appl. Probab. 2004,
14, 2090–2119.
55. Jackel, P.; Rebonato, R. The link between caplet and swaption volatilities in a Brace-Gatarek-Musiela/Jamshidian framework:
Approximate solutions and empirical evidence. J. Comput. Financ. 2003, 6, 41–60. [CrossRef]
56. Rebonato, R. Interest Rate Options Models; Wiley: Hoboken, NJ, USA, 1998.
57. Lvov, D.N. Monte Carlo Methods for Pricing and Hedging: Applications to Bermudan Swaptions and Convertible Bonds. Ph.D.
Thesis, ISMA Centre University of Reading, Reading, UK, 2005.
58. Svenstrup, M. On the suboptimality of single-factor exercise strategies for Bermudan swaptions. J. Financ. Econ. 2005, 78, 651–684.
[CrossRef]
59. Gatarek, D.; Jabłecki, J. Swap Rate à la Stock: Bermudan Swaptions Made Easy. Wilmott 2020, 2020, 44–51. [CrossRef]

You might also like