Professional Documents
Culture Documents
Διπλωματική Εργασία
Γράφων: Επιβλέπων:
Ιωάννης Κοκοτσάκης Β. Παπαδόπουλος, Καθηγητής Ε.Μ.Π.
Athens
August 31, 2021
ii
Εισαγωγή
Η ”χρονική αξία του χρήματος” και η αβεβαιότητα είναι κεντρικά στοιχειά που επηρεάζουν την
αξία των χρηματοοικονομικών μέσων. Όταν λαμβάνεται υπόψη μόνο η χρονική πτυχή της χρημα-
τοδότησης , τα εργαλεία του λογισμού και των διαφορικών εξισώσεων είναι επαρκείς. Όταν
λαμβάνεται υπόψη μόνο η αβεβαιότητα, το εργαλείο της Θεωρίας Πιθανοτήτων φωτίζει τα
πιθανά αποτελέσματα. Όταν ο χρόνος και η αβεβαιότητα είναι μαζί ξεκινάμε την μελέτη των
μαθηματικών χρηματοοικονομικών. Η χρηματοδότηση είναι η μελέτη της συμπεριφοράς των
οικονομικών παραγόντων στην κατανομή χρηματοοικονομικών πόρων και διαχείρισης κίνδυνων
σε εναλλακτικά χρηματοοικονομικά μέσα και σε χρόνο αβέβαιο. Γνωστά παραδείγματα
χρηματοοικονομικών μέσων είναι τραπεζικοί λογαριασμοί, δάνεια, μετοχές, κρατικά ομόλογα και
εταιρικά ομολόγα. Η μαθηματική χρηματοδότηση χαρακτηρίζατε συχνά ως μελέτη των ποιο
εξελιγμένων χρηματο-οικονομικών μέσων που ονομάζονται παράγωγα. Ένα παράγωγο είναι μια
οικονομική συμφωνία μεταξύ δύο μερών που εξαρτάται από κάτι που συμβαίνει στο μέλλον, για
παράδειγμα την τιμή ή την απόδοση ενός υποκείμενου περιουσιακού στοιχείου. Το υποκείμενο
περιουσιακό στοιχειό θα μπορούσε να είναι ένα απόθεμα, ένα ομόλογο ή ένα νόμισμα.
iii
iv
Summary
The ”time value of money” and uncertainty are the central elements that influence the value of
financial instruments. When only the time aspect of finance is considered, the tools of calculus
and differential equations are adequate. When only the uncertainty is considered, the tools of
probability theory illuminate the possible outcomes. When time and uncertainty are considered
together we begin the study of mathematical finance. Finance is the study of economic agents’
behavior in allocating financial resources and managing risks across alternative financial instru-
ments and in time in an uncertain environment. Known examples of financial instruments are
bank accounts, loans, stocks, government bonds and corporate bonds. Mathematical finance is
often characterized as the study of the more sophisticated financial instruments called deriva-
tives. A derivative is a financial agreement between two parties that depends on something that
occurs in the future, for example the price or performance of an underlying asset. The underlying
asset could for example be a stock, a bond or a currency. In the context of financial derivative
pricing, there is a stage in which the asset model needs to be calibrated to market data. In other
words, the open parameters in the asset price model need to be fitted. This is typically not done
by historical asset prices, but by means of option prices, by matching the market prices of heavily
traded options to the option prices from the mathematical model, under the so-called risk-neutral
probability measure. In the case of model calibration, thousands of option prices need to be de-
termined in order to fit these asset parameters.
v
vi
Ευχαριστίες
Θα ήθελα να ευχαριστήσω τον κ. Βησσαρίων Παπαδόπουλο που μου έδωσε την συγκεκριμένη
διπλωματική εργασία και είχα την ευκαιρία να ασχοληθώ με χρηματοοικονομικά μαθηματικά
και αριθμητικές εφαρμογές τους. Επίσης θα ήθελα να ευχαριστήσω κ. Οδυσσέα Κόκκινο για
την καθοδήγηση και τις συμβουλές του που ήταν κρίσιμες για την εκπόνηση της συγκεκριμένης
εργασίας.
vii
viii
Contents
1 Introductory definitions 1
1.1 Interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Futures and Forwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Theoretical Relationships Between Spot, Forwards and Futures . . . . . . 2
1.2.2 Accounting for Dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.3 No Arbitrage Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Put-Call Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 American Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Optimization algorithms 31
4.1 Genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ix
x CONTENTS
Results 49
A Python codes 51
References 61
List of Figures
1.1 The no arbitrage range for the market price of a financial future . . . . . . . . . . 5
xi
xii LIST OF FIGURES
Introductory definitions
1
2 CHAPTER 1. INTRODUCTORY DEFINITIONS
For instance, (1 + 12 R)2 is the annual compounding factor when interest payments are semi-
annual and R is the 1 year interest rate. In general, if a principal amount N is invested at a
discretely compounded annual interest rate R, which has n compounding periods per year, then
its value after m compounding periods is
R m
V = N (1 + ) .
n
It is worth noting that:
R m
lim (1 + ) = exp(R)
n
And this is why the continuously compounded interest rate takes an exponential form.
Παράδειγμα 1.1.1. Find the value of $500 in 3.5 years’ time if it earns interest of 4% per annum
and interest is compounded semi-annually. How does this compare with the continuously com-
pounded value.
0.04 7
V = 500 × (1 + ) = $574.14
2
But under continuous compounding the value will be grater:
C(t, T )
y(t, T ) = ,
S(t)(T − t)
1.2. FUTURES AND FORWARDS 3
Where C(t, T ) is the present value at time t of the dividends accruing between time t and time
T , including any reinvestment income. Consider a US stock with quarterly dividend payments.
We hold the stock for 6 months, i.e. T = 0.5. Let the current share price be $100 and suppose that
we receive dividends of $2 per share in 1 month’s time and again in 4 months’ time. Assuming
that the dividends are not reinvested, that the 1-month zero coupon interest rate is 4.75% and the
4-month zero coupon rate is 5%, calculate:
ii. The continuously compounded dividend yield over the 6-month period.
We now adjust formula for the fair value of the forward to take account of the dividends that
accrue to the stockholder between now and the expiry of the future. In the general case when
dividends or coupons are paid, the value of strategy (a) is no longer equal to:
Equivalently, the difference between the market price and the fair price of the forward, expressed
as a proportion of the spot price, is:
F (t, T ) − F ∗ (t, T )
x(t, T ) = .
S(t)
Many authors refer as the mispricing of the market price of the forward compared with its fair
value. But, as noted it is usually the futures contract that is most actively traded and which there-
fore serves the dominant price discovery role. Thus it is really the spot that is mispriced relative
to the futures contract, rather than the futures contract that is mispriced relative to the spot. In
4 CHAPTER 1. INTRODUCTORY DEFINITIONS
practice it is only possible to make a profitable arbitrage between the spot and the futures contract
if the mispricing is large enough to cover transactions costs. In commodity markets there can be
considerable uncertainty about the net convenience yield. And in equity markets there can be
uncertainties about the size and timing of dividend payments and about the risk free interest rate
during the life of a futures contract. All these uncertainties can affect the return from holding the
spot asset. As a result there is not just one single price at which no arbitrage is possible. In fact
there is a whole range of futures prices around the fair price of the futures for which no arbitrage
is possible. We call this the no arbitrage range.
Παράδειγμα 1.2.1. We know that on 16 December 2005 the futures had a fair value of 5551.44
based on the spot price of 5531.63. But the 3-month FTSE 100 futures actually closed at 5546.50.
How do you account for this difference?
5546.50 − 5551.44
= −8.9bps
5531.63
But the usual no arbitrage range for the FTSE index is approximately ±25 bps because the trans-
actions costs are very small on in such a liquid market. So the closing market price of the futures
falls well inside the usual no arbitrage range. However, calculations of this type can lead to a
larger mispricing, especially in less efficient markets than the FTSE 100.
• the prices should be contemporaneous. The spot market often closes before the futures,
in which case closing prices are not contemporaneous. It is possible that the futures price
changes considerably after the close of the spot market.
• the fair value of a futures contract is based on the assumption of a zero margin, so that
the fair value of the futures is set equal the fair value of the forward. But in practice the
exchange requires margin payments, and in this case a negative correlation between the
returns on the spot index and the zero coupon curve (up to 3 months) would have the
effect of decreasing the fair value of the futures.
Any apparent ”mispricing” of the futures relative to the spot index should always be viewed with
the above two comments in mind.
1.2. FUTURES AND FORWARDS 5
Figure 1.1: The no arbitrage range for the market price of a financial future
Figure illustrates the no arbitrage range for a 1-year forward on a non-dividendpaying asset
assuming that two-way arbitrage is possible, i.e. that the spot can be sold well as bought. The
subscripts A and B refer to ‘ask’ and ‘bid’ prices or rates. 8 Two no arbitrage strategies are depicted:
1. Receive funds at rB by selling the spot at SB and buying the forward at F B. By the no
arbitrage condition , F B = SB exp(rB).
2. Borrow funds at rA to buy the spot at SA and sell the forward at F A. By the no arbitrage
condition, F A = SA exp(rA).
The no arbitrage range for the forward price is marked on the diagram with the grey doubleheaded
arrow. If the market bid price F M B or the market ask price F M A of the forward lies outside
the no arbitrage range F BF A then we can make a profit as follows:
1.3 Options
An option is a contract that gives the purchaser the right to enter into another contract during a
particular period of time. We call it a derivative because it is a contract on another contract. The
underlying contract is a security such as a stock or a bond, a financial asset such as a currency, or
a commodity, or an interest rate. The option contract that gives the right to buy the underlying
is termed a call option, and the right to sell the underlying is termed a put option. Interest rate
options give the purchaser the right to pay or receive an interest rate, and in the case of swaptions
the underlying contract is a swap.
1.3.1 Foundations
The market price of a liquid traded option at any time prior to expiry is determined by the supply
and demand for the option at that time, just like any other liquid traded asset. But when options
are illiquid the price that is quoted in the market must be derived from a model. The most fun-
damental model for an option price is based on the assumption that the underlying asset price
is log-normally distributed and thus follows a stochastic differential equation called a geometric
Brownian motion. Equivalently, the log price and also the log return are normally distributed and
follow an arithmetic Brownian motion. This section opens with a review of the characteristics of
Brownian motion processes. Then we describe the concepts of hedging and no arbitrage that
lead to the risk neutral valuation methodology for pricing options. We introduce the risk neutral
measure and explain why the asset price is not a martingale under this measure. However, we can
change the numeraire to the money market account, and then the price of a non-dividend paying
asset becomes a martingale. The change of numeraire technique is useful in many circumstances,
for instance when interest rates are stochastic or when pricing an exchange option. We do not
usually price options under the assumption that the price process is a geometric Brownian with
constant volatility. Instead, we use a stochastic volatility in the option pricing model, and some-
times we assume there are other stochastic parameters, and/or a mean reversion in the drift, or a
jump in the price and/or the volatility. Hence the option pricing model typically contains several
parameters that we calibrate by using the model to price liquid options (usually standard Euro-
pean calls and puts) and then changing the model parameters so that the model prices match the
observed market prices. Then, with these calibrated parameters, we can use the model to price
path-dependent and exotic options. However, if we are content to assume the price process is a
geometric Brownian with constant volatility, European exotics have prices that are easy to derive.
• Portfolio II: Buy a European call on the share with strike K and maturity T and lend an
amount equal to the present value of the strike, discounted at the risk free interest rate r.
It is easy to see that both portfolios have the same pay-off when the options mature. If it turns
out that S(T ) > K then both strategies pay-off S(T ) because:
• The put is worth nothing so portfolio I pays S(T), and
• The call is worth S(T )K but you receive K from the pay-back on the loan.
If it turns out that S(T) < K then both strategies pay-off K because:
• The put is worth KS(T ) while the share has value S(T ), and
• The call is worth nothing but you still get K from the loan.
Thus both portfolios have the pay-off max(S(T ), K). This means that the two portfolios
must have the same value at any time t prior to the options expiry date T . Otherwise it would
be possible to make a risk free profit (i.e. an arbitrage) by shorting one portfolio and buying the
other. This no arbitrage argument implies that a standard European call price C(K, T ) and a
standard European put price P (K, T ) of the same strike and maturity on the same underlying
must satisfy the following relationship at any time t prior to the expiry date T :
P (S, t∣K, T ) + S(t) = C(S, t∣K, T ) + exp(−r(T − t))K.
Now suppose the share pays dividends with continuous dividend yield y. Then a similar no
arbitrage argument, but with portfolio I buying exp(y(T t)) shares and reinvesting the dividends
in the share, gives the relationship:
P (S, t∣K, T ) + exp(−y(T − t))S(t) = C(S, t∣K, T ) + exp(−r(T − t))K
Or, equivalently,
C(S, t∣K, T ) − P (S, t∣K, T ) = exp(−y(T − t))S(t) − exp(−r(T − t))K.
The put call relationship implies the following:
• Given the price of a European call it is simple to calculate the fair price of the corresponding
put, and vice versa.
• If market prices do not satisfy this relationship there is, theoretically, an opportunity for
arbitrage. However the trading costs may be too high for it to be possible to profit from the
arbitrage.
The put–call parity relationship can also be used to derive lower bounds for prices of European
options. Since the price of an option is never negative implies that:
C(S, t∣K, T ) ≥ exp(−y(T − t))S(t) − exp(−r(T − t))K
And similarly
P (S, t∣K, T ) ≥ exp(−y(T − t))K − exp(−r(T − t))S(t).
8 CHAPTER 1. INTRODUCTORY DEFINITIONS
If there are no dividends then the right-hand side is the intrinsic value of the option, so it
never pays to exercise a non-dividend paying American call early. It also never pays to exercise an
American call, or a put, on a forward contract provided that we do not have to pay the premium.
This is the case when the options are margined, like futures contracts, and the settlement value
for the option is determined by the difference between the initial and final option prices. When
the option is on the forward price F (t, T ) and the option is margined the put–call parity for
European options becomes:
and
P A(F, t∣K, T ) ≥ P E(F, t∣K, T ) ≥ K − F (t, T ).
so in both cases we never gain by early exercise.
Chapter 2
9
10 CHAPTER 2. MATHEMATICAL MODELS FOR OPTIONS PRICING
As S(0) is the current price of the stock, we may assume it is known. The random variable
√ W (T )
is normally distributed with mean 0 and variance T; this is also the distribution of T Z if Z is a
standard normal random variable (mean 0, variance 1). We may therefore represent the terminal
stock price as
1 √
S(T ) = S(0) exp ([r − σ 2 ]T + σ T Z) . (2.3)
2
The logarithm of the stock price is thus normally distributed, and the stock price itself has a
lognormal distribution.
The expectation E [e−rT (S(T ) − K)+ ] is an integral with respect to the lognormal density
of S(T ). This integral can be evaluated in terms of the standard normal cumulative distribution
function Φ as BS(S(0), σ, T, r, K) with
log(S/K) + (r + 12 σ 2 )T log(S/K) + (r − 21 σ 2 )T
BS(S(0), σ, T, r, K) = SΦ ( √ )−e−rT KΦ ( √ )
σ T σ T
(2.4)
This is the Black-Scholes formula for a call option.
In light of the availability of this formula, there is no need to use Monte Carlo to compute
E [e−rT (S(T ) − K)+ ]. Moreover, we noted earlier that Monte Carlo is not a competitive method
for computing one-dimensional integrals. Nevertheless, we now use this example to illustrate the
key steps in Monte Carlo. From ( 2.3) we see that to draw samples of the terminal stock price S(T )
it suffices to have a mechanism for drawing samples from the standard normal distribution. For
now we simply assume the ability to produce a sequence Z1 , Z2 , . . . of independent standard
normal random variables.
• the payoff of a derivative security may depend explicitly on the values of underlying assets
at multiple dates;
• we may not know how to sample transitions of the underlying assets exactly and thus need
to divide a time interval [0, T ] into smaller subintervals to obtain a more accurate approx-
imation to sampling from the distribution at time T .
Before turning to a detailed example of the first case, we briefly illustrate the second. Consider
a generalization of the basic model ( 2.1) in which the dynamics of the underlying asset S(t) are
given by
dS(t) = rS(t)dt + (S(t))S(t)dW (t). (2.5)
In other words, we now let the volatility σ depend on the current level of S. Except in very special
cases, this equation does not admit an explicit solution of the type in ( 2.2) and we do not have an
exact mechanism for sampling from the distribution of S(T ). In this setting, we might instead
partition [0, T ] into m subintervals of length ∆t = T /m and over each subinterval [t, t + ∆t]
simulate a transition using a discrete (Euler) approximation to ( 2.5) of the form
√
S(t + ∆t) = S(t) + rS(t)∆t + σ(S(t))S(t) ∆tZ,
with Z a standard normal random √ variable. This relies on the fact that W (t + ∆t)W (t) has
mean 0 and standard deviation ∆t. For each step, we would use an independent draw from
the normal distribution. Repeating this for m steps produces a value of S(T ) whose distribution
approximates the exact (unknown) distribution of S(T ) implied by ( 2.5).We expect that as m
becomes larger (so that ∆t becomes smaller) the approximating distribution of S(T ) draws closer
to the exact distribution.
Even if we assume the dynamics in (1.1) of the Black-Scholes model, it may be necessary to
simulate paths of the underlying asset if the payoff of a derivative security depends on the value
of the underlying asset at intermediate dates and not just the terminal value. Asian options are
arguably the simplest path-dependent options for which Monte Carlo is a competitive compu-
tational tool. These are options with payoffs that depend on the average level of the underlying
asset. This includes, for example, the payoff (S̄ − K)+ with
1 m
S̄ = ∑ S(tj ) (2.6)
m j=1
for some fixed set of dates 0 = t0 < t1 < ⋯ < tm = T , with T the date at which the payoff is
received.
To calculate the expected discounted payoff E [e−rT (S(T ) − K)+ ], we need to be able to gen-
erate samples of the average S̄. The simplest way to do this is to simulate the path S(t1 ), . . . , S(tm )
and then compute the average along the path. We saw in ( 2.3) how to simulate S(T ) given S(0);
simulating S(tj + 1) from S(tj ) works the same way:
1 √
S(tj+1 = S(tj ) exp ([r − σ 2 ](tj+1 − tj ) + σ tj+1 − tj Zj+1 ) (2.7)
2
where Z1 , . . . , Zm are independent standard normal random variables. Given a path of values, it
is a simple matter to calculate S̄ and then the discounted payoff e−rT (S(T ) − K)+ .
12 CHAPTER 2. MATHEMATICAL MODELS FOR OPTIONS PRICING
dSi (t)
= µi (S(t), t) dt + σi (S(t), t)⊺ dW o (t), (2.8)
Si (t)
with Wo a k-dimensional Brownian motion, each σi taking values in Rk , and each µi scalar-
valued. We assume that the µi and σi are deterministic functions of the current state S(t) =
(S1 (t), . . . , Sd (t))⊺ and time t, though the general theory allows these coefficients to depend on
past prices as well. Let
Σij = σi⊺ σj , i, j = 1, . . . , d; (2.9)
this may be interpreted as the covariance between the instantaneous returns on assets i and j.
A portfolio is characterized by a vector θ ∈ Rd with θi representing the number of units held
of the ith asset. Since each unit of the ith asset is worth Si (t) at time t, the value of the portfolio
at time t is
θ1 S1 (t) + ⋯ + θd Sd (t),
which we may write as θ⊺ S(t). A trading strategy is characterized by a stochastic process θ(t)
of portfolio vectors. To be consistent with the intuitive notion of a trading strategy, we need to
restrict θ(t) to depend only on information available at t; this is made precise through a measur-
ability condition (for example, that θ be predictable).
If we fix the portfolio holdings at θ(t) over the interval [t, t+h], then the change in value over
this interval of the holdings in the ith asset is given by θi (t)[Si (t + h) − Si (t)]; the change in the
value of the portfolio is given by θ(t)⊺ [S(t+h)−S(t)]. This suggests that in the continuous-time
limit we may describe the gains from trading over [0, t] through the stochastic integral
t
∫ θ(u)⊺ dS(u),
0
subject to regularity conditions on S and θ. Notice that we allow trading of arbitrarily large or
small, positive or negative quantities of the underlying assets continuously in time; this is a con-
venient idealization that ignores constraints on real trading.
A trading strategy is self-financing if it satisfies
t
θ(t)⊺ S(t) − θ(0)⊺ S(0) = ∫ θ(u)⊺ dS(u) (2.10)
0
for all t. The left side of this equation is the change in portfolio value from time 0 to time t and
the right side gives the gains from trading over this interval. Thus, the self-financing condition
states that changes in portfolio value equal gains from trading: no gains are withdrawn from the
portfolio and no funds are added. By rewriting (2.10) as
t
θ(t)⊺ S(t) = θ(0)⊺ S(0) + ∫ θ(u)⊺ dS(u),
0
2.4. BLACK-SCHOLES MODEL 13
we can interpret it as stating that from an initial investment of V (0) = θ(0)⊺ S(0) we can achieve
a portfolio value of V (t) = θ(t)⊺ S(t) by following the strategy θ over [0, t].
Consider, now, a derivative security with a payoff of f (S(T )) at time T ; this could be a
standard European call or put on one of the d assets, for example, but the payoff could also depend
on several of the underlying assets. Suppose that the value of this derivative at time t, 0 ≤ t ≤
T , is given by some function V (S(t), t). The fact that the dynamics in (2.8) depend only on
(S(t), t) makes it at least plausible that the same might be true of the derivative price. If we
further conjecture that V is a sufficiently smooth function of its arguments, Itˆo’s formula gives
d t ∂V (S(u), u)
V (S(t), t) = V (S(0), 0) + ∑ ∫ dSi (u)+
i=1 0 ∂Si
⎡
t ⎢ ∂V (S(u), u)
⎤
1 d ∂ 2 V (S(u), u) ⎥
+∫ ⎢ ⎢ + ∑ i S (u)S j (u) ∑ (S(u), u) ⎥ du, (2.11)
⎥
0 ⎢ ∂u 2 i,j=1 ∂Si ∂Sj ⎥
⎣ ij ⎦
with Σ as in (2.9). If the value V (S(t), t) can be achieved from an initial wealth of V (S(0), 0)
through a self-financing trading strategy θ, then we also have
d t
V (S(t), t) = V (S(0), 0) + ∑ ∫ θi (u)dSi (u). (2.12)
i=1 0
Comparing terms in (2.11) and (2.12), we find that both equations hold if
∂V (S(u), u)
θi (u) = , i = 1, . . . , d, (2.13)
∂Si
and
∂V (S(u), u) 1 d ∂ 2 V (S(u), u)
+ ∑ Σij (S, u)Si Sj = 0. (2.14)
∂u 2 i,j=1 ∂Si ∂Sj
Since we also have V (S(t), t) = θ⊺ (t)S(t), (2.13) implies
d
∂V (S, t)
V (S, t) = ∑ Si . (2.15)
i=1 ∂Si
Finally, at t = T we must have
V (S, T ) = f (S) (2.16)
if V is indeed to represent the value of the derivative security.
with W o a one-dimensional Brownian motion. The second asset (often called a savings account
or a money market account) is riskless and grows deterministically at a constant, continuously
compounded rate r; its dynamics are given by
dβ(t)
= rdt.
β(t)
Clearly, β(t) = β(0)ert and we may assume the normalization β(0) = 1. We are interested in
pricing a derivative security with a payoff of f (S(T )) at time T . For example, a standard call
option pays (S(T ) − K)+ , with K a constant.
If we were to formulate this model in the notation of (2.8), Σ would be a 2 × 2 matrix with
only one nonzero entry, σ 2 . Making the appropriate substitutions, (2.14) thus becomes
∂V 1 2 2 ∂ 2 V
+ σ Σ = 0. (2.17)
∂t 2 ∂S 2
Equation (2.15) becomes
∂V ∂V
V (S, β, t) = S+ β. (2.18)
∂S ∂β
These equations and the boundary condition V (S, β, T ) = f (S) determine the price V .
This formulation describes the price V as a function of the three variables S, β, and t. Because
β depends deterministically on t, we are interested in values of V only at points (S, β, T ) with
β = ert . This allows us to eliminate one variable and write the price as tildeV (S, t) = V (S, ert , t).
Making this substitution in (2.17) and (2.18), noting that
∂ Ṽ ∂V ∂V
= rβ +
∂t ∂β ∂t
the mean is a deterministic function of time, so the mean of the asset price distribution at some
fixed future point in time is just the risk free return from now until that time, less any benefits or
plus any costs associated with holding the underlying. But some pricing models assume that the
drift term has a mean-reverting component and/or that the interest rate or the dividend yields
are stochastic. Making a parameter into a risk factor introduces other parameters to the option
pricing model, because these are required to describe the distribution of this new risk factor. For
instance, the parameters of a typical stochastic volatility model include the spot volatility, the
long run volatility, the rate of volatility mean reversion, the volatility of volatility, and the price–
volatility correlation. In addition, we could include a volatility risk premium. When volatility
is assumed to be the only stochastic parameter there are two risk factors in the option pricing
model, the price and the volatility. To price the option we need to know their joint distribution at
every point in time, unless it is a standard European option in which case we only need to know
their joint distribution at the time of expiry. The easiest way to do this is to describe the evolution
of each risk factor in continuous time using a pair of stochastic differential equations. The model
parameters are the parameters in the price and volatility equations, and the price–volatility corre-
lation. The model parameters are calibrated by matching the model prices of standard European
options to the observable market prices of these options. There are usually very many standard
European options prices that are quoted in the market but an option pricing model typically has
only a handful of parameters. Hence, the model prices will not match every single market price
exactly. The calibration of parameters is therefore based on a numerical method to minimize a
root mean square error between the model price and the market price of every standard European
option for which there is a market price. More precisely, we can calibrate the model parameters
λ by applying a numerical method to the optimization problem:
√
2
min ∑ w(K, T ) (f m (K, T ) − f (K, T ∣λ))
λ K,T
where f m (K, T ) and f (K, T ∣λ) denote the market and model prices of a European option with
strike K and expiry time T , w(K, T ) is a weighting function (for instance, we might take it to
be proportional to the option gamma, to place more weight on short term near at-the-money
options) and the sum is taken over all options with liquid market prices. Most of the trading is
on options that are near to at-the-money. The observed prices of deep in-the-money and out-of-
the-money options may be stale if the market has moved since they were last traded. Therefore
it is common to use only those options that are within 10% of the at-the-money to calibrate the
model. Once calibrated the option pricing model can be used to price exotic and path-dependent
options. Traders may also use the model to dynamically hedge the risks of standard European
options, although there is no clear evidence that this would be any better than using the Black–
Scholes–Merton model for hedging.
2.6 Volatility
A set of standard European options of different strikes and maturities on the same underlying
has two related volatility surfaces, the implied volatility surface, and the local volatility surface.
When we use the market prices of options to derive these surfaces we call them the market implied
16 CHAPTER 2. MATHEMATICAL MODELS FOR OPTIONS PRICING
volatility surface and the market local volatility surface. When we use option prices based on a
stochastic volatility model we call them the model implied volatility surface and the model local
volatility surface. Implied volatility is a transformation of a standard European option price. It is
the volatility that, when input into the Black–Scholes–Merton (BSM) formula, yields the price of
the option. In other words, it is the constant volatility of the underlying process that is implicit in
the price of the option. For this reason some authors refer to implied volatility as implicit volatility.
The BSM model implied volatilities are constant. And if the assumptions of the BSM model were
valid then all options on the same underlying would have the same market implied volatility.
However, traders do not believe in these assumptions, hence the market prices of options yield
a surface of market implied volatilities, by strike (or moneyless) and maturity of the option, that
is not flat. In particular, the market implied volatility of all options of the same maturity but
different strikes has a skewed smile shape when plotted as a function of the strike (or moneyless)
of the options. This is called the (market) volatility smile. And the market implied volatility of
all options of the same strike (or moneyless) but different maturities converges to the long term
implied volatility when plotted as a function of maturity. This is called the (market) term structure
of implied volatility. An implied volatility is a deterministic function of the price of a standard
European option (the transformation that is implicitly specified by the BSM formula). Thus a
dynamic model of implied volatility is also a dynamic model of the option price and hedge ratios.
If we can forecast market implied volatility successfully then we can also forecast the market price
of the option and hedge the option accurately. Hence, practitioners spend considerable resources
on developing models of market implied volatility dynamics.
BSM model so that the price process is no longer assumed to follow a geometric Brownian mo-
tion with constant volatility. The aim of these models is to price OTC options that may have
exotic and/or path-dependent pay-offs so that their prices are consistent with the market prices
of standard European options. This precludes the possibility of arbitrage between an exotic op-
tion and a replicating portfolio of standard calls and puts. Hence, the market prices of European
options are used to calibrate the model parameters. We may calibrate the model by changing the
parameters so that the model prices of European options are set equal to, or as close as possible to,
the market prices. Alternatively, the model’s parameters may be calibrated by fitting the model
implied volatilities to the market implied volatility smile.
√
2
min ∑ w(K, T ) (θm (K, T ) − θ(K, T ∣λ))
λ K,T
Where θm (K, T ) and θ(K, T ∣λ) denote the market and model implied volatilities of a European
option with strike K and expiry time T and w(K, T ) is some weighting function that ensures
that more weight is placed on the options with more reliable prices.
dS(t)
= (r − y)dt + σ(t)dW1 (t),
S(t) (2.19)
dσ(t) = α(σ, t)dt + β(σ, t)dW2 (t)
The property (2.20) implies a non-zero constant price–volatility correlation ρ between the log
returns on the underlying and the changes in volatility.
When volatility is stochastic there are two sources of risk that must be hedged, the price risk
and the volatility risk. In this subsection we set up a risk free portfolio just as in the constant
volatility case, but now the portfolio contains three assets: the underlying asset with price S and
two options on S. The second option is needed to hedge the second source of uncertainty. In
the following we write the price of a general claim on S as g(S, σ), thus making explicit its de-
pendence on the volatility σ but ignoring the other variables and parameters that may affect the
option price.
We set up a portfolio with price Π consisting of a short position of one unit in an option with
price g1 (S, σ), a long position of δ1 units of the underlying with price S and a position of δ2 units
18 CHAPTER 2. MATHEMATICAL MODELS FOR OPTIONS PRICING
in another option on S. Denoting the price of the second option by g2 (S, σ), we may write down
the price of the portfolio as
To eliminate both sources of uncertainty we must choose δ1 and δ2 in such as way that the price
remains constant with respect to changes in both the price S and the volatility σ.
Differentiating (2.21) with respect to S and σ and setting to zero gives two first order condi-
tions that are easily solved, namely
ΠS (S, σ) = 0 ∶
g1S (S, σ) = δ1 + δ2 g2S (S, σ) ⇒ δ1 = g1S (S, σ) − δ2 g2σ (S, σ)
and
Πσ (S, σ) = 0 ∶
g1σ (S, σ) (2.22)
g1σ (S, σ) = δ2 g2σ (S, σ) ⇒ δ2 =
g2σ (S, σ)
Substituting these values for δ1 and δ2 back into (2.21) yields
g1σ (S, σ)
Π(S, σ) = (S − 1) (g1 (S, σ) − g2S (S, σ)) (2.23)
g2σ (S, σ)
This portfolio will be risk free so it must earn a net return equal to the risk free rate, r. Since
holding the portfolio may earn income such as dividends or incur costs such as carry costs at the
rate y, the portfolio’s price must grow at the rate ry. We assume both r and y are constant and set
dΠ(S, σ)
= (r − y)dt. (2.24)
Π(S, σ)
Now, just as in the constant volatility case, we apply Itô’s lemma to (2.23) using the approxi-
mation that δ1 and δ2 are unchanged over a small time interval dt. Then we use the resulting total
derivative in (2.23) and (2.24) to derive a PDE that must be satisfied by every claim. However, we
have only one condition (2.24) from which to derive the values of two unknowns, i.e. g1 (S, σ)
and g2 (S, σ). So there are infinitely many solutions.
We resolve this problem by introducing a parameter corresponding to a premium that in-
vestors demand for holding the risky asset called volatility. This is called the volatility risk pre-
mium or the market price of volatility risk and is here denoted λ. The resulting PDE for the price
of the claim may be written
gt + (r − y)SgS + 1/2σ 2 S 2 gSS {(α − λβ)gσ + αρσSgSσ + 1/2β 2 gσσ } = (r − y)g, (2.25)
where g can be either g1 (S, σ) or g2 (S, σ) and we have dropped the dependence on variables
from our notation for simplicity. If volatility were constant the term
would be zero and the PDE would reduce to the BSM PDE. When volatility is not constant the
general solution to (2.25) may be written in the form of the stochastic volatility model (2.19).
The presence of a volatility risk premium in the stochastic volatility PDE (2.25) indicates that
we are in an incomplete market. That is, it is not possible to replicate the value of every claim
with a self-financing portfolio. Then the price of a claim is not unique. Different investors have
different claim prices depending on their risk attitude. However, it is possible to complete the
market by adding all options on a tradable asset, so that we can observe the price of the second
option instead of having to solve for it. In this case, there should be no volatility risk premium.
We shall see below that the volatility risk premium appears as a parameter in the drift term
of the volatility or variance diffusion in parametric stochastic volatility models. For instance, in
the Heston model the volatility risk premium affects the rate of mean reversion. When the price–
volatility correlation is positive, most investors have a positive volatility risk premium and the
mean reversion speed is slow, and when the price–volatility correlation is negative, most investors
have a negative volatility risk premium and mean reversion is rapid. To see why this is intuitive,
suppose that we are modelling the equity index price and volatility processes with the Heston
model. The price–volatility correlation in equity indices is large and negative, so when the index
falls volatility is high. Suppose all investors have the same negative volatility risk premium – which
means they like to hold volatility. Then after a market fall investors will buy the index, because
of the high volatility it adds to their portfolio, so the index price rises again and volatility will
come down very quickly, i.e. the mean reversion rate will be rapid. But if investors had the same
positive volatility risk premium, they would not buy into high volatility and therefore it would
take longer to mean revert.
When we calibrate a stochastic volatility model to market data on option prices we often
assume the market has been completed (by adding all options on S to the tradable assets) and so
the volatility risk premium is zero. If we do not assume it is zero, we usually find that the volatility
risk premium is negative, i.e. that investors appear to like to hold volatility. This makes sense, for
the same reason as the variance risk premium that is calculated from the market prices of variance
swaps is usually negative. Investors are usually prepared to accept low or even negative returns for
holding volatility because the negative correlation between prices and volatility makes volatility
a wonderful diversification instrument. Hence, returns on volatility do not need to be high for
volatility to be an attractive asset.
20 CHAPTER 2. MATHEMATICAL MODELS FOR OPTIONS PRICING
Chapter 3
Machine learning (ML) is the study of computer algorithms that can improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning
algorithms build a model based on sample data, known as ”training data”, in order to make predic-
tions or decisions without being explicitly programmed to do so. Machine learning algorithms
are used in a wide variety of applications, such as in medicine, email filtering, speech recogni-
tion, and computer vision, where it is difficult or unfeasible to develop conventional algorithms
to perform the needed tasks.
A feedforward neural network is an artificial neural network where connections between the
nodes do not form a cycle. The feedforward neural network was the first and simplest type of
artificial neural network devised. Back-Propagation is the algorithm that is used to train.
21
22 CHAPTER 3. MACHINE LEARNING METHODS
Σταθερά
'637-. @&1!&/
Σταθερά
Δεδομένα
#/01"
!"
'23#450()*+,. !&/
!&"
!"
#"
Συνάρτηση
!&$
!"
#$ Ενεργοποιήσης
'C:,3B7,3?)0
D+):,3?)0.
A B&
Αποτέλεσμα
'E+,*+,.
Δεδομένα
A
Σ φ( )
!"
'()*+,-.
A 9&
A Άθροισμα
'8+%%3)<0
>+):,3?)0.
!&%
!"
#%
Βάρη
'89)7*,3:0
;43<=,-.
• Input and output layers. The network can also have one or more hidden layers of neurons.
• Each neuron has its activation function. Usually activation functions are non-linear differ-
entiable functions.
• The layers of the network (input, hidden and output) are connected. Can be fully connected
or partially.
23
(Input Layer) (First Hidden Layer) (Second Hidden Layer) (Output Layer)
_oE_-µ -0-8-0 -1--0-3--9
Επίπεδο Πρώτο Κρυφό Δεύτερο Κρυφό Επίπεδο
Δεδομένων Επίπεδο Επίπεδο Αποτελεσμάτων
.
,
24 CHAPTER 3. MACHINE LEARNING METHODS
Back-Propagation algorithm
Let us consider the following unconstrained optimization problem: Find the matrix or vector
w that minimizes the following function E = E(w) which is known as cost function or energy
function. Given a set of m input-target pairs (inp, tar) ∈ Rn ×Rk . The above is known as training
data set in machine learning the first coordinate is the input space and the second the desired
output space. Back-propagation is the most commonly used algorithm to train the neural network
that represents the relationship that connects the two spacesRn → Rk . During the training of
neural network we want to find a class of parameters called weights w that are the solutions of the
bellow minimization problem
1 m
E(w) = ∑ Ei (w) (3.1)
m i=1
. Ei (w) is the total instantaneous error energy of the i training pair is given by Ei (w) = ∑kj=1 Ei,j (w)
where Ei,j (w) is the instantaneous error energy of the i training pair of the j output neuron given
by Ei,j (w) = 12 ei,j (w)2 where ei,j (w) is the error signal of the j output neuron of the neural net-
work ei,j (w) = tari,j − outj (inpi , w) where inpi ∈ Rn is the input of the i training pair, tari,j
is the desired output for the j neuron of the network for the i training pair, outj is the output of
the j neuron.
In each iteration of the algorithm the weights of the network are given by wjk t+1
= wjk
t
+ ∆wjk t
0
with initial weights wjk small random numbers and j the neuron number from where the signal is
generated. It is then multiplied by the weight wjk and after adding up to the corresponding other
products, it ends up as an entrance to the neuron k of the next layer of the network. ∆wt = ηdt
with η learning rate and dt the search direction of the t epoch (the t iteration of the algorithm is
called epoch). dt uses the gradient of the energy function E t (w) which is calculated with back-
∂E t
propagation as ∂w t = δk outj where δk is the local gradient given by:
jk
⎧
⎪
⎪−ek ϕ(yk ) if neuron k is an output neuron
δk = ⎨ (3.2)
⎪
⎩∑L δL wkL ϕ(yk ) if neuron k is a hidden layer’s or input neuron
⎪
where ϕ is the activation function and yk is the input of the k neuron. Algorithms that use uni-
versal network status information, such as e.g. the direction of the total vector of weight renewal,
are the universal methods (global techniques). On the contrary, local strategies are based on
specialized information about weights, such as e.g. the behavior of the partial derivative of a
given weight. The second category of methods is more closely related to the concept of the dis-
tributed processing neural network where the calculations are done independently of each other.
In addition, in many applications, local strategies achieve faster and more reliable prediction than
universal methods despite the fact that they use less information.
• Newton’s method: dt = −H(wt )−1 ∇E(wt ), where ∇E(wt ) is the gradient and H(wt ) is
the Hessian matrix of the vector function E(wt ).
The convergence properties of the previous algorithms depend on the properties of the first and
/ or second derivative of the function to be optimized. For example, the algorithms of steep de-
scent and conjugate vector gradients determine the search direction. Based on the first derivative,
their convergence rate depends indirectly on the properties of the second derivative. Accordingly,
Newton’s method requires the first derivative and the Hessian matrix to determine the search ad-
dress.
Quickprop Algorithm
This method consists of a training algorithm developed by Fahlman and based in part on Newton’s
method. The Quickprop algorithm is widely used to train neural networks. The repeated renewal
of the weights is done on the basis of the estimation of the position of the minimum for each
weight. The weights are renewed according to the next relationship
∂E t
t
∂wjk
t
∆wjk = ∂E t−1 ∂E t
t−1
∆wjk (3.3)
∂wjkt − t
∂wjk
The computational cost of training is significantly improved compared to that of global methods.
Rprop Algorithm
Another local training algorithm developed by Riedmiller and Braun is the Rprop algorithm (ab-
breviation for Resilient back propagation). The weight are renewed in each iteration by the fol-
lowing relationship:
⎛ ∂E t ⎞
∆wjkt
= −ηjk
t
sgn (3.4)
⎝ ∂wjk
t
⎠
where
⎧
⎪ t−1
, ηmax ) ∂E t−1
⋅ ∂E t
>0
⎪
⎪
⎪
min(aηjk ∂wjkt t
∂wjk
⎪
⎪ t−1
t
= ⎨max(βηjk ∂E t
ηjk t−1
, ηmin ) ∂E
⋅ <0
⎪
⎪
⎪
∂wjkt t
∂wjk
⎪
⎪
⎪ t−1
⎩ηjk
with a = 1.2, β = 0.5, ηmax = 50 και ηmin = 0.001. It is noteworthy that, unlike other algorithms,
the Rprop method uses the sign rather than the absolute size of some derivatives. Whenever the
26 CHAPTER 3. MACHINE LEARNING METHODS
partial derivative of a weight wjk sign changes, which proves that the previous change was large
and the algorithm bypassed a local minimum, the coefficient ηjk is decreases by β. If the partial
derivative retains its sign, the rate ηjk increases slightly in order to accelerate convergence. The
coefficients of change ηjk are usually bounded to avoid numerical problems.
3.2.1 Background
Tree-based methods partition the feature space into a set of rectangles, and then fit a simple model
(like a constant) in each one. They are conceptually simple yet powerful. We first describe a
popular method for tree-based regression and classification called CART, and later contrast it
with C4.5, a major competitor.
Let’s consider a regression problem with continuous response Y and inputs X1 and X2 , each
taking values in the unit interval. The top left panel of Figure ?? shows a partition of the feature
space by lines that are parallel to the coordinate axes. In each partition element we can model
Y with a different constant. However, there is a problem: although each partitioning line has a
simple description like X1 = c, some of the resulting regions are complicated to describe.
Figure 3.3
3.2. TREE-BASED METHODS 27
Figure 3.4: Partitions and CART. Top right panel shows a partition of a two-dimensional feature
space by recursive binary splitting, as used in CART, applied to some fake data. Top left panel
shows a general partition that cannot be obtained from recursive binary splitting. Bottom left
panel shows the tree corresponding to the partition in the top right panel, and a perspective plot
of the prediction surface appears in the bottom right panel.
To simplify matters, we restrict attention to recursive binary partitions like that in the top
right panel of Figure (3.4). We first split the space into two regions, and model the response by
the mean of Y in each region. We choose the variable and split-point to achieve the best fit. Then
one or both of these regions are split into two more regions, and this process is continued, until
some stopping rule is applied. For example, in the top right panel of Figure (3.4), we first split at
X1 = t1 . Then the region X1 ≤ t1 is split atX2 = t2 and the region X1 > t1 is split at X1 = t3 .
Finally, the region X1 > t3 is split at X2 = t4 . The result of this process is a partition into the
five regions R1 , R2 , . . . , R5 shown in the figure. The corresponding regression model predicts Y
with a constant cm in region Rm , that is,
5
fˆ(X) = ∑ cm I {(X1 , X2 ) ∈ Rm } . (3.5)
m=1
This same model can be represented by the binary tree in the bottom left panel of Figure (3.4).
The full dataset sits at the top of the tree. Observations satisfying the condition at each junction
are assigned to the left branch, and the others to the right branch. The terminal nodes or leaves
of the tree correspond to the regions R1 , R2 , . . . , R5 . The bottom right panel of Figure ?? is a
perspective plot of the regression surface from this model. For illustration, we chose the node
means c1 = 5, c2 = 7, c3 = 0, c4 = 2, c5 = 4 to make this plot.
A key advantage of the recursive binary tree is its interpretability. The feature space partition
is fully described by a single tree. With more than two inputs, partitions like that in the top right
panel of Figure (3.4) are difficult to draw, but the binary tree representation works in the same
28 CHAPTER 3. MACHINE LEARNING METHODS
way. This representation is also popular among medical scientists, perhaps because it mimics the
way that a doctor thinks. The tree stratifies the population into strata of high and low outcome,
on the basis of patient characteristics.
M
fˆ(X) = ∑ cm I (x ∈ Rm ) . (3.6)
m=1
If we adopt as our criterion minimization of the sum of squares ∑(yi − f (xi ))2 , it is easy to
see that the best ĉm is just the average of yi in region Rm :
1. Bias: This error is caused by unrealistic assumptions. When bias is high, the ML algo-
rithm has failed to recognize important relations between features and outcomes. In this
situation, the algorithm is said to be “underfit.”
2. Variance: This error is caused by sensitivity to small changes in the training set. When
variance is high, the algorithm has overfit the training set, and that is why even minimal
changes in the training set can produce wildly different predictions. Rather than modeling
the general patterns in the training set, the algorithm has mistaken noise with signal.
3. Noise: This error is caused by the variance of the observed values, like unpredictable changes
or measurement errors. This is the irreducible error, which cannot be explained by any
model.
can be decomposed as
2
⎛ ⎞
E [(yi − f [xi ]) ] = ⎜E [f [xi ] − f [xi ]]⎟
⎜
2
⎟ + V [f [xi ]] + σε
ˆ ˆ ˆ 2
¯
⎝´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶⎠ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ noise
bias variance
An ensemble method is a method that combines a set of weak learners, all based on the same
learning algorithm, in order to create a (stronger) learner that performs better than any of the
individual ones. Ensemble methods help reduce bias and/or variance.
1 N 1 N ⎛N ⎞ 1 N ⎛ 2 N ⎞
V[ ∑ ϕi [c]] = 2 ∑ ∑ σi,j = 2 ∑ σi + ∑ σi σj ρi,j
N i=1 N i=1 ⎝j=1 ⎠ N i=1 ⎝ j≠i ⎠
⎛ ⎞
⎜ ⎟
⎜ ⎟
1 N ⎜⎜ 2
N ⎟ σ̄ 2 + (N − 1)σ̄ 2 ρ̄
⎟ (3.8)
= 2 ∑ ⎜σ̄ + ∑ σ̄ ρ̄ ⎟ =
2
N i=1 ⎜
⎜ j≠i
⎟
⎟ N
⎜ ⎟
´¹¹ ¹ ¸¹¹ ¹ ¹ ¶ ⎟
⎜
⎝ =(N −1)σ̄ 2 ρ̄⎠
for a fixed i
1 − ρ̄
= σ̄ 2 (ρ̄ + )
N
−1 N
i=1 σ̄ = ∑i=1 σ̄i ⇔ σ̄ = N
where σi,j is the covariance of predictions by estimators i, j; ∑N 2 N 2 2
∑i=1 σ̄i 2 ;
−1 N
and ∑j≠i σ̄ ρ̄ = ∑j≠i σi σj ρi,j ⇔ ρ̄ = (σ̄ N (N − 1) ∑j≠i σi σj ρi,j .
N 2 N 2
30 CHAPTER 3. MACHINE LEARNING METHODS
Optimization algorithms
31
32 CHAPTER 4. OPTIMIZATION ALGORITHMS
• The highest ranking solution’s fitness is reaching or has reached a plateau such that succes-
sive iterations no longer produce better results
• Manual inspection
Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for
numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for nu-
merical optimization of non-linear or non-convex continuous optimization problems. They be-
long to the class of evolutionary algorithms and evolutionary computation. An evolutionary
algorithm is broadly based on the principle of biological evolution, namely the repeated interplay
of variation (via recombination and mutation) and selection: in each generation (iteration) new
individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic
way, of the current parental individuals. Then, some individuals are selected to become the par-
ents in the next generation based on their fitness or objective function value f (x). Like this, over
the generation sequence, individuals with better and better f -values are generated.
In an evolution strategy, new candidate solutions are sampled according to a multivariate
normal distribution in R⋉ . Recombination amounts to selecting a new mean value for the distri-
bution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise
dependencies between the variables in the distribution are represented by a covariance matrix.
The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this
distribution. This is particularly useful if the function f is ill-conditioned.
Adaptation of the covariance matrix amounts to learning a second order model of the un-
derlying objective function similar to the approximation of the inverse Hessian matrix in the
quasi-Newton method in classical optimization. In contrast to most classical methods, fewer as-
sumptions on the nature of the underlying objective function are made. Only the ranking between
candidate solutions is exploited for learning the sample distribution and neither derivatives nor
even the function values themselves are required by the method.
4.2. COVARIANCE MATRIX ADAPTATION EVOLUTION 33
4.2.1 Algorithm
In the following the most commonly used (µ/µw, λ)-CMA-ES is outlined, where in each iteration
step a weighted combination of the µ best out of λ new candidate solutions is used to update the
distribution parameters. The main loop consists of three main parts:
Bayesian optimization is a sequential design strategy for global optimization of black-box func-
tions that does not assume any functional forms. It is usually employed to optimize expensive-
to-evaluate functions.
Bayesian optimization is typically used on problems of the form maxx∈A f (x), where A is
a set of points whose membership can easily be evaluated. Bayesian optimization is particularly
advantageous for problems where f (x) is difficult to evaluate, is a black box with some unknown
structure, relies upon less than 20 dimensions, and where derivatives are not evaluated.
Since the objective function is unknown, the Bayesian strategy is to treat it as a random func-
tion and place a prior over it. The prior captures beliefs about the behavior of the function. After
gathering the function evaluations, which are treated as data, the prior is updated to form the
posterior distribution over the objective function. The posterior distribution, in turn, is used to
construct an acquisition function (often also referred to as infill sampling criteria) that determines
the next query point.
4.3. BAYESIAN OPTIMIZATION 35
37
38 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
Our objective is to compare the different methods for option pricing. We split the above
dataset into training and test dataset (with ratio 80%-20%). One approach is to use the Stochastic
Models described in Chapter 3 (BSM and Heston Model). Another approach is to use Machine
Learning models (ANN and RF). We will compare the above methods by using the mse (mean
square error) on test dataset.
1 N 2
∑ (yi − ŷi ) (5.1)
N i=1
where, yi is the real option price and ŷi the predicted option price.
1 N ∗ 2
∑ (Cn − Cn (p))
mod
min (5.2)
p N n=1
where, Cn∗ and Cnmod are the market price and the model price of the nth option, respectively.
p is the parameter set provided as input to the option pricing model. We will use the methods
of Chapter 4 to solve the above problem.Those algorithms can solve optimization problems with
black box functions.In our case we want to minimize the mse on training dataset.
5.2.1 BSM
Consider now the Black-Scholes-Merton model in its dynamic form, as described by the stochas-
tic differential equation (SDE). Here, Zt is a standard Brownian motion. The SDE is called a
geometric Brownian motion. The values of St are lognormally distributed and the (marginal)
returns dSSt normally.
t
Figure 5.3: Parameters, upper and lower bounds for the BSM model
Figure 5.5: Results for the BSM model calibration with the GA algorithm
40 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
Figure 5.6: Convergence for the BSM model calibration with the GA algorithm
Figure 5.7: Results for the BSM model calibration with the CMAES algorithm
5.2. CALIBRATION OF BSM AND HESTON MODEL 41
Figure 5.8: Convergence for the BSM model calibration with the CMAES algorithm
Figure 5.9: Results for the BSM model calibration with the bayesian optimization algorithm
5.2.2 Heston
One of the major simplifying assumptions of the Black-Scholes-Merton model is the constant
volatility. However, volatility in general is neither constant nor deterministic; it is stochastic.
42 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
Therefore, a major advancement with regard to financial modeling was achieved in the early 1990s
with the introduction of so-called stochastic volatility models. One of the most popular models
that fall into that category is that of Heston (1993), which is presented in (5.4)
√
dSt = rSt dt + νt St dZt1
√
dνt = κν (θν − νt ) dt + σν νt dZt2 (5.4)
dZt1 dZt2 = ρ
The meaning of the single variables and parameters can now be inferred easily from the dis-
cussion of the geometric Brownian motion and the square-root diffusion. The parameter ρ rep-
resents the instantaneous correlation between the two standard Brownian motions Z1t , Z2t . This
allows us to account for a stylized fact called the leverage effect, which in essence states that volatil-
ity goes up in times of stress (declining markets) and goes down in times of a bull market (rising
markets).
For the Heston model we have six parameters V0 , κV , θV , σV , ρ, r
Figure 5.10: Parameters, upper and lower bounds for the heston model
Figure 5.12: Results for the Heston model calibration with the GA algorithm
Figure 5.13: Convergence for the Heston model calibration with the GA algorithm
44 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
Figure 5.14: Results for the heston model calibration with the CMAES algorithm
Figure 5.15: Convergence for the heston model calibration with the CMAES algorithm
5.3. MACHINE LEARNING 45
Figure 5.16: Results for the Heston model calibration with the bayesian optimization algorithm
For the machine learning models we can use more features than the three input features of BSM
and Heston model (S0 initial stock price, K strike, T time of maturity in years). For this reason
we have created the following extra features:
46 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
For the above six combinations we will use the following parameters in grid search:
• Number of epochs
• Learning rate
• Drop-out rate
• Activation functions
Figure 5.18: Parameters, upper and lower bounds for the NN model
• Max depth:
48 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
• Number of estimators
• Max samples
• Max features
Figure 5.20: Parameters, upper and lower bounds for the RF model
• From the above we can see that Machine Learning Models have slightly better results.
• For the stochastic Models we see that BSM Model gave about equivalent results. We would
expect Heston Model to outperform the BSM Model and the reason that is not happening
is because we are using the same number of iterations for both Models.
• We were forced to apply the above regarding iterations due to extremely heavy computa-
tional cost of the Heston Model. Normally would need three times the number of iterations
of the BSM Model (since one has two parameters and the other has six).
• Regarding optimization algorithms (GA, CMA-ES and Bayesian optimization) the better
the results the heavier the computational cost.
• Neural Network performs better with extra features and two hidden layers.
• Random Forest performed better with three features ( stock price, time of maturity, strike)
Random Forest gave the best results from the methods we used.
49
50 CHAPTER 5. APPLICATIONS WITH REAL OPTIONS DATA
Appendix A
Python codes
51
52 APPENDIX A. PYTHON CODES
1. Abate, J., Choudhury, G., and Whitt, W. (1999) An introduction to numerical transform in-
version and its application to probability models in Computational Probability, W.Grassman,
ed., Kluwer Publisher, Boston.
3. Acworth,P., Broadie, M., and glasserman, P. (1998) A comparison of some Monte Carlo
and quasi Monte Carlo methods for option pricing , pp.1-18 in Monte-Carlo and Quasi-
Monte Carlo Methods 1996, P.Hellekalek, G.Larcher, H. Niederreiter, and P.Zintehof, eds.,
Springer-Verlag, Berlin.
4. Ahrens, J.H., and Dieter, U.(1974) Computer methods for sampling from the gamma, beta,
Poisson, and binomial distributions.
6. Alexander, C. and Barbosa, A. (2008) Hedging exchange traded funds. Journal of Banking
and Finance 32(2), 326–337.
7. Alexander, C. and Nogueira, L. (2004) Hedging with stochastic and local volatility. ICMA
Discussion Papers in Finance 2004-11.
8. Alexander, C. and Nogueira, L. (2008) Stochastic local volatility. ICMA Discussion Papers
in Finance DP2008-02.
9. Andersen,L., and Broadie, M.(2001) A primal-dual simulation algorithm for pricing multi-
dimensional American options, working paper, Columbia Business School, New York
10. Anderson, T.W (1984) An Indroduction to Multivariate Statistical Analysis, Second Edi-
tion, Wiley, New York.
11. Bishop, Christopher M. (1995). Neural networks for pattern recognition. Clarendon Press.
61
62 REFERENCES
13. Carol Alexander –Market Risk Analysis-Pricing, Hedging and Trading Financial Instra-
ments
14. Cecchetti, S.G., Cumby, R.E. and Figlewski, S. (1988) Estimation of optimal futures hedge.
Review of Economics and Statistics 70.
16. Cox, J., Ingersoll, J. and Ross, S. (1985) A theory of the term structure of interest rates.
Econometrica 53
17. Hansen, N. (2006), ”The CMA evolution strategy: a comparing review”, Towards a new
evolutionary computation. Advances on estimation of distribution algorithms, Springer,
pp. 1769–1776
18. Haykin, Simon S. (1999). Neural networks : a comprehensive foundation. Prentice Hall
19. Jonas Mockus: On Bayesian Methods for Seeking the Extremum. Optimization Techniques
1974: 400-404
20. Paul Glasserman – Monte Carlo Methods in Financial Engineering (Stochastic Modeling
and Aplied Probability) (v.53)- Springer (2003).
21. Rudin, C. and K. L. Wagstaff (2014) “Machine learning for science and society.” Machine
Learning, Vol. 95, No. 1, pp. 1–9.
22. Schmitt, Lothar M. (2001). ”Theory of Genetic Algorithms”. Theoretical Computer Sci-
ence. 259 (1–2): 1–61
23. Shoshani, A. and D. Rotem (2010): “Scientific data management: Challenges, technology,
and deployment.” Chapman & Hall/CRC Computational Science Series. CRC Press.
24. Snir, M. et al. (1998): MPI: The Complete Reference. Volume 1, The MPI-1 Core. MIT
Press.
25. Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009) The Elements of Statistical Learn-
ing, Springer-Verlag