Professional Documents
Culture Documents
Dr Carolyn Phelan
1 Introduction 5
1.1 How to use this primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Trigonometric identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Other useful equations: factorials, the Gamma function and binomial coef-
ficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Differentiation 8
2.1 Basic rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Common equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Chain rule and product rule . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.4 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Uses of differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Maxima and minima of functions . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Taylor and Maclaurin series . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Taylor series approximations . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 L’Hôpital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
3.6 Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Integration 43
5.1 Basic rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Common equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Integration by substitution . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.3 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.4 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7 Multivariate calculus 55
7.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.3 Basics rules of differentiation for multivariate functions . . . . . . . . . . . . 56
7.3.1 Product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3.3 Clairaut’s theorem (simplified) . . . . . . . . . . . . . . . . . . . . . 57
7.4 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4.1 Taylor series and differentials . . . . . . . . . . . . . . . . . . . . . . 57
7.5 Integration in two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5.1 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.5.2 Differentiation of integrals: Leibniz’s integral rule . . . . . . . . . . . 59
7.5.3 Change of variables in a double integral . . . . . . . . . . . . . . . . 60
8 Complex numbers 62
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Polar notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2.1 Euler’s formula and Euler’s identity . . . . . . . . . . . . . . . . . . 64
8.3 Logarithms of complex and negative numbers . . . . . . . . . . . . . . . . . 66
8.4 Complex differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3
8.4.1 Cauchy-Riemann equations . . . . . . . . . . . . . . . . . . . . . . . 67
8.5 Complex integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4
Chapter 1
Introduction
This primer is intended to provide a guide to the mathematical knowledge that you need
to be successful in your MSc modules. You should all have seen most of the contents of
this course this before. However, in order to get the most out of your other courses, it
needs to be familiar to you. This is the starting point that your other lecturers will expect
you to be at and you should be comfortable with these concepts in an exam setting
This course will provide you with explanations and worked examples which you can
use if you come across things that you do not understand in your modules. However it will
be of even more benefit if you study it in advance of taking the modules so that you are
prepared beforehand. Certainly, you should study the information in Section 1.2 in such
detail that you know it without having to continuously reference it during your study for
your other courses.
1.2 Basics
Before we get started, here are a list of things that you should pretty much know in your
sleep. They cover basic relationships between log and exponents and are the sort of things
5
that students can get muddled up with under exam conditions.
ea+b = ea eb (1.1)
eab = (ea )b = (eb )a (1.2)
log a + log b = log ab (1.3)
a
log a − log b = log (1.4)
b
b
log a = b log a (1.5)
logd a
logb a = (1.6)
logd b
a = elog a (1.7)
ar = er log a (1.8)
The sign of the term b2 − 4ac determines whether your roots are real (positive), complex
(negative) or repeated (zero).
There are very many trig identities; here are a few that you should be familiar with:
6
1.4 Other useful equations: factorials, the Gamma func-
tion and binomial coefficients
The factorial of a non-negative integer n is n! = 1 × 2 × 3..... × n, with the convention that
0! = 1 and the recurrence relationship n! = n(n − 1)!. In your probability courses you are
also likely to come across the Gamma function
Z ∞
Γ(α) = xz−1 e−x dx, (1.18)
0
which can be conceptualised as the generalisation of the factorial function to any complex
number. For α equal to a positive integer, Γ(α) = (α − 1)! and it has a similar recurrence
relation to the factorial for any value of α, i.e. αΓ(α) = Γ(α + 1). We prove this recurrence
relation for the second example in Section 5.1.3.
The binomial coefficient has several applications in mathematics and probability and one
way to think of it is the number of ways to choose k elements from a set of n elements. It
is is defined as
n n!
= , (1.19)
k (n − k)!k!
and it has are two useful recurrence relations.
One is multiplicative:
n+1 n n+1 n! (n + 1)! n+1
= = = . (1.20)
k+1 k k + 1 (n − k)!k! (n − k)!(k + 1)! k+1
7
Chapter 2
Differentiation
To denote the derivatives more succinctly Lagrange or Newtonian notation may also be
used. Lagrange notation, also called prime notation, denotes the derivative with respect
to x using 0 , i.e.
df d2 f d3 f
= f 0, = f 00 , = f 000 . (2.3)
dx dx2 dx3
For fourth order derivatives and above, 0 is no longer used and is instead replaced by the
order of differentiation in brackets, e.g.
d4 f
= f (4) . (2.4)
dx4
Newtonian notation denotes the derivative with respect to time using a dot over the
function, i.e. for y = f (t),
df d2 f d3 f ...
= f˙, = f¨, = f. (2.5)
dt dt2 dt3
This notation becomes unwieldy when we are working with fourth order derivatives and
above. However, it can be useful, especially when looking at something like the heat
equation where we wish to distinguish between derivatives with respect to x and derivatives
with respect to t.
8
2.1 Basic rules
This can be shown simply by substituting f (x) = g(x) + h(x) and f (x + ∆x) = g(x +
∆x) + h(x + ∆x) into Eq. (2.1) and separately grouping the terms for g(·) and h(·).
df (x)
= anxn−1 . (2.7)
dx
Example 1:
df (x)
f (x) = 3x2 , = 6x. (2.8)
dx
Example 2:
1 df (x) 1
f (x) = , = − 2. (2.9)
x dx x
df (x)
For exponentials f (x) = ex , dx = ex .
For trigonometric functions there are several identities that are useful and you should
know these by heart.
df (x)
f (x) = cos x, = − sin x, (2.10)
dx
df (x)
f (x) = sin x, = cos x. (2.11)
dx
Chain rule gives a formula for the derivative of nested functions f (g(x)) as
Example:
f (x) = ax , a>0 (2.13)
9
First define b such that a = eb and then f (x) = ebx so we can define g(x) = bx and
f (g(x)) = eg(x) , then
Product rule gives a formula for the derivative of the product of functions f (x) = u(x)v(x)
as
df (x) dv(x) du(x)
= u(x) + v(x) . (2.15)
dx dx dx
u(x)
Quotient rule can be derived from chain rule and product rule, for f (x) = v(x)
1
df (x) d v(x) 1 du(x)
= u(x) +
dx dx v(x) dx
1
d v(x) dv(x) 1 du(x)
= u(x) +
dv dx v(x) dx
1 dv(x) 1 du(x)
= −u(x) 2
+
v(x) dx v(x) dx
v(x) du(x) dv(x)
dx − u(x) dx
= . (2.16)
v(x)2
Example:
Using quotient rule and the trigonometric identities we can find the derivative of tan x
sin x
tan x = so u(x) = sin x, v(x) = cos x
cos x
du(x) dv(x)
= cos x, = − sin x
dx dx
d tan x sin2 x + cos2 x 1
=⇒ = 2
= = sec2 x (2.17)
dx cos x cos2 x
Chain and product rules also exist for higher order derivatives
Example:
The chain rule for a second order derivative is obtained using the first order product and
chain rules
d2 f (g(x))
d df dg
=
dx dx dg dx
df d dg dg d df dg
= +
dg dx dx dx dg dg dx
df d2 g d2 f dg 2
= + (2.18)
dg dx2 dg 2 dx
The product rule for higher order derivatives is obtained by repeatedly applying the first
10
order product rule
Example:
The product rule for a second order derivative is
d2 u(x)v(x)
d dv(x) du(x)
2
= u(x) + v(x)
dx dx dx dx
2
d v(x) du(x) dv(x) d2 u(x)
= u(x) + 2 + v(x) . (2.19)
dx2 dx dx dx2
In fact the product rule can be generalised to any higher derivative using the binomial
coefficient, i.e. ni = i!(n−i)!
n!
, to
n
dn u(x)v(x) X n dk u(x) dn−k v(x)
= (2.20)
dxn k dxk dxn−k
k=0
If we define y = f (x) then implicit differentiation describes the technique whereby rather
dy dy dx −1
directly it can be easier to calculate dx
than calculating dx dy and then calculate dx = dy ,
also substituting in y = f (x) to obtain the required result in terms of x.
Example 1:
dx
y = log x, so x = ey , = ey
dy
dy 1 1
=⇒ = y = . (2.21)
dx e x
This result can be combined with chain rule to give the derivative of the log of a function,
i.e.
d log f (x) 1 df (x)
= (2.22)
dx f (x) dx
Example 2:
dx
y = arcsin x, so x = sin y, = cos y
dy
dy 1 1 1
=⇒ = =p =√ . (2.23)
dx cos y 2
1 + sin y 1 + x2
Example 3:
dx 1
y = arctan x, so x = tan y,
=
dy cos2 y
dy 2
cos y 1 1
=⇒ = cos2 y = 2 2 = 2 = 2 . (2.24)
dx cos y + sin y tan y + 1 x +1
11
2.2 Uses of differentiation
The calculation of derivatives has very many applications within financial mathematics
and applied mathematics as a whole and covering them all here is well beyond the scope
of this course. However, we run through a couple of the most important ones that you
need to be familiar with.
One of the most important uses of derivatives is finding the local maxima and minima of
functions. This is explained in more detail in Section 3.2 of Chapter 3.
An analytic function f (x) can be described by its Taylor series expansion around a
1 00 1 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + f 000 (a)(x − a)3 + f (4) (a)(x − a)4 + · · · .
2! 3! 4!
(2.25)
∞
X f (n) (a)
f (x) = (x − a)n , (2.26)
n!
n=0
then by repeatedly differentiating and setting x = a we can obtain the values of the
coefficients in terms of the derivatives of f (x) and a power of x − a
f (a) = c0
f 0 (x) = c1 + 2c2 (x − a) + 3c3 (x − a)2 + 4c4 (x − a)3 + · · ·
f 0 (a) = c1
f 00 (x) = 2c2 + 2 · 3c3 (x − a) + 3 · 4c4 (x − a)2 + · · ·
f 00 (a) = 2c2
f 000 (x) = 2 · 3c3 + 2 · 3 · 4c4 (x − a) + · · ·
f 000 (a) = 3!c3 (2.28)
12
and so on. A special case of the Taylor series is when a = 0, this is also referred to as a
Maclaurin series and is
∞
X f (n) (0)
f (x) = xn , (2.29)
n!
n=0
Example 1:
Exponential function: use the Maclaurin series to expand around 0.
f (x) = ex ,
⇒ f (n) (x) = ex , ∀n = 1, 2, 3, ...
f (n) (0) = 1, forn = 0, 1, 2, 3, ...
x2 x3 x4
f (x) = 1 + x + + + + ··· (2.30)
2 3! 4!
Example 2:
Log function: use the Taylor series to expand log x around 1.
In general
1 2! 3!
log x = 0 + 1 · (x − 1) − (x − 1)2 + (x − 1)3 + (x − 1)4 + · · ·
2 3! 4!
1 1 1
= (x − 1) − (x − 1) + (x − 1) − (x − 1)4 + · · ·
2 3
(2.33)
2 3 4
Example 3:
13
Log function: use the Maclaurin series to expand log(x + 1) around 0.
1 2! 3!
log(x + 1) = x − x2 + x3 − x4 + · · · (2.35)
2 3! 4!
Example 4:
1
Geometric series: use the Maclaurin series to expand 1−x around 0
1
f (x) = , f (0) = 1,
1−x
−1 · −1
f 0 (x) = , f 0 (0) = 1,
(1 − x)2
−1 · −2
f 00 (x) = , f 00 (0) = 2,
(1 − x)3
−1 · −3 · 2
f 000 (x) = , f 000 (0) = 3!,
(1 − x)4
−1 · −4 · 3!
f (4) (x) = , f 000 (0) = 4!. (2.36)
(1 − x)5
∞
1 1 1 1 1 X 1
= 1 + + 2 + 3 + 4 + ··· = (2.37)
1−x x x x x xn
n=0
We can use the Taylor and Maclaurin series in order to approximate functions in with a
few polynomial terms and this also allows us to make predictions about the error of these
approximations.
Example:
14
Approximate sin x up to the fifth order term using the Taylor series.
For for small x i.e. such that the value of xn decreases with n we can approximate sin x
by
x3 x5
sin x ≈ x − + . (2.39)
3! 5!
The error of this approximation is O(x7 ). Note that although we cut the series off after
the 5th order term the error is not O(x6 ) as the even order terms in the Taylor series for
sin x are always 0. The notation O(xn ) is called Landau notation and we discuss this in
further detail in Chapter 4 but in this context it means that as x → 0, the error of the
approximation reduces at a rate of xn .
We can look at the general first and second order approximations for using the Taylor
series. In the following we replace x − a with ∆x to give the first order approximation as
∞
X ∆x2
f (n) (a) (2.41)
n!
n=2
f 00 (a)∆x2
f (x) = f (a) + f 0 (a)∆x + . (2.42)
2
The error orders for the two approximations are an upper bound. For cases where we can
show that some of the terms in the Taylor series are always zero, such as in the example
above, we may be able to specify a better error bound.
15
2.2.4 L’Hôpital’s rule
In the case where we have a quotient whose value is undefined, i.e. the numerator and
denominator are both zero or both infinite we can use L’Hôpital’s rule to determine the
value. This states that where
f (x)
is not defined, but
g(x) x=c
f 0 (x)
is defined, then
g 0 (x) x=c
f 0 (x)
f (x)
lim = 0 . (2.43)
x→c g(x) g (x) x=c
Example 1:
sin x
Evaluate x at x = 0.
d sin x dx
At x = 0 sin x = 0 and so the quotient is undefined. However dx = cos x and dx = 1 so
sin x cos x 1
lim = = = 1. (2.44)
x→0 x 1 x=0 1
Example 2:
ex
Evaluate x at x = ∞.
dex dx
At x = ∞ ex = ∞ and so the quotient is undefined. However dx = ex and dx = 1 so
ex ex
∞
lim = = = ∞. (2.45)
x→∞ x 1 x=∞
1
16
Chapter 3
When we wish to understand the behaviour of a function f (x) there are several different
aspects that we can examine such as turning points, asymptotes, poles and convexity to
gain more of an insight into it’s properties.
The roots of an equation are values of x for which f (x) = 0. For quadratic equations, i.e.
equations of the form
f (x) = ax2 + bx + c (3.1)
we can use the standard formula for the roots of a quadratic equation in Eq. (1.9). However
in the case of a polynomial of order greater than 2 we need to use other methods in order
to find the roots.
For polynomial functions with rational roots we can use polynomial division (also known
as synthetic division). For example, if we wish to find the rational roots of
then we can use polynomial division. The first thing we must do is come up with an
Ansatz for one root, i.e. (αx + γ). The first thing we say is that α must be a factor of 2,
i.e. 1 or 2, and γ must be a factor of 6, i.e. 1, 2, 3, or 6. We can then try dividing by this
and see if it produces a remainder. For our first ansatz we choose 2x + 3 and dividing this
using polynomial (also known as synthetic) division. Dividing f (x) by our factor 2x + 3
17
we obtain
5
x2 + 4x + 2 (3.3)
2x3 + 11x2 + 17x + 6
2x + 3
− 2x3 − 3x2
8x2 + 17x
− 8x2 − 12x
5x + 6
15
− 5x − 2
3
− 2
where the final line of 0 means that we have no remainder and therefore 2x + 1 is a root
of f (x) and x = − 12 is a root. Moreover, from Eq. (3.4)we can see that the other roots of
f (x) are the roots of x2 + 5x + 6. Using Eq. (1.9) to find the roots of x2 + 5x + 6 gives
the factors of f (x) as
The method in Section 3.1.1 assumed that at least one of the roots of the polynomial were
rational. This is not necessarily true of all or even some of the roots of a polynomial,
therefore we also describe some iterative methods to find the roots of functions.
Newton’s method Probably the most well known iterative method for finding the roots
of an equation is Newtons method. This requires knowledge of the derivative of f (x), f 0 (x)
18
and uses the following iterative relationship
f (xn )
xn+1 = xn − , (3.6)
f 0 (xn )
until the value of f (xn ) is closer to zero than some predefined tolerance. For example, to
find the roots of the function from Section 3.1.1, i.e.
we first calculate
f 0 (x) = 6x2 + 22x + 17, (3.8)
to obtain
2x3 + 11x2 + 17x + 6
xn+1 = xn − (3.9)
6x2 + 22x + 17
Starting with x0 = 0 we have
x0 = 0,
x1 = −0.3529,
x2 = −0.4814,
x3 = −0.4996,
x4 = −0.5000, (3.10)
which gives us the root − 12 in only 4 steps. However this is only one root so we must
choose other starting places to find the other two roots. For example
x0 = −1.5,
x1 = −2.1,
x2 = −1.9949,
x3 = −2.000 (3.11)
and
x0 = −3.5,
x1 = −3.1667,
x2 = −3.0284,
x3 = −3.0011,
x4 = −3.0000. (3.12)
19
Secant method Newton’s method assumes that we can easily find the derivative of
the function. In the cases that this is not easily done then we can use the secant method
where we replace the derivative with its finite difference approximation. This gives our
iterative procedure as
xn − xn−1
xn+1 = xn − f (xn ) . (3.13)
f (xn ) − f (xn−1 )
In order to use this we must first select two starting values. For the example in the previous
section, we select x0 = 0 and x1 = −0.2 and we have
x0 = 0,
x1 = −0.2,
x2 = −0.4032,
x3 = −0.4766,
x4 = −0.4978,
x5 = −0.4999,
x6 = −0.5000. (3.14)
In general the convergence of the secant method is slightly slower than Newton’s method.
d2 f
> 0, maximum, (3.15)
dx2
d2 f
< 0, minimum. (3.16)
dx2
20
40
30
20
10
y
-10
y=f(x)
-20 y=f'(x)
y=f''(x)
-30
21
30
20
10
0
y
-10
y=f(x)
-20 y=f'(x)
y=f''(x)
-30
-3 -2 -1 0 1 2 3
x
Figure 3.2: We plot f (x) = x3 , f 0 (x) = 3x2 and f 00 (x) = 6x. We can see that f 0 (x) = 0
occurs for the same value of x as f 00 (x) = 0 and although, the curve becomes parallel to
the x-axis at x = 0, it remains non-decreasing for the entire domain of x.
The inflection point, i.e. the value of x where f 00 (x) = 0 also has significance if we
consider the concept of the convexity and concavity of functions. A concave function is
one where for all α ∈ (0, 1)
That is the function of an average of two points is less than the average of
the function at those two points. Conversely a convex function is one where for all
α ∈ (0, 1)
Functions can also be strictly concave or convex, in which case ≤ and ≥ in Eqs. (3.17) and
(3.18) are replaced by their respective strict inequalities. Concave and convex functions
are illustrated in Figure 3.3.
Much of mathematical finance concerns calculating the maximum of a function (e.g.
return) or the minimum of a function (e.g. risk). Therefore as convex functions only have
a single global minimum (i.e. there are no local minima) and concave functions only have a
single global maximum, it is important to be able to show whether functions are concave
or convex if we are carrying out optimisation. This is related to the second derivative
f 00 (x) so that if f 00 (x) ≥ 0 ∀xthe function is convex and if f 00 (x) ≤ 0 ∀x the function is
concave.
Functions may also have convex and concave regions, the point where a function moves
22
14 2
f(x)
12
f(x1 )+(1- )f(x2 )
1.5
10
8 1
f(x)
f(x)
6
0.5
4
0 f(x)
2 f(x1 )+(1- )f(x2 )
0 -0.5
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1 0 1 2
x x
Figure 3.3: Convex and concave functions. The left hand plot shows the convex function
f (x) = ex and αf (x1 ) + (1 − α)f (x2 ) for x1 = −1, x2 = 2 and α varying from 0 to 1. The
2
√ plot shows the concave function f (x) = 2 − x and αf (x1 ) + (1 − α)f (x2 ) for
right hand
x1 = − 2, x2 = 1 and α varying from 0 to 1. Notice that the straight line lies above f (x)
in case of the convex function and below f (x) in the case of the concave function.
from a convex to a concave region is the inflection point, i.e. where f 00 (x) = 0. This is
illustrated in Figure where we show how αf (x1 ) + (1 − α)f (x2 ) is above or below f (x)
depending on where we are relative to the inflection point.
10 40
5 20
0 0
y1
y2
-5 -20
y 1 =f(x)
y 1 = f(-0.8)+(1- )f(1)
-10 y 1 = f(1.25)+(1- )f(3) -40
y 2 =f''(x)
-15 -60
-1 -0.5 0 0.5 1 1.5 2 2.5 3
x
Figure 3.4: This figure shows f (x) = 3x3 − 10x2 + 4x + 4 and αf (x1 ) + (1 − α)f (x2 )
for values of x above and below the inflection point plotted against the left hand y-axis.
Notice how the function is concave for values of x below the inflection point and convex
for values of x above the inflection point. The second order derivative f 00 (x) = 18x − 20 is
plotted against the right hand side y-axis and shows that the inflection point corresponds
to the value of x where f 00 (x) = 0.
Asymptotes are defined as a line or a curve which approaches a given function arbitrarily
closely. We will look at horizontal, vertical, oblique and curvilinear asymptotes separately.
23
3.4.1 Horizontal asymptotes
Probably the situation where you have most frequently come across asymptotes is in the
context of horizontal asymptotes, i.e. where a function approaches a constant as x → +∞
or x → −∞. A function can have at most two horizontal asymptotes and can approach
2x2 +3
them from above or below. Figure 3.5 shows an example for f (x) = x2 +1
and we can
see that the horizontal asymptote for both x → −∞ and x → +∞ is 2 and the function
approaches this value from above. There are several straightforward rules governing the
3.5
f(x)=(2x2 +3)/(x2 +1)
f(x)=2
3
f(x)
2.5
1.5
-6 -4 -2 0 2 4 6
x
2x2 +3
Figure 3.5: The function f (x) = x2 +1
and its horizontal asymptote of f (x) → 2 as
x → ±∞.
Rational functions If f (x) is of the form of a ratio between two polynomials of the
form
axp + bxp−1 + · · · + c
f (x) = (3.19)
dxq + exq−1 + · · · + f
then if p ≤ q there is a horizontal asymptote as x → ±∞. If p = q then the asymptote is
a
d and if p < q the asymptote is equal to zero. The p = q case is illustrated in Figure 3.5
above where the asymptote is 2.
Exponential functions If f (x) is of the form αg(x) , where g(x) is a rational function,
then it is straightforward to determine if one or two horizontal asymptotes exist. If the
axp +bxp−1 +···+c
rational function g(x) = dxq +exq−1 +···+f
is such that p < q then there are two asymptotes
as x → ∞ and x → −∞ and these are both equal to 1. If p > q then asymptotes may exist
p
depending on the values of p, q, a and d. For odd values of q then there is one asymptote
a a
as x → −∞ if d > 0 or one asymptote as x → ∞ if d < 0; these are both equal to zero.
24
p a
For even values of q then asymptotes only exist if d < 0 and in these cases there are two
asymptotes at x → ±∞ and these are both equal to 0. For the final case of p = q there
a
are two asymptotes at x → ±∞ and these are both equal to α d . Examples of these cases
are illustrated in Figure 3.6. Notice that the left hand plot also has a black vertical line
at x = 0 this denotes the existence of a vertical asymptote which are covered in Section
3.4.2.
20
20 2 2
f(x)=e(2x +3)/(x +1)
f(x)
f(x)
f(x)
2
5
1.5 10
1
0
0.5
-5 5
-20 -10 0 10 20 -3 -2 -1 0 1 2 3 -8 -6 -4 -2 0 2 4 6 8
x x x
Vertical asymptotes (also often referred to as poles) of a function are defined as a vertical
line x = x0 where at least one of the following is true:
(3.22)
Here x → x−
0 indicates that the value of x is approaching the constant x0 from the left and
x → x+
0 indicates that it is approaching it from the right. Unlike horizontal asymptotes
where there can be at most two, there can be an infinite number of vertical asymptotes.
In general, a function has a vertical asymptote when it can be expressed as a fraction and
the denominator is equal to zero.
Rational functions If f (x) is of the form of a ratio between two polynomials of the
form
axp + bxp−1 + · · · + c
f (x) = (3.23)
dxq + exq−1 + · · · + f
then, in general, vertical asymptotes exist at values of x where the denominator is zero,
i.e. at the roots of dxq + exq−1 + · · · + f (we will cover the exception to this rule below).
25
For example, for
x2 − 2x − 3
f (x) = (3.24)
x3 − x2 − 4x + 4
we can factorise the denominator into x3 − x2 − 4x + 4 = (x − 1)(x + 2)(x − 2) which
suggests we should have vertical asymptotes at x = −2, 1, 2. This is shown in Figure 3.7,
notice that the function has a different value depending on which direction (left or right)
that they are approached from, e.g.
(3.27)
20
f(x)=(x2 -2x-3)/(x 3 -x 2 -4x+4)
10
f(x)
-10
-20
-3 -2 -1 0 1 2 3
x
x2 −2x−3
Figure 3.7: The function f (x) = x3 −x2 −4x+4
and its vertical asymptotes (marked in red)
of x = −2, x = 1 and x = 2.
We explore the effect of the direction of approach and the number of asymptotes in more
1
detail using the example f (x) = x3 −3x+2
. The denominator is a third order polynomial
so we would expect the function to have at most 3 vertical asymptotes. We have plotted
f (x) and the denominator in Figure 3.8 and we can notice several things. Firstly there
are only two asymptotes at x = −2, 1 rather than the three we might expect. Secondly,
for the asymptote at x = 1 we notice that the f (x) tends towards +∞ regardless of the
direction of approach, conversely we can observe that there is a change of sign either side
of the asymptote at x = −2. These can be linked back to way that the denominator can
be separated into its factors. In this case x3 − 3x + 2 = (x − 1)2 (x + 2), i.e. we have a
double root at x = 1 which is the reason why we only have two asymptotes rather than
26
one. Moreover we can also see from Figure 3.8 that the double root coincides with the
minima of a function and therefore the sign of the denominator is unchanged either side of
x = 1. As the sign of the denominator does not change, the sign of f (x) does not change
either side of the asymptote and the f (x) tends towards +∞ regardless of whether we
approach from the left or right. Contrast this with the behaviour of the denominator close
to the other asymptote at x = −2. On the left hand side of x = −2 the denominator is
less than zero, on the right hand side it is greater than zero, therefore the sign of f (x) is
different on the left hand and right hand side of the asymptote.
10 10
5 5
f(x)
f(x)
0 0
-5 -5 f(x)=x3 +3x+2
f(x)=1/(x3 +3x+2)
-10 -10
-4 -2 0 2 4 -3 -2 -1 0 1 2 3
x x
Figure 3.8: Vertical asymptotes f (x) = x3 − 3x + 2 = (x − 1)2 (x + 2); the left hand figure
shows the behaviour of f (x) close to its asymptotes at x = −2 and x = 1 and the right
hand figure shows the behaviour of the denominator around its roots.
We saw that functions have fewer asymptotes than the polynomial order of their de-
nominators in the case of a double root. However, there is another situation where we
may have fewer asymptotes than we might expect looking at the order of the denominator
alone. For example consider
x2 + 2x − 3
f (x) = (3.28)
x3 − x2 − 4x + 4
which is plotted in Figure 3.9. Looking at the order of the denominator we might expect
there to be three vertical asymptotes at x = −2, x = 1 and x = 2 because x3 −x2 −4x+4 =
(x−1)(x+2)(x−2). However we can see from Figure 3.9 that there are only two asymptotes
at x = ±2. We can understand this by finding the roots of the numerator as well as the
(x−1)(x+3) x+3
denominator so that f (x) = (x−1)(x+2)(x−2) = (x+2)(x−2) , i.e. the root of the denominator
x−1 is also a root of the numerator so they cancel out leaving only two vertical asymptotes.
Trigonometic functions The functions of sin x and cos x have periodic zeros at π/2+
kπ and kπ respectively for integer k. Therefore, any function which divides by one of these
will have vertical asymptotes anywhere where the denominator is zero (and not cancelled
27
20
f(x)=(x2 +2x-3)/(x3 -x 2 -4x+4)
15
10
5
f(x)
0
-5
-10
-15
-4 -3 -2 -1 0 1 2 3 4
x
x2 −2x−3
Figure 3.9: The function f (x) = x3 −x2 −4x+4
and its vertical asymptotes (marked in red)
of x = ±2.
sin x
f (x) = tan x = (3.29)
cos x
which is plotted in Figure 3.10. It can be seen that there are regularly spaced asymptotes
at x = π/2 + kπ where k. These coincide with the values of x where cos x = 0. Note also
that as there are a countably infinite number of values of x where cos x = 0, then tan x
has a countably infinite number of vertical asymptotes.
20
10
tan x
-10
-20
-8 -6 -4 -2 0 2 4 6 8
x
Figure 3.10: The function f (x) = tan x, notice its vertical asymptotes at x = π/2 + kπ,
where k is an integer.
28
Log functions It is well known that for a given function f (x), then the plot of y =
f −1 (x) is a mirror image of y = f (x) about the line x = y. Therefore if f (x) is an injective
(i.e. one to one) function with horizontal asymptotes then we can f −1 (x) has vertical
asymptotes. An example of this is f (x) = ex which has a single horizontal asymptote at
0 as x → −∞. Therefore g(x) = f −1 (x) = log x has a vertical asymptote at x where
f (x) → −∞. This is illustrated in Figure 3.11.
10
5
f(x)
f(x)=ex
-5 f(x)=log(x)
f(x)=x
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
x
Figure 3.11: The functions f (x) = ex and f (x) = log x, notice the way that the two
functions are mirrored around f (x) = x and that the horizontal asymptote of f (x) = ex
corresponds to the vertical asymptote of f (x) = log x.
2x2 + 3x + 1
f (x) = (3.30)
x+5
where p = 2 and q = 1. This function is displayed in Figure 3.12 along with a diagonal
line which is an asymptote for f (x) as x → ±∞; there is also an asymptote at x = 5. In
29
60
40
20
0
f(x)
-20
-40
f(x)=(2x2 +3x+1)(x+5)
-60 f(x)=2x-7
x=-5
-80
2
Figure 3.12: The function f (x) = 2x x+5
+3x+1
. Notice that we have a vertical asymptote at
x = −5 an oblique asymptote of 2x − 7 as x ± ∞.
order to find the asymptote we divide the numerator by the denominator, i.e.
2x − 7 (3.31)
2x2 + 3x + 1
x+5
− 2x2 − 10x
− 7x + 1
7x + 35
36
So far we have solely looked at continuous functions. However, we may also be required to
analyse functions with jumps, particularly CDFs of discrete random variables. A simple
example of a function with a jump is
1 x < 1
f (x) = (3.32)
2 1 ≤ x,
and this is plotted in Figure 3.13. It is particularly important to understand the rela-
tionship between the use of ≤ and < and the influence this has on the behaviour of the
function. The function described in Eq. (3.32) and plotted in Figure 3.13 is described
as càdlàg which is short for “continue à droite, limite à gauche”. This means that the
function is continuous to the right and limited to the left, i.e. for every value of x there
is a value of x to the left of it where the function is continuous, whereas there are values
30
of x where you cannot move to the left without encountering a jump. For example, say
you are at x = 0.9 then f (x) = 1, however you can also move right to x = 0.99, for
example and f (x) remains equal to 1. Furthermore, you can increase the value of x to
0.999, 0.9999 and 0.99999, etc., and the function remains continuous. In contrast if we
are at x = 1 then we cannot reduce the value of x at all without encountering the jump.
Notice the way that f (x) is plotted with respect to the continuous and limited sides of
the discontinuity: on the continuous side, we have an open circle and on the limited side
we have a closed circle. Note that the opposite situation is described as càglàd, short for
“continue à gauche, limite à droite”.
2.5
2
f(x)
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
Figure 3.13: A càdlàg function with a jump at x = 1. Notice that the continuous side of
the function is denoted with an open circle and the limited side of the function is denoted
with a closed circle.
For continuous random variables, the cumulative distribution function (CDF) is continu-
ous. However for discrete random variables the CDF has jumps; in this section we see how
the definition of CDFs means that they are always càdlàg for discrete random variables.
The definition of the CDF for a random variable X is
notice the use of ≤ in the above equation as this is significant for the right continuity of
the function. In order to see that the definition in Eq. (3.33) automatically leads to a
càdlàg function we use the example of a fair 6-sided dice as an example. The probability
1
mass function is P (X = x) = 6 for x ∈ {1, 2, 3, 4, 5, 6}. As before, in order to understand
the continuity of the function we can consider what happens when we are either side of
31
1
a discontinuity. For example let x = 1.9 then P (X ≤ 1.9) = 6 and we can increase the
value of x to 1.99, 1.999, 1.9999 and the CDF function remains equal to 61 . However if we
2
are at x = 2 then P (X ≤ 2) = 6 and we cannot reduce our value of x without the CDF
1
changing to a value of 6.
Using the techniques in this chapter we can make sketch graphs of functions in order to
visualise their characteristics. We should take the following steps
1. Find the value where the function crosses the y-axis, i.e. set x = 0 and evaluate.
2. Find the points where the function crosses the x-axis. For monotonic functions with
an inverse f −1 (x) we can use f −1 (0). For example if f (x) = log x then we can find
the point it crosses the x-axis with f −1 (0) = e0 = 1. For quadratic equations we
use Eq. (1.9) and for higher order polynomials the direct and iterative techniques
described in Section 3.1 can be employed. For more complex functions we can also
use the iterative methods described in Section 3.1.2.
3. Using the methods described in Section 3.2 find the turning points of the function
and identify if they are minima or maxima.
4. Identity the horizontal/vertical and oblique asymptotes and from which direction
the function approaches them.
Note that these are the things that must be taken considered, they do not necessarily all
exist for all functions.
Example 1: Sketch the graph for
x
f (x) = . (3.34)
1−x
Following the steps above
0
1. f (0) = 1−0 so the function crosses the y-axis at x = 0, y = 0
2. To find the points where a rational function crosses the x-axis we only need examine
the numerator in this case x which clearly has a single root of x = 0
1
f 0 (x) = . (3.35)
(1 − x)2
This only becomes zero at x = ±∞ and so there are no local crossing points
32
x
4. As x → ±∞, f (x) → −x = −1 so a horizontal asymptote of f (x) = −1 exists for
both −∞ and +∞. We can see from Eq. (3.35) that the gradient is positive for all
values of x so the function approaches the asymptote from above as x → −∞ and
below as x → +∞. The denominator of f (x) is zero at x = 1 and so we have a
vertical asymptote there. Again, as the gradient is always positive, we can see that
f (x) → +∞ as x → 1− and f (x) → −∞ as x → 1+ .
A plot of the function is shown on the left hand side of Figure 3.14.
Example 2: Sketch the graph for
2. Using the polynomial division method described in Section 3.1.1 we can produce
guesses for roots at x = ±1, ±3, ±5. If we pick x = 3 we can attempt to divide f (x)
by (x − 3) for
x2 − 6x + 5 (3.37)
x3 − 9x2 + 23x − 15
x−3
− x3 + 3x2
− 6x2 + 23x
6x2 − 18x
5x − 15
− 5x + 15
0
This divides perfectly with no remainder and we can use Eq. (1.9) to factorize x2 −
6x + 5 = (x − 1)(x − 5) so the three roots of f (x) are 1, 3and5. Thus f (x) = 0 at
x = 1, x = 3 and x = 5.
Thus for x = 3 + √2
3
f 00 (x) > 0 and this point is a minimum and for x = 3 − √2
3
f 00 (x) < 0 and so this point is a minimum. The values of f (x) at the turning points
are f (3 + √2 ) = 3.0792 and f (3 − √2 ) = 3.0792
3 3
33
4. As x ± ∞ f (x) ± ∞ so there are no horizontal asymptotes and as there are no values
of |x| < ∞ where f (x) → ∞, there are also no vertical asymptotes.
A plot of the function is shown on the right hand side of Figure 3.14.
10 15
f(x)=x/(1-x) f(x)=x3 -9x 2 +23x-15
x=1
10
f(x)=-1
5
5
f(x)
f(x)
0 0
-5
-5
-10
-10 -15
-10 -5 0 5 10 0 1 2 3 4 5 6
x x
Figure 3.14: Plots of the functions from the curve sketching examples 1 and 2 above.
The left hand side plot shows f (x) = x/(1 − x) and the right hand plot shows f (x) =
x3 − 9x2 + 5x − 15.
34
Chapter 4
So far, when solving equations f (x) for x we have only considered functions with an
equality, for example x2 − 1 = 0 gives the solution x = −1, 1. However we may also come
across functions written as inequalities, e.g.
x2 + 3x − 2 > 0 or (4.1)
x−1 ≤ e−x , (4.2)
where the solution is the value, or range of values, of x that solve the equation. The
inequality is written as < or > for strict inequalities where the two sides of the equa-
tion cannot be equal and ≤ or ≥ for “less than or equal” and “greater than or equal”
respectively.
In the same way that we can solve equations by performing the the same operations on
both left hand and right sides of the equation (e.g addition, subtraction, etc.) we can do
this with inequalities. However care must be taken this as some operations may change
the direction of the inequality.
Operations preserving the direction of the inequality There are several oper-
ations that can be performed on both sides of the inequality which preserve the direction
of the inequality.
– If a ≥ b and c ∈ R then a + c ≥ b + c
35
– If a > b and c > 0 then ac > bc
a b
– If a ≤ b and c > 0 then c ≤ c
– If a > b and f (x) is strictly monotonically increasing then f (a) > f (b)
– If a > b and f (x) is monotonically increasing but not strictly so then f (a) ≥
f (b)
Note that when we are applying a function to both sides of an inequality, we must be
careful that the domain of x for which f (x) is monotonic at least covers the domain
of a and b. For example if a, b > 0 then if a < b then a2 < b2 . However, if a, b ∈ R
then there are values of a and b for which a < b but a ≮ b. We can generalise this
as follows
– For positive a and b then raising both sides of the inequality to a positive
power n preserves both strict and non-strict inequalities. That is if a < b and
a, b, n > 0 then an < bn .
– There is also a special case when n is a positive, odd integer where the inequality
is preserved for a, b ∈ R.
Operations switching the direction of the inequality There are several oper-
ations that switch the direction of the inequality
– If a > b and f (x) is strictly monotonically decreasing then f (a) < f (b)
36
– If a > b and f (x) is monotonically decreasing but not strictly so then f (a) ≤
f (b)
Again we must be careful that the domain of monotonicity is the same as the domain
of the expressions on either side of the inequality and for negative powers we have
– For positive a and b then raising both sides of the inequality to a negative power
−n switches both strict and non-strict inequalities. For example, if a < b and
a, b, n > 0 then a−n > b−n .
– There is also a special case where −n is a negative, odd integer where the
inequality is switched for a, b ∈ R.
In order to better understand these concepts in practise we can look at some examples.
Example 1: Raising both sides to a positive power. Say a = 2, b = 4, n = 2 then
a = 2 < 4 = b and a3 = 4 < 16 = b2 so the strict inequality is preserved. However,
what if we move a and b out of the range where the an and bn are strictly monotonically
increasing functions, for example a = −2, b = −4, n = 2. Then b = −4 < −2 = a but
b2 = 16 > 4 = a2 so the direction of the inequality is not preserved.
Example 3: Logarithmic function. Logarithms, e.g. log(x) are strictly monotonically in-
creasing for positive values of x. The stipulation that values must be greater than 0 is
not necessarily a limitation when we are working with financial data as there are many
quantities that will not go below zero, for example asset prices, exchange rates, etc. So if
x ≥ y then log(x) ≥ log(y).
Example 4: Exponentials. Exponential functions e.g. ex are strictly monotonic for all
x ∈ R, so if x < y then ex < ey .
We have looked at solving equations f (x) = x for x, however we can also solve equations
involving inequalities, usually to produce a range of values.
37
For example, how would we solve x2 + 3x < −2 for x? First move everything to one
side of the inequality for
x2 + 3x + 2 < 0 (4.3)
We can then factorise the inequality using Eq. (1.9) to find the roots of the equation on
the left hand side of the inequality, i.e.
Having factorised the expression we know that x2 +3x+2 crosses the y-axis at x = −2 and
x = −1 so these provide the boundary between the range of x values where the inequality
holds and the range where it does not hold. However we also need to see which side of
these values the range of x which solves the inequality is, there are several ways of doing
this:
• By inspection 1: we can see that in order for (x + 1)(x + 2) to be negative then one
(but not both) either (x + 1) or (x + 2) must be negative. The only values of x which
meet this condition are between -2 and -1 and therefore x ∈ (−2, −1) is the solution
to the inequality. Note that the solution is an open rather than a closed set. This
is because the inequality is strict. If the inequality was non strict then the solution
would be a closed set.
• For more complicated expressions you can differentiate and find the direction of the
function (i.e. positive or negative gradient at the roots which will tell you whether
the function is going from positive to negative or vice versa.
• For very complicated expressions it can be useful to sketch the curve to understand
the values of x where you are above and below the x axis.
We saw Landau notation, which is also known as “big O” notation in Chapter 2, Section
2.2.3 with reference to the error seen in truncated Taylor series.
More formally, it is a way of describing the behaviour of a function f (x) as x approaches
some value. Very often we are concerned with the behaviour of the function as x → ∞
but also, as we saw for the work on Taylor series, we may wish to examine the behaviour
as x → 0. In fact we can look at the behaviour at the limit x → a, where a can be any
value but 0 and ∞ are the most commonly used.
38
So, if for a real or complex valued function f (x) and a real positive valued function
g(x) we say
f (x) = O g(x) as x→a (4.5)
∞
x2 x4 x6 X x2n
cos x = 1 − + − + ··· = (−1)n . (4.7)
2! 4! 6! (2n)!
n=0
x2 x4
cos x ≈ 1 − + (4.8)
2! 4!
we can see this has an error O(x6 ) as x → 0 because as x becomes very small the first
6
truncated term ( x6! ) will become the most significant term in the error.
Example 2: How does a function
behave as x → ∞? We can see that for x ≥ 1, 3x3 + 4x2 + 5x + 2 ≤ 14x3 therefore we can
say that
f (x) = O x3
as x → ∞. (4.10)
We can notice that we do not need to state the constant c = 14 anywhere in the Landau
notation, it is enough that a finite constant exists in order to use Landau notation.
Landau notation is also used to define the computational complexity of algorithms.
If an algorithm with n input variables is defined as having a complexity of O(n2 ) this
means that as the number of inputs variables increases, the computational time increases
by square of the number of input variables.
When we are measuring output errors or computational times and we wish to see how
these are bounded, it can be useful to make use of log graphs, where both axes have a log
scale and semi-log graphs where only one of the x or y axes have a log scale. These can
39
indicate trends more clearly that graphs with the usual linear scale. The disadvantage is
that you cannot plot negative numbers on a log scale. However if we are assessing the
trend of the functions for the purpose of Landau notation then we can see from Eq. (4.6)
that we would wish to plot |f (x)| and so the only value that could not be plotted on a log
axis is f (x) = 0 and/or x = 0 depending on the type of log plot used.
When we plot a power function of the form xn on a loglog graph this gives a straight
line with the gradient depending on the value of n, see Figure 4.1 for examples of both
positive and negative values of n.
10 12 10 0
f(x)=x2 f(x)=1/x
10 f(x)=x4 f(x)=1/x 3
10
10 -2
10 8
10 -4
f(x)
f(x)
10 6
10 -6
10 4
10 -8
10 2
-10
10 0 10
10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3
x x
Figure 4.1: Log-log graphs showing power functions of the form xn where n is positive for
the left hand plot and negative for the right hand plot.
When we plot an exponential of the form abx , where a is a positive constant and b ∈ R,
on a graph with a linear x-axis and a logarithmic y-axis we obtain a straight line with the
gradient depending on the values of a and b, see Figure 4.2
When we plot a logarithmic function of the form log axb where a is positive and b ∈ R
on a graph with a logarithmic x-axis and a linear y axis we obtain as straight line where
a determines the intercept with the y-axis and b determines the gradient.
40
10 100
f(x)=ex
f(x)=e2x
80
10
10 60
f(x)
40
10
10 20
10 0
10 20 30 40 50 60 70 80 90 100
x
14
f(x)=log x
12 f(x)=log 2x
f(x)=log x2
10
8
f(x)
0
0 1 2 3
10 10 10 10
x
Figure 4.3: Graph showing logarithmic functions of the form log axb .
41
4.3.2 Ordering functions by bound
Often we are interested in how functions behave relative to each other as x approaches a
limit, especially as x → ∞. If we have the following functions on at least a semi-infinite
domain D with bounds:
|f1 (x)| > c1 |f2 (x)| > c2 |f3 (x)| > c3 |f4 (x)| > c4 |f5 (x)| > c5 |f6 (x)| > c6 |f7 (x)| as x → ∞.
(4.18)
Furthermore, if the functions have a finite bound over their entire domains D, then we
can find positive constants κ1 , κ2 , ... so that
|f1 (x)| > κ1 |f2 (x)| > κ2 |f3 (x)| > κ3 |f4 (x)| > κ4 |f5 (x)| > κ5 |f6 (x)| > κ6 |f7 (x)| ∀ x ∈ D.
(4.19)
However we are not able to find positive constants a1 , a2 so that |f1 (x)| < a1 |f2 (x)| or
|f5 (x)| < a2 |f6 (x)|, for example.
42
Chapter 5
Integration
If g(x) = f 0 (x) then we can specify two different types of integrals, indefinite and definite.
The calculation of an indefinite integral means that you are calculating the antiderivative
of a function, i.e. Z
g(x)dx = f (x) + c. (5.1)
Notice that there are no limits on the integral sign and the presence of the constant c.
The output of an indefinite integral is a function of x. In mathematical finance we often
calculate definite integrals, i.e.
Z b
g(x)dx = [f (x)]ba = f (b) − f (a). (5.2)
a
Notice that we obtain a single value, the area between g(x) and the x-axis from x = a
to x = b, rather than a function of x, and the constant c is no longer required. This is
illustrated in Figure 5.1.
As might be expected looking at the definite integral in Eq. (5.2) swapping the limits on
the integral changes the sign of the result, i.e.
Z b Z a
g(x)dx = − g(x)dx. (5.3)
a b
43
15
10
5
f(x)
-5
-10
-1 -0.5 0 0.5 1 1.5 2 2.5 3
x
Note that although we have two integrals we can combine the arbitrary constants into a
single value c.
For definite integrals we must also take into account the limits on the integral when we
are combining functions. Where the limits are the same, it is straightforward to combine
the two integrals, e.g.
Z b Z b Z b
0 0
f (x)dx + g (x)dx = f 0 (x) + g 0 (x)dx
a a a
Where the limits of the two integrals are not the same then we must take slightly more
care, for example with the limits a < b < c < d
Z c Z d Z b Z c Z d
0 0 0 0 0
f (x)dx + g (x)dx = f (x)dx + f (x) + g (x)dx + g 0 (x)dx. (5.6)
a b a b c
Notice that we can only combine the integrals where the limits overlap. Where there is
Rb Rd
no overlap between the limits, for example for a f 0 (x)dx + c g 0 (x)dx with a < b < c < d,
we cannot combine the integrals at all.
44
5.1.1 Common equations
We can infer from Section 2.1.1 a similar set of common equations for integrals. For
polynomials of the form axn
axn+1
Z
axn dx = + c, (5.7)
n+1
Z v n+1 v
n ax a(v n+1 − un+1 )
ax dx = = . (5.8)
u n+1 u n+1
aekx
Z
aekx dx = + c, (5.9)
k
Z v kx v
kx ae a(ekv − eku )
ae dx = = . (5.10)
u k u k
We can also use some of the standard results for derivatives to obtain the integrals of
functions, e.g.
Z θ
f (x) = cos x, f (x)dx = [sin x]θ0 = sin θ − sin 0 = sin θ, (5.11)
0
Z θ
f (x) = sin x, f (x)dx = [− cos x]θ0 = − cos θ + cos 0 = 1 − cos θ. (5.12)
0
sin x − d cos
dx
x
tan x = = (5.13)
cos x cos x
When integrals become more complex we cannot simply find the result by looking at
common formulas for derivatives and we need to look at other techniques, one of these is
integration by substitution. The idea is that we should transform an integral
Z b
g(x)dx, (5.15)
a
45
which is hard to solve, into
Z d
h(u)du, (5.16)
c
Example: Solve
Z b
1
√ dx.
a 1 − x2
Define
x = sin u,
dx
⇒ = cos u,
du
⇒ dx = cos udu,
d = arcsin b,
c = arcsin a.
Therefore
Z b Z arcsin b
1 1
√ dx = p cos udu
a 1 − x2 arcsin a 1 − sin2 (u)
Z arcsin b
cos u
= du
arcsin a cos u
arcsin b
= u arcsin a
= arcsin b − arcsin a. (5.19)
This method is useful when the integrand is a product of two functions; it can be derived
from the product rule for differentiation:
46
Integrating and then rearranging, we then obtain
Z b Z b
b dv(x) du(x)
u(x)v(x) a = u(x) dx + v(x) dx, (5.21)
a dx a dx
Z b Z b
dv(x) b du(x)
⇒ u(x) dx = u(x)v(x) a − v(x) dx, (5.22)
a dx a dx
Example 1:
Find the expected value of an exponential random variable with parameter λ, i.e. solve
Z ∞
E[x] = xλe−λx dx. (5.23)
0
u(x) = x,
du(x)
⇒ = 1,
dx
dv(x)
= λe−λx ,
dx
⇒ v(x) = −eλx . (5.24)
Therefore,
Z ∞ ∞
Z ∞
−λx
xe−λx 0 −e−λx dx
xλe dx = − −
0 0
∞
e−λx
=0−
λ 0
1
= . (5.25)
λ
Example 2: Prove the recurrence relation for the Gamma function, i.e. αΓ(α) = Γ(α + 1)
using integration by parts. By the definition of the Gamma function in Section 1.4 we
have
Z ∞
αΓ(α) = α xα−1 e−x dx. (5.26)
0
u(x) = e−x ,
du(x)
⇒ = −e−x ,
dx
dv(x)
= xα−1 ,
dx
1
⇒ v(x) = xα . (5.27)
α
47
So we have
Z ∞
αΓ(α) = α xα−1 e−x dx
0
1 α −x ∞
Z ∞
1 α −x
= −α x e +α x e dx
α 0 0 α
Z ∞
= xα e−x dx
0
= Γ(α + 1) (5.28)
Dx3 + Ex2 + F x + G
f (x) = (5.29)
(x + A)(x + B)(x2 + C)
a b cx + d
f (x) = + + 2 (5.30)
(x + A) (x + B) (x + C)
which are straightforward to integrate using the common integral equations and the lin-
earity property.
Example: Integrate
3x + 5
f (x) = . (5.31)
(x + 1)2 (x + 2)
First find a, b and c such that
a b c
f (x) = + + . (5.32)
(x + 1) (x + 1)2 (x + 2)
We do this by recombining the partial fractions into a single fraction and equating terms.
If we were to combine the terms in Eq. (5.32) then we would obtain
48
Equating terms we obtain three independent equations in the three unknowns, i.e.
a + c = 0,
3a + b + 2c = 3,
2a + 2b + c = 5. (5.34)
1 2 1
f (x) = + 2
− . (5.35)
(x + 1) (x + 1) (x + 2)
49
Chapter 6
Much of the work in mathematical finance and applied mathematics as a whole involves
differential equations where we need to solve an equation involving a function and its
derivative. When the function is of a single variable and the derivative is with respect to
that same variable, these are known as ordinary differential equations (ODEs). The order
of the ODE is determined by the highest order derivative present in the equation so first
order ODEs involve the function and a first order derivative of the function.
dy g(x)
= (6.1)
dx f (y)
We arrange the equation so that the all the x terms are on one side and all of the y terms
are on the other side, i.e.
f (y)dy = g(x)dx (6.2)
Notice that these are both indefinite integrals and therefore we must either include a
constant in the solution or apply some boundary conditions in order to calculate an exact
solution. It is for this reason that this type of problem is often referred to as an initial value
problem (IVP) when the boundary conditions are functions of zero, e.g. f (0) = 1, f 0 (0) =
3, or a boundary value problem (BVP) when one or more of the boundary conditions are
50
not functions of zero, e.g. f (2π) = 0.
Example: a simple first order ODE, relevant to mathematical finance, is the equation for
constant interest rate on a bank account, i.e.
dBt
= rBt . (6.4)
dt
Bt
That is the rate of change of a bank account dt is equal to a constant interest rate r
multiplied by the amount in the account Bt . We can collect all the Bt terms on one side
of the equation and solve as follows.
dBt
= rdt, (6.5)
Bt
Z Z
1
dBt = rdt, (6.6)
Bt
log Bt + c1 = rt + c2 , (6.7)
B0 = c3 er×0 = c3 . (6.9)
Therefore we obtain an equation for the amount of money in a bank account at time t
with an initial deposit of Bt and a constant interest rate of r as
Bt = B0 ert . (6.10)
df (x)
+ f (x)p(x) = g(x) (6.11)
dx
51
We define F (x) so that
p(x)F (x) = F 0 (x) (6.13)
The first step in finding a solution to this is to solve Eq. (6.13) for F (x) using separation
of variables, i.e.
dF (x)
= p(x)F (x), (6.16)
Z dx Z
dF (x)
= p(x)dx, (6.17)
F (x)
Z
log F (x) + c1 = p(x)dx, (6.18)
R
F (x) = ec2 e p(x)dx
. (6.19)
df (x)
+ 10f (x) = 15. (6.23)
dx
This is a simple example as both p(x) and f (x) are constants, nonetheless it illustrates
the method. First find F (x)
R
10dx
e = e10x . (6.24)
52
Then
15e10x dx + c3
R
f (x) =
e10x
15 10x
e dx + c3
= 10 ,
e10x
= 1.5 + c3 e−10x . (6.25)
In order to find a particular solution we need the boundary conditions. Say we have
f (0) = 3 then we have
1.5 + c3 = 3, (6.26)
df (x)
+ p(x)f (x) = g(x)f (x)α , (6.28)
dx
where α ∈ R. The first step to finding a solution for this class of first order ODEs is to
rearrange this as
df (x)
= g(x)f (x)α − p(x)f (x). (6.29)
dx
We also define u(x) = f (x)(1−α) so that by chain rule
du(x) df (x)
= (1 − α)f (x)−α . (6.30)
dx dx
df (x)
Substituting the expression on the right hand side of Eq. (6.29) for dx into Eq. (6.30),
we obtain
du(x)
= (1 − α)f (x)−α g(x)f (x)α − p(x)f (x)
dx h i
= (1 − α) g(x) − p(x)f (x)(1−α) . (6.31)
We have u(x) = f (x)(1−α) and substituting this into Eq. (6.31) gives
du(x)
= (1 − α)g(x) − (1 − α)p(x)u(x) (6.32)
dx
du(x)
+ (1 − α)p(x)u(x) = (1 − α)g(x). (6.33)
dx
This is in the form of a linear ODE and can be solved using the integrating factor method
described in Section 6.2.
53
Example: Solve
df (x)
− 2f (x) = xf (x)4 . (6.34)
dx
First define u(x) = f (x)−3 so
du(x) df (x)
= −3f (x)−4
dx dx
−4
= −3f (x) 2f (x) + xf (x)4
= −3 2f (x)−3 + x
= −3 2u(x) + x . (6.35)
du(x)
+ 6u(x) = −3x. (6.36)
dx
So
1 1
f (x)−3 = − x− + ce−6x (6.38)
2 6
and if we have a boundary condition we can apply it now to find the value of c. Say
f (0) = 2 then f (0)−3 = 1/8 and
1 1
= +c
8 12
1 1
c= −
8 12
1
= . (6.39)
24
54
Chapter 7
Multivariate calculus
So far we have only looked at rules of calculus for functions in a single variable. We now
move on to look at differentiating and integrating functions in more than one variable.
7.1 Notation
For a function f (x, y) then its 1st order partial derivatives are written
∂f (x, y) ∂f
= = fx . (7.1)
∂x ∂x
∂2f
= fxx for second order derivatives in the same variable and (7.2)
∂x2
∂2f
= fyx for mixed second order derivatives in different variables. (7.3)
∂x∂y
Notice the order of the variables in the notation for mixed second order derivatives.
Even if a function is continuous in each argument, this is not a sufficient condition for
x2 y
multivariate continuity. Example: The expression x4 +y 2
gives a different limit at (0,0)
whether it is approached via a straight line y = kx or a parabola y = x2 , i.e.
d3 kx3
x2 y kx3
dx
6k
= 4 = d3 (x4 +k2 x2 ) = = ∞, (7.4)
x4 + y 2 x=0,y=0
y=kx x + k 2 x2 x=0 12x x=0
dx x=0
x2 y x4 x4
1
= 4 = = . (7.5)
x4 + y 2 x=0,y=0
y=x2 x + x4 x=0 2x4 x=0 2
55
If a function is described as continuous it means that these conditions are true at all
points in its domain:
Product rule stays the same, for u(x, y), v(x, y) and w = uv
dw ∂v(x, y) ∂u(x, y)
= u(x, y) + v(x, y) . (7.6)
dx ∂x ∂x
Chain rule changes for multivariate functions; there are three variants depending on the
form of the equations:
2. The outer function is of several variables and the inner one is a function of a single
variable
In financial applications you often see functions of the form f (t, u(t), v(t)) in which
case
df ∂f ∂f ∂u ∂f ∂v
= + + . (7.9)
dt ∂t ∂u ∂t ∂v ∂t
df ∂f
It may seem confusing having dt on the left hand side and ∂t on the right hand
side but this can be easily understood if we define a function s(t) = t and then
∂f ∂s ∂f ∂f
∂s ∂t = ∂s = ∂t .
3. Both functions are of several variables, i.e. f (u1 (x1 , ..., xm ), ..., un (x1 , ..., xm ), then
∂f ∂f ∂u1 ∂f ∂un
= + ··· + i = 1, 2, ..., m (7.10)
∂xi ∂u1 ∂xi ∂un ∂xi
56
7.3.3 Clairaut’s theorem (simplified)
If the the two mixed 2nd order partial derivatives are continuous at (x0 , y0 ) then fxy (x0 , y0 ) =
fyx (x0 , y0 ).
We look at the Taylor series in two variables. The factorial scaling of the terms from the
single variable Taylor series is multiplied by the binomial coefficient, i.e.
n n!
= . (7.11)
i i!(n − i)!
The binomial coefficient gives the number of ways that the partial derivative can be ob-
tained (assuming the conditions for applying Clairaut’s theorem are met). For exam-
ple there are three ways that you can obtain the mixed third order term fxxy , because
fxxy = fxyx = fyxx .
∂f ∂f
df = dx + dy. (7.13)
∂x ∂y
Only the first two terms remain as ∆xn , ∆y n , where n ≥ 2 approach zero more quickly
than ∆x and ∆y. This leads to Euler’s method for the approximation of a function, i.e.
∂f ∂f
∆f = ∆x + ∆y. (7.14)
∂x ∂y
57
However, for a function f (t, Wt ), where Wt is a Wiener process, this changes slightly.
Starting out with the Taylor series we have
where fw , ft w and fww are the first and second order derivatives with respect to Wt .
Taking the limit as ∆t, ∆Wt → 0, we obtain.
∂f ∂f 1 ∂2f
df = dt + dWt + dWt2 . (7.16)
∂t ∂Wt 2 ∂Wt2
1 ∂2f 2
Notice the extra term 2 ∂Wt2 dWt which appears as, although ∆t2 and ∆t∆Wt approach
zero at more quickly than ∆t, ∆Wt approaches zero at the same rate as ∆t. In fact from
Ito at the limit ∆t → 0 we have dWt2 = dt. This gives Ito’s lemma which, in simplistic
terms, states that the differential of any twice continuously differentiable function can be
written
∂f ∂f 1 ∂2f
df = dt + dWt + dt. (7.17)
∂t ∂Wt 2 ∂Wt2
For a function of two Wiener processes Xt and Yt , second order terms must be included
for both processes, plus a mixed second order term, i.e.
As before dXt2 = dt and dYt2 = dt and from Ito, dXt Yt = ρdt where ρ is the correlation
between the processes, so we have
where X defines the range of integration over x and Y defines the range of integration
over y. Notice that the integral is written in a nested way, i.e. the first integral over X
corresponds to the final dx term.
58
7.5.1 Fubini’s theorem
One of the most useful theorems when we are integrating over two variables is Fubini’s
theorem. Simply, this states that
Z Z Z Z
f (x, y)dy dx = f (x, y)dx dy, (7.21)
X Y Y X
There are several different situations that we may come across when we are differentiating
an integral, the most basic is where the variable only appears in the range of a definite
integral, i.e. Z x
d
f (z)dz = f (x). (7.23)
dx c
dg(x)
The sketch proof of this is as follows, say f (x) = dx , then
Z x
d d dg(x)
f (z)dz = [g(x) − g(c)] = = f (x). (7.24)
dx c dx dx
The second situation we examine is the differentiation of the integral of a function of two
variables, i.e.
Z b
u(x) = f (x, t)dt (7.25)
a
du u(x + ∆x) − u(x)
= lim
dx ∆x→0 ∆x
Rb Rb
a f (x + ∆x, t)dt − a f (x, t)dt
= lim
∆x→0 ∆x
Z b
f (x + ∆x, t)dt − f (x, t)dt
= lim . (7.26)
∆x→0 a ∆x
Provided the limit can be passed across the integral (dominated convergence theorem)
then
b b
f (x + ∆x, t)dt − f (x, t)dt
Z Z
du ∂f (x, t)
= lim = dt. (7.27)
dx a ∆x→0 ∆x a ∂x
The final situation we look at is when the parameter we are differentiating over appears
in the limits and in the integrand e.g.
Z b(t)
g(t, a(t), b(t)) = f (z, t)dz (7.28)
a(t)
59
differentiated over t. To do this we use Leibniz’s integral rule which makes use of the
second type of chain rule described above, i.e.
dg ∂g ∂g ∂a ∂g ∂b
= + + . (7.29)
dt ∂t ∂a ∂t ∂b ∂t
Making use of the rules, described above, for differentiation across the integral and differ-
entiation of an integral with limits which are a function of the variable we obtain
Z b(t)
dg ∂f (z, t) ∂b ∂a
= dz + f (b(t), t) − f (a(t), t) . (7.30)
dt a(t) ∂t ∂t ∂t
∂x ∂x
∂x ∂y ∂y ∂x
Here ∂u ∂v
= − and is named the Jacobian determinant.
∂y ∂y
∂u ∂v ∂u ∂v
∂u ∂v
Example: Z ∞Z ∞
2 +y 2 )
Solve e−(x dxdy. (7.32)
0 0
Define x = r cos θ and y = r sin θ. The new domains for r and θ are therefore (0, ∞) and
(0, π2 ). Then we can also calculate the Jacobian determinant using
∂x ∂y ∂x ∂y
= cos θ = sin θ = −r sin θ = r cos θ (7.33)
∂r ∂r ∂θ ∂θ
so we obtain
∂x ∂x
= cos θ · r cos θ − sin θ · −r sin θ = r cos2 θ + r sin2 θ = r.
∂r ∂θ
∂y ∂y
(7.34)
∂r ∂θ
60
Substituting for x and y in the original integral thus gives
Z ∞Z ∞ Z π/2 Z ∞
−x2 +y 2 2 (cos2 θ+sin2 θ)
e dxdy = e−r rdrdθ
0 0 0 0
Z π/2 Z ∞
2
= re−r drdθ
0
"0 2 ∞
#
π/2
−e−r
Z
= dθ
0 2
0
Z π/2
1 π
= dθ = (7.35)
0 2 4
61
Chapter 8
Complex numbers
8.1 Introduction
√
i= −1. (8.1)
62
to give a real part of su − tv and the imaginary part of i(ut + sv). The conjugate of a
complex number z = u + iv is defined as z = z = u − iv (sometimes z ∗ is used). An
illustration of this on the complex plane is shown in Figure 8.2.
zz = u2 + v 2 (8.5)
y s + it (u − iv)(s + it) us + vt ut − vs
= = 2 2
= 2 2
+i 2 (8.6)
z u + iv u +v u +v u + v2
Up to now we have been using Cartesian notation. The calculation of products and
quotients is easier if we introduce the polar notation. Instead of defining the real and
imaginary parts, we have a magnitude and an angle in radians (argument) relative to the
real axis.
z = u + iv = reiθ . (8.7)
63
It is clear from Figure 8.3 that we can use simple trigonometry to obtain the relationship
between the parameters of the the two forms, i.e.
z = u − iv = re−iθ (8.13)
and this can can be easily observed looking at the complex plane in Figure 8.4.
It can be observed from the complex plane that reiθ = reiθ+i2kπ , where k is any integer.
Furthermore −reiθ = reiθ+i(2k+1)π , where k is any integer.
eiπ + 1 = 0. (8.15)
64
Combining Euler’s formula in Eq. (8.14) for θ and −θ, we can see how the complex
exponential representations of trigonometric functions are arrived at, for example:
eiθ + e−iθ
cos θ = , (8.16)
2
eiθ − e−iθ
sin θ = . (8.17)
2i
These then can be used show how the results for the differentiation of trigonometric
functions are obtained.
We can also see how some of the common identities listed in Section 1.3 are obtained.
Example 1:
ei(θ+α) − e−i(θ+α)
sin(θ + α) = . (8.20)
2i
(eiθ − e−iθ )(eiα + e−iα )
sin θ cos α =
4i
e i(θ+α) −e −i(θ+α) − e−i(θ−α) + ei(θ−α)
= . (8.21)
4i
(eiθ + e−iθ )(eiα − e−iα )
cos θ sin α =
4i
ei(θ+α) − e−i(θ+α) + e−i(θ−α) − ei(θ−α)
= . (8.22)
4i
=⇒ sin(θ + α) = sin θ cos α + cos θ sin α (8.23)
Example 2:
2
eiθ − e−iθ
2
cos θ =
2
e + 2eiθ e−iθ + e−i2θ
iθ
=
4
1
= (cos 2θ + 1) (8.24)
2
Similarly we can show that sin2 θ = 12 (1 − cos 2θ). Adding the identities together gives us
sin2 θ + cos2 θ = 1, which we used when carrying out integration by substitution in the
example in Section 5.1.2.
65
Using polar notation also makes it very easy to find the powers of complex numbers, i.e.
However, unlike real numbers, these roots are multivariate functions as reiθ = reiθ+i2kπ
for integer k. So, for example, w = reiπ/4 = reiπ/4+i2π , but taking the square root gives
different results depending on the value of k, i.e.
In many applications we only deal with logarithms of positive numbers. However, using
the polar representation for complex numbers above we can also produce logarithms of
complex and negative numbers. For a real number
y = log x ⇔ x = ey . (8.30)
We have seen that all complex and negative numbers can be written as a scalar multiplied
by an exponential of a complex number and we can use this to find the logarithm of
negative and complex numbers.
Negative numbers This is the simplest example and makes use of Euler’s identity,
defined above, which we can rearrange as eiπ = −1. For a negative number −A where A
is positive and thus we can obtain a = log A:
66
Complex numbers We can write the polar representation of a complex number z as
Complex analysis is a huge subject and so this section is to give you a little familiarity with
the subject rather than provide a comprehensive guide. For more information Kreyszig’s
book provides a more detailed introduction.
i.e. f (z) is the sum of a pair of functions, u(x, y) and v(x, y), which are functions of the
real and imaginary parts of z. Therefore, differentiating a complex function is linked to
the work we did on partial derivatives in the previous lecture.
67
Conversely, if you let ∆x → 0 first, you are left with
As we can equate the real and imaginary paths, we have as a condition of analyticity that
∂u ∂v ∂v ∂u
= and =− (8.39)
∂x ∂y ∂x ∂y
∂u ∂u ∂x ∂u ∂y
= +
∂r ∂x ∂r ∂y ∂r
∂u ∂u
= cos θ + sin θ
∂x ∂y
∂v ∂v
= cos θ − sin θ
∂y ∂x
1 ∂v ∂v
= r cos θ − r sin θ
r ∂y ∂x
1 ∂v ∂y ∂v ∂x 1 ∂v
= + = (8.40)
r ∂y ∂rθ ∂x ∂θ r ∂θ
where the third line uses the substitution of the Cauchy-Riemann equations, a similar
method can be used to show that
∂v 1 ∂u
=− (8.41)
∂r r ∂θ
Similarly to the section on differentiation, this is not intended to provide you with a
comprehensive guide, but rather is an introduction to some of the concepts and notation
you may see. Complex integration is on a path through the complex plane and is written
R H
C f (z)dz for an open path and C f (z)dz for a closed path. In general the value of the
integral depends on the path through the complex plane, however there are exceptions to
this.
If the function is analytic within a simply connected domain D containing points a and b
68
which is joined by a path C completely within D and we have f (z) = F 0 (z) then
Z Z b
f (z)dz = f (z)dz = F (b) − F (a) (8.42)
C a
and the integral gives the same value for any path in D between a and b.
This implies for a simple closed path within a simply connected domain that, as the first
part of the path is from an arbitrary point a to b, then the rest of the curve is from point
a back to b and
I Z b Z a Z b Z b
f (z)dz = f (z)dz + f (z)dz = f (z)dz − f (z)dz = 0 (8.43)
C a b a a
I
dz
(8.44)
C z
dz
define the path z = cos t + i sin t = eit over the range t ∈ [0, 2π]. Then dt = ieit and we
substitute dz, z and the limits to give
2π 2π
ieit dt
Z Z
=i dt = 2πi. (8.45)
0 eit 0
69
Chapter 9
af 00 + bf 0 + cf = 0 (9.1)
d2 f df
where f 00 = dx2
and f 0 = dx as before.
We can solve this using an ansatz(guess) of f = eλx , then f 0 = λeλx and f 00 = λ2 eλx .
Then Eq. (9.1) can be rewritten as
This means that we must have aλ2 + bλ + c = 0 which we can solve using Eq.(1.9) above.
There are three possibilities, real roots, complex roots or equal roots which we illustrate
with examples.
f 00 + 5f 0 + 4f = 0 (9.3)
with initial conditions f (0) = 5, f 0 (0) = 1 (note that now we have two possible solutions
we need two initial conditions). Using the formula for the roots of a quadratic equation
in Eq. (1.9) we have
√
−5 ± 25 − 16
λ= . (9.4)
2
70
Therefore, the two possible roots are λ = −4, λ = −1 which gives us two possible solutions
that we call f1 and f2 . Therefore, by linearity of differentiation the general case solution
is
Using the initial conditions gives us c1 + c2 = 5 for f (0) and f 0 (0) = −c1 − 4c2 = 1 which
gives c2 = −2 and c1 = 7 for a solution of
f 00 + 2f 0 + 4f = 0 (9.7)
√
with initial conditions f (0) = 1, f 0 (0) = 1 Solving for λ using Eq. (1.9) gives λ = −1 ± 3i
So we have a general solution of
√ √
f (x) = e−x c1 ei 3x + c2 e−i 3x (9.8)
ca cb ca cb
If we redefine our constants as c1 = 2 + 2i and c2 = 2 − 2i , then using the exponential
formulae for sin and cos we can rewrite this as
√ √
f (x) = e−x ca cos( 3x) + cb sin( 3x) (9.9)
= 1 (ca · 1 + cb · 0)
= ca (9.10)
0 −x
√ √ −x
√ √ √
f (0) = −e ca cos( 3x) + cb sin( 3x) + e −ca sin( 3x) + cb 3 cos( 3x) |x=0
√ √ √ √ √
= −e−0 ca cos( 3 · 0) + cb sin( 3 · 0) + e−0 −ca sin( 3 · 0) + cb 3 cos( 3 · 0)
√
= −ca + 3cb = 1 (9.11)
√
which gives cb = 2/ 3 for a general solution of
√ √
−x 2
f (x) = e cos( 3x) + √ sin( 3x) . (9.12)
3
71
Example 3. Single roots: When b2 = 4ac we only obtain one solution to Eq. (1.9) above.
In this case to obtain our two solutions, we define one as f1 (x) = c1 eλx where λ is the
only calculated root, i.e. λ = −b/2a and we define another solution as f2 (x) = v(x)e−bx/2a
where v(x) is TBD. By chain and product rules we get the first and second derivatives of
f2 (x):
b
f20 (x) = v(x)0 e−bx/2a − v(x)e−bx/2a , (9.13)
2a
2
b b b
f200 (x) = v 00 (x)e−bx/2a − v 0 (x)e−bx/2a − v 0 (x)e−bx/2a + v(x)e−bx/2a
2a 2a 2a
2
00 −bx/2a b 0 −bx/2a b
= v (x)e − v (x)e + v(x)e−bx/2a . (9.14)
a 2a
Putting this into our original ODE, factoring out the exponential term and rearranging
gives
2 !
b b b
a v 00 (x) − v 0 (x) + v(x) 0
+ b v (x) − v(x) + cv(x) = 0 (9.15)
a 2a 2a
1 2
av 00 (x) − (b − 4ac)v(x) = 0. (9.16)
4a
Example 4. Solve
f 00 + 4f 0 + 4f = 0 (9.20)
with initial conditions f (0) = 1, f 0 (0) = 3. This gives a single root of λ = −2 and so a
general solution of
72
For the first boundary condition f (0) = 1 we can set x = 0 and obtain c1 = 1 by inspection.
For the second boundary condition of f 0 (0) = 3 we differentiate our general solution for
We use the substitution t = log x and first and second order chain rule:
df df dt 1 df
= = (9.26)
dx dt dx x dt
d2 f df d2 t d2 f dt 2 1 d2 f 1 d2 f
1 df df
= + 2 =− 2 + = 2 − (9.27)
dx2 dt dx2 dt dx x dt x2 dt2 x dt2 dt
df d2 f
Using Newtonian (dot) notation for the derivative with regard to t, i.e. dt = f˙ and dt2
= f¨,
we can rewrite Eq. (9.25) as
Defining b0 = b − a, we can then solve in the same way as for the homogenous equations
in Section 9.1 above to give f as a function of t and then use the substitution t = log x to
obtain the required answer.
Example 5. Real roots
Solve the following for boundary conditions f (1) = 0 and f 0 (1) = 1
73
The roots are −3 and 25 , giving a general solution for f (t) of
5
f (t) = c1 e−3t + c2 e 2 t . (9.31)
5 5
f (x) = c1 e−3 log x + c2 e 2 log x = c1 x−3 + c2 x 2 . (9.32)
Applying the boundary condition f (1) = 2 gvies c1 + c2 = 0. And for f 0 (1) = 1 we have
0 −4 5 3
f (0) = −3c1 x + c2 x 2
2 x=1
5
= −3c1 + c2
2
5
= 3+ c2 = 1. (9.33)
2
2 2
Thus c2 = 11 and c1 = − 11 giving the solution
2 −3 2 5
f (x) = − x + x2 . (9.34)
11 11
x2 f 00 + 3xf 0 + 4f = 0. (9.35)
√
The roots are −1 ± i 3 giving the general solution in terms of t as
√ √
f (t) = e−t (c1 cos( 3t) + c2 sin( 3t)). (9.37)
√ √
f (x) = c1 x−1 cos( 3 log x) + c2 x−1 sin( 3 log x). (9.38)
√ √
f (1) = c1 x−1 cos( 3 log x) + c2 x−1 sin( 3 log x)
x=1
−1
√ −1
√
= c1 1 cos( 3 log 1) + c2 1 sin( 3 log 1)
√ √
= c1 cos( 3 · 0) + c2 sin( 3 · 0) = c1 , (9.39)
74
so c1 = 1. For the second boundary condition f 0 (1) = 2 we have
c1 h √ √ √ i c h√
2 √ √ i
f (0 1) = − cos( 3 log x) + 3 sin( 3 log x) + 3 cos( 3 log x) − sin( 3 log x)
2
xh x 2
x=1
√ √ √ i h √ √ √ i
= −c1 cos( 3 · 0) + 3 sin( 3 · 0) + c2 3 cos( 3 · 0) − sin( 3 · 0)
√
= −c1 + 3c2 (9.40)
√ √
and thus substituting in c1 = 1 we have 3c2 − 1 = 2 and so c2 = 3 giving our solution
√ √ √
f (x) = x−1 cos( 3 log x) + 3x−1 sin( 3 log x). (9.41)
The first boundary condition gives c1 = 1 as log(1) = 0. For the second boundary condition
we have
x4
0 3 3
f (1) = c1 4x + c2 4x log x + c2 = 4c1 + c2 = 5, (9.46)
x x=1
The final set of functions we look at for second order ODEs are functions of the form
af 00 + bf 0 + cf = p(x). (9.48)
75
We solve these using the linearity property of differentiation, i.e. we can split the solution
into two parts f = fp + fg and then
We can treat these two problems as separate and solve afg00 + bfg0 + cfg = 0 using the
techniques for finding general solutions in Section 9.1. We also need to find a solution for
afp00 + bfp0 + cfp = p(x) and this depends on the form of p(x). The classes of solutions
for different classes of p(x) are shown in Table 9.1. The use of these solutions are also
1. If p(x) is in one of the forms in the left hand column then pick the corresponding
function on the right hand side as your ansatz and equate the terms.
2. If your ansatz is a solution of the general equation then multiply by x for a single
root or x2 for a double root.
3. If p(x) is a combination of the terms in the left hand column of the table then your
ansatz for fp is a linear combination of the corresponding terms in the right hand
column
Example 8: Solve the following for boundary conditions f (0) = 0, f 0 (0) = 1.5
76
The roots of ±1i which leads to a general solution of
where we have used constants k1 and k2 to avoid confusion with the constants c1 and c2
used in Table 9.1. From the table we use fp (x) = c2 x2 + c1 x + c0 . We can substitute fp
and fp00 (x) = 2c2 back into the original problem in Eq. 9.50 for
Applying the first boundary condition f (0) = 0 gives k1 = 0.002. Applying the second
boundary condition f 0 (0) = 1.5 gives
Solving this for our general solution gives real roots of −2 and 6, for a general solution
of fg = k1 e−2x + k2 e6t . As the ansatz for fp is c1 e6x , it is the same as one of our general
solutions so we use the 2nd rule about and multiply the solution by x to give fp = c1 xe6x .
To find c1 we differentiate and put back into the original ODE, f 0 = c1 e6x + 6xc1 e6x ,
f 00 = 6c1 e6x + 6c1 e6x + 36xc1 e6x = 12c1 e6x + 36xc1 e6x
12c1 e6x + 36xc1 e6x − 4(c1 e6x + 6xc1 e6x ) − 12c1 xe6x = 8c1 e6x = e6x . (9.59)
1
so c1 = 8 for a solution of
1
f = k1 e−2x + k2 e6x + xe6x . (9.60)
8
77
The first boundary condition gives k1 + k2 = 0, the second boundary condition gives
0 −2x 6x 1 6x 6 6x 1
f (0) = −2k1 e + 6k2 e + e + xe = 8k2 + = 0. (9.61)
8 8 x=0 8
1 1
so k2 = − 64 and k1 = 64 .
78
Chapter 10
We are going to look at second order PDEs, as motivating examples we will concentrate
on the wave equation and the heat equation. In both cases, finding a general solution is
quite a simple procedure, the trickier part is to apply the boundary conditions.
In order to solve 2nd order partial differential equations we need to be aware of Fourier
transforms. Fourier analysis is a huge subject so we simply present here the basic concepts.
A good reference for further information is Kreyszig.
The basic idea is that a periodic function with a period of 2π can be represented as an
infinite sum of trigonometric functions., i.e.
It can be shown (e.g Kreyszig) that the Fourier coefficients can be calculated using the
formulae
Z π
1
a0 = f (y)dy (10.3)
2π −π
1 π
Z
an = f (y) cos nydy (10.4)
π −π
Z π
1
bn = f (y) sin nydy (10.5)
π −π
79
We can generalise these equations to functions which are periodic over a period of 2l by
using the change of variables y = πl x → dy = πl dx to give
π π 2π 2π 3π 3π
f (y) = a0 + a1 cos x + b1 sin x + a2 cos x + b2 sin x + a3 cos x + b3 sin x + ...
l l l l l l
(10.6)
∞
X nπ nπ
= a0 + an cos x + bn sin x (10.7)
l l
n=1
It can be shown (e.g Kreyszig) that the Fourier coefficients can be calculated using the
formulae
1 l
Z
a0 = f (x)dx (10.8)
2l −l
1 l
Z
nπ
an = f (x) cos xdx (10.9)
l −l l
1 l
Z
nπ
bn = f (x) sin xdx (10.10)
l −l l
Even and functions only have cosine components, odd functions have only sine components
which gives us
1 l
Z
a0 = f (x)dx (10.11)
l 0
2 l
Z
nπ
an = f (x) cos xdx (10.12)
l 0 l
bn = 0 (10.13)
a0 = 0 (10.14)
an = 0 (10.15)
Z l
2 nπ
bn = f (x) sin xdx (10.16)
l 0 l
for odd functions. We can also let L → ∞ which gives changes our summation into an
integral and we obtain
Z ∞
f (x) = [A(ξ) cos ξx + B(ξ) sin ξx]dξ (10.17)
0
80
with
1 ∞
Z
A(ξ) = f (x) cos ξxdx (10.18)
π −∞
1 ∞
Z
B(ξ) = f (x) sin ξxdx (10.19)
π −∞
For completeness we also show how the Fourier transform can be expressed in exponential
form.
1 ∞
Z
f (x) = [A(ξ)eiξx + A(ξ)e−iξx − iB(ξ)eiξx + iB(ξ)e−iξx ]dξ (10.20)
2 0
1 ∞
Z
= [(A(ξ) − iB(ξ))eiξx + (A(ξ) + iB(ξ))e−iξx ]dξ (10.21)
2 0
1 ∞ 1 0
Z Z
iξx
= [A(ξ) − iB(ξ)]e + [A(−ξ) + iB(−ξ)]eiξx dξ (10.22)
2 0 2 −∞
and
Z ∞ Z ∞
A(−ξ) + iB(−ξ) 1 1
= f (x) cos(−ξx)dx − i f (x) sin(−ξx)dx (10.25)
2 2π −∞ 2π −∞
Z ∞ Z ∞
1 1
= f (x)[cos ξx − i sin ξx]dx = f (x)e−iξx dx (10.26)
2π −∞ 2π −∞
where
Z ∞
1
C(ξ) = f (x)e−iξx dx (10.28)
2π −∞
notice that the integration bounds on the equation for f (x) are now between ±∞
An important distribution is the Dirac delta function. Despite the name it is not a function
in the ordinary sense, being known as a distribution or generalised function. It has the
81
properties
∞ x = a
δ(x − a) = (10.29)
0 x 6= a
and
Z ∞
δ(x − a)dx = 1. (10.30)
−∞
It also has the following properties that can be significant in some of the areas of quanti-
tative finance that you will come across. Firstly integrating the a function multiplied by
the delta function obtains the value of the function at a single point, i.e.
Z ∞
δ(x − a)g(x)dx = g(a). (10.31)
−∞
Secondly we can see that the Fourier transform of the δ(x) is therefore equal to 1.
Z ∞
δ(x)eixξ dx = 1. (10.32)
−∞
The Dirac delta function is important in mathematical finance as it is used as the limit
on a transition density when the time parameter goes to zero.
A ”transition density” is the probability density of the difference between a process at
different times, i.e. what is the probability that a process with change (or transition) from
value Xt1 = x to Xt2 = y between times t1 and t2 . In the case of a Wiener process this
transition density is a normal distribution with variance t2 − t1 = τ , i.e.
1 (y−x)2
f (y − x) = √ e 2τ (10.33)
2πτ
We can see that as the times t2 and t1 get closer and closer together τ → 0, this means
that the transition density gets narrower until eventually it is a single infinite value at
x = y and zero elsewhere. As it is a probability distribution function it still retains the
property that integrating over all possible values gives a result of 1.
∂2f ∂2f
2
= c2 2 (10.34)
∂t ∂x
For our example the equation represents the movement of a string tied at both ends 0
and l in the x dimension. The boundary conditions imposed are therefore f (0, t) = 0 and
82
f (l, t) = 0, we also have an initial displacement function f (x, 0) = g(x).
The first step in solving the wave equation is to separate it into two functions, one only a
function of x and one only a function of t, i.e. f (x, t) = u(x)v(t). We can then inset this
in to the PDE in Eq. (10.34) and rearrange as
∂ 2 u(x)v(t) 2
2 ∂ u(x)v(t)
= c (10.35)
∂t2 ∂x2
2
d v(t) d2 u(x)
2
⇒ u(x) = c v(t) (10.36)
dt2 ∂x2
2
d v(t) 1 2
d u(x) 1
⇒ 2
= c2 (10.37)
dt v(t) ∂x2 u(x)
As the left hand side of the equation only depends on t and right hand side of the equation
only depends on x we can say that both sides are equal to a constant, i.e.
1 d2 v(t) 1 d2 u(x) 1
2 2
= = −p2 (10.38)
c dt v(t) dx2 u(x)
(the reason for the negative sign on p2 will become apparent). This gives two second order
ordinary differential equations
d2 u(x)
+ p2 u(x) = 0 and (10.39)
dx2
d2 v(t)
+ p2 c2 v(t) = 0 (10.40)
dt2
which we can solve using some of the same techniques that we discussed in the previous
lectures. These are homogenous 2nd order ODEs, however unlike for our previous examples
where we were given the coefficients, in this case we must infer a value (or a range of values)
from the initial conditions. The first, most general question is whether we have a positive
or negative value of p2 as this will determine whether our roots are real or imaginary. So
in the case of real roots (p2 < 0), we have the general solution for u(x) of
We can first separate the initial conditions into f (0, t) = u(0)v(t) = 0 and f (l, t) =
u(l)v(t) = 0, so unless v(t) = 0∀t (trivial solution) then this gives us u(0) = 0 and
u(l) = 0 so the only way that we could obtain this for Eq. (10.41) is if c1 = c2 = 0 (again
the trivial solution). Therefore we must have imaginary roots (p2 > 0), giving a general
solution of
83
Solving this for the initial conditions gives c1 = 0 due to u(0) = 0 and so u(l) = c2 sin(pl) =
0, implying that if we wish to avoid the trivial solution of c1 = c2 = 0 then sin(pl) = 0,
i.e. pl = kπ where k is any integer. Therefore we have
kπ
u(x) = c2 sin x (10.43)
l
We will deal with the value of c2 when we have the full general solution for f (x, t). We can
then solve the other ODE in t. Using the fact that we know p2 that c2 > 0 by definition
of the problem, we have roots of ±pc = ±c kπ
l and a general solution of
kπ kπ kπ
f (x, t) = u(x)v(t) = ca cos c t + cb sin c t sin x (10.44)
l l l
where the constant c2 has been absorbed into ca and cb . As the above Eq. (10.44) holds
for every value of k then, by the linearity property of differentiation, it is also true that
the summation over all values of k can be used, i.e.
∞
X kπ kπ kπ
f (x, t) = cak cos c t + cbk sin c t sin x (10.45)
l l l
k=1
In order to find the value of cak (notice that we have a different value for each value of k)
we use the initial condition f (x, 0) = g(x) for
∞
X kπ
f (x, 0) = cak sin x = g(x) (10.46)
l
k=1
The summation is also a Fourier series. This means that we can find cak using
Z l
2 kπ
cak = g(x) sin x dx. (10.47)
l 0 l
∂f (x,t)
To find cb we need to consider the first derivative of the initial condition ∂t |t=0 .
∞
∂f (x, t) X kπ kπ kπ kπ kπ
= −cak c sin c t + cbk c cos c t sin x
∂t l l l l l
t=0 k=1 t=0
(10.48)
∞
X kπ kπ
= cbk c sin x . (10.49)
l l
k=1
∂f (x,t)
Very often ∂t |t=0 ≡ 0 (i.e. the string is stationary at t = 0) and then cbk ≡ 0. However
in the case that the string is moving at t = 0 then using the notation ∂f∂t
(x,t)
|t=0 = h(x)
84
then Eq. (10.49) is Fourier series and we can find cbk using
Z l
2 kπ
cak = h(x) sin x dx. (10.50)
cbk kπ 0 l
∂f ∂2f
= d2 2 (10.51)
∂t ∂x
We again assume that f (x, t) = u(x)v(t) and separate the variables to obtain two ODEs,
as before.
d2 u(x)
+ p2 u(x) = 0 and (10.52)
dx2
dv(t)
+ p2 d2 v(t) = 0. (10.53)
dt
Depending on the boundary conditions, there are many different possible solutions to these
equations. However two of the most common boundary conditions are f (0, t) = u(0)v(t) =
0 and f (l, t) = u(l)v(t) = 0 leading to the boundary conditions of u(0) = u(l) = 0, as
before. By similar reasoning to that which we used for the wave equation, we can say that
p2 must be positive. Therefore the general solution for u(x) is
2 d2 t
v(t) = c3 e−p . (10.55)
2 d2 t
f (x, t) = (c1 cos(px) + c2 sin(px))e−p (10.56)
where c3 has been absorbed into the constants c1 and c2 . The boundary conditions for
x = 0, L give c1 = 0, we ignore the possibility to set c2 = 0 as this gives the trivial solution
kπ
of f (t, x) ≡ 0. Therefore we have the general solution of pk = l where k is a positive
integer so we define κk = p2k d2 giving a general solution
∞
X
−κk t kπ
f (x, t) = ck e sin x . (10.57)
l
k=1
85
Say we have the initial time condition f (x, 0) = g(x) we have
∞
X kπ
f (x, 0) = ck sin x (10.58)
l
k=1
which is a Fourier series with ck the Fourier coefficients. We can then obtain ck using the
integral
Z l
2 kπ
ck = g(x) sin x dx. (10.59)
l 0 l
Example:
Solve the heat equation when the boundary conditions are u(0, t) = u(l, t) = 0 and
u(x, 0) = g(x) = 1,
Z l
2 kπx
ck = sin (10.60)
l 0 l
kπx l
2 l
=− cos (10.61)
l kπ l 0
2
= (1 − cos(kπ)) . (10.62)
kπ
4
For even k, cos(kπ) = 1 ⇒ ck = 0. For odd k, cos(kπ) = −1 ⇒ ck = kπ .
Our solution is therefore
∞
X 4 (2n + 1)πx
f (x, t) = e−κ2n+1 t sin (10.63)
(2n + 1)π l
n=0
h i2
(2n+1)π
where κ2n+1 = l c .
For solutions on an infinitely long bar, solutions can be found using a similar technique
to the one above but replacing the Fourier series with the Fourier transform (see Kreyszig
for example), however there are also simpler methids taking advantage of the properties
of the Fourier transform. See Dr Guido Germano’s notes for stochastic processes week 1,
for example.
86
Chapter 11
We introduce some of the key concepts of vectors and matrices and look at ways to solve
systems of linear equations.
11.1 Basics
Vectors and matrices are often written using bold text with upper case used for matrices
and lower case for vectors, e.g.
! ! ! !
a11 a12 b11 b12 b13 c11 c12 x1
A= B= C= x= (11.1)
a21 a22 b21 b22 b23 c21 c22 x2
Notice the row, column convention for addressing the elements and that we drop the col-
umn number for vectors.
87
!
−a11 −a12
where D is a 2×2 matrix, 0 is a 2×2 matrix will all zero entries and −A = .
−a21 −a22
Multiplying by a scalar c is done by multiplying every element of a matrix with c, i.e.
!
ca11 ca12
cA = (11.7)
ca21 ca22
c(A + C) = cC + cA (11.8)
(c + k)A = cA + kA (11.9)
c(kA) = (ck)A (11.10)
1A = A (11.11)
n
X
wij = uik vkj . (11.13)
k=1
The transpose of a matrix, written AT or A0 , is a mirror of its elements along the diagonal,
88
i.e. using the definitions of A and B above
! b11 b21
a11 a21
AT = BT =
b12 b22
(11.18)
a12 a22
b13 b23
Combining the transpose with multiplication we have (AB)T = BT AT , notice the change
in order. An important matrix when we look at multiplication is the identity matrix,
commonly written I or sometimes In to indicate that it is an n × n identity matrix and
has the property IA = AI = A. For example
1 0 0 0
!
1 0 0 1 0 0
I2 = I4 = (11.19)
0 1 0
0 1 0
0 0 0 1
where a· and b· are scalar coefficients. Creating matrices from the coefficients as follows
a11 a12 ... a1n x1 b1
a21 a22 ... a2n x2 b2
A= . x= . b= . (11.21)
.. .. .. .. .. ..
. . .
am1 am2 ... amn xn bm
we can now write the system of linear equations as a single matrix equation
Ax = b. (11.22)
89
Solve the following system of equations
x1 + x2 + x3 = −2
2x1 + 4x2 − 3x3 = 0
2x1 + 2x2 − x3 = 2 (11.23)
The next step is to linearly combine the rows so that you end up with an upper triangular
matrix (i.e. put the system of equations into triangular form). We first subtract 2 × row 1
from row 3 to obtain 2 2 −1 2 −2 1 1 1 −2 = 2 − 2 2 − 2 −1 − 2 2 + 4 =
0 0 −3 6 so we now have
1 1 1 −2
A
e = 2 4 −3
0 (11.26)
0 0 −3 6
The next step is to subtract 2 × row 1 from row 2 to obtain 2 4 −3 0 −2 1 1 1 −2 =
2 − 2 4 − 2 −3 − 2 0 + 4 = 0 2 −5 4 so we now have
1 1 1 −2
A
e = 0 2 −5
4 (11.27)
0 0 −3 6
90
We can then extract 3 new equations from our augmented matrix and thus obtain a
solution.
−3x3 = 6 → x3 = −2 (11.28)
2x2 − 5x3 = 4 → 2x2 = 4 + 5(−2) = −6 → x2 = −3 (11.29)
x1 + x2 + x3 = −2 → x1 = −2 + 3 + 2 = 3 (11.30)
The system of equations in this problem has a single solution associated with it. If we
were to rearrange our augmented array but find something like
a11 a12 a13 b1
0
a22 a23 b2 (11.31)
0 0 0 0
the following then there are infinitely many possible solutions. However if it is something
like
a11 a12 a13 b1
0
a22 a23 b2
(11.32)
0 0 0 b3
A closely related subject to the number of possible solutions is the rank of a matrix. If we
consider a matrix
a11 a12 ... a1n
a21 a22 ... a2n
A= . (11.33)
.. .. .. ..
. . .
an1 an2 ... ann
then it has a rank r if r ≤ n of its row vectors aj = aj1 aj2 ... ajn are linearly
independent from each other. For example for a1 linear independence means that that is
there are no set of constants cj such that a1 = c2 a2 + c3 a3 + ... + cn an . For example for
the following matrix
1 0 4 2
A=
2 3 0 5
(11.34)
5 3 12 11
91
The first two row vectors a1 = 1 0 4 2 and a2 = 2 3 0 5 are linearly inde-
pendent of each other but a3 = 3a1 + a2 . Therefore the matrix is of rank 2.
It can be shown (see e.g. Kreyszig) that the column rank is the same as the row rank and
therefore for a n × m matrix where m 6= n the maximum possible rank is the smallest
dimension, i.e. min(n, m). The rank has some important implications for the solution of
systems of linear equations. If we have our matrix equation Ax = b as before then the
following rules apply
2. If the rank of A is the same as the number of unknowns then there is a single unique
solution.
3. If the rank is less than the number of unknowns then there are infinitely many
solutions.
Most of you will have seen the simple equation for the inverse of a 2 × 2 matrix
!
1 a22 −a12
A−1 = (11.37)
detA −a21 a11
92
a
11 a12
Where detA is the determinant of A, written and is calculated as a11 a22 −a12 a21 .
a21 a22
We can also calculate the inverse of a matrix with n > 2 but the calculations become much
more computationally heavy and complicated.
Say A and B are square n × n matrices of full rank (we will explain the reason for this
condition later) then
We look at a general definition of the inverse of a n×n matrix in the next section. However,
there are techniques which are computationally easier to use. One of the most well-known
is Gauss-Jordan elimination. Which uses a similar technique to Gaussian elimination to
find the solution to a system of linear equations (i.e. Ax = b with unknown x). Instead
we wish to find the solution to the equation AX = I, where I is the identity matrix and
we can see from Eq. (11.36) that the solution to this will give X = A−1 . We illustrate
this with the following example (from Kreyszig).
−1 1 2
Example. Find the inverse of the 3 × 3 matrix
3 −1 1
−1 3 4
First create the augmented matrix A e made up of [A I]
−1 1 2 1 0 0
A
e = 3 −1 1 0 1 0
(11.40)
−1 3 4 0 0 1
We first create the upper triangular matrix using Gaussian elimination as before, i.e. first
add 3× row 1 yo row 2 and take 1× row 1 from row 3 to remove the first entries in those
rows to give
−1 1 2 1 0 0
A
e = 0 2 7 3 1 0 .
(11.41)
0 2 2 −1 0 1
93
Then take (new) row 2 from (new) row 3 to remove the second entry in the third row to
give
−1 1 2 1 0 0
A
e = 0 2 7
3 1 0 . (11.42)
0 0 −5 −4 −1 1
If this was the usual technique of Gaussian elimination we would stop here, however we
now carry on to turn the leftmost 3×3 matrix in A
e into the identity matrix. First multiply
row 1 by −1, row 2 by 0.5 and row 3 by −0.2 in order to obtain 1’s on the diagonal.
1 −1 −2 −1 0 0
A
e = 0 1 3.5 1.5 0.5
0 . (11.43)
0 0 1 0.8 0.2 −0.2
Next remove the entries in the third column of rows 1 and 2 by subtracting 3.5× row 3
from row 2 and adding 2× row 3 to row 1 to give
1 −1 0 0.6 −0.4 0.4
A
e = 0
1 .
0 −1.3 −0.2 0.7 (11.44)
0 0 1 0.8 0.2 −0.2
The final step is to add row 2 to row 1 in order to remove the second entry in row 1 and
finally recover the identity matrix.
1 0 0 −0.7 0.2 0.3
A
e = 0 1 0 −1.3 −0.2 0.7 .
(11.45)
0 0 1 0.8 0.2 −0.2
This result can be easily confirmed by calculating AA−1 using a high level programming
language such as MATLAB.
94
of a matrix inverse. In order to do this we need to know about the determinant and
cofactors of a matrix.
We first calculate the “minors” of a row or column in a matrix. The minor of a matrix
element is the determinant of the matrix with the corresponding row and column removed.
Using b23 as an example the minor is
b
11 b12
M23 = = b11 b32 − b12 b31 (11.48)
b31 b32
i.e. the determinant of a matrix which is equal to B with the second row and third column
removed. The cofactor of a matrix element bij is equal to Cij = (−1)i+j Mij . Selecting
any row i of the matrix we calculate the determinant as
with the same result obtained regardless of the row selected. Similarly we can select any
column j and calculate the determinant as
This not only gives the same result regardless of the column selected but also gives the
same result as the row calculation.
The inverse of the 3 × 3 matrix B is therefore
C11 C21 C31
1
B−1
= C12 C22 C32 (11.51)
detB
C13 C23 C33
Notice the swapping of the indices of the cofactors in B−1 relative to the elements of B.
95
Where Cij are the cofactors of A and detA is the determinant of A (which is also calculated
using the cofactors). The determinant is calculated as
n
X
D= aij Cij or (11.53)
i=1
n
X
D= aji Cji (11.54)
i=1
where j can be selected as any integer between 1 and n. There are several rules for the
determinants of a matrix (see e.g. Kreyszig) but one of the most important ones for our
purposes is that it does not change if you add a (positive or negative) multiple of one row
to another row. Therefore for a n × n matrix with rank r < n there is at least one row
that can be changed to zero by adding multiples of one or more of the other rows. We can
choose any row to calculate our determinant and it is clear that if we selected a row where
all entries aij are zero, the determinant would also be zero and therefore the inverse, as
defined in Eq. (11.52) will be undefined.
This leads to a very important relationship between the rank and the determinant and
inverse of a matrix.
If the rank r of an n × n matrix is less than n, the determinant is zero and the
inverse is undefined.
An important calculation for many applications is to find the eigenvectors and eigenvalues
of a matrix. The eigenvalue problem for a given n × n matrix A is concerned with the
equation
Ax = λx (11.55)
where λ is an unknown scalar and x is a vector of size n. That is, the problem is to find λ
and the corresponding vector x (x 6= 0 as this is the trivial solution) such that multiplying
A by x gives the same result as scaling x by λ. In order to solve the problem we first find
the eigenvalues. In order to do this we can see that
Ax − λx = (A − λI)x = 0 (11.56)
96
which can be written as the homogeneous system of equations
As explained in Section 11.4, in order to have a solution to this other than the trivial one
(x 6= 0 and λ = 0) then the rank of A − λI must be less than n. As explained in Section
11.5.3 a rank of less than n directly implies that the determinant is equal to 0 so, to find
the eigenvalues we solve the following equation
! !
2 1 2−λ 1
Example: For the 2×2 matrix we have the “characteristic matrix”
1 2 1 2−λ
and thus a “characteristic polynomial” of
2 − λ 1
= (2 − λ)2 − 1 = λ2 − 4λ + 3 = (λ − 3)(λ − 1) (11.59)
1 2 − λ
! ! ! !
2−λ 1 x1 −x1 + x2 0
= = (11.60)
1 2−λ x2 x1 − x2 0
This does not give a unique solution for x as (by definition) (A − λI) is of!rank less than
1
2. However by selecting a value for x1 = 1 we can find an eigenvector . Similarly for
1
λ2 = 1 we obtain
! !
x1 + x2 0
= (11.61)
x1 + x2 0
!
1
and setting x1 = 1 we obtain . Clearly, from the rules of multiplying a matrix by a
−1
scalar, if x is an eigenvector then cx is also an eigenvector, where c is any scalar constant.
Therefore we can arbitrarily scale our eigenvector; it is common to normalise it so that
97
qP
n
x2 = 1. So in the case of the two eigenvectors we calculated above we divide by
p i=1 i √
x21 + x22 = 2 for our eigenvectors to be
√1 √1
2 for λ1 = 3 2 for λ2 = 1 (11.62)
√1 − √12
2
It is also possible
! to have complex solutions to the eigenvalue problem. For example, the
0 1
matrix yields the characteristic polynomial
−1 0
0 − λ 1
= λ2 + 1 (11.63)
−1 0 − λ
which has possible solutions λ1 = i, λ2 = −i feeding them back into the equation (A −
λI)x = 0 gives
! ! ! !
−i 1 x1 −ix1 + x2 0
= = (11.64)
−1 −i x2 x1 − ix2 0
√1
for λ1 = i, giving a (normalised) eigenvector of 2 . Similarly we obtain
√i
2
! ! ! !
i 1 x1 ix1 + x2 0
= = (11.65)
−1 i x2 x1 + ix2 0
√1
for λ2 = −i, giving a (normalised) eigenvector of 2 .
− √i2
So far we have only looked at the situation where there is a single linearly independent
eigenvector for each eigenvalue. However it is the case that many different linearly inde-
pendent eigenvectors may !
exist for a single eigenvalue. An extreme example of this is the
1 0
identity matrix I = . Finding the characteristic polynomial yields
0 1
1 − λ 0
= (1 − λ)2 (11.66)
0 1 − λ
98
which has a single repeated root λ = 1. Putting this back into (A − λI)x = 0 yields
! ! ! !
1−λ 0 x1 0 × x1 + 0 × x2 0
= = (11.67)
0 1−λ x2 0 × x1 + 0 × x2 0
which is clearly true for an arbitrary choice of x1 and x2 . In fact this matches what we
already know about the identity matrix which is that
Ix = 1 × x (11.68)
A = PDP−1 (11.69)
Where D is a diagonal matrix containing the eigenvalues and P contains the eigenvec-
tors as column vectors. Note that you must have n linearly independent eigenvectors to
diagonalise a matrix. However P is not necessarily unique in the case where there are
more than n multiple linearly independent eigenvectors, also eigenvalues can be repeated
on the diagonal if they have more than one linearly independent eigenvector. In order
to convince yourself of this think about “diagonalising” the identity matrix using XIX−1
where X can be any matrix.
Not every matrix can be diagonalised, however in general the following rules apply.
2. If you have less than n eigenvalues then you need at least n linearly independent
eigenvectors in total in order to be able to diagonalise the matrix.
The diagonalisation of a matrix can be useful for finding the powers of a matrix in a
computationally efficient way. We first recognise that if we have a diagonal matrix D
which we wish to raise to the power p then we do not need to go through the usual matrix
99
multiplication procedure, we simply need to calculate
dp11 0 0 ... 0
0 dp22 0 ... 0
p dp33
D =
0 0 ... 0 . (11.70)
.. .. .. .. .
..
. . . .
p
0 0 0 ... dnn
p
Ap = PDP−1 = PDP−1 PDP−1 PDP−1 ...PDP−1 PDP−1 . (11.71)
p
Ap = PDP−1 = PDIDID...DIDP−1 = PDDD...DDP−1 = PDp P−1 . (11.72)
xT Ax > 0 (11.73)
100
However, it is possible to create so called pseudo-inverses which give an identity matrix
when multiplied from the left or right only.
Let A be a m × n matrix, where m < n, which is of rank m then we can use the
property of matrix ranks that
We can also see that, by the properties of matrix multiplication, that AAT gives an m×m
matrix which is of rank m according to Eq. (11.75) and thus of full rank and therefore
invertible. Therefore we can say that
and so we call AT (AAT )−1 the right hand pseudo-inverse of A as it gives the identity
matrix when multiplied by A from the right hand side.
Similarly A be a m × n matrix, where m > n, which is of rank n then we can use the
property of matrix ranks in Eq. (11.75) we can also see that that AT A gives an n × n
matrix which is of rank n and thus of full rank and therefore invertible. Therefore we can
say that
and so we call (AT A)−1 AT the left hand pseudo-inverse of A as it gives the identity
matrix when multiplied by A from the left hand side.
101