You are on page 1of 101

Mathematics primer for MSc Computational Finance and

MSc Financial Risk Management

Dr Carolyn Phelan

September 25, 2019


Contents

1 Introduction 5
1.1 How to use this primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Trigonometric identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Other useful equations: factorials, the Gamma function and binomial coef-
ficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Differentiation 8
2.1 Basic rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Common equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Chain rule and product rule . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.4 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Uses of differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Maxima and minima of functions . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Taylor and Maclaurin series . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Taylor series approximations . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 L’Hôpital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Study of functions and curve sketching 17


3.1 Roots of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Polynomial division . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Maxima and minima of functions . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Inflection points, concavity and convexity . . . . . . . . . . . . . . . . . . . 21
3.4 Asymptotes and poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Horizontal asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Vertical asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.3 Oblique asymptopes . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Jumps: left and right continuity and holes . . . . . . . . . . . . . . . . . . . 30
3.5.1 Cumulative distribution functions . . . . . . . . . . . . . . . . . . . 31

2
3.6 Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Inequalities and Landau notation 35


4.1 Basic rules of inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Solving inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Log graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Ordering functions by bound . . . . . . . . . . . . . . . . . . . . . . 42

5 Integration 43
5.1 Basic rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Common equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Integration by substitution . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.3 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.4 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 First order ordinary differential equations 50


6.1 Separable ordinary differential equations . . . . . . . . . . . . . . . . . . . . 50
6.2 Integrating factor method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Bernoulli differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Multivariate calculus 55
7.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.3 Basics rules of differentiation for multivariate functions . . . . . . . . . . . . 56
7.3.1 Product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3.3 Clairaut’s theorem (simplified) . . . . . . . . . . . . . . . . . . . . . 57
7.4 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4.1 Taylor series and differentials . . . . . . . . . . . . . . . . . . . . . . 57
7.5 Integration in two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5.1 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.5.2 Differentiation of integrals: Leibniz’s integral rule . . . . . . . . . . . 59
7.5.3 Change of variables in a double integral . . . . . . . . . . . . . . . . 60

8 Complex numbers 62
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Polar notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2.1 Euler’s formula and Euler’s identity . . . . . . . . . . . . . . . . . . 64
8.3 Logarithms of complex and negative numbers . . . . . . . . . . . . . . . . . 66
8.4 Complex differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3
8.4.1 Cauchy-Riemann equations . . . . . . . . . . . . . . . . . . . . . . . 67
8.5 Complex integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

9 Second order ordinary differential equations 70


9.1 Homogeneous ordinary differential equations . . . . . . . . . . . . . . . . . . 70
9.2 Cauchy-Euler equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.3 Non-homogeneous ordinary differential equations . . . . . . . . . . . . . . . 75

10 Second order partial differential equations 79


10.1 A brief introduction to Fourier transforms . . . . . . . . . . . . . . . . . . . 79
10.2 Dirac delta function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.3 Wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4 Heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

11 Linear algebra, vectors and matrices 87


11.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11.2 Solving systems of linear equations: Gaussian elimination . . . . . . . . . . 89
11.3 Matrix rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.4 Implication of rank for homogeneous systems of equations . . . . . . . . . . 92
11.5 Matrix inversion and determinants . . . . . . . . . . . . . . . . . . . . . . . 92
11.5.1 Inverse of a product of matrices . . . . . . . . . . . . . . . . . . . . . 93
11.5.2 Gauss - Jordan elimination. . . . . . . . . . . . . . . . . . . . . . . . 93
11.5.3 General definition of an matrix inverse . . . . . . . . . . . . . . . . . 94
11.6 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.6.1 Complex solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.6.2 Multiple eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.7 Diagonalisation of a matrix and finding matrix powers. . . . . . . . . . . . . 99
11.8 Positive-definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.9 Right and left pseudo-inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4
Chapter 1

Introduction

1.1 How to use this primer

This primer is intended to provide a guide to the mathematical knowledge that you need
to be successful in your MSc modules. You should all have seen most of the contents of
this course this before. However, in order to get the most out of your other courses, it
needs to be familiar to you. This is the starting point that your other lecturers will expect
you to be at and you should be comfortable with these concepts in an exam setting

This course will provide you with explanations and worked examples which you can
use if you come across things that you do not understand in your modules. However it will
be of even more benefit if you study it in advance of taking the modules so that you are
prepared beforehand. Certainly, you should study the information in Section 1.2 in such
detail that you know it without having to continuously reference it during your study for
your other courses.

1.2 Basics

Before we get started, here are a list of things that you should pretty much know in your
sleep. They cover basic relationships between log and exponents and are the sort of things

5
that students can get muddled up with under exam conditions.

ea+b = ea eb (1.1)
eab = (ea )b = (eb )a (1.2)
log a + log b = log ab (1.3)
a
log a − log b = log (1.4)
b
b
log a = b log a (1.5)
logd a
logb a = (1.6)
logd b
a = elog a (1.7)
ar = er log a (1.8)

Roots of a quadratic equation ax2 + bx + c = 0



−b ± b2 − 4ac
x= (1.9)
2a

The sign of the term b2 − 4ac determines whether your roots are real (positive), complex
(negative) or repeated (zero).

A comment on notation. Many of you with an engineering background will be used to


the convention that log(x) is shorthand for log10 (x) and that ln(x) is reserved for natural
log, i.e. loge (x). Instead, we use the convention that log(x) is natural log, i.e. the same as
ln(x). This matches MATLAB syntax where log(x) is the function for natural log and
log10(x) is the function for base 10 log.

1.3 Trigonometric identities

There are very many trig identities; here are a few that you should be familiar with:

cos(−θ) = cos θ, (1.10)


sin(−θ) = − sin θ, (1.11)
tan(−θ) = − tan θ, (1.12)
sin(θ + φ) = sin θ cos φ + cos θ sin φ, (1.13)
cos(θ + φ) = cos θ cos φ − sin θ sin φ, (1.14)
1
cos2 θ = (1 − cos 2θ) , (1.15)
2
1
sin2 θ = (cos 2θ − 1) , (1.16)
2
2 2
cos θ + sin θ = 1. (1.17)

6
1.4 Other useful equations: factorials, the Gamma func-
tion and binomial coefficients
The factorial of a non-negative integer n is n! = 1 × 2 × 3..... × n, with the convention that
0! = 1 and the recurrence relationship n! = n(n − 1)!. In your probability courses you are
also likely to come across the Gamma function
Z ∞
Γ(α) = xz−1 e−x dx, (1.18)
0

which can be conceptualised as the generalisation of the factorial function to any complex
number. For α equal to a positive integer, Γ(α) = (α − 1)! and it has a similar recurrence
relation to the factorial for any value of α, i.e. αΓ(α) = Γ(α + 1). We prove this recurrence
relation for the second example in Section 5.1.3.

The binomial coefficient has several applications in mathematics and probability and one
way to think of it is the number of ways to choose k elements from a set of n elements. It
is is defined as  
n n!
= , (1.19)
k (n − k)!k!
and it has are two useful recurrence relations.
One is multiplicative:
   
n+1 n n+1 n! (n + 1)! n+1
= = = . (1.20)
k+1 k k + 1 (n − k)!k! (n − k)!(k + 1)! k+1

The other is additive


   
n n n! n!
+ = +
k k+1 (n − k)!k! (n − k − 1)!(k + 1)!
n! n!(n − k)
= +
(n − k)!k! (n − k)!k!(k + 1)
 
n! n−k k+1
= +
(n − k)!k! k + 1 k + 1
 
n! n+1
=
(n − k)!k! k + 1
 
n+1
= (1.21)
k+1

7
Chapter 2

Differentiation

The derivative of a function f (x) is defined as

df (x) f (x + ∆x) − f (x)


= lim . (2.1)
dx ∆x→0 ∆x
df d dy
It may also be written dx , dx f (x) and where y = f (x), dx . Higher order derivatives are
written using the notation
dn f dn y dn
= = f (x), (2.2)
dxn dxn dxn
where n is the order of differentiation.

To denote the derivatives more succinctly Lagrange or Newtonian notation may also be
used. Lagrange notation, also called prime notation, denotes the derivative with respect
to x using 0 , i.e.
df d2 f d3 f
= f 0, = f 00 , = f 000 . (2.3)
dx dx2 dx3
For fourth order derivatives and above, 0 is no longer used and is instead replaced by the
order of differentiation in brackets, e.g.

d4 f
= f (4) . (2.4)
dx4

Newtonian notation denotes the derivative with respect to time using a dot over the
function, i.e. for y = f (t),

df d2 f d3 f ...
= f˙, = f¨, = f. (2.5)
dt dt2 dt3

This notation becomes unwieldy when we are working with fourth order derivatives and
above. However, it can be useful, especially when looking at something like the heat
equation where we wish to distinguish between derivatives with respect to x and derivatives
with respect to t.

8
2.1 Basic rules

It is a linear operator so for f (x) = g(x) + h(x)

df (x) dg(x) dh(x)


= + . (2.6)
dx dx dx

This can be shown simply by substituting f (x) = g(x) + h(x) and f (x + ∆x) = g(x +
∆x) + h(x + ∆x) into Eq. (2.1) and separately grouping the terms for g(·) and h(·).

2.1.1 Common equations

For polynomials of the form f (x) = axn the derivative is

df (x)
= anxn−1 . (2.7)
dx

Example 1:
df (x)
f (x) = 3x2 , = 6x. (2.8)
dx
Example 2:
1 df (x) 1
f (x) = , = − 2. (2.9)
x dx x
df (x)
For exponentials f (x) = ex , dx = ex .
For trigonometric functions there are several identities that are useful and you should
know these by heart.

df (x)
f (x) = cos x, = − sin x, (2.10)
dx
df (x)
f (x) = sin x, = cos x. (2.11)
dx

Notice the different signs in the two cases.

2.1.2 Chain rule and product rule

Chain rule gives a formula for the derivative of nested functions f (g(x)) as

df (g(x)) df (g) dg(x)


= . (2.12)
dx dg dx

Example:
f (x) = ax , a>0 (2.13)

9
First define b such that a = eb and then f (x) = ebx so we can define g(x) = bx and
f (g(x)) = eg(x) , then

df (g(x)) df (g) dg(x)


= = eg(x) · b = b · ebx = log a · elog a·x = log a · ax (2.14)
dx dg dx

Product rule gives a formula for the derivative of the product of functions f (x) = u(x)v(x)
as
df (x) dv(x) du(x)
= u(x) + v(x) . (2.15)
dx dx dx
u(x)
Quotient rule can be derived from chain rule and product rule, for f (x) = v(x)

1
df (x) d v(x) 1 du(x)
= u(x) +
dx dx v(x) dx
1
d v(x) dv(x) 1 du(x)
= u(x) +
dv dx v(x) dx
1 dv(x) 1 du(x)
= −u(x) 2
+
v(x) dx v(x) dx
v(x) du(x) dv(x)
dx − u(x) dx
= . (2.16)
v(x)2

Example:
Using quotient rule and the trigonometric identities we can find the derivative of tan x

sin x
tan x = so u(x) = sin x, v(x) = cos x
cos x
du(x) dv(x)
= cos x, = − sin x
dx dx
d tan x sin2 x + cos2 x 1
=⇒ = 2
= = sec2 x (2.17)
dx cos x cos2 x

2.1.3 Higher order derivatives

Chain and product rules also exist for higher order derivatives
Example:
The chain rule for a second order derivative is obtained using the first order product and
chain rules

d2 f (g(x))
 
d df dg
=
dx dx dg dx
   
df d dg dg d df dg
= +
dg dx dx dx dg dg dx
df d2 g d2 f dg 2
 
= + (2.18)
dg dx2 dg 2 dx

The product rule for higher order derivatives is obtained by repeatedly applying the first

10
order product rule
Example:
The product rule for a second order derivative is

d2 u(x)v(x)
 
d dv(x) du(x)
2
= u(x) + v(x)
dx dx dx dx
2
d v(x) du(x) dv(x) d2 u(x)
= u(x) + 2 + v(x) . (2.19)
dx2 dx dx dx2

In fact the product rule can be generalised to any higher derivative using the binomial
coefficient, i.e. ni = i!(n−i)!
n!

, to

n  
dn u(x)v(x) X n dk u(x) dn−k v(x)
= (2.20)
dxn k dxk dxn−k
k=0

2.1.4 Implicit differentiation

If we define y = f (x) then implicit differentiation describes the technique whereby rather
dy dy dx −1
directly it can be easier to calculate dx

than calculating dx dy and then calculate dx = dy ,
also substituting in y = f (x) to obtain the required result in terms of x.
Example 1:

dx
y = log x, so x = ey , = ey
dy
dy 1 1
=⇒ = y = . (2.21)
dx e x

This result can be combined with chain rule to give the derivative of the log of a function,
i.e.
d log f (x) 1 df (x)
= (2.22)
dx f (x) dx

Example 2:

dx
y = arcsin x, so x = sin y, = cos y
dy
dy 1 1 1
=⇒ = =p =√ . (2.23)
dx cos y 2
1 + sin y 1 + x2

Example 3:

dx 1
y = arctan x, so x = tan y,
=
dy cos2 y
dy 2
cos y 1 1
=⇒ = cos2 y = 2 2 = 2 = 2 . (2.24)
dx cos y + sin y tan y + 1 x +1

11
2.2 Uses of differentiation

The calculation of derivatives has very many applications within financial mathematics
and applied mathematics as a whole and covering them all here is well beyond the scope
of this course. However, we run through a couple of the most important ones that you
need to be familiar with.

2.2.1 Maxima and minima of functions

One of the most important uses of derivatives is finding the local maxima and minima of
functions. This is explained in more detail in Section 3.2 of Chapter 3.

2.2.2 Taylor and Maclaurin series

An analytic function f (x) can be described by its Taylor series expansion around a

1 00 1 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + f 000 (a)(x − a)3 + f (4) (a)(x − a)4 + · · · .
2! 3! 4!
(2.25)

This can be written more compactly as


X f (n) (a)
f (x) = (x − a)n , (2.26)
n!
n=0

where we use the usual convention that 0! = 1


Sketch proof of the Taylor series. Assume that the function can be represented by a
polynomial series around a, i.e.

f (x) = c0 + c1 (x − a) + c2 (x − a)2 + c3 (x − a)3 + c4 (x − a)4 + · · · , (2.27)

then by repeatedly differentiating and setting x = a we can obtain the values of the
coefficients in terms of the derivatives of f (x) and a power of x − a

f (a) = c0
f 0 (x) = c1 + 2c2 (x − a) + 3c3 (x − a)2 + 4c4 (x − a)3 + · · ·
f 0 (a) = c1
f 00 (x) = 2c2 + 2 · 3c3 (x − a) + 3 · 4c4 (x − a)2 + · · ·
f 00 (a) = 2c2
f 000 (x) = 2 · 3c3 + 2 · 3 · 4c4 (x − a) + · · ·
f 000 (a) = 3!c3 (2.28)

12
and so on. A special case of the Taylor series is when a = 0, this is also referred to as a
Maclaurin series and is

X f (n) (0)
f (x) = xn , (2.29)
n!
n=0

Example 1:
Exponential function: use the Maclaurin series to expand around 0.

f (x) = ex ,
⇒ f (n) (x) = ex , ∀n = 1, 2, 3, ...
f (n) (0) = 1, forn = 0, 1, 2, 3, ...
x2 x3 x4
f (x) = 1 + x + + + + ··· (2.30)
2 3! 4!

Example 2:
Log function: use the Taylor series to expand log x around 1.

f (x) = log x, f (1) = 0,


1
f 0 (x) = , f 0 (1) = 1,
x
−1
f 00 (x) = 2 , f 00 (1) = −1,
x
2
f (x) = 3 , f 000 (1) = 2,
000
x
−2.3
f (4) (x) = 3 , f 000 (1) = −3. (2.31)
x

In general

f (n) (1) = (−1)n−1 (n − 1)! for n = 1, 2, 3, ... (2.32)

So for the first 4 terms

1 2! 3!
log x = 0 + 1 · (x − 1) − (x − 1)2 + (x − 1)3 + (x − 1)4 + · · ·
2 3! 4!
1 1 1
= (x − 1) − (x − 1) + (x − 1) − (x − 1)4 + · · ·
2 3
(2.33)
2 3 4

Example 3:

13
Log function: use the Maclaurin series to expand log(x + 1) around 0.

f (x) = log x + 1, f (1) = 0,


1
f 0 (x) = , f 0 (0) = 1,
x+1
−1
f 00 (x) = , f 00 (0) = −1,
(x + 1)2
2
f 000 (x) = , f 000 (0) = 2,
(x + 1)3
−2.3
f (4) (x) = , f 000 (0) = −3. (2.34)
(x + 1)3

So for the first 4 terms

1 2! 3!
log(x + 1) = x − x2 + x3 − x4 + · · · (2.35)
2 3! 4!

Example 4:
1
Geometric series: use the Maclaurin series to expand 1−x around 0

1
f (x) = , f (0) = 1,
1−x
−1 · −1
f 0 (x) = , f 0 (0) = 1,
(1 − x)2
−1 · −2
f 00 (x) = , f 00 (0) = 2,
(1 − x)3
−1 · −3 · 2
f 000 (x) = , f 000 (0) = 3!,
(1 − x)4
−1 · −4 · 3!
f (4) (x) = , f 000 (0) = 4!. (2.36)
(1 − x)5

so we can write the geometric series as


1 1 1 1 1 X 1
= 1 + + 2 + 3 + 4 + ··· = (2.37)
1−x x x x x xn
n=0

2.2.3 Taylor series approximations

We can use the Taylor and Maclaurin series in order to approximate functions in with a
few polynomial terms and this also allows us to make predictions about the error of these
approximations.

Example:

14
Approximate sin x up to the fifth order term using the Taylor series.

f (x) = sin x, f (0) = 0


f 0 (x) = cos x, f (0) = 1
f 00 (x) = − sin x, f (0) = 0
f 000 (x) = − cos x, f (0) = −1
f (4) (x) = sin x, f (0) = 0
f (5) (x) = cos x, f (0) = 1 (2.38)

For for small x i.e. such that the value of xn decreases with n we can approximate sin x
by

x3 x5
sin x ≈ x − + . (2.39)
3! 5!

The error of this approximation is O(x7 ). Note that although we cut the series off after
the 5th order term the error is not O(x6 ) as the even order terms in the Taylor series for
sin x are always 0. The notation O(xn ) is called Landau notation and we discuss this in
further detail in Chapter 4 but in this context it means that as x → 0, the error of the
approximation reduces at a rate of xn .

We can look at the general first and second order approximations for using the Taylor
series. In the following we replace x − a with ∆x to give the first order approximation as

f (x) = f (a) + f 0 (a)∆x. (2.40)

The error of the approximation is therefore


X ∆x2
f (n) (a) (2.41)
n!
n=2

which has an error of order O(∆x3 ).

The second order approximation is

f 00 (a)∆x2
f (x) = f (a) + f 0 (a)∆x + . (2.42)
2

which has an error of order O(∆x2 ).

The error orders for the two approximations are an upper bound. For cases where we can
show that some of the terms in the Taylor series are always zero, such as in the example
above, we may be able to specify a better error bound.

15
2.2.4 L’Hôpital’s rule

In the case where we have a quotient whose value is undefined, i.e. the numerator and
denominator are both zero or both infinite we can use L’Hôpital’s rule to determine the
value. This states that where

f (x)
is not defined, but
g(x) x=c
f 0 (x)

is defined, then
g 0 (x) x=c
f 0 (x)

f (x)
lim = 0 . (2.43)
x→c g(x) g (x) x=c

Example 1:
sin x
Evaluate x at x = 0.
d sin x dx
At x = 0 sin x = 0 and so the quotient is undefined. However dx = cos x and dx = 1 so

sin x cos x 1
lim = = = 1. (2.44)
x→0 x 1 x=0 1

Example 2:
ex
Evaluate x at x = ∞.
dex dx
At x = ∞ ex = ∞ and so the quotient is undefined. However dx = ex and dx = 1 so

ex ex


lim = = = ∞. (2.45)
x→∞ x 1 x=∞
1

16
Chapter 3

Study of functions and curve sketch-


ing

When we wish to understand the behaviour of a function f (x) there are several different
aspects that we can examine such as turning points, asymptotes, poles and convexity to
gain more of an insight into it’s properties.

3.1 Roots of polynomials

The roots of an equation are values of x for which f (x) = 0. For quadratic equations, i.e.
equations of the form
f (x) = ax2 + bx + c (3.1)

we can use the standard formula for the roots of a quadratic equation in Eq. (1.9). However
in the case of a polynomial of order greater than 2 we need to use other methods in order
to find the roots.

3.1.1 Polynomial division

For polynomial functions with rational roots we can use polynomial division (also known
as synthetic division). For example, if we wish to find the rational roots of

f (x) = 2x3 + 11x2 + 17x + 6 (3.2)

then we can use polynomial division. The first thing we must do is come up with an
Ansatz for one root, i.e. (αx + γ). The first thing we say is that α must be a factor of 2,
i.e. 1 or 2, and γ must be a factor of 6, i.e. 1, 2, 3, or 6. We can then try dividing by this
and see if it produces a remainder. For our first ansatz we choose 2x + 3 and dividing this
using polynomial (also known as synthetic) division. Dividing f (x) by our factor 2x + 3

17
we obtain
5
x2 + 4x + 2 (3.3)
2x3 + 11x2 + 17x + 6

2x + 3
− 2x3 − 3x2
8x2 + 17x
− 8x2 − 12x
5x + 6
15
− 5x − 2
3
− 2

The final line of Eq. (3.3) shows − −3


2 indicating that we have a remainder and that
therefore 2x + 3 is not a factor of f (x). In contrast if we attempt to divide f (x) by 2x + 1
we obtain
x2 + 5x + 6 (3.4)
2x3 + 11x2 + 17x + 6

2x + 1
− 2x3 − x2
10x2 + 17x
− 10x2 − 5x
12x + 6
− 12x − 6
0

where the final line of 0 means that we have no remainder and therefore 2x + 1 is a root
of f (x) and x = − 12 is a root. Moreover, from Eq. (3.4)we can see that the other roots of
f (x) are the roots of x2 + 5x + 6. Using Eq. (1.9) to find the roots of x2 + 5x + 6 gives
the factors of f (x) as

2x3 + 11x2 + 17x + 6 = (2x + 1)(x + 2)(x + 3) (3.5)

and therefore the roots are x = − 12 , −2, −3.

3.1.2 Iterative methods

The method in Section 3.1.1 assumed that at least one of the roots of the polynomial were
rational. This is not necessarily true of all or even some of the roots of a polynomial,
therefore we also describe some iterative methods to find the roots of functions.

Newton’s method Probably the most well known iterative method for finding the roots
of an equation is Newtons method. This requires knowledge of the derivative of f (x), f 0 (x)

18
and uses the following iterative relationship

f (xn )
xn+1 = xn − , (3.6)
f 0 (xn )

until the value of f (xn ) is closer to zero than some predefined tolerance. For example, to
find the roots of the function from Section 3.1.1, i.e.

f (x) = 2x3 + 11x2 + 17x + 6, (3.7)

we first calculate
f 0 (x) = 6x2 + 22x + 17, (3.8)

to obtain
2x3 + 11x2 + 17x + 6
xn+1 = xn − (3.9)
6x2 + 22x + 17
Starting with x0 = 0 we have

x0 = 0,
x1 = −0.3529,
x2 = −0.4814,
x3 = −0.4996,
x4 = −0.5000, (3.10)

which gives us the root − 12 in only 4 steps. However this is only one root so we must
choose other starting places to find the other two roots. For example

x0 = −1.5,
x1 = −2.1,
x2 = −1.9949,
x3 = −2.000 (3.11)

and

x0 = −3.5,
x1 = −3.1667,
x2 = −3.0284,
x3 = −3.0011,
x4 = −3.0000. (3.12)

which gives us the three roots of the equation.

19
Secant method Newton’s method assumes that we can easily find the derivative of
the function. In the cases that this is not easily done then we can use the secant method
where we replace the derivative with its finite difference approximation. This gives our
iterative procedure as
xn − xn−1
xn+1 = xn − f (xn ) . (3.13)
f (xn ) − f (xn−1 )
In order to use this we must first select two starting values. For the example in the previous
section, we select x0 = 0 and x1 = −0.2 and we have

x0 = 0,
x1 = −0.2,
x2 = −0.4032,
x3 = −0.4766,
x4 = −0.4978,
x5 = −0.4999,
x6 = −0.5000. (3.14)

In general the convergence of the secant method is slightly slower than Newton’s method.

3.2 Maxima and minima of functions


df
To find the local minimum, maximum of a function find the value(s) of x such that dx =0
d2 f
and then put this/these x value(s) into dx2
and determine the sign:

d2 f
> 0, maximum, (3.15)
dx2
d2 f
< 0, minimum. (3.16)
dx2

This is illustrated in Figure 3.1

20
40

30

20

10
y

-10
y=f(x)
-20 y=f'(x)
y=f''(x)
-30

-0.5 0 0.5 1 1.5 2 2.5 3 3.5


x
Figure 3.1: We plot f (x) = 3x3 −10x2 +4x−1, f 0 (x) = 9x2 −20x+4 and f 00 (x) = 18x−20.
We can see that f 0 (x) = 0 occurs for the same value of x as the maximum and minimum
of f (x) and f 00 (x) is positive for a local maximum and negative for a local minimum.

3.3 Inflection points, concavity and convexity


In addition to the two situations described in Eqs. (3.15) and (3.16), we can also consider
d2 f
the situation where dx2
= 0; this means that the function has an infection point. If this
df
point coincides with the point where dx = 0 then the function is parallel with the x-axis
but does not change direction. This is illustrated in Figure 3.2.

21
30

20

10

0
y

-10

y=f(x)
-20 y=f'(x)
y=f''(x)
-30
-3 -2 -1 0 1 2 3
x

Figure 3.2: We plot f (x) = x3 , f 0 (x) = 3x2 and f 00 (x) = 6x. We can see that f 0 (x) = 0
occurs for the same value of x as f 00 (x) = 0 and although, the curve becomes parallel to
the x-axis at x = 0, it remains non-decreasing for the entire domain of x.

The inflection point, i.e. the value of x where f 00 (x) = 0 also has significance if we
consider the concept of the convexity and concavity of functions. A concave function is
one where for all α ∈ (0, 1)

f (x1 α + x2 (1 − α)) ≤ αf (x1 ) + (1 − α)f (x2 ). (3.17)

That is the function of an average of two points is less than the average of
the function at those two points. Conversely a convex function is one where for all
α ∈ (0, 1)

f (x1 α + x2 (1 − α)) ≥ αf (x1 ) + (1 − α)f (x2 ). (3.18)

Functions can also be strictly concave or convex, in which case ≤ and ≥ in Eqs. (3.17) and
(3.18) are replaced by their respective strict inequalities. Concave and convex functions
are illustrated in Figure 3.3.
Much of mathematical finance concerns calculating the maximum of a function (e.g.
return) or the minimum of a function (e.g. risk). Therefore as convex functions only have
a single global minimum (i.e. there are no local minima) and concave functions only have a
single global maximum, it is important to be able to show whether functions are concave
or convex if we are carrying out optimisation. This is related to the second derivative
f 00 (x) so that if f 00 (x) ≥ 0 ∀xthe function is convex and if f 00 (x) ≤ 0 ∀x the function is
concave.
Functions may also have convex and concave regions, the point where a function moves

22
14 2
f(x)
12
f(x1 )+(1- )f(x2 )
1.5
10

8 1
f(x)

f(x)
6
0.5
4
0 f(x)
2 f(x1 )+(1- )f(x2 )

0 -0.5
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1 0 1 2
x x

Figure 3.3: Convex and concave functions. The left hand plot shows the convex function
f (x) = ex and αf (x1 ) + (1 − α)f (x2 ) for x1 = −1, x2 = 2 and α varying from 0 to 1. The
2
√ plot shows the concave function f (x) = 2 − x and αf (x1 ) + (1 − α)f (x2 ) for
right hand
x1 = − 2, x2 = 1 and α varying from 0 to 1. Notice that the straight line lies above f (x)
in case of the convex function and below f (x) in the case of the concave function.

from a convex to a concave region is the inflection point, i.e. where f 00 (x) = 0. This is
illustrated in Figure where we show how αf (x1 ) + (1 − α)f (x2 ) is above or below f (x)
depending on where we are relative to the inflection point.
10 40

5 20

0 0
y1

y2
-5 -20
y 1 =f(x)
y 1 = f(-0.8)+(1- )f(1)
-10 y 1 = f(1.25)+(1- )f(3) -40
y 2 =f''(x)

-15 -60
-1 -0.5 0 0.5 1 1.5 2 2.5 3
x

Figure 3.4: This figure shows f (x) = 3x3 − 10x2 + 4x + 4 and αf (x1 ) + (1 − α)f (x2 )
for values of x above and below the inflection point plotted against the left hand y-axis.
Notice how the function is concave for values of x below the inflection point and convex
for values of x above the inflection point. The second order derivative f 00 (x) = 18x − 20 is
plotted against the right hand side y-axis and shows that the inflection point corresponds
to the value of x where f 00 (x) = 0.

3.4 Asymptotes and poles

Asymptotes are defined as a line or a curve which approaches a given function arbitrarily
closely. We will look at horizontal, vertical, oblique and curvilinear asymptotes separately.

23
3.4.1 Horizontal asymptotes

Probably the situation where you have most frequently come across asymptotes is in the
context of horizontal asymptotes, i.e. where a function approaches a constant as x → +∞
or x → −∞. A function can have at most two horizontal asymptotes and can approach
2x2 +3
them from above or below. Figure 3.5 shows an example for f (x) = x2 +1
and we can
see that the horizontal asymptote for both x → −∞ and x → +∞ is 2 and the function
approaches this value from above. There are several straightforward rules governing the

3.5
f(x)=(2x2 +3)/(x2 +1)
f(x)=2
3
f(x)

2.5

1.5
-6 -4 -2 0 2 4 6
x
2x2 +3
Figure 3.5: The function f (x) = x2 +1
and its horizontal asymptote of f (x) → 2 as
x → ±∞.

value and behaviour of horizontal asymptotes:

Rational functions If f (x) is of the form of a ratio between two polynomials of the
form
axp + bxp−1 + · · · + c
f (x) = (3.19)
dxq + exq−1 + · · · + f
then if p ≤ q there is a horizontal asymptote as x → ±∞. If p = q then the asymptote is
a
d and if p < q the asymptote is equal to zero. The p = q case is illustrated in Figure 3.5
above where the asymptote is 2.

Exponential functions If f (x) is of the form αg(x) , where g(x) is a rational function,
then it is straightforward to determine if one or two horizontal asymptotes exist. If the
axp +bxp−1 +···+c
rational function g(x) = dxq +exq−1 +···+f
is such that p < q then there are two asymptotes
as x → ∞ and x → −∞ and these are both equal to 1. If p > q then asymptotes may exist
p
depending on the values of p, q, a and d. For odd values of q then there is one asymptote
a a
as x → −∞ if d > 0 or one asymptote as x → ∞ if d < 0; these are both equal to zero.

24
p a
For even values of q then asymptotes only exist if d < 0 and in these cases there are two
asymptotes at x → ±∞ and these are both equal to 0. For the final case of p = q there
a
are two asymptotes at x → ±∞ and these are both equal to α d . Examples of these cases
are illustrated in Figure 3.6. Notice that the left hand plot also has a black vertical line
at x = 0 this denotes the existence of a vertical asymptote which are covered in Section
3.4.2.
20
20 2 2
f(x)=e(2x +3)/(x +1)

3.5 f(x)=e1/x f(x)=e-x


f(x)=0 f(x)=e2
f(x)=1 15
3
15
2.5 10

f(x)
f(x)
f(x)

2
5
1.5 10
1
0
0.5
-5 5
-20 -10 0 10 20 -3 -2 -1 0 1 2 3 -8 -6 -4 -2 0 2 4 6 8
x x x

axp +bxp−1 +···+c


Figure 3.6: Asymptotes for functions of the form f (x) = e dxq +exq−1 +···+f ; the left hand
figure shows an example of the behaviour when p < q, the middle figure when p > q and
the right hand figure when p = q.

3.4.2 Vertical asymptotes

Vertical asymptotes (also often referred to as poles) of a function are defined as a vertical
line x = x0 where at least one of the following is true:

lim f (x) = ±∞, (3.20)


x→x+
0

lim f (x) = ±∞. (3.21)


x→x−
0

(3.22)

Here x → x−
0 indicates that the value of x is approaching the constant x0 from the left and
x → x+
0 indicates that it is approaching it from the right. Unlike horizontal asymptotes
where there can be at most two, there can be an infinite number of vertical asymptotes.
In general, a function has a vertical asymptote when it can be expressed as a fraction and
the denominator is equal to zero.

Rational functions If f (x) is of the form of a ratio between two polynomials of the
form
axp + bxp−1 + · · · + c
f (x) = (3.23)
dxq + exq−1 + · · · + f
then, in general, vertical asymptotes exist at values of x where the denominator is zero,
i.e. at the roots of dxq + exq−1 + · · · + f (we will cover the exception to this rule below).

25
For example, for
x2 − 2x − 3
f (x) = (3.24)
x3 − x2 − 4x + 4
we can factorise the denominator into x3 − x2 − 4x + 4 = (x − 1)(x + 2)(x − 2) which
suggests we should have vertical asymptotes at x = −2, 1, 2. This is shown in Figure 3.7,
notice that the function has a different value depending on which direction (left or right)
that they are approached from, e.g.

lim f (x) = +∞, (3.25)


x→−2+

lim f (x) = −∞. (3.26)


x→−2−

(3.27)

20
f(x)=(x2 -2x-3)/(x 3 -x 2 -4x+4)

10
f(x)

-10

-20
-3 -2 -1 0 1 2 3
x
x2 −2x−3
Figure 3.7: The function f (x) = x3 −x2 −4x+4
and its vertical asymptotes (marked in red)
of x = −2, x = 1 and x = 2.

We explore the effect of the direction of approach and the number of asymptotes in more
1
detail using the example f (x) = x3 −3x+2
. The denominator is a third order polynomial
so we would expect the function to have at most 3 vertical asymptotes. We have plotted
f (x) and the denominator in Figure 3.8 and we can notice several things. Firstly there
are only two asymptotes at x = −2, 1 rather than the three we might expect. Secondly,
for the asymptote at x = 1 we notice that the f (x) tends towards +∞ regardless of the
direction of approach, conversely we can observe that there is a change of sign either side
of the asymptote at x = −2. These can be linked back to way that the denominator can
be separated into its factors. In this case x3 − 3x + 2 = (x − 1)2 (x + 2), i.e. we have a
double root at x = 1 which is the reason why we only have two asymptotes rather than

26
one. Moreover we can also see from Figure 3.8 that the double root coincides with the
minima of a function and therefore the sign of the denominator is unchanged either side of
x = 1. As the sign of the denominator does not change, the sign of f (x) does not change
either side of the asymptote and the f (x) tends towards +∞ regardless of whether we
approach from the left or right. Contrast this with the behaviour of the denominator close
to the other asymptote at x = −2. On the left hand side of x = −2 the denominator is
less than zero, on the right hand side it is greater than zero, therefore the sign of f (x) is
different on the left hand and right hand side of the asymptote.

10 10

5 5
f(x)

f(x)
0 0

-5 -5 f(x)=x3 +3x+2
f(x)=1/(x3 +3x+2)

-10 -10
-4 -2 0 2 4 -3 -2 -1 0 1 2 3
x x

Figure 3.8: Vertical asymptotes f (x) = x3 − 3x + 2 = (x − 1)2 (x + 2); the left hand figure
shows the behaviour of f (x) close to its asymptotes at x = −2 and x = 1 and the right
hand figure shows the behaviour of the denominator around its roots.

We saw that functions have fewer asymptotes than the polynomial order of their de-
nominators in the case of a double root. However, there is another situation where we
may have fewer asymptotes than we might expect looking at the order of the denominator
alone. For example consider

x2 + 2x − 3
f (x) = (3.28)
x3 − x2 − 4x + 4

which is plotted in Figure 3.9. Looking at the order of the denominator we might expect
there to be three vertical asymptotes at x = −2, x = 1 and x = 2 because x3 −x2 −4x+4 =
(x−1)(x+2)(x−2). However we can see from Figure 3.9 that there are only two asymptotes
at x = ±2. We can understand this by finding the roots of the numerator as well as the
(x−1)(x+3) x+3
denominator so that f (x) = (x−1)(x+2)(x−2) = (x+2)(x−2) , i.e. the root of the denominator
x−1 is also a root of the numerator so they cancel out leaving only two vertical asymptotes.

Trigonometic functions The functions of sin x and cos x have periodic zeros at π/2+
kπ and kπ respectively for integer k. Therefore, any function which divides by one of these
will have vertical asymptotes anywhere where the denominator is zero (and not cancelled

27
20
f(x)=(x2 +2x-3)/(x3 -x 2 -4x+4)
15
10
5
f(x)

0
-5
-10
-15

-4 -3 -2 -1 0 1 2 3 4
x
x2 −2x−3
Figure 3.9: The function f (x) = x3 −x2 −4x+4
and its vertical asymptotes (marked in red)
of x = ±2.

out by the numerator), as before. One example of this is

sin x
f (x) = tan x = (3.29)
cos x

which is plotted in Figure 3.10. It can be seen that there are regularly spaced asymptotes
at x = π/2 + kπ where k. These coincide with the values of x where cos x = 0. Note also
that as there are a countably infinite number of values of x where cos x = 0, then tan x
has a countably infinite number of vertical asymptotes.

20

10
tan x

-10

-20
-8 -6 -4 -2 0 2 4 6 8
x

Figure 3.10: The function f (x) = tan x, notice its vertical asymptotes at x = π/2 + kπ,
where k is an integer.

28
Log functions It is well known that for a given function f (x), then the plot of y =
f −1 (x) is a mirror image of y = f (x) about the line x = y. Therefore if f (x) is an injective
(i.e. one to one) function with horizontal asymptotes then we can f −1 (x) has vertical
asymptotes. An example of this is f (x) = ex which has a single horizontal asymptote at
0 as x → −∞. Therefore g(x) = f −1 (x) = log x has a vertical asymptote at x where
f (x) → −∞. This is illustrated in Figure 3.11.

10

5
f(x)

f(x)=ex
-5 f(x)=log(x)
f(x)=x

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
x

Figure 3.11: The functions f (x) = ex and f (x) = log x, notice the way that the two
functions are mirrored around f (x) = x and that the horizontal asymptote of f (x) = ex
corresponds to the vertical asymptote of f (x) = log x.

3.4.3 Oblique asymptopes

axp +bxp−1 +···+c


For rational functions of the form f (x) = dxq +exq−1 +···+f
we have seen that horizontal
asymptotes occur when p ≤ q. However in the case that p = q + 1 we obtain something
known as an oblique asymptote where the function approaches a straight line which is not
parallel with the x or y axes. For example

2x2 + 3x + 1
f (x) = (3.30)
x+5

where p = 2 and q = 1. This function is displayed in Figure 3.12 along with a diagonal
line which is an asymptote for f (x) as x → ±∞; there is also an asymptote at x = 5. In

29
60

40

20

0
f(x)

-20

-40
f(x)=(2x2 +3x+1)(x+5)
-60 f(x)=2x-7
x=-5
-80

-40 -30 -20 -10 0 10 20 30


x

2
Figure 3.12: The function f (x) = 2x x+5
+3x+1
. Notice that we have a vertical asymptote at
x = −5 an oblique asymptote of 2x − 7 as x ± ∞.

order to find the asymptote we divide the numerator by the denominator, i.e.

2x − 7 (3.31)
2x2 + 3x + 1

x+5
− 2x2 − 10x
− 7x + 1
7x + 35
36

3.5 Jumps: left and right continuity and holes

So far we have solely looked at continuous functions. However, we may also be required to
analyse functions with jumps, particularly CDFs of discrete random variables. A simple
example of a function with a jump is

1 x < 1
f (x) = (3.32)
2 1 ≤ x,

and this is plotted in Figure 3.13. It is particularly important to understand the rela-
tionship between the use of ≤ and < and the influence this has on the behaviour of the
function. The function described in Eq. (3.32) and plotted in Figure 3.13 is described
as càdlàg which is short for “continue à droite, limite à gauche”. This means that the
function is continuous to the right and limited to the left, i.e. for every value of x there
is a value of x to the left of it where the function is continuous, whereas there are values

30
of x where you cannot move to the left without encountering a jump. For example, say
you are at x = 0.9 then f (x) = 1, however you can also move right to x = 0.99, for
example and f (x) remains equal to 1. Furthermore, you can increase the value of x to
0.999, 0.9999 and 0.99999, etc., and the function remains continuous. In contrast if we
are at x = 1 then we cannot reduce the value of x at all without encountering the jump.
Notice the way that f (x) is plotted with respect to the continuous and limited sides of
the discontinuity: on the continuous side, we have an open circle and on the limited side
we have a closed circle. Note that the opposite situation is described as càglàd, short for
“continue à gauche, limite à droite”.

2.5

2
f(x)

1.5

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x

Figure 3.13: A càdlàg function with a jump at x = 1. Notice that the continuous side of
the function is denoted with an open circle and the limited side of the function is denoted
with a closed circle.

3.5.1 Cumulative distribution functions

For continuous random variables, the cumulative distribution function (CDF) is continu-
ous. However for discrete random variables the CDF has jumps; in this section we see how
the definition of CDFs means that they are always càdlàg for discrete random variables.
The definition of the CDF for a random variable X is

F (x) = P (X ≤ x), (3.33)

notice the use of ≤ in the above equation as this is significant for the right continuity of
the function. In order to see that the definition in Eq. (3.33) automatically leads to a
càdlàg function we use the example of a fair 6-sided dice as an example. The probability
1
mass function is P (X = x) = 6 for x ∈ {1, 2, 3, 4, 5, 6}. As before, in order to understand
the continuity of the function we can consider what happens when we are either side of

31
1
a discontinuity. For example let x = 1.9 then P (X ≤ 1.9) = 6 and we can increase the
value of x to 1.99, 1.999, 1.9999 and the CDF function remains equal to 61 . However if we
2
are at x = 2 then P (X ≤ 2) = 6 and we cannot reduce our value of x without the CDF
1
changing to a value of 6.

3.6 Curve sketching

Using the techniques in this chapter we can make sketch graphs of functions in order to
visualise their characteristics. We should take the following steps

1. Find the value where the function crosses the y-axis, i.e. set x = 0 and evaluate.

2. Find the points where the function crosses the x-axis. For monotonic functions with
an inverse f −1 (x) we can use f −1 (0). For example if f (x) = log x then we can find
the point it crosses the x-axis with f −1 (0) = e0 = 1. For quadratic equations we
use Eq. (1.9) and for higher order polynomials the direct and iterative techniques
described in Section 3.1 can be employed. For more complex functions we can also
use the iterative methods described in Section 3.1.2.

3. Using the methods described in Section 3.2 find the turning points of the function
and identify if they are minima or maxima.

4. Identity the horizontal/vertical and oblique asymptotes and from which direction
the function approaches them.

5. Identity any discontinuities and whether the function is càdlàg or càglàd.

Note that these are the things that must be taken considered, they do not necessarily all
exist for all functions.
Example 1: Sketch the graph for
x
f (x) = . (3.34)
1−x
Following the steps above

0
1. f (0) = 1−0 so the function crosses the y-axis at x = 0, y = 0

2. To find the points where a rational function crosses the x-axis we only need examine
the numerator in this case x which clearly has a single root of x = 0

3. Differentiating the equation using quotient rule gives

1
f 0 (x) = . (3.35)
(1 − x)2

This only becomes zero at x = ±∞ and so there are no local crossing points

32
x
4. As x → ±∞, f (x) → −x = −1 so a horizontal asymptote of f (x) = −1 exists for
both −∞ and +∞. We can see from Eq. (3.35) that the gradient is positive for all
values of x so the function approaches the asymptote from above as x → −∞ and
below as x → +∞. The denominator of f (x) is zero at x = 1 and so we have a
vertical asymptote there. Again, as the gradient is always positive, we can see that
f (x) → +∞ as x → 1− and f (x) → −∞ as x → 1+ .

A plot of the function is shown on the left hand side of Figure 3.14.
Example 2: Sketch the graph for

f (x) = x3 − 9x2 + 5x − 15 (3.36)

Following the steps above

1. f (0) = −15 so the function crosses the y-axis at x = 0, y = −15

2. Using the polynomial division method described in Section 3.1.1 we can produce
guesses for roots at x = ±1, ±3, ±5. If we pick x = 3 we can attempt to divide f (x)
by (x − 3) for
x2 − 6x + 5 (3.37)
x3 − 9x2 + 23x − 15

x−3
− x3 + 3x2
− 6x2 + 23x
6x2 − 18x
5x − 15
− 5x + 15
0

This divides perfectly with no remainder and we can use Eq. (1.9) to factorize x2 −
6x + 5 = (x − 1)(x − 5) so the three roots of f (x) are 1, 3and5. Thus f (x) = 0 at
x = 1, x = 3 and x = 5.

3. Differentiating the f (x) gives

f 0 (x) = 3x2 − 18x + 23. (3.38)

solving f 0 (x) = 0 using Eq. (1.9) gives turning points of x = 3 ± √2 .


3
Differentiating
again we have
f 00 (x) = 6x − 18. (3.39)

Thus for x = 3 + √2
3
f 00 (x) > 0 and this point is a minimum and for x = 3 − √2
3
f 00 (x) < 0 and so this point is a minimum. The values of f (x) at the turning points
are f (3 + √2 ) = 3.0792 and f (3 − √2 ) = 3.0792
3 3

33
4. As x ± ∞ f (x) ± ∞ so there are no horizontal asymptotes and as there are no values
of |x| < ∞ where f (x) → ∞, there are also no vertical asymptotes.

A plot of the function is shown on the right hand side of Figure 3.14.
10 15
f(x)=x/(1-x) f(x)=x3 -9x 2 +23x-15
x=1
10
f(x)=-1
5
5

f(x)
f(x)

0 0

-5
-5
-10

-10 -15
-10 -5 0 5 10 0 1 2 3 4 5 6
x x

Figure 3.14: Plots of the functions from the curve sketching examples 1 and 2 above.
The left hand side plot shows f (x) = x/(1 − x) and the right hand plot shows f (x) =
x3 − 9x2 + 5x − 15.

34
Chapter 4

Inequalities and Landau notation

So far, when solving equations f (x) for x we have only considered functions with an
equality, for example x2 − 1 = 0 gives the solution x = −1, 1. However we may also come
across functions written as inequalities, e.g.

x2 + 3x − 2 > 0 or (4.1)
x−1 ≤ e−x , (4.2)

where the solution is the value, or range of values, of x that solve the equation. The
inequality is written as < or > for strict inequalities where the two sides of the equa-
tion cannot be equal and ≤ or ≥ for “less than or equal” and “greater than or equal”
respectively.

4.1 Basic rules of inequalities

In the same way that we can solve equations by performing the the same operations on
both left hand and right sides of the equation (e.g addition, subtraction, etc.) we can do
this with inequalities. However care must be taken this as some operations may change
the direction of the inequality.

Operations preserving the direction of the inequality There are several oper-
ations that can be performed on both sides of the inequality which preserve the direction
of the inequality.

• Addition and subtraction, eg:

– If a ≥ b and c ∈ R then a + c ≥ b + c

– If a < b and c ∈ R then a − c < b − c

• Multiplication and division by a positive constant, eg:

35
– If a > b and c > 0 then ac > bc
a b
– If a ≤ b and c > 0 then c ≤ c

• Application of an increasing monotonic function to both sides of an inequality pre-


serves the direction of the inequality but may change it from a strict inequality to a
non-strict one depending on whether the function is strictly monotonic or not. For
example

– If a > b and f (x) is strictly monotonically increasing then f (a) > f (b)

– If a > b and f (x) is monotonically increasing but not strictly so then f (a) ≥
f (b)

– If a ≤ b and f (x) is strictly monotonically increasing then f (a) ≤ f (b)

– If a ≤ b and f (x) is monotonically increasing but not strictly so then f (a) ≤


f (b)

Note that when we are applying a function to both sides of an inequality, we must be
careful that the domain of x for which f (x) is monotonic at least covers the domain
of a and b. For example if a, b > 0 then if a < b then a2 < b2 . However, if a, b ∈ R
then there are values of a and b for which a < b but a ≮ b. We can generalise this
as follows

– For positive a and b then raising both sides of the inequality to a positive
power n preserves both strict and non-strict inequalities. That is if a < b and
a, b, n > 0 then an < bn .

– There is also a special case when n is a positive, odd integer where the inequality
is preserved for a, b ∈ R.

Operations switching the direction of the inequality There are several oper-
ations that switch the direction of the inequality

• Multiplication and division by a negative constant, eg:

– If a > b and c > 0 then −ac < −bc

– If a ≤ b and c > 0 then − ac ≥ − cb

• Application of an decreasing monotonic function to both sides of an inequality


switches the direction of the inequality but may change it from a strict inequal-
ity to a non-strict one depending on whether the function is strictly monotonic or
not. For example

– If a > b and f (x) is strictly monotonically decreasing then f (a) < f (b)

36
– If a > b and f (x) is monotonically decreasing but not strictly so then f (a) ≤
f (b)

– If a ≤ b and f (x) is strictly monotonically decreasing then f (a) ≥ f (b)

– If a ≤ b and f (x) is monotonically decreasing but not strictly so then f (a) ≥


f (b)

Again we must be careful that the domain of monotonicity is the same as the domain
of the expressions on either side of the inequality and for negative powers we have

– For positive a and b then raising both sides of the inequality to a negative power
−n switches both strict and non-strict inequalities. For example, if a < b and
a, b, n > 0 then a−n > b−n .

– There is also a special case where −n is a negative, odd integer where the
inequality is switched for a, b ∈ R.

In order to better understand these concepts in practise we can look at some examples.
Example 1: Raising both sides to a positive power. Say a = 2, b = 4, n = 2 then
a = 2 < 4 = b and a3 = 4 < 16 = b2 so the strict inequality is preserved. However,
what if we move a and b out of the range where the an and bn are strictly monotonically
increasing functions, for example a = −2, b = −4, n = 2. Then b = −4 < −2 = a but
b2 = 16 > 4 = a2 so the direction of the inequality is not preserved.

Example 2: Raising both sides to a negative power. Say that again a = 2, b = 4, n = 2


then a = 2 < 4 = b and a−2 = 1
4 > 1
16 = b−2 so the strict inequality switches direction.

Example 3: Logarithmic function. Logarithms, e.g. log(x) are strictly monotonically in-
creasing for positive values of x. The stipulation that values must be greater than 0 is
not necessarily a limitation when we are working with financial data as there are many
quantities that will not go below zero, for example asset prices, exchange rates, etc. So if
x ≥ y then log(x) ≥ log(y).

Example 4: Exponentials. Exponential functions e.g. ex are strictly monotonic for all
x ∈ R, so if x < y then ex < ey .

4.2 Solving inequalities

We have looked at solving equations f (x) = x for x, however we can also solve equations
involving inequalities, usually to produce a range of values.

37
For example, how would we solve x2 + 3x < −2 for x? First move everything to one
side of the inequality for
x2 + 3x + 2 < 0 (4.3)

We can then factorise the inequality using Eq. (1.9) to find the roots of the equation on
the left hand side of the inequality, i.e.

(x + 1)(x + 2) < 0. (4.4)

Having factorised the expression we know that x2 +3x+2 crosses the y-axis at x = −2 and
x = −1 so these provide the boundary between the range of x values where the inequality
holds and the range where it does not hold. However we also need to see which side of
these values the range of x which solves the inequality is, there are several ways of doing
this:

• By inspection 1: we can see that in order for (x + 1)(x + 2) to be negative then one
(but not both) either (x + 1) or (x + 2) must be negative. The only values of x which
meet this condition are between -2 and -1 and therefore x ∈ (−2, −1) is the solution
to the inequality. Note that the solution is an open rather than a closed set. This
is because the inequality is strict. If the inequality was non strict then the solution
would be a closed set.

• By inspection 2: We see that as x → ±∞ x2 + 3x + 2 → ∞ and therefore as the


function only crosses the y-axis twice the expression must be less than zero between
its two roots and therefore x ∈ (−2, −1).

• For more complicated expressions you can differentiate and find the direction of the
function (i.e. positive or negative gradient at the roots which will tell you whether
the function is going from positive to negative or vice versa.

• For very complicated expressions it can be useful to sketch the curve to understand
the values of x where you are above and below the x axis.

4.3 Landau notation

We saw Landau notation, which is also known as “big O” notation in Chapter 2, Section
2.2.3 with reference to the error seen in truncated Taylor series.
More formally, it is a way of describing the behaviour of a function f (x) as x approaches
some value. Very often we are concerned with the behaviour of the function as x → ∞
but also, as we saw for the work on Taylor series, we may wish to examine the behaviour
as x → 0. In fact we can look at the behaviour at the limit x → a, where a can be any
value but 0 and ∞ are the most commonly used.

38
So, if for a real or complex valued function f (x) and a real positive valued function
g(x) we say

f (x) = O g(x) as x→a (4.5)

if and only if there exist positive numbers c and δ such that

|f (x)| ≤ cg(x)as (x − a) → δ. (4.6)

In general we choose the tightest bound, for example if f (x) = O x3



as x → 0 then
but definition we can find a constant c such that |f (x)| ≤ cx3 as x → 0 also stands to
reason that we can find a constant c2 such that |f (x)| ≤ c2 x2 as x → 0, but we would
write f (x) = O x3 as this is the tighter bound.


Example 1: The Taylor series for cos x is


x2 x4 x6 X x2n
cos x = 1 − + − + ··· = (−1)n . (4.7)
2! 4! 6! (2n)!
n=0

If we approximate this for small x by the three terms

x2 x4
cos x ≈ 1 − + (4.8)
2! 4!

we can see this has an error O(x6 ) as x → 0 because as x becomes very small the first
6
truncated term ( x6! ) will become the most significant term in the error.
Example 2: How does a function

f (x) = 3x3 + 4x2 + 5x + 2 (4.9)

behave as x → ∞? We can see that for x ≥ 1, 3x3 + 4x2 + 5x + 2 ≤ 14x3 therefore we can
say that
f (x) = O x3

as x → ∞. (4.10)

We can notice that we do not need to state the constant c = 14 anywhere in the Landau
notation, it is enough that a finite constant exists in order to use Landau notation.
Landau notation is also used to define the computational complexity of algorithms.
If an algorithm with n input variables is defined as having a complexity of O(n2 ) this
means that as the number of inputs variables increases, the computational time increases
by square of the number of input variables.

4.3.1 Log graphs

When we are measuring output errors or computational times and we wish to see how
these are bounded, it can be useful to make use of log graphs, where both axes have a log
scale and semi-log graphs where only one of the x or y axes have a log scale. These can

39
indicate trends more clearly that graphs with the usual linear scale. The disadvantage is
that you cannot plot negative numbers on a log scale. However if we are assessing the
trend of the functions for the purpose of Landau notation then we can see from Eq. (4.6)
that we would wish to plot |f (x)| and so the only value that could not be plotted on a log
axis is f (x) = 0 and/or x = 0 depending on the type of log plot used.
When we plot a power function of the form xn on a loglog graph this gives a straight
line with the gradient depending on the value of n, see Figure 4.1 for examples of both
positive and negative values of n.

10 12 10 0
f(x)=x2 f(x)=1/x
10 f(x)=x4 f(x)=1/x 3
10
10 -2

10 8
10 -4
f(x)

f(x)

10 6

10 -6
10 4

10 -8
10 2

-10
10 0 10
10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3
x x

Figure 4.1: Log-log graphs showing power functions of the form xn where n is positive for
the left hand plot and negative for the right hand plot.

When we plot an exponential of the form abx , where a is a positive constant and b ∈ R,
on a graph with a linear x-axis and a logarithmic y-axis we obtain a straight line with the
gradient depending on the values of a and b, see Figure 4.2
When we plot a logarithmic function of the form log axb where a is positive and b ∈ R
on a graph with a logarithmic x-axis and a linear y axis we obtain as straight line where
a determines the intercept with the y-axis and b determines the gradient.

40
10 100
f(x)=ex
f(x)=e2x
80
10

10 60
f(x)

40
10

10 20

10 0
10 20 30 40 50 60 70 80 90 100
x

Figure 4.2: Graph showing exponential functions of the form ebx .

14
f(x)=log x
12 f(x)=log 2x
f(x)=log x2
10

8
f(x)

0
0 1 2 3
10 10 10 10
x

Figure 4.3: Graph showing logarithmic functions of the form log axb .

41
4.3.2 Ordering functions by bound

Often we are interested in how functions behave relative to each other as x approaches a
limit, especially as x → ∞. If we have the following functions on at least a semi-infinite
domain D with bounds:

f1 (x) = O(x−1 ) as x→∞ (4.11)


f2 (x) = O(x−2 ) as x→∞ (4.12)
f3 (x) = O(e−x ) as x→∞ (4.13)
f4 (x) = O(e−2x ) as x→∞ (4.14)
−x2
f5 (x) = O(e ) as x→∞ (4.15)
−x3
f6 (x) = O(e ) as x→∞ (4.16)
−ex
f7 (x) = O(e ) as x→∞ (4.17)

then we can find positive constants c1 , c2 , ... so that

|f1 (x)| > c1 |f2 (x)| > c2 |f3 (x)| > c3 |f4 (x)| > c4 |f5 (x)| > c5 |f6 (x)| > c6 |f7 (x)| as x → ∞.
(4.18)
Furthermore, if the functions have a finite bound over their entire domains D, then we
can find positive constants κ1 , κ2 , ... so that

|f1 (x)| > κ1 |f2 (x)| > κ2 |f3 (x)| > κ3 |f4 (x)| > κ4 |f5 (x)| > κ5 |f6 (x)| > κ6 |f7 (x)| ∀ x ∈ D.
(4.19)
However we are not able to find positive constants a1 , a2 so that |f1 (x)| < a1 |f2 (x)| or
|f5 (x)| < a2 |f6 (x)|, for example.

42
Chapter 5

Integration

If g(x) = f 0 (x) then we can specify two different types of integrals, indefinite and definite.
The calculation of an indefinite integral means that you are calculating the antiderivative
of a function, i.e. Z
g(x)dx = f (x) + c. (5.1)

Notice that there are no limits on the integral sign and the presence of the constant c.
The output of an indefinite integral is a function of x. In mathematical finance we often
calculate definite integrals, i.e.
Z b
g(x)dx = [f (x)]ba = f (b) − f (a). (5.2)
a

Notice that we obtain a single value, the area between g(x) and the x-axis from x = a
to x = b, rather than a function of x, and the constant c is no longer required. This is
illustrated in Figure 5.1.

5.1 Basic rules

As might be expected looking at the definite integral in Eq. (5.2) swapping the limits on
the integral changes the sign of the result, i.e.
Z b Z a
g(x)dx = − g(x)dx. (5.3)
a b

Similarly to differentiation, integration is also a linear operation.


For indefinite integrals, using Lagrange notation for the derivative of a function, if h0 (x) =
f 0 (x) + g 0 (x) then
Z Z Z Z
0 0 0 0
f (x)dx+ g (x)dx = f (x)+g (x)dx = h0 (x)dx = h(x)+c = f (x)+g(x)+c (5.4)

43
15

10

5
f(x)

-5

-10
-1 -0.5 0 0.5 1 1.5 2 2.5 3
x

Figure 5.1: The area of the hatched section between


R 2 the curve f (x) = 3x3 − 10x2 + 4x + 10
and the x-axis is equal to the definite integral 0.5 3x − 10x2 + 4x + 10dx
3

Note that although we have two integrals we can combine the arbitrary constants into a
single value c.

For definite integrals we must also take into account the limits on the integral when we
are combining functions. Where the limits are the same, it is straightforward to combine
the two integrals, e.g.
Z b Z b Z b
0 0
f (x)dx + g (x)dx = f 0 (x) + g 0 (x)dx
a a a

= [f (x) + g(x)]ba = f (b) + g(b) − (f (a) + g(a)). (5.5)

Where the limits of the two integrals are not the same then we must take slightly more
care, for example with the limits a < b < c < d
Z c Z d Z b Z c Z d
0 0 0 0 0
f (x)dx + g (x)dx = f (x)dx + f (x) + g (x)dx + g 0 (x)dx. (5.6)
a b a b c

Notice that we can only combine the integrals where the limits overlap. Where there is
Rb Rd
no overlap between the limits, for example for a f 0 (x)dx + c g 0 (x)dx with a < b < c < d,
we cannot combine the integrals at all.

The linearity property of integrals is important in mathematical finance as it allows us to


separate the different parts of the pricing equation when we are calculating the expected
value of a payoff, etc. It also means that integral transforms such as Fourier and Laplace
transforms which you will use for more advanced topics are linear operators.

44
5.1.1 Common equations

We can infer from Section 2.1.1 a similar set of common equations for integrals. For
polynomials of the form axn

axn+1
Z
axn dx = + c, (5.7)
n+1
Z v  n+1 v
n ax a(v n+1 − un+1 )
ax dx = = . (5.8)
u n+1 u n+1

For exponentials aekx where a and k are constants

aekx
Z
aekx dx = + c, (5.9)
k
Z v  kx v
kx ae a(ekv − eku )
ae dx = = . (5.10)
u k u k

We can also use some of the standard results for derivatives to obtain the integrals of
functions, e.g.
Z θ
f (x) = cos x, f (x)dx = [sin x]θ0 = sin θ − sin 0 = sin θ, (5.11)
0
Z θ
f (x) = sin x, f (x)dx = [− cos x]θ0 = − cos θ + cos 0 = 1 − cos θ. (5.12)
0

Again, notice the different signs in the two cases.


To calculate the integral of tan x we can make use of the rule for differentiating log f (x),
d log f (x) f 0 (x)
i.e. dx = f (x) . Therefore if we write

sin x − d cos
dx
x
tan x = = (5.13)
cos x cos x

we can see that


Z θ
tan xdx = [− log cos x]θ0 = [log sec x]θ0 = log sec θ − log sec 0 = log sec θ. (5.14)
0

5.1.2 Integration by substitution

When integrals become more complex we cannot simply find the result by looking at
common formulas for derivatives and we need to look at other techniques, one of these is
integration by substitution. The idea is that we should transform an integral
Z b
g(x)dx, (5.15)
a

45
which is hard to solve, into
Z d
h(u)du, (5.16)
c

which is easy to solve.


We first define the following relationships

x = v(u), u = v −1 (x), h(u) = g(v(u)),


dx
= v 0 (u), dx = v 0 (u)du,
du
d = v −1 (b), c = v −1 (a). (5.17)

So we can rewrite the integral we wish to solve as follows


Z b Z d
g(x)dx = g(v(u))v 0 (u)du (5.18)
a c

Example: Solve
Z b
1
√ dx.
a 1 − x2
Define

x = sin u,
dx
⇒ = cos u,
du
⇒ dx = cos udu,
d = arcsin b,
c = arcsin a.

Therefore
Z b Z arcsin b
1 1
√ dx = p cos udu
a 1 − x2 arcsin a 1 − sin2 (u)
Z arcsin b
cos u
= du
arcsin a cos u
 arcsin b
= u arcsin a
= arcsin b − arcsin a. (5.19)

5.1.3 Integration by parts

This method is useful when the integrand is a product of two functions; it can be derived
from the product rule for differentiation:

du(x)v(x) dv(x) du(x)


= u(x) + v(x) . (5.20)
dx dx dx

46
Integrating and then rearranging, we then obtain
Z b Z b
 b dv(x) du(x)
u(x)v(x) a = u(x) dx + v(x) dx, (5.21)
a dx a dx
Z b Z b
dv(x)  b du(x)
⇒ u(x) dx = u(x)v(x) a − v(x) dx, (5.22)
a dx a dx

Example 1:

Find the expected value of an exponential random variable with parameter λ, i.e. solve
Z ∞
E[x] = xλe−λx dx. (5.23)
0

In order to solve via integration by parts we define the following

u(x) = x,
du(x)
⇒ = 1,
dx
dv(x)
= λe−λx ,
dx
⇒ v(x) = −eλx . (5.24)

Therefore,
Z ∞ ∞
Z ∞
−λx
xe−λx 0 −e−λx dx

xλe dx = − −
0 0
∞
e−λx

=0−
λ 0
1
= . (5.25)
λ

Example 2: Prove the recurrence relation for the Gamma function, i.e. αΓ(α) = Γ(α + 1)
using integration by parts. By the definition of the Gamma function in Section 1.4 we
have
Z ∞
αΓ(α) = α xα−1 e−x dx. (5.26)
0

We then define the following

u(x) = e−x ,
du(x)
⇒ = −e−x ,
dx
dv(x)
= xα−1 ,
dx
1
⇒ v(x) = xα . (5.27)
α

47
So we have
Z ∞
αΓ(α) = α xα−1 e−x dx
0
1 α −x ∞
  Z ∞
1 α −x
= −α x e +α x e dx
α 0 0 α
Z ∞
= xα e−x dx
0

= Γ(α + 1) (5.28)

5.1.4 Partial fractions

If we wish to integrate a quotient of two polynomials, e.g.

Dx3 + Ex2 + F x + G
f (x) = (5.29)
(x + A)(x + B)(x2 + C)

then we can split them into partial fractions of the form

a b cx + d
f (x) = + + 2 (5.30)
(x + A) (x + B) (x + C)

which are straightforward to integrate using the common integral equations and the lin-
earity property.

Example: Integrate
3x + 5
f (x) = . (5.31)
(x + 1)2 (x + 2)
First find a, b and c such that

a b c
f (x) = + + . (5.32)
(x + 1) (x + 1)2 (x + 2)

We do this by recombining the partial fractions into a single fraction and equating terms.
If we were to combine the terms in Eq. (5.32) then we would obtain

a b c a(x + 1)(x + 2) + b(x + 2) + c(x + 1)2


+ + =
(x + 1) (x + 1)2 (x + 2) (x + 1)2 (x + 2)
a(x2 + 3x + 2) + b(x + 2) + c(x2 + 2x + 1)
=
(x + 1)2 (x + 2)
(a + c)x2 + (3a + 2c + b)x + 2a + 2b + c
=
(x + 1)2 (x + 2)
3x + 5
= (5.33)
(x + 1)2 (x + 2)

48
Equating terms we obtain three independent equations in the three unknowns, i.e.

a + c = 0,
3a + b + 2c = 3,
2a + 2b + c = 5. (5.34)

Solving these gives a = 1, b = 2 and c = −1 and so we obtain

1 2 1
f (x) = + 2
− . (5.35)
(x + 1) (x + 1) (x + 2)

Therefore, the indefinite integral of f (x) is


Z Z Z Z
1 2 1
f (x)dx = dx + dx − dx
(x + 1) (x + 1)2 (x + 2)
2
= log(x + 1) − − log(x + 2) + c
(x + 1)
 
x+1 2
= log − + c. (5.36)
x+2 (x + 1)

49
Chapter 6

First order ordinary differential


equations

Much of the work in mathematical finance and applied mathematics as a whole involves
differential equations where we need to solve an equation involving a function and its
derivative. When the function is of a single variable and the derivative is with respect to
that same variable, these are known as ordinary differential equations (ODEs). The order
of the ODE is determined by the highest order derivative present in the equation so first
order ODEs involve the function and a first order derivative of the function.

6.1 Separable ordinary differential equations

For separable ODEs we have a function of the form

dy g(x)
= (6.1)
dx f (y)

We arrange the equation so that the all the x terms are on one side and all of the y terms
are on the other side, i.e.
f (y)dy = g(x)dx (6.2)

We then integrate both sides to find a solution


Z Z
f (y)dy = g(x)dx, (6.3)

Notice that these are both indefinite integrals and therefore we must either include a
constant in the solution or apply some boundary conditions in order to calculate an exact
solution. It is for this reason that this type of problem is often referred to as an initial value
problem (IVP) when the boundary conditions are functions of zero, e.g. f (0) = 1, f 0 (0) =
3, or a boundary value problem (BVP) when one or more of the boundary conditions are

50
not functions of zero, e.g. f (2π) = 0.

Example: a simple first order ODE, relevant to mathematical finance, is the equation for
constant interest rate on a bank account, i.e.

dBt
= rBt . (6.4)
dt
Bt
That is the rate of change of a bank account dt is equal to a constant interest rate r
multiplied by the amount in the account Bt . We can collect all the Bt terms on one side
of the equation and solve as follows.

dBt
= rdt, (6.5)
Bt
Z Z
1
dBt = rdt, (6.6)
Bt
log Bt + c1 = rt + c2 , (6.7)

where c1 and c2 are both arbitrary constants. Rearranging for Bt we obtain

Bt = ec2 −c1 ert = c3 ert , (6.8)

where c3 is a new arbitrary constant. In order to obtain a particular solution we need to


make use of a boundary condition. For this equation an obvious choice is the amount of
money in the bank account at t = 0, B0 . So we have

B0 = c3 er×0 = c3 . (6.9)

Therefore we obtain an equation for the amount of money in a bank account at time t
with an initial deposit of Bt and a constant interest rate of r as

Bt = B0 ert . (6.10)

6.2 Integrating factor method

For linear first order ODEs of the form

df (x)
+ f (x)p(x) = g(x) (6.11)
dx

we multiply both sides by a factor F (x) (to be determined) to obtain

F (x)f 0 (x) + f (x)p(x)F (x) = g(x)F (x). (6.12)

51
We define F (x) so that
p(x)F (x) = F 0 (x) (6.13)

Then Eq. (6.12) can be written as

F (x)f 0 (x) + f (x)F 0 (x) = g(x)F (x). (6.14)

so using product rule in Eq. (2.15) we have

(f (x)F (x))0 = g(x)F (x). (6.15)

The first step in finding a solution to this is to solve Eq. (6.13) for F (x) using separation
of variables, i.e.

dF (x)
= p(x)F (x), (6.16)
Z dx Z
dF (x)
= p(x)dx, (6.17)
F (x)
Z
log F (x) + c1 = p(x)dx, (6.18)
R
F (x) = ec2 e p(x)dx
. (6.19)

We then solve Eq. (6.15) as


Z
f (x)F (x) = g(x)F (x)dx + c3 , (6.20)
R
g(x)F (x)dx + c3
f (x) = . (6.21)
F (x)

Substituting in the solution for F (x) from Eq. (6.19) gives


R
g(x)ec2 e p(x)dx dx
R
+ c3
f (x) = R
eRc2 e p(x) dx
g(x)e p(x)dx dx +
R
c
= R . (6.22)
e p(x)dx

Example: solve a linear ODE with p = 10 and g = 15, i.e.

df (x)
+ 10f (x) = 15. (6.23)
dx

This is a simple example as both p(x) and f (x) are constants, nonetheless it illustrates
the method. First find F (x)
R
10dx
e = e10x . (6.24)

52
Then

15e10x dx + c3
R
f (x) =
e10x
15 10x
e dx + c3
= 10 ,
e10x
= 1.5 + c3 e−10x . (6.25)

In order to find a particular solution we need the boundary conditions. Say we have
f (0) = 3 then we have
1.5 + c3 = 3, (6.26)

so c3 = 1.5 for a solution of


f (x) = 1.5(1 + e−10x ). (6.27)

6.3 Bernoulli differential equations

Bernouilli differential equations are first order ODEs of the form

df (x)
+ p(x)f (x) = g(x)f (x)α , (6.28)
dx

where α ∈ R. The first step to finding a solution for this class of first order ODEs is to
rearrange this as
df (x)
= g(x)f (x)α − p(x)f (x). (6.29)
dx
We also define u(x) = f (x)(1−α) so that by chain rule

du(x) df (x)
= (1 − α)f (x)−α . (6.30)
dx dx
df (x)
Substituting the expression on the right hand side of Eq. (6.29) for dx into Eq. (6.30),
we obtain

du(x)
= (1 − α)f (x)−α g(x)f (x)α − p(x)f (x)
 
dx h i
= (1 − α) g(x) − p(x)f (x)(1−α) . (6.31)

We have u(x) = f (x)(1−α) and substituting this into Eq. (6.31) gives

du(x)
= (1 − α)g(x) − (1 − α)p(x)u(x) (6.32)
dx
du(x)
+ (1 − α)p(x)u(x) = (1 − α)g(x). (6.33)
dx

This is in the form of a linear ODE and can be solved using the integrating factor method
described in Section 6.2.

53
Example: Solve
df (x)
− 2f (x) = xf (x)4 . (6.34)
dx
First define u(x) = f (x)−3 so

du(x) df (x)
= −3f (x)−4
dx dx
−4
= −3f (x) 2f (x) + xf (x)4
 

= −3 2f (x)−3 + x
 
 
= −3 2u(x) + x . (6.35)

Thus the ODE which we need to solve is

du(x)
+ 6u(x) = −3x. (6.36)
dx

Using p(x) = 6 and g(x) = −3x in Eq. (6.22) we obtain


R
6dx dx
R
−3xe +c
u(x) = R
6dx
e
−3xe6x dx + c
R
= 6x
1
 e6x 1 6x 
− xe − 6 e +c
= 2 6x
e
1 1
=− x− + ce−6x . (6.37)
2 6

So
1 1
f (x)−3 = − x− + ce−6x (6.38)
2 6
and if we have a boundary condition we can apply it now to find the value of c. Say
f (0) = 2 then f (0)−3 = 1/8 and

1 1
= +c
8 12
1 1
c= −
8 12
1
= . (6.39)
24

The solution of this Bernoulli equation is therefore


 −1/3
1 1 1
f (x) = − x + e−6x . (6.40)
12 2 24

54
Chapter 7

Multivariate calculus

So far we have only looked at rules of calculus for functions in a single variable. We now
move on to look at differentiating and integrating functions in more than one variable.

7.1 Notation

For a function f (x, y) then its 1st order partial derivatives are written

∂f (x, y) ∂f
= = fx . (7.1)
∂x ∂x

The second order partial derivatives are written

∂2f
= fxx for second order derivatives in the same variable and (7.2)
∂x2
∂2f
= fyx for mixed second order derivatives in different variables. (7.3)
∂x∂y

Notice the order of the variables in the notation for mixed second order derivatives.

7.2 Limits and continuity

Even if a function is continuous in each argument, this is not a sufficient condition for
x2 y
multivariate continuity. Example: The expression x4 +y 2
gives a different limit at (0,0)
whether it is approached via a straight line y = kx or a parabola y = x2 , i.e.

d3 kx3
x2 y kx3

dx
6k
= 4 = d3 (x4 +k2 x2 ) = = ∞, (7.4)

x4 + y 2 x=0,y=0
y=kx x + k 2 x2 x=0 12x x=0
dx x=0
x2 y x4 x4

1
= 4 = = . (7.5)
x4 + y 2 x=0,y=0
y=x2 x + x4 x=0 2x4 x=0 2

The third step in Eq. (7.4) comes from L’Hopital’s rule.

55
If a function is described as continuous it means that these conditions are true at all
points in its domain:

1. Polynomials are continuous on R2 .

2. Rational functions are continuous where the denominator is non zero.

3. If g is continuous and f (x, y) is continuous then g(f (x, y)) is continuous.

4. Products and sums of continuous functions are also continuous.

7.3 Basics rules of differentiation for multivariate func-


tions

7.3.1 Product rule

Product rule stays the same, for u(x, y), v(x, y) and w = uv

dw ∂v(x, y) ∂u(x, y)
= u(x, y) + v(x, y) . (7.6)
dx ∂x ∂x

7.3.2 Chain rule

Chain rule changes for multivariate functions; there are three variants depending on the
form of the equations:

1. The outer function f is of a single variable, the inner function u is of multiple


variables
∂f ∂f ∂u
f (u), u(x1 , x2 , ..., xn ), = (7.7)
∂xi ∂u ∂xi

2. The outer function is of several variables and the inner one is a function of a single
variable

df ∂f ∂x1 ∂f ∂x2 ∂f ∂xn


f (x1 , x2 , ..., xn ), xi = xi (u), = + + ... + . (7.8)
du ∂x1 ∂u ∂x2 ∂u ∂xn ∂u

In financial applications you often see functions of the form f (t, u(t), v(t)) in which
case
df ∂f ∂f ∂u ∂f ∂v
= + + . (7.9)
dt ∂t ∂u ∂t ∂v ∂t
df ∂f
It may seem confusing having dt on the left hand side and ∂t on the right hand
side but this can be easily understood if we define a function s(t) = t and then
∂f ∂s ∂f ∂f
∂s ∂t = ∂s = ∂t .

3. Both functions are of several variables, i.e. f (u1 (x1 , ..., xm ), ..., un (x1 , ..., xm ), then

∂f ∂f ∂u1 ∂f ∂un
= + ··· + i = 1, 2, ..., m (7.10)
∂xi ∂u1 ∂xi ∂un ∂xi

56
7.3.3 Clairaut’s theorem (simplified)

If the the two mixed 2nd order partial derivatives are continuous at (x0 , y0 ) then fxy (x0 , y0 ) =
fyx (x0 , y0 ).

7.4 Taylor series

We look at the Taylor series in two variables. The factorial scaling of the terms from the
single variable Taylor series is multiplied by the binomial coefficient, i.e.
 
n n!
= . (7.11)
i i!(n − i)!

The Taylor series in two variables


   
1 2 2 2
f (x + ∆x, y + ∆y) =f (x, y) + fx ∆x + fy ∆y + fxx ∆x + fxy ∆x∆y + fyy ∆y +
2 1
     
1 3 3
fxxx ∆x3 + fxxy ∆x2 ∆y + fxyy ∆x∆y 2 + fyyy ∆y 3 + · · ·
3! 1 2
1
fxx ∆x2 + 2fxy ∆x∆y + fyy ∆y 2 +

=f (x, y) + fx ∆x + fy ∆y +
2
1
fxxx ∆x + 3fxxy ∆x ∆y + 3fxyy ∆x∆y 2 + fyyy ∆y 3 + · · · .
3 2

6
(7.12)

The binomial coefficient gives the number of ways that the partial derivative can be ob-
tained (assuming the conditions for applying Clairaut’s theorem are met). For exam-
ple there are three ways that you can obtain the mixed third order term fxxy , because
fxxy = fxyx = fyxx .

7.4.1 Taylor series and differentials

As ∆x, ∆y → 0 we obtain the differential of the function, i.e.

∂f ∂f
df = dx + dy. (7.13)
∂x ∂y

Only the first two terms remain as ∆xn , ∆y n , where n ≥ 2 approach zero more quickly
than ∆x and ∆y. This leads to Euler’s method for the approximation of a function, i.e.

∂f ∂f
∆f = ∆x + ∆y. (7.14)
∂x ∂y

57
However, for a function f (t, Wt ), where Wt is a Wiener process, this changes slightly.
Starting out with the Taylor series we have

f (t + ∆t, Wt + ∆Wt ) =f (t, Wt ) + ft ∆t + fw ∆Wt +


1
ftt ∆t2 + 2ftw ∆t∆Wt + fww ∆Wt2 + · · · ,

(7.15)
2

where fw , ft w and fww are the first and second order derivatives with respect to Wt .
Taking the limit as ∆t, ∆Wt → 0, we obtain.

∂f ∂f 1 ∂2f
df = dt + dWt + dWt2 . (7.16)
∂t ∂Wt 2 ∂Wt2

1 ∂2f 2
Notice the extra term 2 ∂Wt2 dWt which appears as, although ∆t2 and ∆t∆Wt approach
zero at more quickly than ∆t, ∆Wt approaches zero at the same rate as ∆t. In fact from
Ito at the limit ∆t → 0 we have dWt2 = dt. This gives Ito’s lemma which, in simplistic
terms, states that the differential of any twice continuously differentiable function can be
written
∂f ∂f 1 ∂2f
df = dt + dWt + dt. (7.17)
∂t ∂Wt 2 ∂Wt2
For a function of two Wiener processes Xt and Yt , second order terms must be included
for both processes, plus a mixed second order term, i.e.

∂2f ∂2f ∂2f


 
∂f ∂f 1 2 2
df = dXt + dYt + dXt + 2 dXt dYt + dXt . (7.18)
∂Xt ∂Yt 2 ∂Xt2 ∂Xt ∂Yt ∂Yt2

As before dXt2 = dt and dYt2 = dt and from Ito, dXt Yt = ρdt where ρ is the correlation
between the processes, so we have

∂2f ∂2f ∂2f


 
∂f ∂f 1
df = dXt + dYt + + 2ρ + dt. (7.19)
∂Xt ∂Yt 2 ∂Xt2 ∂Xt ∂Yt ∂Yt2

7.5 Integration in two variables

An definite integral over two variables is commonly written


Z Z
f (x, y)dydx, (7.20)
X Y

where X defines the range of integration over x and Y defines the range of integration
over y. Notice that the integral is written in a nested way, i.e. the first integral over X
corresponds to the final dx term.

58
7.5.1 Fubini’s theorem

One of the most useful theorems when we are integrating over two variables is Fubini’s
theorem. Simply, this states that
Z Z  Z Z 
f (x, y)dy dx = f (x, y)dx dy, (7.21)
X Y Y X

under certain conditions, namely that


Z Z
|f (x, y)| dydx < ∞. (7.22)
X Y

7.5.2 Differentiation of integrals: Leibniz’s integral rule

There are several different situations that we may come across when we are differentiating
an integral, the most basic is where the variable only appears in the range of a definite
integral, i.e. Z x
d
f (z)dz = f (x). (7.23)
dx c
dg(x)
The sketch proof of this is as follows, say f (x) = dx , then
Z x
d d dg(x)
f (z)dz = [g(x) − g(c)] = = f (x). (7.24)
dx c dx dx

The second situation we examine is the differentiation of the integral of a function of two
variables, i.e.
Z b
u(x) = f (x, t)dt (7.25)
a
du u(x + ∆x) − u(x)
= lim
dx ∆x→0 ∆x
Rb Rb
a f (x + ∆x, t)dt − a f (x, t)dt
= lim
∆x→0 ∆x
Z b
f (x + ∆x, t)dt − f (x, t)dt
= lim . (7.26)
∆x→0 a ∆x

Provided the limit can be passed across the integral (dominated convergence theorem)
then
b b
f (x + ∆x, t)dt − f (x, t)dt
Z Z
du ∂f (x, t)
= lim = dt. (7.27)
dx a ∆x→0 ∆x a ∂x
The final situation we look at is when the parameter we are differentiating over appears
in the limits and in the integrand e.g.
Z b(t)
g(t, a(t), b(t)) = f (z, t)dz (7.28)
a(t)

59
differentiated over t. To do this we use Leibniz’s integral rule which makes use of the
second type of chain rule described above, i.e.

dg ∂g ∂g ∂a ∂g ∂b
= + + . (7.29)
dt ∂t ∂a ∂t ∂b ∂t

Making use of the rules, described above, for differentiation across the integral and differ-
entiation of an integral with limits which are a function of the variable we obtain
Z b(t)
dg ∂f (z, t) ∂b ∂a
= dz + f (b(t), t) − f (a(t), t) . (7.30)
dt a(t) ∂t ∂t ∂t

7.5.3 Change of variables in a double integral

We previously looked at change of variables as a method for solving integrals in a single


variable. For a function f (x, y) we define x and y as functions of two new variables u and
v, i.e. x = g(u, v), y = h(u, v) and then we have

Z Z Z Z ∂x ∂x
f (x, y)dxdy = f (g(u, v), h(u, v)) ∂u ∂v
dudv. (7.31)

Y X V U ∂y ∂y
∂u ∂v


∂x ∂x
∂x ∂y ∂y ∂x
Here ∂u ∂v
= − and is named the Jacobian determinant.

∂y ∂y
∂u ∂v ∂u ∂v
∂u ∂v

Example: Z ∞Z ∞
2 +y 2 )
Solve e−(x dxdy. (7.32)
0 0

Define x = r cos θ and y = r sin θ. The new domains for r and θ are therefore (0, ∞) and
(0, π2 ). Then we can also calculate the Jacobian determinant using

∂x ∂y ∂x ∂y
= cos θ = sin θ = −r sin θ = r cos θ (7.33)
∂r ∂r ∂θ ∂θ

so we obtain

∂x ∂x
= cos θ · r cos θ − sin θ · −r sin θ = r cos2 θ + r sin2 θ = r.
∂r ∂θ
∂y ∂y
(7.34)

∂r ∂θ

60
Substituting for x and y in the original integral thus gives
Z ∞Z ∞ Z π/2 Z ∞
−x2 +y 2 2 (cos2 θ+sin2 θ)
e dxdy = e−r rdrdθ
0 0 0 0
Z π/2 Z ∞
2
= re−r drdθ
0
"0 2 ∞
#
π/2
−e−r
Z
= dθ
0 2
0
Z π/2
1 π
= dθ = (7.35)
0 2 4

61
Chapter 8

Complex numbers

8.1 Introduction

The imaginary operator i is defined as


i= −1. (8.1)

We define a complex number z as


z = u + iv, (8.2)

where the real part is defined as u and the imaginary part as v.


This can be visualised by plotting it on the complex plane. Adding or subtracting two

Figure 8.1: Representation of a complex number z = u + v on the complex plane.

complex numbers z = u + iv and y = s + it is done using

y ± z = s ± u + i(t ± v). (8.3)

Multiplying two together is done with

yz = (s + it)(u + iv) = su − tv + i(ut + sv), (8.4)

62
to give a real part of su − tv and the imaginary part of i(ut + sv). The conjugate of a
complex number z = u + iv is defined as z = z = u − iv (sometimes z ∗ is used). An
illustration of this on the complex plane is shown in Figure 8.2.

Figure 8.2: Representation of the conjugate of a complex number z = z = u − iv on the


complex plane.

The product of a complex number with its conjugate is

zz = u2 + v 2 (8.5)

The quotient of two complex numbers is

y s + it (u − iv)(s + it) us + vt ut − vs
= = 2 2
= 2 2
+i 2 (8.6)
z u + iv u +v u +v u + v2

8.2 Polar notation

Up to now we have been using Cartesian notation. The calculation of products and
quotients is easier if we introduce the polar notation. Instead of defining the real and
imaginary parts, we have a magnitude and an angle in radians (argument) relative to the
real axis.
z = u + iv = reiθ . (8.7)

Again, we can visualise this on the complex plane.

Figure 8.3: Representation of a complex number z = u + iv = reiθ on the complex plane.

63
It is clear from Figure 8.3 that we can use simple trigonometry to obtain the relationship
between the parameters of the the two forms, i.e.

r2 =u2 + v 2 = zz, (8.8)


v
θ = tan−1 , (8.9)
u
u =r cos θ, (8.10)
v =r sin θ, (8.11)
z =r cos θ + ir sin θ. (8.12)

The conjugate in polar form is expressed as

z = u − iv = re−iθ (8.13)

and this can can be easily observed looking at the complex plane in Figure 8.4.

Figure 8.4: The conjugate of a complex number z = z = u − iv = re−iθ on the complex


plane.

It can be observed from the complex plane that reiθ = reiθ+i2kπ , where k is any integer.
Furthermore −reiθ = reiθ+i(2k+1)π , where k is any integer.

8.2.1 Euler’s formula and Euler’s identity

Setting r = 1 in Eq. (8.12) gives Euler’s formula

eiθ = cos θ + i sin θ. (8.14)

Furthermore setting θ = π and rearranging very slightly gives Euler’s identity

eiπ + 1 = 0. (8.15)

64
Combining Euler’s formula in Eq. (8.14) for θ and −θ, we can see how the complex
exponential representations of trigonometric functions are arrived at, for example:

eiθ + e−iθ
cos θ = , (8.16)
2
eiθ − e−iθ
sin θ = . (8.17)
2i

These then can be used show how the results for the differentiation of trigonometric
functions are obtained.

d eiθ + e−iθ ieiθ − ie−iθ −(eiθ − ieθ )


 
d cos θ
= = = = − sin θ, (8.18)
dθ dθ 2 2 i2
d eiθ − e−iθ ieiθ + ie−iθ eiθ + e−iθ
 
d sin θ
= = = = cos θ. (8.19)
dθ dθ 2i 2i 2

We can also see how some of the common identities listed in Section 1.3 are obtained.
Example 1:

ei(θ+α) − e−i(θ+α)
sin(θ + α) = . (8.20)
2i
(eiθ − e−iθ )(eiα + e−iα )
sin θ cos α =
4i
e i(θ+α) −e −i(θ+α) − e−i(θ−α) + ei(θ−α)
= . (8.21)
4i
(eiθ + e−iθ )(eiα − e−iα )
cos θ sin α =
4i
ei(θ+α) − e−i(θ+α) + e−i(θ−α) − ei(θ−α)
= . (8.22)
4i
=⇒ sin(θ + α) = sin θ cos α + cos θ sin α (8.23)

Example 2:
2
eiθ − e−iθ

2
cos θ =
2
e + 2eiθ e−iθ + e−i2θ

=
4
1
= (cos 2θ + 1) (8.24)
2

Similarly we can show that sin2 θ = 12 (1 − cos 2θ). Adding the identities together gives us
sin2 θ + cos2 θ = 1, which we used when carrying out integration by substitution in the
example in Section 5.1.2.

65
Using polar notation also makes it very easy to find the powers of complex numbers, i.e.

z b = (reiθ )b = rb eibθ (8.25)


z 1/b = (reiθ )1/b = r1/b eiθ/b (8.26)
1
z −b = (reiθ )−b = b e−iθ/b (8.27)
r

However, unlike real numbers, these roots are multivariate functions as reiθ = reiθ+i2kπ
for integer k. So, for example, w = reiπ/4 = reiπ/4+i2π , but taking the square root gives
different results depending on the value of k, i.e.

(reiπ/4 )1/2 = r1/2 eiπ/8 (8.28)


(reiπ/4+i2π )1/2 = r1/2 eiπ/8 eiπ = r1/2 eiπ/8 · −1 = −r1/2 eiπ/8 . (8.29)

The principal value is generally chosen as k=0.

8.3 Logarithms of complex and negative numbers

In many applications we only deal with logarithms of positive numbers. However, using
the polar representation for complex numbers above we can also produce logarithms of
complex and negative numbers. For a real number

y = log x ⇔ x = ey . (8.30)

We have seen that all complex and negative numbers can be written as a scalar multiplied
by an exponential of a complex number and we can use this to find the logarithm of
negative and complex numbers.

Negative numbers This is the simplest example and makes use of Euler’s identity,
defined above, which we can rearrange as eiπ = −1. For a negative number −A where A
is positive and thus we can obtain a = log A:

− A = −1 × A = eiπ ea = eiπ+a = ei(2k+1)π+a (8.31)

where k is any integer. Therefore

log −A = i(2k + 1)π + a, (8.32)

although we usually take the principal value of k = 0.

66
Complex numbers We can write the polar representation of a complex number z as

z = reiθ = eln r eiθ = eln r+iθ+ik2π (8.33)

where k is any integer. Therefore

log z = ln r + iθ + ik2π. (8.34)

although, as before, we usually take the principal value of k = 0.

8.4 Complex differentiation

Complex analysis is a huge subject and so this section is to give you a little familiarity with
the subject rather than provide a comprehensive guide. For more information Kreyszig’s
book provides a more detailed introduction.

For a function of a complex variable f (z) where z = x + iy:

f (z) = u(x, y) + iv(x, y), (8.35)

i.e. f (z) is the sum of a pair of functions, u(x, y) and v(x, y), which are functions of the
real and imaginary parts of z. Therefore, differentiating a complex function is linked to
the work we did on partial derivatives in the previous lecture.

8.4.1 Cauchy-Riemann equations

In order to perform differentiation meaningfully on functions in the complex plane, they


must be analytic. One of the conditions of analyticity is that the derivative at point
zo = xo + yo , with ∆z = ∆x + i∆y as defined by

u(xo + ∆x, yo + ∆y) + iv(xo + ∆x, yo + ∆y) − [u(xo , yo ) + iv(xo , yo )]


f 0 (zo ) = lim
∆z→0 ∆x + i∆y
(8.36)
must be the same for every direction which you approach zo .

If you let ∆y → 0 first, you are left with.

u(xo + ∆x, yo ) + iv(xo + ∆x, yo ) − [u(xo , yo ) + iv(xo , yo )]


f 0 (zo ) = lim
∆x→0 ∆x
∂u ∂v
= +i . (8.37)
∂x ∂x

67
Conversely, if you let ∆x → 0 first, you are left with

u(xo , yo + ∆y) + iv(xo , yo + ∆y) − [u(xo , yo ) + iv(xo , yo )]


f 0 (zo ) = lim
∆y→0 i∆y
∂u ∂v
= −i + . (8.38)
∂y ∂y

As we can equate the real and imaginary paths, we have as a condition of analyticity that

∂u ∂v ∂v ∂u
= and =− (8.39)
∂x ∂y ∂x ∂y

which are known as the Cauchy-Riemann equations.


Example 3: f (z) = z 2 = x2 − y 2 + i2xy then u(x, y) = x2 − y 2 and v(x, y) = 2xy. Then
ux = 2x, vy = 2x and uy = −2y and vx = 2y. Thus the function is analytic.
In polar form, define z = reiθ = r(cos θ + i sin θ) as before and so x = r cos θ, y = r sin θ.
Using chain rule:

∂u ∂u ∂x ∂u ∂y
= +
∂r ∂x ∂r ∂y ∂r
∂u ∂u
= cos θ + sin θ
∂x ∂y
∂v ∂v
= cos θ − sin θ
∂y ∂x
 
1 ∂v ∂v
= r cos θ − r sin θ
r ∂y ∂x
 
1 ∂v ∂y ∂v ∂x 1 ∂v
= + = (8.40)
r ∂y ∂rθ ∂x ∂θ r ∂θ

where the third line uses the substitution of the Cauchy-Riemann equations, a similar
method can be used to show that

∂v 1 ∂u
=− (8.41)
∂r r ∂θ

8.5 Complex integration

Similarly to the section on differentiation, this is not intended to provide you with a
comprehensive guide, but rather is an introduction to some of the concepts and notation
you may see. Complex integration is on a path through the complex plane and is written
R H
C f (z)dz for an open path and C f (z)dz for a closed path. In general the value of the
integral depends on the path through the complex plane, however there are exceptions to
this.
If the function is analytic within a simply connected domain D containing points a and b

68
which is joined by a path C completely within D and we have f (z) = F 0 (z) then
Z Z b
f (z)dz = f (z)dz = F (b) − F (a) (8.42)
C a

and the integral gives the same value for any path in D between a and b.
This implies for a simple closed path within a simply connected domain that, as the first
part of the path is from an arbitrary point a to b, then the rest of the curve is from point
a back to b and
I Z b Z a Z b Z b
f (z)dz = f (z)dz + f (z)dz = f (z)dz − f (z)dz = 0 (8.43)
C a b a a

which is Cauchy’s integral theorem.


However if the the domain D where the function is analytic is no longer simply connected,
the integral takes a value which is not necessarily zero. One technique for solving these
is to define t such that the integral path z(t) = x(t) + iy(t), i.e. we no longer have to
integrate over a path involving x and y but only t.
Example: integrate 1/z over the unit circle, which we call C.

I
dz
(8.44)
C z

dz
define the path z = cos t + i sin t = eit over the range t ∈ [0, 2π]. Then dt = ieit and we
substitute dz, z and the limits to give

2π 2π
ieit dt
Z Z
=i dt = 2πi. (8.45)
0 eit 0

69
Chapter 9

Second order ordinary differen-


tial equations

9.1 Homogeneous ordinary differential equations


We first look at second order ODEs of the form

af 00 + bf 0 + cf = 0 (9.1)

d2 f df
where f 00 = dx2
and f 0 = dx as before.

We can solve this using an ansatz(guess) of f = eλx , then f 0 = λeλx and f 00 = λ2 eλx .
Then Eq. (9.1) can be rewritten as

eλx (aλ2 + bλ + c) = 0. (9.2)

This means that we must have aλ2 + bλ + c = 0 which we can solve using Eq.(1.9) above.
There are three possibilities, real roots, complex roots or equal roots which we illustrate
with examples.

Example 1 (Real roots - simplest example): Solve

f 00 + 5f 0 + 4f = 0 (9.3)

with initial conditions f (0) = 5, f 0 (0) = 1 (note that now we have two possible solutions
we need two initial conditions). Using the formula for the roots of a quadratic equation
in Eq. (1.9) we have

−5 ± 25 − 16
λ= . (9.4)
2

70
Therefore, the two possible roots are λ = −4, λ = −1 which gives us two possible solutions
that we call f1 and f2 . Therefore, by linearity of differentiation the general case solution
is

f (x) = c1 e−x + c2 e−4x . (9.5)

Using the initial conditions gives us c1 + c2 = 5 for f (0) and f 0 (0) = −c1 − 4c2 = 1 which
gives c2 = −2 and c1 = 7 for a solution of

f (x) = 7e−x − 2e−4x . (9.6)

Example 2 (Complex roots): Solve

f 00 + 2f 0 + 4f = 0 (9.7)


with initial conditions f (0) = 1, f 0 (0) = 1 Solving for λ using Eq. (1.9) gives λ = −1 ± 3i
So we have a general solution of
 √ √ 
f (x) = e−x c1 ei 3x + c2 e−i 3x (9.8)

ca cb ca cb
If we redefine our constants as c1 = 2 + 2i and c2 = 2 − 2i , then using the exponential
formulae for sin and cos we can rewrite this as
 √ √ 
f (x) = e−x ca cos( 3x) + cb sin( 3x) (9.9)

The initial condition f (0) = 1 gives


 √ √ 
f (0) = e−0 ca cos( 3 · 0) + cb sin( 3 · 0)

= 1 (ca · 1 + cb · 0)
= ca (9.10)

and so ca = 1. For the second initial condition f 0 (0) = 1

0 −x
 √ √  −x
 √ √ √ 
f (0) = −e ca cos( 3x) + cb sin( 3x) + e −ca sin( 3x) + cb 3 cos( 3x) |x=0
 √ √   √ √ √ 
= −e−0 ca cos( 3 · 0) + cb sin( 3 · 0) + e−0 −ca sin( 3 · 0) + cb 3 cos( 3 · 0)

= −ca + 3cb = 1 (9.11)


which gives cb = 2/ 3 for a general solution of

√ √
 
−x 2
f (x) = e cos( 3x) + √ sin( 3x) . (9.12)
3

71
Example 3. Single roots: When b2 = 4ac we only obtain one solution to Eq. (1.9) above.
In this case to obtain our two solutions, we define one as f1 (x) = c1 eλx where λ is the
only calculated root, i.e. λ = −b/2a and we define another solution as f2 (x) = v(x)e−bx/2a
where v(x) is TBD. By chain and product rules we get the first and second derivatives of
f2 (x):

b
f20 (x) = v(x)0 e−bx/2a − v(x)e−bx/2a , (9.13)
2a
 2
b b b
f200 (x) = v 00 (x)e−bx/2a − v 0 (x)e−bx/2a − v 0 (x)e−bx/2a + v(x)e−bx/2a
2a 2a 2a
 2
00 −bx/2a b 0 −bx/2a b
= v (x)e − v (x)e + v(x)e−bx/2a . (9.14)
a 2a

Putting this into our original ODE, factoring out the exponential term and rearranging
gives
 2 !  
b b b
a v 00 (x) − v 0 (x) + v(x) 0
+ b v (x) − v(x) + cv(x) = 0 (9.15)
a 2a 2a
1 2
av 00 (x) − (b − 4ac)v(x) = 0. (9.16)
4a

By definition of the problem (b2 − 4ac) = 0, a 6= 0 so v 00 (x) = 0, thus


Z
0
v (x) = v 00 (x)dx = c2 , (9.17)
Z
v(x) = v 0 (x)dx = c2 x + c3 . (9.18)

Then our general solution is

f (x) = f1 (x) + f2 (x)


= c1 e−bx/2a + v(x)e−bx/2a
= c1 e−bx/2a + e−bx/2a (c2 x + c3 ) = c1 e−bx/2a + c2 xe−bx/2a . (9.19)

Where c1 , c2 and c3 are all arbitrary constants.

Example 4. Solve

f 00 + 4f 0 + 4f = 0 (9.20)

with initial conditions f (0) = 1, f 0 (0) = 3. This gives a single root of λ = −2 and so a
general solution of

f (x) = c1 e−2x + c2 xe−2x . (9.21)

72
For the first boundary condition f (0) = 1 we can set x = 0 and obtain c1 = 1 by inspection.
For the second boundary condition of f 0 (0) = 3 we differentiate our general solution for

f 0 (x) = −c1 2e−2x + c2 e−2x − c2 2xe−2x . (9.22)

Substituting x = 0 f 0 (0) = 3 into Eq. (9.22) gives

f 0 (x) = −c1 2e−2·0 + c2 e−2·0 − c2 2 · 0 · e−2·0 . (9.23)

so c2 − 2c1 = 3 and c2 = 5 and our solution is

f (x) = e−2x + 5xe−2x . (9.24)

9.2 Cauchy-Euler equations

These are equations of the form

ax2 f 00 + bxf 0 + cf = 0 (9.25)

We use the substitution t = log x and first and second order chain rule:

df df dt 1 df
= = (9.26)
dx dt dx x dt
d2 f df d2 t d2 f dt 2 1 d2 f 1 d2 f
   
1 df df
= + 2 =− 2 + = 2 − (9.27)
dx2 dt dx2 dt dx x dt x2 dt2 x dt2 dt

df d2 f
Using Newtonian (dot) notation for the derivative with regard to t, i.e. dt = f˙ and dt2
= f¨,
we can rewrite Eq. (9.25) as

a(f¨ − f˙) + bf˙ + cf = af¨ + (b − a)f˙ + cf = 0. (9.28)

Defining b0 = b − a, we can then solve in the same way as for the homogenous equations
in Section 9.1 above to give f as a function of t and then use the substitution t = log x to
obtain the required answer.
Example 5. Real roots
Solve the following for boundary conditions f (1) = 0 and f 0 (1) = 1

2x2 f 00 + 3xf 0 − 15f = 0. (9.29)

We can transform this using t = log x to

2f¨ + (3 − 2)f˙ − 15f = 2f¨ + f˙ − 15f = 0. (9.30)

73
The roots are −3 and 25 , giving a general solution for f (t) of

5
f (t) = c1 e−3t + c2 e 2 t . (9.31)

We can then substitute t = log x for a general solution in terms of x, f (x)

5 5
f (x) = c1 e−3 log x + c2 e 2 log x = c1 x−3 + c2 x 2 . (9.32)

Applying the boundary condition f (1) = 2 gvies c1 + c2 = 0. And for f 0 (1) = 1 we have

0 −4 5 3
f (0) = −3c1 x + c2 x 2
2 x=1
5
= −3c1 + c2
 2

5
= 3+ c2 = 1. (9.33)
2

2 2
Thus c2 = 11 and c1 = − 11 giving the solution

2 −3 2 5
f (x) = − x + x2 . (9.34)
11 11

Example 6. Complex roots


Solve the following for boundary conditions f (1) = 1 and f 0 (1) = 2

x2 f 00 + 3xf 0 + 4f = 0. (9.35)

This is transformed using t = log x to

f¨ + (3 − 1)f˙ + 4f = f¨ + 2f˙ + 4f = 0 (9.36)


The roots are −1 ± i 3 giving the general solution in terms of t as

√ √
f (t) = e−t (c1 cos( 3t) + c2 sin( 3t)). (9.37)

Again, using the substitution t = log x, we obtain the general solution

√ √
f (x) = c1 x−1 cos( 3 log x) + c2 x−1 sin( 3 log x). (9.38)

As log(1) = 0, the first boundary condition f (1) = 1 gives us

√ √
f (1) = c1 x−1 cos( 3 log x) + c2 x−1 sin( 3 log x)

x=1
−1
√ −1

= c1 1 cos( 3 log 1) + c2 1 sin( 3 log 1)
√ √
= c1 cos( 3 · 0) + c2 sin( 3 · 0) = c1 , (9.39)

74
so c1 = 1. For the second boundary condition f 0 (1) = 2 we have

c1 h √ √ √ i c h√
2 √ √ i
f (0 1) = − cos( 3 log x) + 3 sin( 3 log x) + 3 cos( 3 log x) − sin( 3 log x)

2
xh x 2
x=1
√ √ √ i h √ √ √ i
= −c1 cos( 3 · 0) + 3 sin( 3 · 0) + c2 3 cos( 3 · 0) − sin( 3 · 0)

= −c1 + 3c2 (9.40)

√ √
and thus substituting in c1 = 1 we have 3c2 − 1 = 2 and so c2 = 3 giving our solution

√ √ √
f (x) = x−1 cos( 3 log x) + 3x−1 sin( 3 log x). (9.41)

Example 7. Repeated roots


Solve the following for boundary conditions f (1) = 1 and f 0 (1) = 5

x2 f 00 − 7xf 0 + 16f = 0 (9.42)

This is transformed using t = log x to

f¨ − (7 + 1)f˙ + 16f = f¨ − 8f˙ + 16f = 0. (9.43)

The repeated root is at 4 giving the general solution in terms of t of

f (t) = c1 e4t + c2 te4t . (9.44)

Using the substitution t = log x we obtain

f (x) = c1 e4 log x + c2 e4 log x log x = c1 x4 + c2 x4 log x. (9.45)

The first boundary condition gives c1 = 1 as log(1) = 0. For the second boundary condition
we have

x4

0 3 3
f (1) = c1 4x + c2 4x log x + c2 = 4c1 + c2 = 5, (9.46)
x x=1

giving c2 = 1 and therefore our solution is

f (x) = x4 + x4 log x. (9.47)

9.3 Non-homogeneous ordinary differential equations

The final set of functions we look at for second order ODEs are functions of the form

af 00 + bf 0 + cf = p(x). (9.48)

75
We solve these using the linearity property of differentiation, i.e. we can split the solution
into two parts f = fp + fg and then

afp00 + bfp0 + cfp + afg00 + bfg0 + cfg = p(x) + 0 (9.49)

We can treat these two problems as separate and solve afg00 + bfg0 + cfg = 0 using the
techniques for finding general solutions in Section 9.1. We also need to find a solution for
afp00 + bfp0 + cfp = p(x) and this depends on the form of p(x). The classes of solutions
for different classes of p(x) are shown in Table 9.1. The use of these solutions are also

Term in p(x) Choice for fp


ceκ x c1 eκ x
cxn (n = 0, 1, 2, ..) cn xn + cn−1 xx−1 + ... + c0
c sin λx c1 sin λx + c2 cos λx
c cos λx c1 sin λx + c2 cos λx
ceαx sin λx eαx (c1 sin λx + c2 cos λx)
ceαx cos λx eαx (c1 sin λx + c2 cos λx)

Table 9.1: Selection of solutions for non-homogeneous ordinary differential equations

governed by the following rules:

1. If p(x) is in one of the forms in the left hand column then pick the corresponding
function on the right hand side as your ansatz and equate the terms.

2. If your ansatz is a solution of the general equation then multiply by x for a single
root or x2 for a double root.

3. If p(x) is a combination of the terms in the left hand column of the table then your
ansatz for fp is a linear combination of the corresponding terms in the right hand
column

Example 8: Solve the following for boundary conditions f (0) = 0, f 0 (0) = 1.5

f 00 (x) + f (x) = 0.001x2 . (9.50)

Splitting this into fg (x) and fp (x) we obtain

fp00 (x) + fp (x) + fg00 (x) + fg (x) = 0.001x2 + 0. (9.51)

Solving this for the general solution first,

fg00 (x) + fg (x) = 0. (9.52)

76
The roots of ±1i which leads to a general solution of

fg (x) = k1 cos x + k2 sin x (9.53)

where we have used constants k1 and k2 to avoid confusion with the constants c1 and c2
used in Table 9.1. From the table we use fp (x) = c2 x2 + c1 x + c0 . We can substitute fp
and fp00 (x) = 2c2 back into the original problem in Eq. 9.50 for

2c2 + c2 x2 + c1 x + c0 = 0.001x2 . (9.54)

Equating terms we have c2 = 0.001, c1 = 0, c0 = −2c2 = −0.002 for fp = 0.001x2 − 0.002.


Then the general solution in full is

f (x) = k1 cos x + k2 sin x + 0.001x2 − 0.002. (9.55)

Applying the first boundary condition f (0) = 0 gives k1 = 0.002. Applying the second
boundary condition f 0 (0) = 1.5 gives

f 0 (0) = −k1 sin x + k2 cos x + 0.002x|x=0 = k2 = 1.5. (9.56)

The full solution is thus

f (x) = 0.002 cos x + 1.5 sin x + 0.001x2 − 0.002. (9.57)

Example 9. Solve the following for boundary conditions f (0) = 0, f 0 (0) = 0

f 00 − 4f 0 − 12f = e6x . (9.58)

Solving this for our general solution gives real roots of −2 and 6, for a general solution
of fg = k1 e−2x + k2 e6t . As the ansatz for fp is c1 e6x , it is the same as one of our general
solutions so we use the 2nd rule about and multiply the solution by x to give fp = c1 xe6x .
To find c1 we differentiate and put back into the original ODE, f 0 = c1 e6x + 6xc1 e6x ,
f 00 = 6c1 e6x + 6c1 e6x + 36xc1 e6x = 12c1 e6x + 36xc1 e6x

12c1 e6x + 36xc1 e6x − 4(c1 e6x + 6xc1 e6x ) − 12c1 xe6x = 8c1 e6x = e6x . (9.59)

1
so c1 = 8 for a solution of

1
f = k1 e−2x + k2 e6x + xe6x . (9.60)
8

77
The first boundary condition gives k1 + k2 = 0, the second boundary condition gives

0 −2x 6x 1 6x 6 6x 1
f (0) = −2k1 e + 6k2 e + e + xe = 8k2 + = 0. (9.61)
8 8 x=0 8

1 1
so k2 = − 64 and k1 = 64 .

78
Chapter 10

Second order partial differential


equations

We are going to look at second order PDEs, as motivating examples we will concentrate
on the wave equation and the heat equation. In both cases, finding a general solution is
quite a simple procedure, the trickier part is to apply the boundary conditions.

10.1 A brief introduction to Fourier transforms

In order to solve 2nd order partial differential equations we need to be aware of Fourier
transforms. Fourier analysis is a huge subject so we simply present here the basic concepts.
A good reference for further information is Kreyszig.
The basic idea is that a periodic function with a period of 2π can be represented as an
infinite sum of trigonometric functions., i.e.

f (y) = a0 + a1 cos y + b1 sin y + a2 cos 2y + b2 sin 2y + a3 cos 3y + b3 sin 3y + ... (10.1)



X
= a0 + (an cos ny + bn sin ny) (10.2)
n=1

It can be shown (e.g Kreyszig) that the Fourier coefficients can be calculated using the
formulae
Z π
1
a0 = f (y)dy (10.3)
2π −π
1 π
Z
an = f (y) cos nydy (10.4)
π −π
Z π
1
bn = f (y) sin nydy (10.5)
π −π

79
We can generalise these equations to functions which are periodic over a period of 2l by
using the change of variables y = πl x → dy = πl dx to give

π π 2π 2π 3π 3π
f (y) = a0 + a1 cos x + b1 sin x + a2 cos x + b2 sin x + a3 cos x + b3 sin x + ...
l l l l l l
(10.6)
∞ 
X nπ nπ 
= a0 + an cos x + bn sin x (10.7)
l l
n=1

It can be shown (e.g Kreyszig) that the Fourier coefficients can be calculated using the
formulae

1 l
Z
a0 = f (x)dx (10.8)
2l −l
1 l
Z

an = f (x) cos xdx (10.9)
l −l l
1 l
Z

bn = f (x) sin xdx (10.10)
l −l l

Even and functions only have cosine components, odd functions have only sine components
which gives us

1 l
Z
a0 = f (x)dx (10.11)
l 0
2 l
Z

an = f (x) cos xdx (10.12)
l 0 l
bn = 0 (10.13)

for even functions and

a0 = 0 (10.14)
an = 0 (10.15)
Z l
2 nπ
bn = f (x) sin xdx (10.16)
l 0 l

for odd functions. We can also let L → ∞ which gives changes our summation into an
integral and we obtain
Z ∞
f (x) = [A(ξ) cos ξx + B(ξ) sin ξx]dξ (10.17)
0

80
with

1 ∞
Z
A(ξ) = f (x) cos ξxdx (10.18)
π −∞
1 ∞
Z
B(ξ) = f (x) sin ξxdx (10.19)
π −∞

For completeness we also show how the Fourier transform can be expressed in exponential
form.

1 ∞
Z
f (x) = [A(ξ)eiξx + A(ξ)e−iξx − iB(ξ)eiξx + iB(ξ)e−iξx ]dξ (10.20)
2 0
1 ∞
Z
= [(A(ξ) − iB(ξ))eiξx + (A(ξ) + iB(ξ))e−iξx ]dξ (10.21)
2 0
1 ∞ 1 0
Z Z
iξx
= [A(ξ) − iB(ξ)]e + [A(−ξ) + iB(−ξ)]eiξx dξ (10.22)
2 0 2 −∞

Combining the expressions for A(·) and B(·) we have


Z ∞ Z ∞
A(ξ) − iB(ξ) 1 1
= f (x) cos ξxdx − i f (x) sin ξxdx (10.23)
2 2π −∞ 2π −∞
Z ∞ Z ∞
1 1
= f (x)[cos ξx − i sin ξx]dx = f (x)e−iξx dx (10.24)
2π −∞ 2π −∞

and
Z ∞ Z ∞
A(−ξ) + iB(−ξ) 1 1
= f (x) cos(−ξx)dx − i f (x) sin(−ξx)dx (10.25)
2 2π −∞ 2π −∞
Z ∞ Z ∞
1 1
= f (x)[cos ξx − i sin ξx]dx = f (x)e−iξx dx (10.26)
2π −∞ 2π −∞

Therefore we can express a function f (x) as


Z ∞
f (x) = C(ξ)eiξx dξ (10.27)
−∞

where
Z ∞
1
C(ξ) = f (x)e−iξx dx (10.28)
2π −∞

notice that the integration bounds on the equation for f (x) are now between ±∞

10.2 Dirac delta function

An important distribution is the Dirac delta function. Despite the name it is not a function
in the ordinary sense, being known as a distribution or generalised function. It has the

81
properties

∞ x = a
δ(x − a) = (10.29)
0 x 6= a

and
Z ∞
δ(x − a)dx = 1. (10.30)
−∞

It also has the following properties that can be significant in some of the areas of quanti-
tative finance that you will come across. Firstly integrating the a function multiplied by
the delta function obtains the value of the function at a single point, i.e.
Z ∞
δ(x − a)g(x)dx = g(a). (10.31)
−∞

Secondly we can see that the Fourier transform of the δ(x) is therefore equal to 1.
Z ∞
δ(x)eixξ dx = 1. (10.32)
−∞

The Dirac delta function is important in mathematical finance as it is used as the limit
on a transition density when the time parameter goes to zero.
A ”transition density” is the probability density of the difference between a process at
different times, i.e. what is the probability that a process with change (or transition) from
value Xt1 = x to Xt2 = y between times t1 and t2 . In the case of a Wiener process this
transition density is a normal distribution with variance t2 − t1 = τ , i.e.

1 (y−x)2
f (y − x) = √ e 2τ (10.33)
2πτ

We can see that as the times t2 and t1 get closer and closer together τ → 0, this means
that the transition density gets narrower until eventually it is a single infinite value at
x = y and zero elsewhere. As it is a probability distribution function it still retains the
property that integrating over all possible values gives a result of 1.

10.3 Wave equation

The wave equation in 1-dimension is

∂2f ∂2f
2
= c2 2 (10.34)
∂t ∂x

For our example the equation represents the movement of a string tied at both ends 0
and l in the x dimension. The boundary conditions imposed are therefore f (0, t) = 0 and

82
f (l, t) = 0, we also have an initial displacement function f (x, 0) = g(x).
The first step in solving the wave equation is to separate it into two functions, one only a

function of x and one only a function of t, i.e. f (x, t) = u(x)v(t). We can then inset this
in to the PDE in Eq. (10.34) and rearrange as

∂ 2 u(x)v(t) 2
2 ∂ u(x)v(t)
= c (10.35)
∂t2 ∂x2
2
d v(t) d2 u(x)
2
⇒ u(x) = c v(t) (10.36)
dt2 ∂x2
2
d v(t) 1 2
d u(x) 1
⇒ 2
= c2 (10.37)
dt v(t) ∂x2 u(x)

As the left hand side of the equation only depends on t and right hand side of the equation
only depends on x we can say that both sides are equal to a constant, i.e.

1 d2 v(t) 1 d2 u(x) 1
2 2
= = −p2 (10.38)
c dt v(t) dx2 u(x)

(the reason for the negative sign on p2 will become apparent). This gives two second order
ordinary differential equations

d2 u(x)
+ p2 u(x) = 0 and (10.39)
dx2
d2 v(t)
+ p2 c2 v(t) = 0 (10.40)
dt2

which we can solve using some of the same techniques that we discussed in the previous
lectures. These are homogenous 2nd order ODEs, however unlike for our previous examples
where we were given the coefficients, in this case we must infer a value (or a range of values)
from the initial conditions. The first, most general question is whether we have a positive
or negative value of p2 as this will determine whether our roots are real or imaginary. So
in the case of real roots (p2 < 0), we have the general solution for u(x) of

u(x) = c1 epx + c2 e−px . (10.41)

We can first separate the initial conditions into f (0, t) = u(0)v(t) = 0 and f (l, t) =
u(l)v(t) = 0, so unless v(t) = 0∀t (trivial solution) then this gives us u(0) = 0 and
u(l) = 0 so the only way that we could obtain this for Eq. (10.41) is if c1 = c2 = 0 (again
the trivial solution). Therefore we must have imaginary roots (p2 > 0), giving a general
solution of

u(x) = c1 cos(px) + c2 sin(px). (10.42)

83
Solving this for the initial conditions gives c1 = 0 due to u(0) = 0 and so u(l) = c2 sin(pl) =
0, implying that if we wish to avoid the trivial solution of c1 = c2 = 0 then sin(pl) = 0,
i.e. pl = kπ where k is any integer. Therefore we have
 

u(x) = c2 sin x (10.43)
l

We will deal with the value of c2 when we have the full general solution for f (x, t). We can
then solve the other ODE in t. Using the fact that we know p2 that c2 > 0 by definition
of the problem, we have roots of ±pc = ±c kπ
l and a general solution of
      
kπ kπ kπ
f (x, t) = u(x)v(t) = ca cos c t + cb sin c t sin x (10.44)
l l l

where the constant c2 has been absorbed into ca and cb . As the above Eq. (10.44) holds
for every value of k then, by the linearity property of differentiation, it is also true that
the summation over all values of k can be used, i.e.

∞       
X kπ kπ kπ
f (x, t) = cak cos c t + cbk sin c t sin x (10.45)
l l l
k=1

In order to find the value of cak (notice that we have a different value for each value of k)
we use the initial condition f (x, 0) = g(x) for

∞  
X kπ
f (x, 0) = cak sin x = g(x) (10.46)
l
k=1

The summation is also a Fourier series. This means that we can find cak using
Z l  
2 kπ
cak = g(x) sin x dx. (10.47)
l 0 l

∂f (x,t)
To find cb we need to consider the first derivative of the initial condition ∂t |t=0 .

∞ 
     
∂f (x, t) X kπ kπ kπ kπ kπ
= −cak c sin c t + cbk c cos c t sin x

∂t l l l l l


t=0 k=1 t=0
(10.48)
∞  
X kπ kπ
= cbk c sin x . (10.49)
l l
k=1

∂f (x,t)
Very often ∂t |t=0 ≡ 0 (i.e. the string is stationary at t = 0) and then cbk ≡ 0. However
in the case that the string is moving at t = 0 then using the notation ∂f∂t
(x,t)
|t=0 = h(x)

84
then Eq. (10.49) is Fourier series and we can find cbk using
Z l  
2 kπ
cak = h(x) sin x dx. (10.50)
cbk kπ 0 l

10.4 Heat equation

The heat equation in one dimension is

∂f ∂2f
= d2 2 (10.51)
∂t ∂x

We again assume that f (x, t) = u(x)v(t) and separate the variables to obtain two ODEs,
as before.

d2 u(x)
+ p2 u(x) = 0 and (10.52)
dx2
dv(t)
+ p2 d2 v(t) = 0. (10.53)
dt

Depending on the boundary conditions, there are many different possible solutions to these
equations. However two of the most common boundary conditions are f (0, t) = u(0)v(t) =
0 and f (l, t) = u(l)v(t) = 0 leading to the boundary conditions of u(0) = u(l) = 0, as
before. By similar reasoning to that which we used for the wave equation, we can say that
p2 must be positive. Therefore the general solution for u(x) is

u(x) = c1 cos(px) + c2 sin(px). (10.54)

The general solution for v(t) is

2 d2 t
v(t) = c3 e−p . (10.55)

This gives a general solution of f (x, t) of

2 d2 t
f (x, t) = (c1 cos(px) + c2 sin(px))e−p (10.56)

where c3 has been absorbed into the constants c1 and c2 . The boundary conditions for
x = 0, L give c1 = 0, we ignore the possibility to set c2 = 0 as this gives the trivial solution

of f (t, x) ≡ 0. Therefore we have the general solution of pk = l where k is a positive
integer so we define κk = p2k d2 giving a general solution

∞  
X
−κk t kπ
f (x, t) = ck e sin x . (10.57)
l
k=1

85
Say we have the initial time condition f (x, 0) = g(x) we have

∞  
X kπ
f (x, 0) = ck sin x (10.58)
l
k=1

which is a Fourier series with ck the Fourier coefficients. We can then obtain ck using the
integral
Z l  
2 kπ
ck = g(x) sin x dx. (10.59)
l 0 l

Example:
Solve the heat equation when the boundary conditions are u(0, t) = u(l, t) = 0 and
u(x, 0) = g(x) = 1,
Z l 
2 kπx
ck = sin (10.60)
l 0 l
kπx l
  
2 l
=− cos (10.61)
l kπ l 0
2
= (1 − cos(kπ)) . (10.62)

4
For even k, cos(kπ) = 1 ⇒ ck = 0. For odd k, cos(kπ) = −1 ⇒ ck = kπ .
Our solution is therefore
∞  
X 4 (2n + 1)πx
f (x, t) = e−κ2n+1 t sin (10.63)
(2n + 1)π l
n=0

h i2
(2n+1)π
where κ2n+1 = l c .
For solutions on an infinitely long bar, solutions can be found using a similar technique
to the one above but replacing the Fourier series with the Fourier transform (see Kreyszig
for example), however there are also simpler methids taking advantage of the properties
of the Fourier transform. See Dr Guido Germano’s notes for stochastic processes week 1,
for example.

86
Chapter 11

Linear algebra, vectors and ma-


trices

We introduce some of the key concepts of vectors and matrices and look at ways to solve
systems of linear equations.

11.1 Basics

Vectors and matrices are often written using bold text with upper case used for matrices
and lower case for vectors, e.g.
! ! ! !
a11 a12 b11 b12 b13 c11 c12 x1
A= B= C= x= (11.1)
a21 a22 b21 b22 b23 c21 c22 x2

Notice the row, column convention for addressing the elements and that we drop the col-
umn number for vectors.

Addition and subtraction are done on an element wise basis, i.e.


! ! !
a11 a12 c11 c12 a11 ± c11 a12 ± c12
A±C= ± = (11.2)
a21 a22 c21 c22 a21 ± c21 a22 ± c22

and follow the usual rules of addition,

A+C=C+A commutativity (11.3)


(A + C) + D = A + (C + D) associativity (11.4)
A+0=A (11.5)
A + (−A) = 0 (11.6)

87
!
−a11 −a12
where D is a 2×2 matrix, 0 is a 2×2 matrix will all zero entries and −A = .
−a21 −a22
Multiplying by a scalar c is done by multiplying every element of a matrix with c, i.e.
!
ca11 ca12
cA = (11.7)
ca21 ca22

and follows familiar rules

c(A + C) = cC + cA (11.8)
(c + k)A = cA + kA (11.9)
c(kA) = (ck)A (11.10)
1A = A (11.11)

where k is another scalar.


Matrix multiplication does not follow the usual multiplication rules and is done:
! ! !
a11 a12 b11 b12 b13 a11 b11 + a12 b21 a11 b12 + a12 b22 a11 b13 + a12 b23
AB = × = .
a21 a22 b21 b22 b23 a21 b11 + a22 b21 a21 b12 + a22 b22 a21 b13 + a22 b23
(11.12)

Notice that multiplying a m × n matrix with a n × p matrix we obtain a m × p matrix.


The general element-wise rule for the multiplication of two matrices is as follows, if U is
a m × n and V is a n × p matrix and W = UV then W is a m × p matrix with elements

n
X
wij = uik vkj . (11.13)
k=1

It is important to understand that, unlike for the multiplication of scalars, UV 6= VU so


matrix multiplication is not commutative. It does, however conform to the other following
rules

(cA)B = A(cB) (11.14)


A(BC) = (AB)C associativity (11.15)
A(B + C) = AB + AC distributivity (11.16)
(B + C)A = BA + CA distributivity (11.17)

The transpose of a matrix, written AT or A0 , is a mirror of its elements along the diagonal,

88
i.e. using the definitions of A and B above
 
! b11 b21
a11 a21
AT = BT = 
 
b12 b22
 (11.18)
a12 a22  
b13 b23

Combining the transpose with multiplication we have (AB)T = BT AT , notice the change
in order. An important matrix when we look at multiplication is the identity matrix,
commonly written I or sometimes In to indicate that it is an n × n identity matrix and
has the property IA = AI = A. For example
 
1 0 0 0
!  
1 0 0 1 0 0
I2 = I4 =  (11.19)
 

0 1 0
 0 1 0

0 0 0 1

11.2 Solving systems of linear equations: Gaussian elim-


ination

An important application of matrices is to solve systems of linear equations. Starting with


the example of m linear equations in n variables xn :

a11 x1 + a12 x2 + ... + a1n xn = b1


a21 x1 + a22 x2 + ... + a2n xn = b2
...
am1 x1 + am2 x2 + ... + amn xn = bm (11.20)

where a· and b· are scalar coefficients. Creating matrices from the coefficients as follows
     
a11 a12 ... a1n x1 b1
     
 a21 a22 ... a2n   x2   b2 
A= . x= .  b= .  (11.21)
     
 .. .. .. ..   ..   .. 
 . . . 
    
am1 am2 ... amn xn bm

we can now write the system of linear equations as a single matrix equation

Ax = b. (11.22)

We can solve this equation by a method called Gaussian elimination.


Example 1:

89
Solve the following system of equations

x1 + x2 + x3 = −2
2x1 + 4x2 − 3x3 = 0
2x1 + 2x2 − x3 = 2 (11.23)

First create the matrices A and b


   
1 1 1 −2
   
A= 2 4 −3
 b=
 0 .
 (11.24)
2 2 −1 2

The next step is to create the “augmented” matrix A


e where b is added on the right hand
side of A
 
1 1−2 1
 
A
e = 2 4 −3 0 
  (11.25)
2 2 −1 2

The next step is to linearly combine the rows so that you end up with an upper triangular
matrix (i.e. put the system of equations into triangular form). We first subtract 2 × row 1
     
from row 3 to obtain 2 2 −1 2 −2 1 1 1 −2 = 2 − 2 2 − 2 −1 − 2 2 + 4 =
 
0 0 −3 6 so we now have

 
1 1 1 −2
 
A
e = 2 4 −3
 0 (11.26)
0 0 −3 6
   
The next step is to subtract 2 × row 1 from row 2 to obtain 2 4 −3 0 −2 1 1 1 −2 =
   
2 − 2 4 − 2 −3 − 2 0 + 4 = 0 2 −5 4 so we now have

 
1 1 1 −2
 
A
e = 0 2 −5
 4 (11.27)
0 0 −3 6

90
We can then extract 3 new equations from our augmented matrix and thus obtain a
solution.

−3x3 = 6 → x3 = −2 (11.28)
2x2 − 5x3 = 4 → 2x2 = 4 + 5(−2) = −6 → x2 = −3 (11.29)
x1 + x2 + x3 = −2 → x1 = −2 + 3 + 2 = 3 (11.30)

The system of equations in this problem has a single solution associated with it. If we
were to rearrange our augmented array but find something like
 
a11 a12 a13 b1
 
 0
 a22 a23 b2 (11.31)
0 0 0 0

the following then there are infinitely many possible solutions. However if it is something
like
 
a11 a12 a13 b1
 
 0
 a22 a23 b2 
 (11.32)
0 0 0 b3

where b3 6= 0 then there are no possible solutions.

11.3 Matrix rank

A closely related subject to the number of possible solutions is the rank of a matrix. If we
consider a matrix
 
a11 a12 ... a1n
 
 a21 a22 ... a2n 
A= . (11.33)
 
 .. .. .. .. 
 . . . 

an1 an2 ... ann
 
then it has a rank r if r ≤ n of its row vectors aj = aj1 aj2 ... ajn are linearly
independent from each other. For example for a1 linear independence means that that is
there are no set of constants cj such that a1 = c2 a2 + c3 a3 + ... + cn an . For example for
the following matrix
 
1 0 4 2
 
A=
2 3 0 5
 (11.34)
5 3 12 11

91
   
The first two row vectors a1 = 1 0 4 2 and a2 = 2 3 0 5 are linearly inde-
pendent of each other but a3 = 3a1 + a2 . Therefore the matrix is of rank 2.
It can be shown (see e.g. Kreyszig) that the column rank is the same as the row rank and
therefore for a n × m matrix where m 6= n the maximum possible rank is the smallest
dimension, i.e. min(n, m). The rank has some important implications for the solution of
systems of linear equations. If we have our matrix equation Ax = b as before then the
following rules apply

1. Solutions exist if and only if A and A


e have the same rank.

2. If the rank of A is the same as the number of unknowns then there is a single unique
solution.

3. If the rank is less than the number of unknowns then there are infinitely many
solutions.

11.4 Implication of rank for homogeneous systems of equa-


tions
A homogeneous system of equations,

a11 x1 + a12 x2 + ... + a1n xn = 0


a21 x1 + a22 x2 + ... + a2n xn = 0
...
am1 x1 + am2 x2 + ... + amn xn = 0 (11.35)

by inspection has the trivial solution of xi = 0 ∀ i ∈ {1, 2, ..n}. If the rank of A is


equal to n then the system of equations has only one solution and therefore there are
no non-trivial solutions available. Therefore for in order for non trivial solutions to be
available we must have r < n.

11.5 Matrix inversion and determinants


For an n × n matrix A, its inverse is written A−1 and has the property

AA−1 = A−1 A = In (11.36)

Most of you will have seen the simple equation for the inverse of a 2 × 2 matrix
!
1 a22 −a12
A−1 = (11.37)
detA −a21 a11

92

a
11 a12

Where detA is the determinant of A, written and is calculated as a11 a22 −a12 a21 .
a21 a22
We can also calculate the inverse of a matrix with n > 2 but the calculations become much
more computationally heavy and complicated.

11.5.1 Inverse of a product of matrices

Say A and B are square n × n matrices of full rank (we will explain the reason for this
condition later) then

(AB)−1 = B−1 A−1 (11.38)

A sketch proof of this is

ABB−1 A−1 = AIn A−1 = AA−1 = In (11.39)

11.5.2 Gauss - Jordan elimination.

We look at a general definition of the inverse of a n×n matrix in the next section. However,
there are techniques which are computationally easier to use. One of the most well-known
is Gauss-Jordan elimination. Which uses a similar technique to Gaussian elimination to
find the solution to a system of linear equations (i.e. Ax = b with unknown x). Instead
we wish to find the solution to the equation AX = I, where I is the identity matrix and
we can see from Eq. (11.36) that the solution to this will give X = A−1 . We illustrate
this with the following example (from Kreyszig).
 
−1 1 2
 
Example. Find the inverse of the 3 × 3 matrix 
 3 −1 1 

−1 3 4
First create the augmented matrix A e made up of [A I]
 
−1 1 2 1 0 0
 
A
e =  3 −1 1 0 1 0
  (11.40)
−1 3 4 0 0 1

We first create the upper triangular matrix using Gaussian elimination as before, i.e. first
add 3× row 1 yo row 2 and take 1× row 1 from row 3 to remove the first entries in those
rows to give
 
−1 1 2 1 0 0
 
A
e =  0 2 7 3 1 0 .
  (11.41)
0 2 2 −1 0 1

93
Then take (new) row 2 from (new) row 3 to remove the second entry in the third row to
give
 
−1 1 2 1 0 0
 
A
e = 0 2 7
 3 1 0 . (11.42)
0 0 −5 −4 −1 1

If this was the usual technique of Gaussian elimination we would stop here, however we
now carry on to turn the leftmost 3×3 matrix in A
e into the identity matrix. First multiply
row 1 by −1, row 2 by 0.5 and row 3 by −0.2 in order to obtain 1’s on the diagonal.
 
1 −1 −2 −1 0 0
 
A
e = 0 1 3.5 1.5 0.5
 0 . (11.43)
0 0 1 0.8 0.2 −0.2

Next remove the entries in the third column of rows 1 and 2 by subtracting 3.5× row 3
from row 2 and adding 2× row 3 to row 1 to give
 
1 −1 0 0.6 −0.4 0.4
 
A
e = 0
 1 .
0 −1.3 −0.2 0.7  (11.44)
0 0 1 0.8 0.2 −0.2

The final step is to add row 2 to row 1 in order to remove the second entry in row 1 and
finally recover the identity matrix.
 
1 0 0 −0.7 0.2 0.3
 
A
e = 0 1 0 −1.3 −0.2 0.7  .
  (11.45)
0 0 1 0.8 0.2 −0.2

The right most 3 × 3 matrix in A


e is now the inverse of A, i.e.
 
−0.7 0.2 0.3
A−1 = 
 
−1.3 −0.2 0.7  . (11.46)

0.8 0.2 −0.2

This result can be easily confirmed by calculating AA−1 using a high level programming
language such as MATLAB.

11.5.3 General definition of an matrix inverse

The Gauss Jordan elimination method is a straightforward algorithm to calculate the


inverse of a matrix. However there are some important properties of matrix inversion, in
particular determinants, that can be better understood if we look at the general definition

94
of a matrix inverse. In order to do this we need to know about the determinant and
cofactors of a matrix.

We use a 3 × 3 matrix as an example.


 
b11 b12 b13
 
B=
 b21 b22 b23

 (11.47)
b31 b32 b33

We first calculate the “minors” of a row or column in a matrix. The minor of a matrix
element is the determinant of the matrix with the corresponding row and column removed.
Using b23 as an example the minor is

b
11 b12

M23 = = b11 b32 − b12 b31 (11.48)
b31 b32

i.e. the determinant of a matrix which is equal to B with the second row and third column
removed. The cofactor of a matrix element bij is equal to Cij = (−1)i+j Mij . Selecting
any row i of the matrix we calculate the determinant as

D = bi1 Ci1 + bi2 Ci2 + bi3 Ci3 (11.49)

with the same result obtained regardless of the row selected. Similarly we can select any
column j and calculate the determinant as

D = b1j C1j + b2j C2j + b3j C3j . (11.50)

This not only gives the same result regardless of the column selected but also gives the
same result as the row calculation.
The inverse of the 3 × 3 matrix B is therefore
 
C11 C21 C31
1 
B−1

= C12 C22 C32  (11.51)
detB  
C13 C23 C33

Notice the swapping of the indices of the cofactors in B−1 relative to the elements of B.

In general the inverse of a n × n matrix A is


 
C11 C21 C31 ... Cn1
 
1  C12 C22 C32 ... Cn2 
A−1 = (11.52)

detA  ... .. .. .. .. 

 . . . . 

C1n C2n C3n ... Cnn

95
Where Cij are the cofactors of A and detA is the determinant of A (which is also calculated
using the cofactors). The determinant is calculated as

n
X
D= aij Cij or (11.53)
i=1
n
X
D= aji Cji (11.54)
i=1

where j can be selected as any integer between 1 and n. There are several rules for the
determinants of a matrix (see e.g. Kreyszig) but one of the most important ones for our
purposes is that it does not change if you add a (positive or negative) multiple of one row
to another row. Therefore for a n × n matrix with rank r < n there is at least one row
that can be changed to zero by adding multiples of one or more of the other rows. We can
choose any row to calculate our determinant and it is clear that if we selected a row where
all entries aij are zero, the determinant would also be zero and therefore the inverse, as
defined in Eq. (11.52) will be undefined.

This leads to a very important relationship between the rank and the determinant and
inverse of a matrix.

If the rank r of an n × n matrix is less than n, the determinant is zero and the
inverse is undefined.

11.6 Eigenvectors and eigenvalues

An important calculation for many applications is to find the eigenvectors and eigenvalues
of a matrix. The eigenvalue problem for a given n × n matrix A is concerned with the
equation

Ax = λx (11.55)

where λ is an unknown scalar and x is a vector of size n. That is, the problem is to find λ
and the corresponding vector x (x 6= 0 as this is the trivial solution) such that multiplying
A by x gives the same result as scaling x by λ. In order to solve the problem we first find
the eigenvalues. In order to do this we can see that

Ax − λx = (A − λI)x = 0 (11.56)

96
which can be written as the homogeneous system of equations

(a11 − λ)x1 + a12 x2 + ... + a1n xn = 0


a21 x1 + (a22 − λ)x2 + ... + a2n xn = 0
...
an1 x1 + an2 x2 + ... + (ann − λ)xn = 0. (11.57)

As explained in Section 11.4, in order to have a solution to this other than the trivial one
(x 6= 0 and λ = 0) then the rank of A − λI must be less than n. As explained in Section
11.5.3 a rank of less than n directly implies that the determinant is equal to 0 so, to find
the eigenvalues we solve the following equation

det(A − λI) = 0. (11.58)

! !
2 1 2−λ 1
Example: For the 2×2 matrix we have the “characteristic matrix”
1 2 1 2−λ
and thus a “characteristic polynomial” of

2 − λ 1
= (2 − λ)2 − 1 = λ2 − 4λ + 3 = (λ − 3)(λ − 1) (11.59)


1 2 − λ

so the eigenvalues are λ1 = 3 and λ2 = 1.


We find the eigenvectors by inserting them into (A − λI)x = 0 and finding the x which
corresponds to the eigenvalue. !
2 1
Example: For our eigenvalue λ1 = 3 from our matrix we obtain
1 2

! ! ! !
2−λ 1 x1 −x1 + x2 0
= = (11.60)
1 2−λ x2 x1 − x2 0

This does not give a unique solution for x as (by definition) (A − λI) is of!rank less than
1
2. However by selecting a value for x1 = 1 we can find an eigenvector . Similarly for
1
λ2 = 1 we obtain
! !
x1 + x2 0
= (11.61)
x1 + x2 0

!
1
and setting x1 = 1 we obtain . Clearly, from the rules of multiplying a matrix by a
−1
scalar, if x is an eigenvector then cx is also an eigenvector, where c is any scalar constant.
Therefore we can arbitrarily scale our eigenvector; it is common to normalise it so that

97
qP
n
x2 = 1. So in the case of the two eigenvectors we calculated above we divide by
p i=1 i √
x21 + x22 = 2 for our eigenvectors to be
   
√1 √1
 2 for λ1 = 3  2  for λ2 = 1 (11.62)
√1 − √12
2

11.6.1 Complex solutions

It is also possible
! to have complex solutions to the eigenvalue problem. For example, the
0 1
matrix yields the characteristic polynomial
−1 0

0 − λ 1
= λ2 + 1 (11.63)


−1 0 − λ

which has possible solutions λ1 = i, λ2 = −i feeding them back into the equation (A −
λI)x = 0 gives
! ! ! !
−i 1 x1 −ix1 + x2 0
= = (11.64)
−1 −i x2 x1 − ix2 0
 
√1
for λ1 = i, giving a (normalised) eigenvector of  2 . Similarly we obtain
√i
2

! ! ! !
i 1 x1 ix1 + x2 0
= = (11.65)
−1 i x2 x1 + ix2 0
 
√1
for λ2 = −i, giving a (normalised) eigenvector of  2 .
− √i2

11.6.2 Multiple eigenvectors

So far we have only looked at the situation where there is a single linearly independent
eigenvector for each eigenvalue. However it is the case that many different linearly inde-
pendent eigenvectors may !
exist for a single eigenvalue. An extreme example of this is the
1 0
identity matrix I = . Finding the characteristic polynomial yields
0 1

1 − λ 0
= (1 − λ)2 (11.66)


0 1 − λ

98
which has a single repeated root λ = 1. Putting this back into (A − λI)x = 0 yields
! ! ! !
1−λ 0 x1 0 × x1 + 0 × x2 0
= = (11.67)
0 1−λ x2 0 × x1 + 0 × x2 0

which is clearly true for an arbitrary choice of x1 and x2 . In fact this matches what we
already know about the identity matrix which is that

Ix = 1 × x (11.68)

for any vector x.

11.7 Diagonalisation of a matrix and finding matrix pow-


ers.

Diagonalisation is a powerful technique that allows us to perform various matrix calcula-


tions more efficiently and makes use of the matrix eigenvectors and eigenvalues. If a n × n
matrix has n eigenvectors then we can express it as

A = PDP−1 (11.69)

Where D is a diagonal matrix containing the eigenvalues and P contains the eigenvec-
tors as column vectors. Note that you must have n linearly independent eigenvectors to
diagonalise a matrix. However P is not necessarily unique in the case where there are
more than n multiple linearly independent eigenvectors, also eigenvalues can be repeated
on the diagonal if they have more than one linearly independent eigenvector. In order
to convince yourself of this think about “diagonalising” the identity matrix using XIX−1
where X can be any matrix.
Not every matrix can be diagonalised, however in general the following rules apply.

1. If you have n distinct eigenvalues the matrix can be diagonalised.

2. If you have less than n eigenvalues then you need at least n linearly independent
eigenvectors in total in order to be able to diagonalise the matrix.

The diagonalisation of a matrix can be useful for finding the powers of a matrix in a
computationally efficient way. We first recognise that if we have a diagonal matrix D
which we wish to raise to the power p then we do not need to go through the usual matrix

99
multiplication procedure, we simply need to calculate
 
dp11 0 0 ... 0
 

 0 dp22 0 ... 0 
p dp33
 
D =
 0 0 ... 0 . (11.70)
 .. .. .. .. .
.. 

 . . . . 

p
0 0 0 ... dnn

Furthermore if A = PDP−1 we can write Ap as

p
Ap = PDP−1 = PDP−1 PDP−1 PDP−1 ...PDP−1 PDP−1 . (11.71)

We know by definition that P−1 P = I and so obtain

p
Ap = PDP−1 = PDIDID...DIDP−1 = PDDD...DDP−1 = PDp P−1 . (11.72)

11.8 Positive-definite matrices

A common type of matrix is known as a positive-definite (or semidefinite matrix). The


definition of a positive-definite matrix is that for any real column vector x the matrix A
is positive-definite if

xT Ax > 0 (11.73)

For a positive-semidefinite matrix > is replaced by ≥. The Hermitian of ! a matrix is


1 1 + 2i
the conjugate transpose of a complex matrix, i.e. if C = then CH =
2 − 3i 4
!
1 2 + 3i
, if a matrix is purely real then the Hermitian is the same as the trans-
1 − 2i 4
pose. A Hermitian matrix is one for which CH = C.
An alternate definition of a positive-definite matrix is a Hermitian matrix whose eigenval-
ues are all positive (in the case of a positive-semidefinite matrix, the eigenvalues are all
non-negative.

11.9 Right and left pseudo-inverses

From before the definition of a true inverse is

AA−1 = A−1 A = In (11.74)

100
However, it is possible to create so called pseudo-inverses which give an identity matrix
when multiplied from the left or right only.
Let A be a m × n matrix, where m < n, which is of rank m then we can use the
property of matrix ranks that

Rank(AAT ) = Rank(AT A) = Rank(A). (11.75)

We can also see that, by the properties of matrix multiplication, that AAT gives an m×m
matrix which is of rank m according to Eq. (11.75) and thus of full rank and therefore
invertible. Therefore we can say that

AAT (AAT )−1 = Im (11.76)

and so we call AT (AAT )−1 the right hand pseudo-inverse of A as it gives the identity
matrix when multiplied by A from the right hand side.
Similarly A be a m × n matrix, where m > n, which is of rank n then we can use the
property of matrix ranks in Eq. (11.75) we can also see that that AT A gives an n × n
matrix which is of rank n and thus of full rank and therefore invertible. Therefore we can
say that

(AT A)−1 AT A = In (11.77)

and so we call (AT A)−1 AT the left hand pseudo-inverse of A as it gives the identity
matrix when multiplied by A from the left hand side.

101

You might also like