## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Finance

Jonathan Block

April 1, 2008

2

Information for the class

Oﬃce: DRL3E2-A

Telephone: 215-898-8468

Oﬃce Hours: Tuesday 1:30-2:30, Thursday, 1:30-2:30.

Email: blockj@math.upenn.edu

References:

1. Financial Calculus, an introduction to derivative pricing, by Martin

Baxter and Andrew Rennie.

2. The Mathematics of Financial Derivatives-A Student Introduction, by

Wilmott, Howison and Dewynne.

3. A Random Walk Down Wall Street, Malkiel.

4. Options, Futures and Other Derivatives, Hull.

5. Black-Scholes and Beyond, Option Pricing Models, Chriss

6. Dynamic Asset Pricing Theory, Duﬃe

I prefer to use my own lecture notes, which cover exactly the topics that I

want. I like very much each of the books above. I list below a little about

each book.

1. Does a great job of explaining things, especially in discrete time.

2. Hull—More a book in straight ﬁnance, which is what it is intended to

be. Not much math. Explains ﬁnancial aspects very well. Go here for

details about ﬁnancial matters.

3. Duﬃe— This is a full ﬂedged introduction into continuous time ﬁnance

for those with a background in measure theoretic probability theory.

Too advanced. But you might want to see how our course compares to

a PhD level course in this material.

3

4. Wilmott, Howison and Dewynne—Immediately translates the issues

into PDE. It is really a book in PDE. Doesn’t really touch much on

the probabilistic underpinnings of the subject.

Class grade will be based on homework and a take-home ﬁnal. I will try to

give homework every week and you will have a week to hand it in.

4

Syllabus

1. Probability theory. The following material will not be covered in class.

I am assuming familiarity with this material (from Stat 430). I will

hand out notes regarding this material for those of you who are rusty,

or for those of you who have not taken a probability course but think

that you can become comfortable with this material.

(a) Probability spaces and random variables.

(b) Basic probability distributions.

(c) Expectation and variance, moments.

(d) Bivariate distributions.

(e) Conditional probability.

2. Derivatives.

(a) What is a derivative security?

(b) Types of derivatives.

(c) The basic problem: How much should I pay for an option? Fair

price.

(d) Expectation pricing. Einstein and Bachelier, what they knew

about derivative pricing. And what they didn’t know.

(e) Arbitrage and no arbitrage. The simple case of futures. Arbitrage

arguments.

(f) The arbitrage theorem.

(g) Arbitrage pricing and hedging.

3. Discrete time stochastic processes and pricing models.

(a) Binomial methods without much math. Arbitrage and reassigning

probabilities.

5

(b) A ﬁrst look at martingales.

(c) Stochastic processes, discrete in time.

(d) Conditional expectations.

(e) Random walks.

(f) Change of probabilities.

(g) Martingales.

(h) Martingale representation theorem.

(i) Pricing a derivative and hedging portfolios.

(j) Martingale approach to dynamic asset allocation.

4. Continuous time processes. Their connection to PDE.

(a) Wiener processes.

(b) Stochastic integration..

(c) Stochastic diﬀerential equations and Ito’s lemma.

(d) Black-Scholes model.

(e) Derivation of the Black-Scholes Partial Diﬀerential Equation.

(f) Solving the Black Scholes equation. Comparison with martingale

method.

(g) Optimal portfolio selection.

5. Finer structure of ﬁnancial time series.

6

Chapter 1

Basic Probability

The basic concept in probability theory is that of a random variable. A

random variable is a function of the basic outcomes in a probability space.

To deﬁne a probability space one needs three ingredients:

1. A sample space, that is a set S of “outcomes” for some experiment.

This is the set of all “basic” things that can happen. This set can be

a discrete set (such as the set of 5-card poker hands, or the possible

outcomes of rolling two dice) or it can be a continuous set (such as an

interval of the real number line for measuring temperature, etc).

2. A sigma-algebra (or σ-algebra) of subsets of S. This means a set Ω of

subsets of S (so Ω itself is a subset of the power set T(S) of all subsets

of S) that contains the empty set, contains S itself, and is closed under

countable intersections and countable unions. That is when E

j

∈ Ω,

for j = 1, 2, 3, . . . is a sequence of subsets in Ω, then

∞

_

j=1

E

j

∈ Ω

and

∞

j=1

E

j

∈ Ω

In addition, if E ∈ Ω then its complement is too, that is we assume

E

c

∈ Ω

7

8 CHAPTER 1. BASIC PROBABILITY

The above deﬁnition is not central to our approach of probability but it

is an essential part of the measure theoretic foundations of probability

theory. So we include the deﬁnition for completeness and in case you

come across the term in the future.

When the basic set S is ﬁnite (or countably inﬁnite), then Ω is often

taken to be all subsets of S. As a technical point (that you should

probably ignore), when S is a continuous subset of the real line, this

is not possible, and one usually restricts attention to the set of subsets

that can be obtained by starting with all open intervals and taking

intersections and unions (countable) of them – the so-called Borel sets

[or more generally, Lebesgue measurable subsets of S). Ω is the collec-

tion of sets to which we will assign probabilities and you should think

of them as the possible events. In fact, we will most often call them

events.

3. A function P from Ω to the real numbers that assigns probabilities to

events. The function P must have the properties that:

(a) P(S) = 1, P(∅) = 0.

(b) 0 ≤ P(A) ≤ 1 for every A ∈ Ω

(c) If A

i

, i = 1, 2, 3, . . . is a countable (ﬁnite or inﬁnite) collection of

disjoint sets (i.e., A

i

∩ A

j

= ∅ for all i diﬀerent from j), then

P(∪A

i

)) =

P(A

i

).

These axioms imply that if A

c

is the complement of A, then P(A

c

) =

1 −P(A), and the principle of inclusion and exclusion: P(A∪ B) = P(A) +

P(B) −P(A ∩ B), even if A and B are not disjoint.

General probability spaces are a bit abstract and can be hard to deal

with. One of the purposes of random variables, is to bring things back to

something we are more familiar with. As mentioned above, a random variable

is a function X on a probability space (S, Ω, P) is a function that to every

outcome s ∈ S gives a real number X(s) ∈ R. For a subset A ⊂ R let’s

deﬁne X

−1

(A) to be the following subset of S:

X

−1

(A) = ¦s ∈ S[X(s) ∈ A¦

It may take a while to get used to what X

−1

(A) means, but do not think of

X

−1

as a function. In order for X

−1

to be a function we would need X to be

9

one-one, which we are not assuming. X

−1

(A) is just a subset of S as deﬁned

above. The point of this deﬁnition is that if A ⊂ R then X

−1

(A) ⊂ S,

that is X

−1

(A) is the event that X of the outcome will be in A. If for

A = (a, b) ⊂ R an inverval, X

−1

(A) ∈ Ω (this is a techinical assumption on

X, called measurability, and should really be part of the deﬁnition of random

variable.) then we can deﬁne P

X

(A) = P(X

−1

(A)).

Proposition 1.0.1. P

X

is a probability function on R.

So we have in some sense transferred the probability function to the real

line. P

X

is called the probability density of the random variable X.

In practice, it is the probability density that we deal with and mention

of (S, Ω, P) is omitted. They are lurking somewhere in the background.

1.0.1 Discrete Random Variables

These take on only isolated (discrete) values, such as when you are counting

something. Usually, but not always, the values of a discrete random variable

X are (subset of the) integers, and we can assign a probability to any subset

of the sample space, as soon as we know the probability of any set containing

one element, i.e., P(¦X = k¦) for all k. It will be useful to write p(k) =

P(¦X = k¦) — set p(k) = 0 for all numbers not in the sample space. p(k)

is called the probability density function (or pdf for short) of X. We repeat,

for discrete random variables, the value p(k) represents the probability that

the event ¦X = k¦ occurs. So any function from the integers to the (real)

interval [0, 1] that has the property that

∞

k=−∞

p(k) = 1

deﬁnes a discrete probability distribution.

1.0.2 Finite Discrete Random Variables

1. Uniform distribution – This is often encountered, e.g., coin ﬂips, rolls

of a single die, other games of chance: S is the set of whole numbers

from a to b (this is a set with b − a + 1 elements!), Ω is the set of all

subsets of S, and P is deﬁned by giving its values on all sets consisting

of one element each (since then the rule for disjoint unions takes over

10 CHAPTER 1. BASIC PROBABILITY

to calculate the probability on other sets). “Uniform” means that the

same value is assigned to each one-element set. Since P(S) = 1, the

value that must be assigned to each one element set is 1/(b −a + 1).

For example, the possible outcomes of rolling one die are ¦1¦, ¦2¦, ¦3¦, ¦4¦, ¦5¦

and ¦6¦. Each of these outcomes has the same probability, namely

1/6. We can express this by making a table, or specifying a function

f(k) = 1/6 for all k = 1, 2, 3, 4, 5, 6 and f(k) = 0 otherwise. Using

the disjoint union rule, we ﬁnd for example that P(¦1, 2, 5¦) = 1/2,

P(¦2, 3¦) = 1/3, etc..

2. Binomial distribution – ﬂip n fair coins, how many come up heads?

i.e., what is the probability that k of them come up heads? Or do a

sample in a population that favors the Democrat over the Republican

60 percent to 40 percent. What is the probability that in a sample of

size 100, more than 45 will favor the Republican?

The sample space S is 0, 1, 2, ....n since these are the possible outcomes

(number of heads, number of people favoring the Republican [n=100

in this case]). As before, the sigma algebra Ω is the set of all subsets

of S. The function P is more interesting this time:

P(¦k¦) =

_

n

k

_

2

n

where

n

k

is the binomial coeﬃcient

_

n

k

_

=

n!

k!(n −k)!

which equals the number of subsets of an n-element set that have ex-

actly k elements.

1.0.3 Inﬁnite Discrete Random Variables

3. Poisson distribution (with parameter α) – this arises as the number of

(random) events of some kind (such as people lining up at a bank, or

Geiger-counter clicks, or telephone calls arriving) per unit time. The

sample space S is the set of all nonnegative integers S = 0, 1, 2, 3, ....,

11

and again Ω is the set of all subsets of S. The probability function on

Ω is derived from:

P(¦k¦) = e

−α

α

k

k!

Note that this is an honest probability function, since we will have

P(S) = P(∪

∞

k=1

¦k¦) =

∞

k=1

e

−α

α

k

k!

= e

−α

∞

k=1

α

k

k!

= e

−α

e

α

= 1

1.0.4 Continuous Random Variables

A continuous random variable can take on only any real values, such as when

you are measuring something. Usually the values are (subset of the) reals,

and for technical reasons, we can only assign a probability to certain subsets

of the sample space (but there are a lot of them). These subsets, either

the collection of Borel sets ((sets that can be obtained by taking countable

unions and intersections of intervals)) or Lebesgue-measurable sets ((Borels

plus a some other exotic sets)) comprise the set Ω. As soon as we know the

probability of any interval, i.e., P([a, b]) for all a and b, we can calculate the

probabililty of any Borel set. Recall that in the discrete case, the probabilities

were determined once we knew the probabilities of individual outcomes. That

is P was determined by p(k). In the continuous case, the probability of a

single outcome is always 0, P(¦x¦) = 0. However, it is enough to know

the probabilities of “very small” intervals of the form [x, x + dx], and we

can calculate continuous probabilities as integrals of “probability density

functions”, so-called pdf’s. We think of dx as a very small number and then

taking its limit at 0. Thus, in the continuous case, the pdf is

p(x) = lim

dx→0

1

dx

P([x, x +dx])

so that

P[a, b] =

_

b

a

p(x)dx (1.0.1)

This is really the deﬁning equation of the continuous pdf and I can not stress

too strongly how much you need to use this. Any function p(x) that takes

on only positive values (they don’t have to be between 0 and 1 though), and

whose integral over the whole sample space (we can use the whole real line

if we assign the value p(x) = 0 for points x outside the sample space) is

12 CHAPTER 1. BASIC PROBABILITY

equal to 1 can be a pdf. In this case, we have (for small dx) that p(x)dx

represents (approximately) the probability of the set (interval) [x, x + dx]

(with error that goes to zero faster than dx does). More generally, we have

the probability of the set (interval) [a, b] is:

p([a, b]) =

_

b

a

p(x) dx

So any nonnegative function on the real numbers that has the property that

_

∞

−∞

p(x) dx = 1

deﬁnes a continuous probability distribution.

Uniform distribution

As with the discrete uniform distribution, the variable takes on values in

some interval [a, b] and all variables are equally likely. In other words, all

small intervals [x, x + dx] are equally likely as long as dx is ﬁxed and only

x varies. That means that p(x) should be a constant for x between a and

b, and zero outside the interval [a, b]. What constant? Well, to have the

integral of p(x) come out to be 1, we need the constant to be 1/(b −a). It is

easy to calculate that if a < r < s < b, then

p([r, s]) =

s −r

b −a

The Normal Distribution (or Gaussian distribution)

This is the most important probability distribution, because the distribution

of the average of the results of repeated experiments always approaches a

normal distribution (this is the “central limit theorem”). The sample space

for the normal distribution is always the entire real line. But to begin, we

need to calculate an integral:

I =

_

∞

−∞

e

−x

2

dx

We will use a trick that goes back (at least) to Liouville: Now note that

13

I

2

=

__

∞

−∞

e

−x

2

dx

_

__

∞

−∞

e

−x

2

dx

_

(1.0.2)

=

__

∞

−∞

e

−x

2

dx

_

__

∞

−∞

e

−y

2

dy

_

(1.0.3)

=

_

∞

−∞

_

∞

−∞

e

−x

2

−y

2

dxdy (1.0.4)

because we can certainly change the name of the variable in the second in-

tegral, and then we can convert the product of single integrals into a double

integral. Now (the critical step), we’ll evaluate the integral in polar coordi-

nates (!!) – note that over the whole plane, r goes from 0 to inﬁnity as θ

goes from 0 to 2π, and dxdy becomes rdrdθ:

I

2

=

_

2π

0

_

∞

0

e

−r

2

rdrdθ =

_

2π

0

1

2

e

−r

2

[

∞

0

dθ =

_

2π

0

1

2

dθ = π

Therefore, I =

√

π. We need to arrange things so that the integral is 1, and

for reasons that will become apparent later, we arrange this as follows: deﬁne

N(x) =

1

√

2π

e

−x

2

/2

Then N(x) deﬁnes a probability distribution, called the standard normal dis-

tribution. More generally, we deﬁne the normal distribution with parameters

µ and σ to be

p(x) =

1

σ

N(

x −µ

σ

) =

1

√

2πσ

e

−(x−µ)

2

2σ

2

Exponential distribution (with parameter α)

This arises when measuring waiting times until an event, or time-to-failure

in reliability studies. For this distribution, the sample space is the positive

part of the real line [0, ∞) (or we can just let p(x) = 0 for x < 0). The

probability density function is given by p(x) = αe

−αx

. It is easy to check

that the integral of p(x) from 0 to inﬁnity is equal to 1, so p(x) deﬁnes a

bona ﬁde probability density function.

14 CHAPTER 1. BASIC PROBABILITY

1.1 Expectation of a Random Variable

The expectation of a random variable is essentially the average value it is

expected to take on. Therefore, it is calculated as the weighted average of

the possible outcomes of the random variable, where the weights are just the

probabilities of the outcomes. As a trivial example, consider the (discrete)

random variable X whose sample space is the set ¦1, 2, 3¦ with probability

function given by p(1) = 0.3, p(2) = 0.1 and p(3) = 0.6. If we repeated this

experiment 100 times, we would expect to get about 30 occurrences of X = 1,

10 of X = 2 and 60 of X = 3. The average X would then be ((30)(1) +

(10)(2) + (60)(3))/100 = 2.3. In other words, (1)(0.3) + (2)(0.1) + (3)(0.6).

This reasoning leads to the deﬁning formula:

E(X) =

x∈S

xp(x)

for any discrete random variable. The notation E(X) for the expectation of

X is standard, also in use is the notation ¸X).

For continuous random variables, the situation is similar, except the sum

is replaced by an integral (think of summing up the average values of x by

dividing the sample space into small intervals [x, x +dx] and calculating the

probability p(x)dx that X fall s into the interval. By reasoning similar to

the previous paragraph, the expectation should be

E(X) = lim

dx→0

x∈S

xp(x)dx =

_

S

xp(x)dx.

This is the formula for the expectation of a continuous random variable.

Example 1.1.1. :

1. Uniform discrete: We’ll need to use the formula for

k that you

1.1. EXPECTATION OF A RANDOM VARIABLE 15

learned in freshman calculus when you evaluated Riemann sums:

E(X) =

b

k=a

k

1

b −a + 1

(1.1.1)

=

1

b −a + 1

_

b

k=0

k −

a−1

k=0

k

_

(1.1.2)

=

1

b −a + 1

_

b(b + 1)

2

−

(a −1)a

2

_

(1.1.3)

=

1

2

1

b −a + 1

(b

2

+b −a

2

+a) (1.1.4)

=

1

2

1

b −a + 1

(b −a + 1)(b +a) (1.1.5)

=

b +a

2

(1.1.6)

which is what we should have expected from the uniform distribution.

2. Uniform continuous: (We expect to get (b + a)/2 again, right?). This

is easier:

E(X) =

_

b

a

x

1

b −a

dx (1.1.7)

=

1

b −a

b

2

−a

2

2

=

b +a

2

(1.1.8)

3. Poisson distribution with parameter α: Before we do this, recall the

Taylor series formula for the exponential function:

e

x

=

∞

k=0

x

k

k!

Note that we can take the derivative of both sides to get the formula:

e

x

=

∞

k=0

kx

k−1

k!

If we multiply both sides of this formula by x we get

xe

x

=

∞

k=0

kx

k

k!

16 CHAPTER 1. BASIC PROBABILITY

We will use this formula with x replaced by α.

If X is a discrete random variable with a Poisson distribution, then its

expectation is:

E(X) =

∞

k=0

k

e

−α

α

k

k!

= e

−α

_

∞

k=0

kα

k

k!

_

= e

−α

(αe

α

) = α.

4. Exponential distribution with parameter α. This is a little like the

Poisson calculation (with improper integrals instead of series), and we

will have to integrate by parts (we’ll use u = x so du = dx, and

dv = e

−αx

dx so that v will be −

1

α

e

−αx

):

E(X) =

_

∞

0

xαe

−αx

dx = α

_

−

1

α

xe

−αx

−

1

α

2

e

−αx

_

[

∞

0

=

1

α

.

Note the diﬀerence between the expectations of the Poisson and expo-

nential distributions!!

5. By the symmetry of the respective distributions around their “centers”,

it is pretty easy to conclude that the expectation of the binomial dis-

tribution (with parameter n) is n/2, and the expectation of the normal

distribution (with parameters µ and σ) is µ.

6. Geometric random variable (with parameter r): It is deﬁned on the

sample space ¦0, 1, 2, 3, ...¦ of non-negative integers and, given a ﬁxed

value of r between 0 and 1, has probability function given by

p(k) = (1 −r)r

k

The geometric series formula:

∞

k=0

ar

k

=

a

1 −r

shows that p(k) is a probability function. We can compute the expec-

tation of a random variable X having a geometric distribution with

parameter r as follows (the trick is reminiscent of what we used on the

Poisson distribution):

E(X) =

∞

k=0

k(1 −r)r

k

= (1 −r)r

∞

k=0

kr

k−1

1.1. EXPECTATION OF A RANDOM VARIABLE 17

= (1 −r)r

d

dr

∞

k=0

r

k

= (1 −r)r

d

dr

1

1 −r

=

r

1 −r

7. Another distribution deﬁned on the positive integers ¦1, 2, 3, ...¦ by

the function p(k) =

6

π

2

k

2

. This is a probability function based on the

famous formula due to Euler :

∞

k=1

1

k

2

=

π

2

6

But an interesting thing about this distribution concerns its expecta-

tion: Because the harmonic series diverges to inﬁnity, we have that

E(X) =

6

π

2

∞

k=1

k

k

2

=

6

π

2

∞

k=1

1

k

= ∞

So the expectation of this random variable is inﬁnite (Can you interpret

this in terms of an experiment whose outcome is a random variable with

this distribution?)

If X is a random variable (discrete or continuous), it is possible to deﬁne

related random variables by taking various functions of X, say X

2

or sin(X)

or whatever.

If Y = f(X) for some function f, then the probability of Y being in some

set A is deﬁned to be the probability of X being in the set f

−1

(A).

As an example, consider an exponentially distributed random variable

X with parameter α=1. Let Y = X

2

. Since X can only be positive, the

probability that Y is in the interval [a, b] is the same as the probability that

X is in the interval [

√

a,

√

b].

We can calculate the probability density function p(y) of Y by recalling

that the probability that Y is in the interval [0, y] (actually (−∞, y]) is the

integral of p(y) from 0 to y. In other words, p(y) is the integral of the function

h(y), where h(y) = the probability that Y is in the interval [0, y]. But h(y)

is the same as the probability that X is in the interval [0,

√

y]. We calculate:

p(y) =

d

dy

_

√

y

0

e

−x

dx =

d

dy

_

−e

−x

[

√

y

0

_

=

d

dy

(1 −e

−

√

y

) =

1

2

√

y

e

−

√

y

There are two ways to calculate the expectation of Y . The ﬁrst is obvious:

we can integrate yp(y). The other is to make the change of variables y = x

2

in

18 CHAPTER 1. BASIC PROBABILITY

this integral, which will yield (Check this!) that the expectation of Y = X

2

is

E(Y ) = E(X

2

) =

_

∞

0

x

2

e

−x

dx

Theorem 1.1.2. (Law of the unconscious statistician) If f(x) is any function,

then the expectation of the function f(X) of the random variable X is

E(f(X)) =

_

f(x)p(x)dx or

x

f(x)p(x)

where p(x) is the probability density function of X if X is continuous, or the

probability density function of X if X is discrete.

1.2 Variance and higher moments

We are now lead more or less naturally to a discussion of the “moments” of

a random variable.

The r

th

moment of X is deﬁned to be the expected value of X

r

. In

particular the ﬁrst moment of X is its expectation. If s > r, then having

a ﬁnite s

th

moment is a more restrictive condition than having and r

th

one

(this is a convergence issue as x approaches inﬁnity, since x

r

> x

s

for large

values of x.

A more useful set of moments is called the set of central moments. These

are deﬁned to be the rth moments of the variable X −E(X). In particular,

the second moment of X −E(X) is called the variance of X and is denoted

σ

2

(X). (Its square root is the standard deviation.) It is a crude measure of

the extent to which the distribution of X is spread out from its expectation

value. It is a useful exercise to work out that

σ

2

(X) = E((X −E(X))

2

) = E(X

2

) −(E(X))

2

.

As an example, we compute the variance of the uniform and exponential

distributions:

1. Uniform discrete: If X has a discrete uniform distribution on the in-

terval [a, b], then recall that E(X) = (b + a)/2. We calculate Var(X)

as follows:

Var(X) = E(X

2

)−

_

b +a

2

_

2

=

b

k=a

k

2

b −a + 1

−

_

b +a

2

_

2

=

1

12

(b−a+2)(b−a).

1.3. BIVARIATE DISTRIBUTIONS 19

2. Uniform continuous: If X has a continuous uniform distribution on the

interval [a, b], then its variance is calculated as follows:

σ

2

(X) =

_

b

a

x

2

b −a

dx −

_

b +a

2

_

2

=

1

12

(b −a)

2

.

3. Exponential with parameter α: If X is exponentially distributed with

parameter α, recall that E(X) = 1/α. Thus:

Var(x) =

_

∞

0

x

2

αe

−αx

dx −

1

α

2

=

1

α

2

The variance decreases as α increases; this agrees with intuition gained

from the graphs of exponential distributions shown last time.

1.3 Bivariate distributions

It is often the case that a random experiment yields several measurements (we

can think of the collection of measurements as a random vector) – one simple

example would be the numbers on the top of two thrown dice. When there are

several numbers like this, it is common to consider each as a random variable

in its own right, and to form the joint probability density (or joint probability)

function p(x, y, z, ...). For example, in the case of two dice, X and Y are

discrete random variables each with sample space S = ¦1, 2, 3, 4, 5, 6¦, and

p(x, y) = 1/36 for each (x, y) in the cartesian product SS. More generally,

we can consider discrete probability functions p(x, y) on sets of the form

S T, where X ranges over S and Y over T. Any function p(x, y) such that

p(x, y) is between 0 and 1 for all (x, y) and such that

x∈S

y∈T

p(x, y) = 1

deﬁnes a (joint) probability function on S T.

For continuous functions, the idea is the same. p(x, y) is the joint pdf of

X and Y if p(x, y) is non-negative and satisﬁes

_

∞

−∞

_

∞

−∞

p(x, y)dxdy = 1

20 CHAPTER 1. BASIC PROBABILITY

We will use the following example of a joint pdf throughout the rest of this

section:

X and Y will be random variables that can take on values (x, y) in the

triangle with vertices (0, 0), (2, 0) and (2, 2). The joint pdf of X and Y will

be given by p(x, y) = 1/(2x) if (x, y) is in the triangle and 0 otherwise. To

see that this is a probability density function, we need to integrate p(x, y)

over the triangle and get 1:

_

2

0

_

x

0

1

2x

dydx =

_

2

0

_

y

2x

_

[

x

0

dx =

_

2

0

1

2

dx = 1.

For discrete variables, we let p(i, j) be the probability that X = i and

Y = j, P(X = i ∩ Y = j). This give a function p, the joint probability

function, of X and Y that is deﬁned on (some subset of) the set of pairs of

integers and such that 0 ≤ p(i, j) ≤ 1 for all i and j and

∞

i=−∞

∞

j=−∞

p(i, j) = 1

When we ﬁnd it convenient to do so, we will set p(i, j) = 0 for all i and j

outside the domain we are considering.

For continuous variables, we deﬁne the joint probability density function

p(x, y) on (some subset of) the plane of pairs of real numbers. We interpret

the function as follows: p(x, y)dxdy is (approximately) the probability that

X is between x and x + dx and Y is between y and y + dy (with error that

goes to zero faster than dx and dy as they both go to zero). Thus, p(x, y)

must be a non-negative valued function with the property that

_

∞

−∞

_

∞

−∞

p(x, y) dxdy = 1

As with discrete variables, if our random variables always lie in some subset

of the plane, we will deﬁne p(x, y) to be 0 for all (x, y) outside that subset.

We take one simple example of each kind of random variable. For the

discrete random variable, we consider the roll of a pair of dice. We assume

that we can tell the dice apart, so there are thirty-six possible outcomes and

each is equally likely. Thus our joint probability function will be

p(i, j) = 1/36 if 1 ≤ i ≤ 6, 1 ≤ j ≤ 6

1.4. MARGINAL DISTRIBUTIONS 21

and p(i, j) = 0 otherwise.

For our continuous example, we take the example mentioned at the end

of the last lecture:

p(x, y) =

1

2x

for (x, y) in the triangle with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0

otherwise. We checked last time that this is a probability density function

(its integral is 1).

1.4 Marginal distributions

Often when confronted with the joint probability of two random variables,

we wish to restrict our attention to the value of just one or the other. We

can calculate the probability distribution of each variable separately in a

straightforward way, if we simply remember how to interpret probability

functions. These separated probability distributions are called the marginal

distributions of the respective individual random variables.

Given the joint probability function p(i, j) of the discrete variables X and

Y , we will show how to calculate the marginal distributions p

X

(i) of X and

p

Y

(j) of Y . To calculate p

X

(i), we recall that p

X

(i) is the probability that

X = i. It is certainly equal to the probability that X = i and Y = 0, or

X = i and Y = 1, or . . . . In other words the event X = i is the union of the

events X = i and Y = j as j runs over all possible values. Since these events

are disjoint, the probability of their union is the sum of the probabilities of

the events (namely, the sum of p(i, j)). Thus:

p

X

(i) =

∞

j=−∞

p(i, j)

. Likewise,

p

Y

(j) =

∞

i=−∞

p(i, j)

Make sure you understand the reasoning behind these two formulas!

An example of the use of this formula is provided by the roll of two

dice discussed above. Each of the 36 possible rolls has probability 1/36 of

22 CHAPTER 1. BASIC PROBABILITY

occurring, so we have probability function p(i, j) as indicated in the following

table:

i¸j 1 2 3 4 5 6 p

X

(i)

1 1/36 1/36 1/36 1/36 1/36 1/36 1/6

2 1/36 1/36 1/36 1/36 1/36 1/36 1/6

3 1/36 1/36 1/36 1/36 1/36 1/36 1/6

4 1/36 1/36 1/36 1/36 1/36 1/36 1/6

5 1/36 1/36 1/36 1/36 1/36 1/36 1/6

6 1/36 1/36 1/36 1/36 1/36 1/36 1/6

p

Y

(j) 1/6 1/6 1/6 1/6 1/6 1/6

The marginal probability distributions are given in the last column and last

row of the table. They are the probabilities for the outcomes of the ﬁrst (resp

second) of the dice, and are obtained either by common sense or by adding

across the rows (resp down the columns).

For continuous random variables, the situation is similar. Given the joint

probability density function p(x, y) of a bivariate distribution of the two

random variables X and Y (where p(x, y) is positive on the actual sample

space subset of the plane, and zero outside it), we wish to calculate the

marginal probability density functions of X and Y . To do this, recall that

p

X

(x)dx is (approximately) the probability that X is between x and x +dx.

So to calculate this probability, we should sum all the probabilities that both

X is in [x, x +dx] and Y is in [y, y +dy] over all possible values of Y . In the

limit as dy approaches zero,this becomes an integral:

p

X

(x)dx =

__

∞

−∞

p(x, y) dy

_

dx

In other words,

p

X

(x) =

_

∞

−∞

p(x, y) dy

Similarly,

p

Y

(y) =

_

∞

−∞

p(x, y) dx

Again, you should make sure you understand the intuition and the reasoning

behind these important formulas.

We return to our example:

p(x, y) =

1

2x

1.5. FUNCTIONS OF TWO RANDOM VARIABLES 23

for (x, y) in the triangle with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0

otherwise, and compute its marginal density functions. The easy one is p

X

(x)

so we do that one ﬁrst. Note that for a given value of x between 0 and 2, y

ranges from 0 to x inside the triangle:

p

X

(x) =

_

∞

−∞

p(x, y) dy =

_

x

0

1

2x

dy

=

_

y

2x

_¸

¸

¸

y=x

y=0

=

1

2

if 0 ≤ x ≤ 2, and p

X

(x) = 0 otherwise. This indicates that the values of X

are uniformly distributed over the interval from 0 to 2 (this agrees with the

intuition that the random points occur with greater density toward the left

side of the triangle but there is more area on the right side to balance this

out).

To calculate p

Y

(y), we begin with the observation that for each value of

y between 0 and 2, x ranges from y to 2 inside the triangle:

p

Y

(y) =

_

∞

−∞

p(x, y) dx =

_

2

x

1

2x

dx

=

_

1

2

ln(x)

_¸

¸

¸

¸

x=2

x=y

=

1

2

(ln(2) −ln(y))

if 0 ≤ y ≤ 2 and p

Y

(y) = 0 otherwise. Note that p

Y

(y) approaches inﬁnity as

y approaches 0 from above, and p

Y

(y) approaches 0 as y approaches 2. You

should check that this function is actually a probability density function on

the interval [0, 2], i.e., that its integral is 1.

1.5 Functions of two random variables

Frequently, it is necessary to calculate the probability (density) function of

a function of two random variables, given the joint probability (density)

function. By far, the most common such function is the sum of two random

variables, but the idea of the calculation applies in principle to any function

of two (or more!) random variables.

The principle we will follow for discrete random variables is as follows:

to calculate the probability function for F(X, Y ), we consider the events

24 CHAPTER 1. BASIC PROBABILITY

¦F(X, Y ) = f¦ for each value of f that can result from evaluating F at

points of the sample space of (X, Y ). Since there are only countably many

points in the sample space, the random variable F that results is discrete.

Then the probability function p

F

(f) is

p

F

(f) =

{(x,y)∈S|F(x,y)=f}

p(x, y)

This seems like a pretty weak principle, but it is surprisingly useful when

combined with a little insight (and cleverness).

As an example, we calculate the distribution of the sum of the two dice.

Since the outcome of each of the dice is a number between 1 and 6, the

outcome of the sum must be a number between 2 and 12. So for each f

between 2 and 12:

p

F

(f) =

∞

x=−∞

p(x, f −x) =

min(6,f−1)

max(1,f−6)

1

36

=

1

36

(6 −[f −7[)

A table of the probabilities of various sums is as follows:

f 2 3 4 5 6 7 8 9 10 11 12

p

F

(f) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

The ”tent-shaped” distribution that results is typical of the sum of (inde-

pendent) uniformly distributed random variables.

For continuous distributions, our principle will be a little more compli-

cated, but more powerful as well. To enunciate it, we recall that to calculate

the probability of the event F < f, we integrate the pdf of F from −∞ to f:

P(F < f) =

_

f

−∞

p

F

(g) dg

Conversely, to recover the pdf of F, we can diﬀerentiate the resulting func-

tion:

p

F

(f) =

d

df

_

f

−∞

p

F

(g) dg

(this is simply the ﬁrst fundamental theorem of calculus). Our principle for

calculating the pdf of a function of two random variables F(X, Y ) will be to

calculate the probabilities of the events ¦F(X, Y ) < f¦ (by integrating the

1.5. FUNCTIONS OF TWO RANDOM VARIABLES 25

joint pdf over the region of the plane deﬁned by this inequality), and then

to diﬀerentiate with respect to f to get the pdf.

We apply this principle to calculate the pdf of the sum of the random

variables X and Y in our example:

p(x, y) =

1

2x

for (x, y) in the triangle T with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0

otherwise. Let Z = X + Y . To calculate the pdf p

Z

(z), we ﬁrst note that

for any ﬁxed number z, the region of the plane where Z < z is the half

plane below and to the left of the line y = z −x. To calculate the probability

P(Z < z), we must integrate the joint pdf p(x, y) over this region. Of course,

for z ≤ 0, we get zero since the half plane z < 0 has no points in common

with the triangle where the pdf is supported. Likewise, since both X and

Y are always between 0 and 2 the biggest the sum can be is 4. Therefore

P(Z < z) = 1 for all z ≥ 4.

For z between 0 and 4, we need to integrate 1/2x over the intersection of

the half-plane x +y < z and the triangle T. The shape of this intersection is

diﬀerent, depending upon whether z is greater than or less than 2: If 0 ≤ z ≤

2, the intersection is a triangle with vertices at the points (0, 0), (z/2, z/2)

and (z, 0). In this case, it is easier to integrate ﬁrst with respect to x and

then with respect to y, and we can calculate:

P(Z < z) =

_

z/2

0

_

z−y

y

1

2x

dxdy =

1

2

_

z/2

0

_

ln(x)[

x=z−y

x=y

_

dy

=

1

2

_

z/2

0

(ln(z −y) −ln(y)) dy =

1

2

(z −(z −y) ln(z −y) −y ln(y))[

y=z/2

y=0

=

1

2

z ln(2)

And since the (cumulative) probability that Z < z is

1

2

z ln(2) for 0 < z < 2,

the pdf over this range is p

Z

(z) =

ln(2)

2

.

The calculation of the pdf for 2 ≤ z ≤ 4 is somewhat trickier because

the intersection of the half-plane x + y < z and the triangle T is more

complicated. The intersection in this case is a quadrilateral with vertices

at the points (0, 0), (z/2, z/2), (2, z − 2) and (2, 0). We could calculate

P(Z < z) by integrating p(x,y) over this quadrilateral. But we will be a

26 CHAPTER 1. BASIC PROBABILITY

little more clever: Note that the quadrilateral is the ”diﬀerence” of two sets.

It consists of points inside the triangle with vertices (0, 0), (z/2, z/2), (z, 0)

that are to the left of the line x = 2. In other words it is points inside this

large triangle (and note that we already have computed the integral of 1/2x

over this large triangle to be

1

2

z ln(2)) that are NOT inside the triangle with

vertices (2, 0), (2, z − 2) and (z, 0). Thus, for 2 ≤ z ≤ 4, we can calculate

P(Z < z) as

P(Z < z) =

1

2

z ln(2) −

_

z

2

_

z−x

0

1

2x

dydx

=

1

2

z ln(2) −

_

z

2

_

y

2x

[

y=z−x

y=0

_

dx

=

1

2

z ln(2) −

_

z

2

z −x

2x

dx =

1

2

ln(2) +

1

2

(z ln(x) −x)[

x=z

x=2

=

1

2

z ln(2) −

1

2

(z ln(z) −z −z ln(2) + 2)

= z ln(2) −

1

2

z ln(z) +

1

2

z −1

To get the pdf for 2 < z < 4, we need only diﬀerentiate this quantity, to get

p

Z

(z) =

d

dz

(z ln(2) −

1

2

z ln(z) +

1

2

z −1) = ln(2) −

1

2

ln(z)

Now we have the pdf of Z = X +Y for all values of z. It is p

Z

(z) = ln(2)/2

for 0 < z < 2, it is p

Z

(z) = ln(2) −

1

2

ln(z) for 2 < z < 4 and it is 0 otherwise.

It would be good practice to check that the integral of p

Z

(z) is 1.

The following fact is extremely useful.

Proposition 1.5.1. If X and Y are two random variables, then E(X+Y ) =

E(X) +E(Y ).

1.6 Conditional Probability

In our study of stochastic processes, we will often be presented with situations

where we have some knowledge that will aﬀect the probability of whether

some event will occur. For example, in the roll of two dice, suppose we already

know that the sum will be greater than 7. This changes the probabilities

from those that we computed above. The event F ≥ 8 has probability

1.6. CONDITIONAL PROBABILITY 27

(5 + 4 + 3 + 2 + 1)/36 = 15/36. So we are restricted to less than half of the

original sample space. We might wish to calculate the probability of getting

a 9 under these conditions. The quantity we wish to calculate is denoted

P(F = 9[F ≥ 8), read ”the probability that F=9 given that F ≥ 8”.

In general to calculate P(A[B) for two events A and B (it is not necessary

that A is a subset of B), we need only realize that we need to compute the

fraction of the time that the event B is true, it is also the case the A is true.

In symbols, we have

P(A[B) =

P(A ∩ B)

P(B)

For our dice example (noting that the event F = 9 is a subset of the event

F ≥ 8), we get

P(F = 9[F ≥ 8) =

P(F = 9)

P(F ≥ 8)

=

4/36

15/36

=

4

15

.

As another example (with continuous probability this time), we calculate

for our 1/2x on the triangle example the conditional probabilities: P(X >

1[Y > 1) as well as P(Y > 1[X > 1) (just to show that the probabilities of

A given B and B given A are usually diﬀerent).

First P(X > 1[Y > 1). This one is easy! Note that in the triangle with

vertices (0, 0), (2, 0) and (2, 2) it is true that Y > 1 implies that X > 1.

Therefore the events (Y > 1) ∩ (X > 1) and Y > 1 are the same, so the

fraction we need to compute will have the same numerator and denominator.

Thus P(X > 1[Y > 1) = 1.

For P(Y > 1[X > 1) we actually need to compute something. But note

that Y > 1 is a subset of the event X > 1 in the triangle, so we get:

P(Y > 1[X > 1) =

P(Y > 1)

P(X > 1)

=

_

2

1

_

x

1

1

2x

dydx

_

2

1

_

x

0

1

2x

dydx

=

1

2

(1 −ln(2))

1

2

= 1 −ln(2)

Two events A and B are called independent if the probability of A given

B is the same as the probability of A (with no knowledge of B) and vice

versa. The assumption of independence of certain events is essential to many

probabilistic arguments. Independence of two random variables is expressed

by the equations:

P(A[B) = P(A) P(B[A) = P(B)

28 CHAPTER 1. BASIC PROBABILITY

and especially

P(A ∩ B) = P(A) P(B)

Two random variables X and Y are independent if the probability that

a < X < b remains unaﬀected by knowledge of the value of Y and vice versa.

This reduces to the fact that the joint probability (or probability density)

function of X and Y ”splits” as a product:

p(x, y) = p

X

(x)p

Y

(y)

of the marginal probabilities (or probability densities). This formula is a

straightforward consequence of the deﬁnition of independence and is left as

an exercise.

Proposition 1.6.1. 1. If X is a random variable, then σ

2

(aX) = a

2

σ

2

(X)

for any real number a.

2. If X and Y are two independent random variables, then σ

2

(X + Y ) =

σ

2

(X) +σ

2

(Y ).

1.7 Law of large numbers

The point of probability theory is to organize the random world. For many

centuries, people have noticed that, though certain measurements or pro-

cesses seem random, one can give a quantitative account of this randomness.

Probability gives a mathematical framework for understanding statements

like the chance of hitting 13 on a roulette wheel is 1/38. This belief about

the way the random world works is embodied in the empirical law of aver-

ages. This law is not a mathematical theorem. It states that if a random

experiment is performed repeatedly under identical and independent condi-

tions, then the proportion of trials that result in a given outcome converges

to a limit as the number of trials goes to inﬁnity. We now state for future

reference the strong law of large numbers. This theorem is the main rigorous

result giving credence to the empirical law of averages.

Theorem 1.7.1. (The strong law of large numbers) Let X

1

, X

2

, . . . be a se-

quence of independent random variables, all with the same distribution,

whose expectation is µ. Then

P( lim

n→∞

n

i=1

X

i

n

→ µ) = 1

1.7. LAW OF LARGE NUMBERS 29

Exercises

1. Make your own example of a probability space that is ﬁnite and discrete.

Calculate the expectation of the underlying random variable X.

2. Make your own example of a probability space that is inﬁnite and

discrete. Calculate the expectation of the underlying random variable

X.

3. Make your own example of a continuous random variable. Calculate

its expectation.

4. Prove that the normal distribution function

φ(x; µ, σ) =

1

σ

√

2π

e

−(x−µ)

2

2σ

2

is really a probability density function on the real line (i.e., that it

is positive and that its integral from −∞ to ∞ is 1). Calculate the

expectation of this random variable.

5. What is the relationship among the following partial derivatives?

∂φ

∂σ

,

∂

2

φ

∂x

2

, and

∂φ

∂t

, where t = σ

2

(for the last one, rewrite φ as a function of

x, µ and t).

6. Consider the experiment of picking a point at random from a uniform

distribution on the disk of radius R centered at the origin in the plane

(“uniform” here means if two regions of the disk have the same area,

then the random point is equally likely to be in either one). Calculate

the probability density function and the expectation of the random

variable D, deﬁned to be the distance of the random point from the

origin.

7. Prove the principle of inclusion and exclusion, p(A∪B) = p(A)+p(B)−

p(A ∩ B) and p(A

c

) = 1 −p(A).

8. If S is a ﬁnite set with n elements, how many elements are in T(S)

the power set of S. (Hint, try it for the ﬁrst few values of n, and don’t

forget to count the empty set.)

30 CHAPTER 1. BASIC PROBABILITY

9. Show that for a function f : S →R that for A

1

, A

2

, A

3

, . . ., a collection

of subsets of the real numbers, that

f

−1

(∪

∞

j=1

A

j

) = ∪

∞

j=1

f

−1

(A

j

)

Use this to prove that if f is a random variable on a probability space

(S, Ω, P then its distribution P

f

as deﬁned above is also a probability.

10. For each of the examples of random variables you gave in problems 1,

2 and 3 calculate the expectation and the variance, if they exist.

11. Calculate the variance of the binomial, Poisson and normal distribu-

tions. The answer for the normal distribution is σ

2

.

12. Let X be any random variable with ﬁnite second moment. Consider

the function f(a) deﬁned as follows: f(a) = E((X − a)

2

). Show that

the minimum value of f occurs when a = E(X).

13. Fill in the details of the calculation of the variance of the uniform and

exponential distributions. Also, prove that

σ

2

(X) = E((X −E(X))

2

) = E(X

2

) −(E(X))

2

.

14. Let X be a random variable with the standard normal distribution.

ﬁnd the mean and variance of each of the following:[X[, X

2

and e

tX

.

15. Let X be the sine of an angle in radians chosen uniformly from the

interval (−π/2, π/2). Find the pdf of X, its mean and its variance.

16. Calculate the variance of the binomial, Poisson and normal distribu-

tions. The answer for the normal distribution is σ

2

.

17. Let p(x, y) be the uniform joint probability density on the unit disk,

i.e.,

p(x, y) =

1

π

if x

2

+y

2

< 1

and p(x, y) = 0 otherwise. Calculate the pdf of X +Y .

18. Suppose X and Y are independent random variables, each distributed

according to the exponential distribution with parameter α. Find the

joint pdf of X and Y (easy). Find the pdf of X + Y . Also ﬁnd the

mean and variance of X +Y .

Chapter 2

Aspects of ﬁnance and

derivatives

The point of this book is to develop some mathematical techniques that can

be used in ﬁnance to model movements of assets and then to use this to solve

some interesting problems that occur in concrete practice.

Two of the problems that we will encounter is asset allocation and deriva-

tive valuation. The ﬁrst, asset allocation, asks how do we allocate our re-

sources in various assets to achieve our goals? These goals are usually phrased

in terms of how much risk we are willing to acquire, how much income we

want to derive from the assets and what level of return we are looking for.

The second, derivative valuation, asks to calculate an intrinsic value of a

derivative security (for example, an option on a stock). Though it might

seem more esoteric, the second question is actually more basic than the ﬁrst

(at least in our approach to the problem). Thus we will attack derivative

valuation ﬁrst and then apply it to asset allocation.

2.1 What is a derivative security?

What is a derivative security? By this we mean any ﬁnancial instrument

whose value depends on (i.e. is derived from ) an underlying security. The

underlying in theory could be practically anything: a stock, a stock index,

a bond or much more exotic things. Our most fundamental example is an

option on a stock. An options contract gives the right but not the obligation

to the purchaser of the contract to buy (or sell) a speciﬁed (in our case ) stock

31

32 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

at a speciﬁc price at a speciﬁc future time. For example, I could buy a (call)

option to buy 100 shares of IBM at $90 a share on January 19. Currently,

this contract is selling for $19

1

2

. The seller of this option earns the $19

1

2

(the

price (premium) of the options contract) and incurs the obligation to sell me

100 shares of IBM stock on January 31 for $90 if I should so choose. Of

course, when January 19 rolls around, I would exercise my option only if the

price of IBM is more than $90.

Another well known and simpler example is a forward. A forward contract

is an agreement/contract between two parties that gives the right and the

obligation to one of the parties to purchase (or sell) a speciﬁed commodity

(or ﬁnancial instrument) at a speciﬁed price at a speciﬁc future time. No

money changes hands at the outset of the agreement.

For example, I could buy a June forward contract for 50,000 gallons of frozen

orange juice concentrate for $.25/gallon. This would obligate me to take

delivery of the frozen orange juice concentrate in June at that price. No

money changes hands at the establishment of the contract.

A similar notion is that of a futures contract. The diﬀerence between a for-

ward and a future contract is that futures are generally traded on exchanges

and require other technical aspects, such as margin accounts.

2.2 Securities

The two most common types of securities are an ownership stake or a debt

stake in some entity, for example a company or a government. You become

a part owner in a company by buying shares of the company’s stock. A debt

security is usually acquired by buying a company’s or a government’s bonds.

A debt security like a bond is a loan to a company in exchange for interest

income and the promise to repay the loan at a future maturity date.

Securities can either be sold in a market (an exchange) or over the counter.

When sold in an exchange the price of the security is subject to market forces.

When sold over the counter, the price is arrived at through negotiation by

the parties involved.

Securities sold on an exchange are generally handled by market makers

or specialists. These are traders whose job it is to always make sure that

the security can always be bought or sold at some price. Thus they provide

liquidity for the security involved. Market makers must quote bid prices (the

amount they are willing to buy the security for)and ask prices (the amount

2.3. TYPES OF DERIVATIVES 33

they are willing to sell the security for) of the securities. Of course the bid

price is usually greater than the ask and this is one of the main ways market

makers make their money. The spot price is the amount that the security is

currently selling for. That is, we will usually assume that the bid and ask

prices are the same. This is one of many idealizations that we will make as

we analyze derivatives.

2.3 Types of derivatives

A European call option (on a stock) is a contract that gives the buyer the

right but not the obligation to buy a speciﬁed stock for a speciﬁed price K,

the strike price, at a speciﬁed date, T, the expiration date from the seller of

the option. If the owner of the option can exercise the option any time up

to T, then it is called an American call.

A put is just the opposite. It gives the owner of the contract the right to

sell. It also comes in the European and American varieties.

Some terminology: If one owns an asset, one is said to have a long position

in that asset. Thus is I own a share of IBM, I am long one share in IBM.

The opposite position is called short. Thus if I owe a share of IBM, I am

short one share of IBM. Short selling is a basic ﬁnancial maneuver. To short

sell, one borrows the stock (from your broker, for example), sells it in the

market. In the future, one must return the stock (together with any dividend

income that the stock earned). Short selling allows negative amounts of a

stock in a portfolio and is essential to our ability to value derivatives.

One can have long or short positions in any asset, in particular in options.

Being long an option means owning it. Be short an option means that you

are the writer of the option.

These basic call and put options are sold on exchanges and their prices

are determined by market forces.

There are also many types of exotic options.

Asian options: these are options allowing one to buy a share of stock for the

average value of the stock over the lifetime of the option. Diﬀerent versions

of these are based on calculating the average in diﬀerent ways. An interesting

feature of these is that their pay oﬀ depends not only on the value of the

stock at expiration, but on the entire history of the stock’s movements. We

call this feature path dependence.

Barrier options: These are options that either go into eﬀect or out of eﬀect if

34 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

the stock value passes some threshold. For example, a down and out option

is an option that expires worthless if the barrier X is reached from above

before expiration. There are also down and in, up and in, up and out, and

you are encouraged to think what these mean. These are path dependent.

Lookback options: Options whose pay-oﬀ depends on the the maximum or

minimum value of the stock over the lifetime of the option.

2.4 The basic problem: How much should I

pay for an option? Fair value

A basic notion of modern ﬁnance is that of risk. Contrary to common par-

lance, risk is not only a bad thing. Risk is what one gets rewarded for in

ﬁnancial investments. Quantifying risk and the level of reward that goes

along with a given level of risk is a major activity in both theoretical and

practical circles in ﬁnance. It is essential that every individual or corporation

be able to select and manipulate the level of risk for a given transaction. This

redistribution of risk towards those who are willing and able to assume it is

what modern ﬁnancial markets are about.

Options can be used to reduce or increase risk. Markets for options

and other derivatives are important in the sense that agents who anticipate

future revenues or payments can ensure a proﬁt above a certain level or insure

themselves against a loss above a certain level.

It is pretty clear that a prerequisite for eﬃcient management of risk is

that these derivative products are correctly valued, and priced. Thus our

basic question: How much should I pay for an option?

We make a formal distinction between value and price. The price is what

is actually charged while the value is what the thing is actually worth.

When a developer wants to ﬁgure out how much to charge for a new

house say, he calculates his costs: land, material and labor, insurance, cost

of ﬁnancing etc.. This is the houses value. The developer, wanting to make

a proﬁt, adds to this a markup. The total we call the price. Our problem is

to determine the value of an option.

In 1997, the Nobel Prize in economics was awarded to Robert Merton

and Myron Scholes, who along with Fischer Black, wrote down a solution

to this problem. Black died in his mid-ﬁfties in 1995 and was therefore

not awarded the Nobel prize. Black and Scholes wrote down there formula,

2.5. EXPECTATIONPRICING. EINSTEINANDBACHELIER, WHAT THEYKNEWABOUT DERIVATIVE PRICING. ANDWHAT THEYDIDN’T KNOW35

the Black-Scholes Formula in 1973. They had a diﬃcult time getting their

paper published. However, within a year of its publication, Hewlett Packard

had a calculator with this formula hardwired into it. Today this formula is

an indispensable tool for thousands of traders and investors to value stock

options in markets throughout the world.

Their method is quite general and can be used and modiﬁed to handle

may kinds of exotic options as well as to value insurance contracts, and even

has applications outside of ﬁnance.

2.5 Expectation pricing. Einstein and Bache-

lier, what they knew about derivative pric-

ing. And what they didn’t know

As opposed to the case of pricing a house, an option seems more like pricing

bets in a casino. That is, the amount the casino charges for a given bet

is based on models for the payoﬀs of the bets. Such models are of course

statistical/probabilistic.

As an example, suppose we play a (rather silly) game of tossing a die

and the casino pays the face value on the upside of the die in dollars. How

much should the casino charge to play this game. Well, the ﬁrst thing is

to ﬁgure out the fair value. The casino of course is ﬁguring that people

will play this game many many times. Assuming that the die is fair, (i.e.

P(X = i) = 1/6, for i = 1, 2, . . . , 6 and that each roll of the die doesn’t aﬀect

(that is independent of ) any other roll, the law of large numbers says that if

there are n plays of the game, that the average of the face values on the die

will be with probability close to 1

E(X) = 1 1/6 + 2 1/6 + + 6 1/6 = 21/6

Thus after n plays of the game, the casino feels reasonably conﬁdent that

they will be paying oﬀ something close to $21/6 n. So 21/6 is the fair valuer

per game. The casino desiring to make a proﬁt sets the actual cost to play

at 4. Not too much higher than the fair value or no would want to play, they

would lose their money too fast.

This is an example of expectation pricing. That is, the payoﬀ of some

experiment is random. One models this payoﬀ as a random variable and uses

36 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

the strong law of large numbers to argue that the fair value is the expectation

of the random variable. This works much of the time.

Some times it doesn’t. There are a number of reasons why the actual value

might not be the expectation price. In order for the law of large numbers

to kick in, the experiment has to be performed many times, in independent

trials. If you are oﬀered to play a game once, it might not be a very good

value. What’s the fair value to play Russian Roulette? As we will see below,

there are more subtle and interesting reason why the fair value is not enforced

in the market.

Thinking back to stocks and options, what would be the obvious way to

use expectation pricing to value an option. The ﬁrst order of business would

be to model the movement of the underlying stock. Unfortunately, the best

we can do here as well is to give a probabilistic model.

The French mathematician Louis Bachelier in his doctoral dissertation

around 1900 was one of the earliest to attempt to model stock movements.

He did so by what we now call a Random Walk (discrete version) or a Wiener

process (continuous version). This actually predated the use of these to model

Brownian motion, (for which Einstein got his Nobel Prize ). Bachelier’s thesis

advisor, Jacques Hadamard did not have such high regard for Bachelier’s

work.

2.5.1 Payoﬀs of derivatives

We will have a lot to say in the future about models for stock movements.

But for now, let’s see how might we use this to value an option. Let us

consider a European call option C to buy a share of XYZ stock with strike

price K at time T. Let S

t

denote the value of the stock at time t.The one

time we know the value of the option is at expiration. At expiration there

are two possibilities. If the value of the stock at expiration is less than the

strike price S

T

≤ K, we don’t exercise our option and expires worthless. If

S

T

> K then it makes sense to exercise the option, paying K for the stock,

and selling it on the market for S

T

, the whole transaction netting $S

T

−K.

Thus, the option is worth max(S

t

− K, 0). This is what we call the payoﬀ

of the derivative. It is what the derivative pays at expiration, which in this

case only depends on the value of the stock at expiration.

On your own, calculate the payoﬀ of all the derivatives deﬁned in section

2.3.

2.6. THE SIMPLE CASE OF FORWARDS. ARBITRAGE ARGUMENTS37

Assuming a probabilistic model for S

t

we would arrive at a random vari-

able S

T

describing in a probabilistic way the value of XYZ at time T. Thus

the value of the option would be the random variable max(S

T

−K, 0). And ex-

pectation pricing would say that the fair price should be E(max(S

T

−K, 0)).

(I am ignoring the time value of money, that is, I am assuming interest rates

are 0. )

Bachelier attempted to value options along these lines, unfortunately, it

turns out that expectation pricing (naively understood at least) does not

work. There is another wrinkle to option pricing. It’s called arbitrage.

2.6 The simple case of forwards. Arbitrage

arguments

I will illustrate the failure of expectation pricing with the example of a for-

ward contract. This time I will also keep track of the time value of money.

The time value of money is expressed by interest rates. If one has $B at

time t , then at time T > t it will be worth $ exp(r(T −t))B. Bonds are the

market instruments that express the time value of money. A zero-coupon

bond is a promise of a speciﬁc amount of money (the face or par value) at

some speciﬁed time in the future, the expiration date. So if a bond promises

to pay $B

T

at time T, then its current value B

t

can be expressed as B

T

discounted by some interest rate r, that is B

t

= exp(−r(T −t))B

T

. For the

meanwhile, we will assume that the interest rate r is constant. This makes

some sense. Most options are rather short lived, 3 months to a year or so.

Over this period, the interest rates do no vary so much.

Now suppose that we enter into a forward contract with another person to

buy one share of XYZ stock at time T. We want to calculate what the forward

price $F is. Thus at time T, I must buy a share of XYZ for $F from him and

he must sell it to me for $F. So at time T, the contract is worth $S

T

− F.

According to expectation pricing, and adjusting for the time value of money,

this contract should at time t be worth $ exp(r(T −t))E(S

T

−F). Since the

contract costs nothing to set up, if this is not zero, it would lead to proﬁts or

losses over the long run, by the law of large numbers. So expectation pricing

says F = E(S

T

).

On the other hand, consider the following strategy for the party who

must deliver the stock at time T. If the stock is now, at time t worth $S

t

,

38 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

he borrows the money at the risk free interest rate r, buys the stock now,

and holds the stock until time T. At time T he can just hand over the stock,

collect $F and repay the loan exp(r(T −t))S

t

. After the transaction is over

he has $(F − exp(r(T − t))S

t

. At the beginning of the contract, it is worth

nothing; no money changes hands. So he starts with a contract costing no

money to set up, and in the end has $(F −exp(r(T −t))S

t

. If this is positive,

then we have an arbitrage opportunity. That is a risk free proﬁt. No matter

how small the proﬁt is on any single transaction, one can parley this into

unlimited proﬁts by performing the transaction in large multiples. On the

other hand, what will happen in practice if this is positive. Someone else will

come along and oﬀer the same contract but with a slightly smaller F still

making a positive (and thus a potentially unlimited) positive. Therefore, as

long as $(F − exp(r(T − t))S

t

> 0 someone can come along and undercut

him and still make unlimited money. So $(F −exp(r(T −t))S

t

≤ 0

If $(F − exp(r(T − t))S

t

< 0 you can work it from the other side. That

is, the party to the contract who must buy the stock for F at time T, can

at time t short the stock (i.e. borrow it) and sell it in the market for $S

t

.

Put the money in the risk free instrument. At time T, buy the stock for $F,

return the stock, and now has exp(r(T −t))S

t

in the risk free instrument. So

again, at time t the contract costs nothing to set up, and at time T she has

$(exp(r(T −t))S

t

−F) which is positive. Again, an arbitrage opportunity. A

similar argument as in the paragraph above shows that $(exp(r(T −t))S

t

−

F) ≤ 0. So we arrive at F = exp(r(T − t))S

t

. So we see something very

interesting. The forward price has nothing to do with the expected value

of the stock at time S

T

. It only has to do with the value of the stock at

the time of the beginning of the contract. This is an example of the basic

arbitrage argument. In practice, the price arrived at by expectation pricing

F = E(S

T

) would be much higher than the price arrived at by the arbitrage

argument.

Let’s give another example of an arbitrage argument. This is also a basic

result expressing the relationship between the value of a call and of a put.

Proposition 2.6.1. (Put-Call parity) If S

t

denotes the value of a stock XYZ

at time t, and C and P denote the value of a call and put on XYZ with

the same strike price K and expiration T, then at time t, P + S

t

= C +

K exp(−r(T −t)).

Proof: By now, you should have calculated that the payoﬀ on a put is

max(K−S

T

, 0). Now form the portfolio P +S −C. That is, long a share of

2.7. ARBITRAGE AND NO-ARBITRAGE 39

stock, and a put, and short a call. Then at time T we calculate.

Case 1: S

T

≤ K. Then P +S−C = max(K−S

T

, 0)+S

T

−max(S

T

−K, 0) =

K −S

T

+S

T

−0 = K

Case 2: S

T

> K. Then P +S−C = max(K−S

T

, 0)+S

T

−max(S

T

−K, 0) =

0 +S

T

−(S

T

−K) = K

So the payoﬀ at time T from this portfolio is K no matter what. That is it’s

risk free. Regardless of what S

T

is, the portfolio is worth K at time T. What

other ﬁnancial entity has this property? A zero-coupon bond with par value

K. Hence the value of this portfolio at time t is K exp(−r(T − t)). Giving

the theorem.

Q.E.D.

2.7 Arbitrage and no-arbitrage

Already we have made arguments based on the fact that it is natural for

competing players to undercut prices as long as they are guaranteed a pos-

itive risk free proﬁt, and thus able to generate large proﬁts by carrying out

multiple transactions. This is formalized by the no arbitrage hypothesis. (We

formalize this even more in the next section.) An arbitrage is a portfolio that

guarantees an (instantaneous) risk free proﬁt. It is important to realize that

a zero-coupon bond is not an arbitrage. It is risk free to be sure, but the

growth of the risk free bond is just an expression of the time value of money.

Thus the word instantaneous in the deﬁnition is equivalent to saying that

the portfolio is risk free and grows faster than the risk-free interest rate.

Deﬁnition 2.7.1. The no-arbitrage hypothesis is the assumption that arbi-

trage opportunities do not exist.

What is the sense of this? Of course arbitrage opportunities exist. There

exist people called arbitrageurs whose job it is to ﬁnd these opportunities.

(And some of them make a lot of money doing it.)

Actually, a good discussion of this is in Malkiel’s book, along with the

eﬃcient market hypothesis. But Malkiel and others would insist that in

actual fact, arbitrage opportunities don’t really exist. But even if you don’t

buy this, it seems to be a reasonable assumption (especially in view of the

reasonable arguments we made above.) as a starting point.

40 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

In fact, the existence of arbitrageurs makes the no-arbitrage argument

more reasonable. The existence of many people seeking arbitrage opportu-

nities out means that when such an opportunity arises, it closes all the more

quickly. On publicly traded stocks the existence of arbitrage is increasingly

rare. So as a basic principle in valuing derivatives it is a good starting point.

Also, if one sees derivatives mispriced, it might mean the existence of arbi-

trage opportunities. So you have a dichotomy; you can be right (about the

no-arbitrage hypothesis), in which case your evaluation of derivative prices

is better founded, or you can make money (having found an arbitrage oppor-

tunity).

2.8 The arbitrage theorem

In this section we establish the basic connection between ﬁnance and mathe-

matics. Our ﬁrst goal is to ﬁgure out what arbitrage means mathematically

and for this we model ﬁnancial markets in the following way. We consider

the following situation. Consider N possible securities, labeled ¦1, . . . , N¦.

These may be stocks, bonds, commodities, options, etc. Suppose that now, at

time 0, these securities are selling at S

1

, S

2

, S

3

, . . . , S

N

units of account (say

dollars if you need closure). Note that this notation is diﬀerent than notation

in the rest of the notes in that subscript does not denote time but indexes the

diﬀerent securities. Consider the column vector S whose components are the

S

j

’s. At time 1, the random nature of ﬁnancial markets are represented by

the possible states of the world, which are labeled 1, 2, 3, 4, . . . , M. That is,

from time 0 to time 1 the world has changed from its current state (reﬂected

in the securities prices S), to one of M possible states. Each security, say i

has a payoﬀ d

ij

in state j. This payoﬀ reﬂects any change in the security’s

price, any dividends that are paid or any possible change to the over all worth

of the security. We can organize these payoﬀs into an N M matrix,

D =

_

_

_

_

d

1,1

d

1,2

. . . d

1,M

d

2,1

d

2,2

. . . d

2,M

. . . . . . . . . . . .

d

N,1

d

N,2

. . . d

N,M

_

_

_

_

(2.8.1)

2.8. THE ARBITRAGE THEOREM 41

Our portfolio is represented by a N 1 column matrix

θ =

_

_

_

_

θ

1

θ

2

. . .

θ

N

_

_

_

_

(2.8.2)

where θ

i

represents the number of copies of security i in the portfolio. This

can be a negative number, indicating short selling.

Now, the market value of the portfolio is θ S = θ

T

S. The pay-oﬀ of the

portfolio can be represented by the M 1 matrix D

T

θ

=

_

_

_

_

_

N

j=1

d

j,1

θ

j

N

j=1

d

j,2

θ

j

. . .

N

j=1

d

j,M

θ

j

_

_

_

_

_

(2.8.3)

where the ith row of the matrix, which we write as (D

T

θ)

i

represents the

payoﬀ of the portfolio in state i. The portfolio is riskless if the payoﬀ of the

portfolio is independent of the state of the world, i.e. (D

T

θ)

i

= (D

T

θ)

j

for

any states of the world i and j.

Let

R

n

+

= ¦(x

1

, . . . , x

n

)[x

i

≥ 0 for all i¦

and

R

n

++

= ¦(x

1

, . . . , x

n

)[x

i

> 0 for all i¦

. We write v ≥ 0 if v ∈ R

n

+

and v > 0 if v ∈ R

n

++

.

Deﬁnition 2.8.1. An arbitrage is a portfolio θ such that either

1. θ S ≤ 0 and D

T

θ > 0 or

2. θ S < 0 and D

T

θ ≥ 0

These conditions express that the portfolio goes from being worthless

(respectively, less that worthless) to being worth something (resp. at least

worth nothing), regardless of the state at time 1. In common parlance, an

arbitrage is an (instantaneous) risk free proﬁt, in other words, a free-lunch.

Deﬁnition 2.8.2. A state price vector is a M1 column vector ψ such that

ψ > 0 and S = Dψ .

42 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

Theorem 2.8.1. (Arbitrage theorem) There are no arbitrage portfolios if and

only if there is a state price vector.

Before we prove this theorem, we need an abstract theorem from linear

algebra. Let’s recall that a subset C ⊂ R

n

is called convex if the line between

any two points of C is completely contained in C. That is, for any two points

x, y ∈ C and any t ∈ [0, 1] one has

x +t(y −x) ∈ C

Theorem 2.8.2. (The separating hyperplane theorem) Let C be a compact

convex subset of R

n

and V ⊂ R

N

a vector subspace such that V ∩ C = ∅.

Then there exists a point ξ

0

∈ R

n

such that ξ

0

c > 0 for all c ∈ C and

ξ

0

v = 0 for all v ∈ V .

First we need a lemma.

Lemma 2.8.3. Let C be a closed and convex subset of R

n

not containing 0.

Then there exists ξ

0

∈ C such that ξ

0

c ≥ [[ξ

0

[[

2

for all c ∈ C.

Proof: Let r > 0 be such that B

r

(0) ∩ C ,= ∅. Here B

r

(x) denotes all those

points a distance r or less from x. Let ξ

0

∈ C ∩ B

r

(0) be an element which

minimizes the distance from the origin for elements in C ∩ B

r

(0). That is,

[[ξ

0

[[ ≤ [[c[[ (2.8.4)

for all c ∈ B

r

(0) ∩ C. Such a ξ

0

exists by the compactness of C ∩ B

r

(0).

Then it follows that (2.8.4) holds for all c ∈ C since if c ∈ C ∩ B

r

(0) then

[[ξ

0

[[ ≤ r ≤ [[c[[.

Now since C is convex, it follows that for any c ∈ C, we have ξ

0

+t(c−ξ

0

) ∈

C so that

[[ξ

0

[[

2

≤ [[ξ

0

+t(c −ξ

0

)[[

2

= [[ξ

0

[[

2

+ 2tξ

0

(c −ξ

0

) +t

2

[[c −ξ

0

[[

2

Hence

2[[ξ

0

[[

2

−t[[c −ξ

0

[[ ≤ 2ξ

0

c

for all t ∈ [0, 1]. So ξ

0

c ≥ [[ξ

0

[[

2

QED.

Proof (of the separating hyperplane theorem) Let C be compact and convex,

V ⊂ R

n

as in the statement of the theorem. Then consider

C −V = ¦c −v [ c ∈ C, v ∈ V ¦

2.8. THE ARBITRAGE THEOREM 43

This set is closed and convex. (Here is where you need C be compact.) Then

since C ∩V = ∅ we have that C −V doesn’t contain 0. Hence by the lemma

there exists ξ

0

∈ C −V such that ξ

0

x ≥ [[ξ

0

[[

2

for all x ∈ C −V . But this

implies that for all c ∈ C, v ∈ V that ξ

0

c −ξ

0

v ≥ [[ξ

0

[[

2

. Or that

ξ

0

c > ξ

0

v

for all c ∈ C and v ∈ V . Now note that since V is a linear subspace, that

for any v ∈ V , av ∈ V for all a ∈ R. Hence ξ

0

c > ξ

0

av = aξ

0

v for all a.

But this can’t happen unless ξ

0

v = 0. So we have ξ

0

v = 0 for all v ∈ V

and also that ξ

0

c > 0.

QED.

Proof (of the arbitrage theorem): Suppose S = Dψ, with ψ > 0. Let θ be a

portfolio with S θ ≤ 0. Then

0 ≥ S θ = Dψ θ = ψ D

T

θ

and since ψ > 0 this implies D

T

θ ≤ 0. The same argument works to handle

the case that S θ < 0.

To prove the opposite implication, let

V = ¦(−S θ, D

T

θ) [ θ ∈ R

N

¦ ⊂ R

M+1

Now consider C = R

+

R

M

+

. C is closed and convex. Now by deﬁnition,

there exists an arbitrage exactly when there is a non-zero point of intersection

of V and C. Now consider

C

1

= ¦(x

0

, x

1

, . . . , x

M

) ∈ C [ x

0

+ +x

M

= 1¦

C

1

is compact and convex. Hence there exists according to (2.8.2) x ∈ C

1

with

1. x v = 0 for all v ∈ V and

2. x c > 0 for all c ∈ C

1

.

Then writing x = (x

0

, x

1

, . . . , x

M

) we have that x v = 0 implies that for all

θ ∈ R

N

, (−Sθ)x

0

+D

T

θ(x

1

, . . . , x

M

)

T

= 0. So D

T

θ(x

1

, . . . , x

M

)

T

= x

0

Sθ

or θ D(x

1

, . . . , x

M

)

T

= x

0

S θ. Hence letting

ψ =

(x

1

, . . . , x

M

)

T

x

0

44 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

we have θ Dψ = S θ for all θ. Thus Dψ = S and ψ is a state price vector.

Q.E.D.

Let’s consider the following example. A market consisting of two stocks

1, 2. And two states 1, 2. Suppose that S

1

= 1 and S

2

= 1. And

D =

_

1 1

1

2

2

_

(2.8.5)

So security 1 pays oﬀ 1 regardless of state and security 2 pays of 1/2 in state

1 and 2 in state 2. Hence if

θ =

_

θ

1

θ

2

_

(2.8.6)

is a portfolio its payoﬀ will be D

T

θ

_

θ

1

+θ

2

/2

θ

1

+ 2θ

2

_

(2.8.7)

Is there a ψ with S = Dψ. One can ﬁnd this matrix easily by matrix inversion

so that

_

1

1

_

=

_

1 1

1

2

2

__

ψ

1

ψ

2

_

(2.8.8)

So that

_

ψ

1

ψ

2

_

=

_

4/3 −2/3

−1/3 2/3

__

1

1

_

=

_

2/3

1/3

_

(2.8.9)

Now we address two natural and important questions.

What is the meaning of the state price vector?

How do we use this to value derivatives?

We formalize the notion of a derivative.

Deﬁnition 2.8.3. A derivative (or a contingency claim) is a function on the

set of states.

We can write this function as an M 1 vector. So if h is a contingency

claim, then h will payoﬀ h

j

if state j is realized. To see how this compares

with what we have been talking about up until now, let us model a call on

2.8. THE ARBITRAGE THEOREM 45

security 2 in the example above. Consider a call with strike price 1

1

2

on

security 2 with expiration at time 1. Then h

j

= max(d

2,j

−1

1

2

, 0). Or

h =

_

h

1

h

2

_

=

_

0

1

2

_

(2.8.10)

So our derivatives are really nothing new here, except that it allows the

possibility of a payoﬀ depending on a combination of securities, which is

certainly a degree of generality that we might want.

To understand what the state price vector means, we introduce some

important special contingency claims. Let

j

be a contingency claim that

pays 1 if state change occurs, and 0 if any other state occurs. So

j

=

_

_

_

_

_

_

_

_

0

0

1 in the jth row

0

_

_

_

_

_

_

_

_

(2.8.11)

These are often called binary options. It is just a bet paying oﬀ 1 that state

j occurs. How much is this bet worth? Suppose that we can ﬁnd a portfolio

θ

j

such that D

T

θ

j

=

j

. Let ψ be a state price vector. Then the market

value of the portfolio θ

j

is

S θ

j

= Dψ θ

j

= ψ D

T

θ

j

= ψ

j

= ψ

j

Hence we have shown, that the market value of a portfolio whose payoﬀ is

j

is ψ

j

. So the state price vector is the market value of a bet that state j will

be realized. This is the meaning of ψ.

Now consider

B =

M

j=1

j

=

_

_

_

_

1

1

1

_

_

_

_

(2.8.12)

This is a contingency claim that pays 1 regardless of state. That is, B is

a riskless zero-coupon bond with par value 1 and expiration at time 1. Its

market value will then be R =

M

j=1

ψ

j

. So this R is the discount factor for

riskless borrowing. (you might want to think of R as (1 +r)

−1

where r is the

46 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

interest rate.) Now we can form a vector Q =

1

R

ψ. Note that Q ∈ R

++

and

M

j=1

q

j

= 1. So Q can be interpreted as a probability function on the state

space. It is important to remark that these probabilities are not the

real world probabilities that might be estimated for the states j.

They are probabilities that are derived only from the no-arbitrage

condition. They are therefore often called pseudo-probabilities.

Coming back to our problem of valuing derivatives, consider a derivative

h. If θ is a portfolio whose payoﬀ is h, i.e. D

T

θ = h, we say that θ replicates

h. Then the market value of θ is

S θ = Dψ θ = ψ D

T

θ

= ψ h = RQ h = D

M

j=1

q

j

h

j

= RE

Q

(h)

That is, the market value of θ (and therefore the derivative) is the discounted

expectation of h with respect to the pseudo-probability Q. Hence, we have

regained our idea that the value of a contingency claim should be

an expectation, but with respect to probabilities derived from only

from the no-arbitrage hypothesis, not from any actual probabilities

of the states occurring.

Let us go back to the example above. Let’s go ahead and value the option

on security number 2. R = 1 in this example, so the value of h is

RE

Q

(h) = 1 (0

2

3

+

1

2

1

3

) = 1/6

Coming back to general issues, there is one further thing we need to dis-

cuss. When is there a portfolio θ whose payoﬀ D

T

θ is equal to a contingency

claim h. For this we make the following

Deﬁnition 2.8.4. A market is complete if for each of the

j

there is a portfolio

θ

j

with D

T

θ

j

=

j

.

Now we see that if a market is complete, then for any contingency claim

h, we have that for θ =

M

j=1

h

j

θ

j

that D

T

θ = h. That is, a market is

complete if and only if every claim can be replicated.

We state this and an additional fact as a

Theorem 2.8.4. 1. A market is complete if and only if every contingency

claim can be replicated.

2.8. THE ARBITRAGE THEOREM 47

2. An arbitrage free market is complete if and only if the state price vector

is unique.

Proof: Having proved the ﬁrst statement above we prove the second. It

is clear from the ﬁrst statement that if a market is complete then D

T

:

R

N

→ R

M

is onto, that is has rank M. Thus if ψ

1

and ψ

2

are two state

price vectors, then D(ψ

1

− ψ

2

) θ = (S − S) θ = 0 for all θ. But then

0 = D(ψ

1

− ψ

2

) θ = (ψ

1

− ψ

2

) D

T

θ. Since D

T

is onto, this means that

(ψ

1

−ψ

2

) v = 0 for all v ∈ R

N

which means that ψ

1

−ψ

2

= 0.

We prove the other direction by contradiction. So assume that the market

is not complete, and ψ

1

is a state price vector. Then D

T

is not onto. So we

can ﬁnd a vector τ orthogonal to the image, i.e. D

T

θ τ = 0 for all θ. Then

θ Dτ = 0 for all θ so Dτ = 0 and ψ

2

= ψ

1

+τ is another state price vector.

QED.

48 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

Exercises

1. Consider a variant of the game described above, tossing a die and the

casino pays the face value on the upside of the die in dollars. Except

this time, if the player doesn’t like her ﬁrst toss, she is allowed one

additional toss and the casino then pays the face value on the second

toss. What is the fair value of this game?.

2. Suppose a stock is worth $S

−

just before it pays a dividend, $D. What

is the value of the stock S

+

just after the dividend? Use an arbitrage

argument.

3. Suppose that stock XY Z pays no dividends, and C is an American

call on XY Z with strike price K and expiration T. Show, using an

arbitrage argument, that it is never advantageous to exercise early, that

is, for any intermediate time t < T, C

t

> S

t

−K. (Hint: Calculate the

payoﬀ of the option as seen from the same time T in two cases: 1) If

you exercise early at time t < T, 2) If you hold it until expiration. To

ﬁnance the ﬁrst case, you borrow K at time t to buy the stock, buy the

stock, and repay the loan at time T. Calculate this portfolio at time

T. The other case you know the payoﬀ. Now compare and set up an

arbitrage if S

t

−K ≥ C

t

.)

4. Suppose that C is a call on a non-dividend paying stock S, with strike

price K and expiration T. Show that for any time t < T that C

t

>

S

t

−exp(−r(T −t))K.

5. Suppose that C and D are two calls on the same stock with the same

expiration T, but with strike prices K < L respectively. Show by an

arbitrage argument that C

t

≥ D

t

for all t < T.

6. Suppose that C and D are two calls on the same stock with the same

strike price K, but with expirations T

1

< T

2

, respectively. Show by an

arbitrage argument that D

t

≥ C

t

for all t < T

1

.

7. Using the arbitrage theorem, show that there can only be one bond

price. That is, if there are 2 securities paying $1, regardless of state,

then their market value is the same.

8. Consider the example of the market described in section (2.7). Let h

be a put on security 2 with strike price

1

2

. Calculate its market value.

2.8. THE ARBITRAGE THEOREM 49

9. Make up your own market, and show that there is a state price vector.

Consider a speciﬁc derivative (of your choice) and calculate its market

value.

50 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

Chapter 3

Discrete time

In this chapter, we proceed from the one period model of the last chapter to

the mutli-period model.

3.1 A ﬁrst look at martingales

In this section I would like to give a very intuitive sense of what a martingale

is, why they come up in option pricing, and then give an application to

calculating expected stopping times.

The concept of a martingale is one of primary importance in the modern

theory of probability and in theoretical ﬁnance. Let me begin by stating the

slogan that a martingale is a mathemical model for a fair game. In fact, the

term martingale derives from the well known betting strategy (in roulette)

of doubling ones bet after losing. First let’s think about what we mean by a

fair game.

Giorolamo Cardano, an Italian, was one of the ﬁrst mathematicians to

study the theory of gambling. In his treatise, The Book of Games of Chance

he wrote: ”The most fundamental principle of all in gambling is simply equal

conditions, for example, of opponents, of bystanders, of money, of situation,

of the dice box, and of the die itself. To the extent to which you depart from

that equality, if it is in your opponent’s favor, you are a fool, and if in your

own, you are unjust.” (From A. Hald, A History of Probability and Statistics

and their applications before 1750, John Wiley and Sons, New York.)

How do we mathematically model this notion of a fair game, that is, of

equal conditions? Consider a number of gamblers, say N, playing a gambling

51

52 CHAPTER 3. DISCRETE TIME

game. We will assume that this game is played in discrete steps called hands.

We also assume that the gamblers make equal initial wagers and the winner

wins the whole pot. It is reasonable to model each gambler as the gamblers’

current fortunes. That is, this is the only information that we really need to

talk about. Noting the gamblers’ fortunes at a single time is not so useful,

so we keep track of their fortunes at each moment of time. So let F

n

denote

fortune of one of the gamblers after the nth hand. Because of the random

nature of gambling we let F

n

be a random variable for each n. A sequence

of random variables, F

n

is what is called a stochastic process.

Now for the equal conditions. A reasonable way to insist on the equal

conditions of the gamblers is to assume that each of the gamblers’ fortune

processes are distributed (as random variables) in exactly the same way. That

is, if p

n

(x) denotes the probability density function of F

n

, and G

n

is any of

the other gamblers’ fortunes, with pdf q

n

(x), then we have p

n

(x) = q

n

(x) for

all n and x.

What are the consequences of this condition? The amount that the gam-

bler wins in hand n is the random variable ξ

n

= F

n

− F

n−1

and the equal

distribution of the gamblers’ fortunes implies that the gamblers’ wins in the

nth hand are also equally distributed. From the conditions on the gambling

game described above, we must have E(ξ

n

) = 0. Think about this. Hence

we have that

E(F

n

) = E(F

n−1

+ξ

n

) = E(F

n−1

) +E(ξ

n

) = E(F

n−1

)

So the expected fortunes of the gambler will not change over time. This is

close to the Martingale condition. The same analysis says that if the gambler

has F

n−1

after the n − 1 hand, then he can expect to have F

n−1

after the

nth hand. That is, E(F

n

[F

n−1

) = F

n−1

. (Here E(F

n

[F

n−1

) = F

n−1

denotes

a conditional expectation, which we explain below.) This is closer to the

Martingale condition which says that E(F

n

[F

n−1

, F

n−1

, , F

0

) = F

n−1

.

O Romeo, Romeo! wherefore art thou Romeo?

Deny thy father and refuse thy name;

Or, if thou wilt not, be but sworn my love,

And I’ll no longer be a Capulet.

-Juliet

3.1. A FIRST LOOK AT MARTINGALES 53

Now let me describe a very pretty application of the notion of martingale

to a simple problem. Everyone has come accross the theorem in a basic

probability course that a monkey typing randomly at a typewriter with a

probability of 1/26 of typing any given letter (we will always ignore spaces,

capitals and punctuation) and typing one letter per second will eventually

type the complete works of Shakespeare with probability 1. But how long

will it take. More particularly, let’s consider the question of how long it will

take just to type: ”O Romeo, Romeo”. Everyone recognizes this as Juliet’s

line in Act 2 scene 2 of Shakespeare’s Romeo and Juliet. Benvloio also says

this in Act 3 scene 1. So the question more mathematically is what is the

expected time for a monkey to type ”O Romeo Romeo”, ignoring spaces

and commas. This seems perhaps pretty hard. So let’s consider a couple of

similar but simpler questions and also get our mathematical juices going.

1. Flipping a fair coin over and over, what is the expected number of times

it takes to ﬂip before getting a Heads and then a Tails: HT ?

2. Flipping a fair coin over and over, what is the expected number of times

it takes to ﬂip two Heads in a row: HH ?

We can actually calculate these numbers by hand. Let T

1

denote the random

variable of expected time until HT and T

2

the expected time until HH.

Then we want to calculate E(T

1

) and E(T

2

). If T denotes one of these RV’s

(random variables) then

E(T) =

∞

n=1

nP(T = n)

We then have

P(T

1

= 1) = 0

P(T

1

= 2) = P(¦HT¦) = 1/4

P(T

1

= 3) = P(¦HHT, THT¦) = 2/8

P(T

1

= 4) = P(¦HHHT, THHT, TTHT¦) = 3/16

P(T

1

= 5) = P(¦HHHHT, THHHT,

TTHHT, TTTHT¦) = 4/32

P(T

1

= 6) = P(¦HHHHHT, THHHHT,

TTHHHT, TTTHHT, TTTTHT¦) = 5/64

(3.1.1)

54 CHAPTER 3. DISCRETE TIME

We can see the pattern P(T

1

= n) =

n−1

2

n

and so

E(T

1

) =

∞

n=2

n(n −1)/2

n

. To sum this, let’s write down the ”generating function”

f(t) =

∞

n=1

n(n −1)t

n

so that E(T

1

) = f(1/2) To be able to calculate with such generating functions

is a useful skill. First not that the term n(n −1) is what would occur in the

general term after diﬀerentiating a power series twice. On the other hand,

this term sits in front of t

n

instead of t

n−2

so we need to multiply a second

derivative by t

2

. At this point, we make some guesses and then adjustments

to see that

f(t) = t

2

(

∞

n=0

t

n

)

**so it is easy to see that
**

f(t) =

2t

2

(1 −t)

3

and that E(T

1

) = f(1/2) = 4.

So far pretty easy and no surprises. Let’s do E(T

2

). Here

P(T

2

= 1) = 0

P(T

2

= 2) = P(¦HH¦) = 1/4

P(T

2

= 3) = P(¦THH¦) = 1/8

P(T

2

= 4) = P(¦T −THH, H −THH¦ = 2/16

P(T

2

= 5) = P(¦TT −THH, HT −THH, TH −THH¦) = 3/32

P(T

2

= 6) = P(¦TTT −THH, TTH −THH, THT −THH,

HTT −THH, HTH −THH¦) = 5/64

P(T

2

= 7) = P(¦TTTT −THH, TTTH −THH, TTHT −THH,

THTT −THH, HTTT −THH, THTH −THH,

HTHT −THH, HTTH −THH¦ = 8/128

(3.1.2)

Again the pattern is easy to see: P(T

2

= n) = f

n−1

/2

n

where f

n

denotes

the nth Fibonacci number (Prove it!).(Recall that the Fibonacci numbers are

3.1. A FIRST LOOK AT MARTINGALES 55

deﬁned by the recursive relationship, f

1

= 1, f

2

= 1 and f

n

= f

n−1

+ f

n−2

.)

So

E(T

2

) =

∞

n=2

nf

n−1

/2

n

Writing g(t) =

∞

n=1

nf

n−1

t

n

we calculate in a similar manner (but more

complicated) that

g(t) =

2t

2

−t

3

(1 −t −t

2

)

2

and that E(T

2

) = g(1/2) = 6. A little surprising perhaps.

What is accounting for the fact that HH takes longer? A simple answer

is that if we are awaiting the target string (HT or HH) then if we succeed

to get the ﬁrst letter at time n but then fail at time n + 1 to get the second

letter, the earliest we can succeed after that is n + 2 in the case of HT but

is n + 3 in the case of HH.

The same kind of analysis could in theory be done for ”OROMEOROMEO”

but it would probably be hopelessly complicated.

To approach ”OROMEOROMEO” let’s turn it into a game. Let T denote

the random variable giving me the expected time until ”OROMEOROMEO”.

I walk into a casino and sitting at a table dressed in the casino’s standard

uniform is a monkey in front of a typewriter. People are standing around the

table making bets on the letters of the alphabet. Then the monkey strikes

a key at random (it is a fair monkey) and the people win or lose according

to whether their bet was on the letter struck. If their letter came up, they

are paid oﬀ at a rate of 26 to 1. The game then continues. For simplicity,

assume that a round of monkey typing takes exactly one minute. Clearly the

odds and the pay-oﬀ are set up to yield a ”fair game”: this is an example of

a martingale. I know in my mathematical bones that I can not expect to do

better on average than to break even. That is:

Theorem 3.1.1. No matter what strategy I adopt, on average, I can never do

better than break even.

Eventually this will be a theorem in martingale theory (and not a very

hard one at that). Together a group of gamblers conspire together on the

following betting scheme. On the ﬁrst play, the ﬁrst bettor walks up to

the table and bets one Italian Lira (monkey typing is illegal in the United

States) ITL 1 that the ﬁrst letter is an ”O”. If he wins, he gets ITL 26 and

bets all ITL 26 on the second play that the second letter is an ”R”. If he

56 CHAPTER 3. DISCRETE TIME

wins he bets all ITL 26

2

that on the third play the letter is an ”O”. Of

course if at some point he loses, all the money is conﬁscated and he walks

away with a net loss of the initial ITL 1. Now in addition to starting the

above betting sequence on the ﬁrst play, the second bettor begins the same

betting sequence starting with the second play, that is, regardless of what

has happened on the ﬁrst play, she also bets ITL 1 that the second play will

be an O, etc. One after another, the bettors (there are a lot of them) initiate

a new betting sequence at each play. If at some point one of the bettors gets

through ”OROMEOROMEO”, the game ends, and the conspirators walk

away with their winnings and losses which they share. Then they calculate

their total wins and losses.

Again, because the game is fair, the conspirators expected wins is going

to be 0. On the other hand, let’s calculate the amount they’ve won once

”OROMEOROMEO” has been achieved. Because they got through, the

betting sequence beginning ”OROMEOROMEO” has yielded ITL 26

11

. In

addition to this though, the sequence starting with the ﬁnal ”O” in the ﬁrst

ROMEO and continuing through the end has gone undefeated and yields

ITL 26

6

. And ﬁnally the ﬁnal ”O” earns an additional ITL 26. On the other

hand, at each play, ITL 1 the betting sequence initiated at that play has been

defeated, except for those enumerated above. Each of these then counts as

a ITL 1 loss. How many such losses are there? Well, one for each play of

monkeytyping. Therefore, their net earnings will be

26

11

+ 26

6

+ 26 −T

Since the expected value of this is 0, it means that the expected value E(T) =

26

11

+ 26

6

+ 26. A fair amount of money even in Lira.

Try this for the ”HH” problem above to see more clearly how this works.

3.2 Introducing stochastic processes

A stochastic process which is discrete in both space and time is a sequence of

random variables, deﬁned on the same sample space (that is they are jointly

distrubuted) ¦X

0

, X

1

, X

2

, . . .¦ such that the totality of values of all the X

n

’s

combined form a discrete set. The diﬀerent values that the X

n

’s can take

are called the states of the stochastic process. The subscript n in X

n

is to be

thought of as time.

3.2. INTRODUCING STOCHASTIC PROCESSES 57

We are interested in understanding the evolution of such a process. One

way to describe such a thing is to consider how X

n

depends on X

0

, X

1

, . . . , X

n−1

.

Example 3.2.1. Let W

n

denote the position of a random walker. That is, at

time 0, W

0

= 0 say and at each moment in time, the random walker has a

probability p of taking a step to the left and a probability 1 − p of taking

a step to the right. Thus, if at time n, W

n

= k then at time n + 1 there is

probability p of being at k − 1 and a probability 1 − p of being at k + 1 at

time n + 1. We express this mathematically by saying

P(W

n+1

= j[W

n

= k) =

_

_

_

p if j = k −1

1 −p if j = k + 1

0 otherwise

(3.2.1)

In general, for a stochastic process X

n

we might specify

P(X

n+1

= j[X

n

= k

n

, X

n−1

= k

n−1

, , X

0

= k

0

)

If as in the case of a random walk

P(X

n+1

= j[X

n

= k

n

, X

n−1

= k

n−1

, , X

0

= k

0

) = P(X

n+1

= j[X

n

= k

n

)

the process is called a Markov process. We write

p

jk

= P(X

n+1

= k[X

n

= j)

The numbers p

jk

form a matrix (an inﬁnite by inﬁnite matrix if the random

variables X

n

are inﬁnite), called the matrix of transition probabilities. They

satisfy (merely by virtue of being probabilities )

1. p

jk

≥ 0

2.

k

p

jk

= 1 for each j.

The matrix p

jk

as we’ve deﬁned it tells us how the probabilities change from

the nth to the n + 1st step. So p

jk

a priori depends on n. If it doesn’t,

then the process is called a stationary Markov process. Let’s reiterate: p

jk

represents the probability of going from state j to state k in one time step.

Thus, if X

n

is a stationary Markov process, once we know the p

jk

the only

information left to know is the initial probability distribution of X

0

, that is

p

0

(k) = P(X

0

= k). The basic question is: Given p

0

and p

jk

what is the

58 CHAPTER 3. DISCRETE TIME

probability distribution of X

n

? Towards this end we introduce the ”higher”

transition probablilities

p

(n)

jk

= P(X

n

= k[X

0

= j)

which by stationarity also equals P(X

m+n

= k[X

m

= j) the probability of

going from j to k in n steps.

Proposition 3.2.1.

p

(n)

jk

=

h

p

(n−1)

jh

p

hk

Proof:

p

(n)

jk

= P(X

n

= k[X

0

= j)

Now a process that goes from state j to state k in n steps, must land some-

where on the n −1st step. Suppose it happens to land on the state h. Then

the probability

P(X

n

= k[X

n−1

= h) = P(X

1

= k[X

0

= h) = p

hk

by stationarity. Now the probability of going from state j to state h in n−1

steps and then going from state h to state k in one step is

P(X

n−1

= h[X

0

= j)P(X

n

= k[X

n−1

= h) = p

(n−1)

jh

p

hk

Now the proposition follows by summing up over all the states one could

have landed upon on the n −1st step. Q.E.D.

It follows by repeated application of the preceding proposition that

p

(n+m)

jk

=

h

p

(n)

jh

p

(m)

hk

This is called the Chapman-Kolmogorov equation. Expressed in terms of

matrix mulitplication, it just says that if P = p

jk

denotes the matrix of

transition probabilities from one step to the next, then P

n

the matrix raised

to the nth power represents the higher transition probabilities and of course

P

n+m

= P

n

P

m

.

Now we can answer the basic question posed above. If P(X

0

= j) = p

0

(j)

is the initial probability distribution of a stationary Markov process with

transition matrix P = p

jk

then

P(X

n

= k) =

j

p

0

(j)p

(n)

jk

3.2. INTRODUCING STOCHASTIC PROCESSES 59

3.2.1 Generating functions

Before getting down to some examples, I want to introduce a technique for

doing calculations with random variables. Let X be a discrete random vari-

able whose states are any integer j and has distribution p(k). We introduce

its generating function

F

X

(t) =

k

p(k)t

k

At this point we do not really pay attention to the convergence of F

X

, we will

do formal manipulations with F

X

and therefore F

X

is called a formal power

series (or more properly a formal Laurent series since it can have negative

powers of t). Here are some of the basic facts about the generating function

of X;

1. F

X

(1) = 1 since this is just the sum over all the states of the probabil-

ities, which must be 1.

2. E(X) = F

X

(1). Since F

X

(t) =

k

kp(k)t

k−1

. Now evaluate at 1.

3. Var(X) = F

X

(1) + F

X

(1) − (F

X

(1))

2

. This is an easy calculation as

well.

4. If X and Y are independent random variables, then F

X+Y

(t) = F

X

(t)F

Y

(t).

Before going on, let’s use these facts to do some computations.

Example 3.2.2. Let X be a Random variable representing a Bernoulli trial,

that is P(X = −1) = p, P(X = 1) = 1 − p and P(X = k) = 0 otherwise.

Thus F

X

= pt

−1

+ (1 −p)t. Now a Random walk can be described by

W

n

= X

1

+ +X

n

where each X

k

is an independent copy of X: X

k

represents taking a step to

the left or right at the kth step. Now according to fact 4) above, since the

X

k

are mutually independent we have

F

Wn

= F

X

1

F

X

2

F

Xn

= (F

X

)

n

= (pt

−1

+ (1 −p)t)

n

=

n

j=0

(

n

j)(pt

−1

)

j

((1 −p)t)

n−1

60 CHAPTER 3. DISCRETE TIME

=

n

j=0

(

n

j)p

j

(1 −p)

n−j

t

n−2j

Hence the distribution for the random walk W

n

can be read oﬀ as the co-

eﬃcients of the generating function. So P(W

n

= n − 2j) = (

n

j)p

j

(1 − p)

n−j

for j = 0, 1, 2, . . . , n and is zero for all other values. (Note: This is our

third time calculating this distribution.) Also, E(W

n

) = F

Wn

(1). F

Wn

(t) =

d

dt

(pt

−1

+ (1 − p)t)

n

= n(pt

−1

+ (1 − p)t)

n−1

((1 − p) − pt

−2

) and evaluating

at 1 gives n(p + (1 −p))((1 −p) −p) = n((1 −p) −p). Similarly, F

Wn

= n(n−1)(pt

−1

+(1 −p)t)

n−2

((1 −p) −pt

−2

)

2

+n(pt

−1

+(1 −p)t)

n−1

(2pt

−3

)

and evaluating at 1 gives n(n −1)((1 −p) −p)

2

+n2p. So Var(W

n

)

= F

Wn

(1) +F

Wn

(1) −F

Wn

(1)

2

= n(n −1)((1 −p) −p)

2

+ 2np +n((1 −p) −p) −n

2

((1 −p) −p)

2

= −n((1 −p) −p)

2

+ 2np +n((1 −p) −p)

= n((1 −p) −p)(1 −((1 −p) −p)) + 2np = 2pn((1 −p) −p) + 2np

= 2np(1 −p) −2np

2

+ 2np = 2np(1 −p) + 2np(1 −p) = 4np(1 −p)

Consider the following experiment. Roll two dice and read oﬀ the sum

on them. Then ﬂip a coin that number of times and count the number of

heads. This describes a random variable X. What is its distribution? We

can describe X mathematically as follows. Let N denote the random variable

of the sum on two dice. So according to a calculation in a previous set of

notes, P(N = n) =

1

36

(6 −[7 −n[) for n = 2, 3, ...12 and is 0 otherwise. Also,

let Y count the number of heads in one toss, so that P(Y = k) =

1

2

if k = 0

or 1 and is 0 otherwise. Hence our random variable X can be written as

X =

N

j=1

Y

j

(3.2.2)

where Y

j

are independent copies of Y and note that the upper limit of the

sum is the random variable N. This is an example of a composite random

variable, that is, in general a composite random variable is one that can

be expressed by (3.2.2)where N and Y are general random variables. The

generating function of such a composite is easy to calculate.

3.2. INTRODUCING STOCHASTIC PROCESSES 61

Proposition 3.2.2. Let X be the composite of two random variables N and

Y . Then F

X

(t) = F

N

(F

Y

(t)) = F

N

◦ F

Y

.

Proof:

P(X = k) =

∞

n=0

P(N = n)P(

n

j=1

Y

j

= k)

Now, F

P

n

j=1

Y

j

= (F

Y

(t))

n

since the Y

j

’s are mutually independent. Therefore

F

X

=

∞

n=1

P(N = n)(F

Y

(t))

n

= F

N

(F

Y

(t))

Q.E.D.

So in our example above of rolling dice and then ﬂipping coins, F

N

=

1

n=2

2

1

36

(6−[7−n[)t

n

and F

Y

(t) = 1/2+1/2t. Therefore F

X

(t) =

12

n=2

1

36

(6−

[7 −n[)(1/2 + 1/2t)

n

from which one can read oﬀ the distribution of X.

The above sort of compound random variables appear naturally in the

context of branching processes. Suppose we try to model a population (of

amoebae, of humans, of subatomic particles) in the following way. We denote

the number of members in the nth generation by X

n

. We start with a single

individual in the zeroth generation. So X

0

= 1. This individual splits into

(or becomes) X

1

individuals (oﬀspring) in the ﬁrst generation. The number

of oﬀspring X

1

is a random variable with a certain distribution P(X

1

= k) =

p(k). p(0) is the probability that X

0

dies. p(1) is the probability that X

0

has

one descendent, p(2), 2 descendants etc. Now in the ﬁrst generation we have

X

1

individuals, each of which will produce oﬀspring according to the same

probability distribution as its parent. Thus in the second generation,

X

2

=

X

1

n=1

X

1,n

where X

1,n

is an independent copy of X

1

. That is, X

2

is the compound

random variable of X

1

and itself. Continuing on, we ask that in the nth

generation, each individual will produce oﬀspring according to the same dis-

tribution as its ancestor X

1

. Thus

X

n

=

X

n−1

n=1

X

1,n

62 CHAPTER 3. DISCRETE TIME

Using generating functions we can calculate the distribution of X

n

. So we

start with F

X

1

(t). Then by the formula for the generating function of a

compound random variable,

F

X

2

(t) = F

X

1

(F

X

1

(t))

and more generally,

F

Xn

= F

X

1

◦ n- times ◦ F

X

1

Without generating functions, this expression would be quite diﬃcult.

3.3 Conditional expectation

We now introduce a very subtle concept of utmost importance to us: condi-

tional expectation. We will begin by deﬁning it in its simplest incarnation:

expectation of a random variable conditioned on an event. Given an event

A, deﬁne its indicator to be the random variable

1

A

=

_

1 if ω ∈ A,

0 if ω / ∈ A.

_

Deﬁne

E(X[A) =

E(X 1

A

)

P(A)

.

In the discrete case, this equals

x

xP(X = x[A)

and in the continous case _

xp(x[A) dx

Some basic properties of conditional expectation are

Proposition 3.3.1. (Properties of conditional expectation)

1. E(X +Y [A) = E(X[A) +E(Y [A).

2. If X is a constant c on A, then E(XY [A) = cE(Y [A).

3.3. CONDITIONAL EXPECTATION 63

3. If B =

n

i=1

A

i

, then

E(X[B) =

n

i=1

E(X[A

i

)

P(A

i

)

P(B)

(Jensen’s inequality) A function f is convex if for all t ∈ [0, 1] and x < y f

satisﬁes the inequality

f((1 −t)x +ty) ≤ (1 −t)f(x) +tf(y)

Then for f convex, E(f(X)) ≥ f(E(X)) and E(f(X)[A) ≥ f(E(X[A)) .

If X and Y are two random variables deﬁned on the same sample space,

then we can talk of the particular kind of event ¦Y = j¦. Hence it makes

sense to talk of E(X[Y = j). There is nothing really new here. But a more

useful way to think of the conditional expectation of X conditioned on Y is as

a random variable. As j runs over the values of the variable Y , E(X[Y = j)

takes on diﬀerent values. These are the values of a new random variable.

Thus we make the following

Deﬁnition 3.3.1. Let X and Y be two random variables deﬁned on the

same sample space. Deﬁne a random variable called E(X[Y ) and called the

expectation of X conditioned on Y . The values of this random variable are

E(X[Y = j) as j runs over the values of Y . Finally

P[E(X[Y ) = e] =

{j|E(X|Y =j)=e}

P[Y = j]

More generally, if Y

0

, Y

1

, . . . , Y

n

are a collection of random variables, we

may deﬁne E(X[Y

0

= y

0

, Y

1

= y

1

, . . . , Y

n

= y

n

) and E(X[Y

0

, , Y

1

, . . . , Y

n

).

Some basic properties of the random variable E(X[Y ) are listed in the

following proposition.

Proposition 3.3.2. Let X, Y be two random variables deﬁned on the same

sample space. Then

1. (Tower property) E(E(X[Y )) = E(X).

2. If X, Y are independent, then E(X[Y ) = E(X).

3. Letting Z = f(Y ), then E(ZX[Y ) = ZE(X[Y )

64 CHAPTER 3. DISCRETE TIME

Proof:

1.

P(E(X[Y ) = e) =

{j|E(X|Y =j)=e}

P(Y = j)

Then

E(E(X[Y )) =

e

eP(E(X[Y ) = e)

=

j

E(X[Y = j)P(Y = j)

=

j

(

k

kP(X = k[Y = j)) P(Y = j)

=

j

k

k(P((X = k) ∩ (Y = j))/P(Y = j))P(Y = j)

=

j

k

kP((X = k) ∩ (Y = j)) =

k

j

kP((X = k) ∩ (Y = j))

=

k

kP((X = k) ∩

_

j

Y = j

_

) =

k

kP(X = k) = E(X)

(3.3.1)

2. If X, Y are independent, then E(X[Y ) = E(X). So

E(X[Y = j) =

k

kP(X = k[Y = j)

=

k

kP(X = k) = E(X).

(3.3.2)

This is true for all j so the conditional expectation if a constant.

3. We have

E(ZX[Y = j) =

k,l

klP((Z = k) ∩ (X = l)[Y = j)

=

k,l

klP((f(Y ) = k) ∩ (X = l)[Y = j)

= f(j)

l

lP(X = l[Y = j) = ZE(X[Y = j).

(3.3.3)

Q.E.D.

We also have an analogous statement for conditioning on a set of variables.

Proposition 3.3.3. Let X, Y

0

, . . . , Y

n

be random variables deﬁned on the

same sample space. Then

1. (Tower property) E(E(X[Y

0

, . . . , Y

n

)) = E(X).

2. If X is independent of Y

0

, . . . , Y

n

, then E(X[Y

0

, . . . , Y

n

) = E(X).

3. Letting Z = f(Y

0

, . . . , Y

n

), then E(ZX[Y

0

, . . . , Y

n

) = ZE(X[Y

0

, . . . , Y

n

)

Proof: See exercises.

3.4. DISCRETE TIME FINANCE 65

3.4 Discrete time ﬁnance

Now let us return to the market. If S

n

denotes the value of a security at

time n, is S

n

a Martingale? There are two reasons why not. The ﬁrst reason

is the time value of money. If S

n

were a Martingale, no one would buy it.

You might as well just put your money under your mattress. What about

the discounted security process? This would turn a bank account into a

Martingale. But any other security would not be. If the discounted stock

value were a Martingale, you would expect to make as much money as in

your bank account and people would just stick their money in the bank. The

second reason why the discounted stock price process is not a Martingale is

the non-existence of arbitrage opportunities.

Let’s go back to our model of the market from the last chapter. Thus we

have a vector of securities S and a pay-oﬀ matrix D = (d

ij

) where security i

pays oﬀ d

ij

in state j. In fact, let’s consider a very simple market consisting of

two securities, a bond S

1

and a stock S

2

. Then the value of the stock at time

1 is d

2j

if state j occurs. Under the no-arbitrage condition, there is a state

price vector ψ so that S = Dψ. Setting R =

j

ψ

j

and q

j

= ψ

j

/R as in the

last chapter, we have

j

q

j

= 1 and the q

j

’s determine a probability Q on the

set of states. Now the value of the stock at time 1 is a random variable, which

we will call d

2

with respect to the probabilities Q, thus, P(d

2

= d

2j

) = q

j

.

The expected value with respect to Q of the stock at time 1 is

E

Q

(d

2

) =

j

d

2,j

q

j

= (D(ψ/R))

2

= (S/R)

2

= S

2

/R

In other words, E(Rd

2

) = S

2

. That is, the expected discounted stock price

at time 1 is the same as the value at time 0, with respect to the probabilities

Q which were arrived at by the no-arbitrage assumption. This is the ﬁrst

step to ﬁnding Q such that the discounted stock prices are martingales. This

is the basic phenomena that we will take advantage of in the rest of this

chapter to value derivatives.

We’ve seen that the no-arbitrage hypothesis led to the existence of a state

price vector, or to what is more or less the same thing, to a probability on the

set of states such that the expected value of the discounted price of the stock

at time 1 is the same as the stock price at time 0. The arbitrage theorem

of last chapter only gives us an existence theorem. It does not tell us how

to ﬁnd Q in actual practice. For this we will simplify our market drastically

and specify a more deﬁnite model for stock movements.

66 CHAPTER 3. DISCRETE TIME

Our market will consist of a single stock S and a bond B.

Let us change notation a little. Now, at time 0, the stock is worth S

0

and

the bond is worth B

0

. At time 1, there will be only two states of nature, u

(up) or d (down). That is, at time 1 δt, the stock will either go up to some

value S

u

1

or down to S

d

1

. The bond will go to e

rδt

B

0

.

S

u

1

¸

S

0

, B

0

→ e

rδt

B

0

¸

S

d

1

(3.4.1)

According to our picture up there we need to ﬁnd a probability Q such

that E

Q

(e

−rδt

S

1

) = S

0

. Let q be the probability that the stock goes up. Then

we want

S

0

= E

Q

(e

−rδt

S

1

) = qe

−rδt

S

u

1

+ (1 −q)e

−rδt

S

d

1

Thus we need

q(e

−rδt

S

u

1

−e

−rδt

S

d

1

) +e

−rδt

S

d

1

= S

0

So

q =

e

rδt

S

0

−S

d

1

S

u

1

−S

d

1

1 −q =

S

u

1

−e

rδt

S

0

S

u

1

−S

d

1

(3.4.2)

If h is a contingency claim on S, then according to our formula of last

chapter, the value the derivative is

V (h) = RE

Q

(h(S

1

))

= e

−rδt

(qh(S

u

1

) + (1 −q)h(S

d

1

))

= e

−rδt

__

e

rδt

S

0

−S

d

1

S

u

1

−S

d

1

_

h(S

u

1

) +

_

S

u

1

−e

rδt

S

0

S

u

1

−S

d

1

_

h(S

d

1

)

_

= e

−rδt

_

h(S

u

1

)

_

e

rδt

S

0

−S

d

1

S

u

1

−S

d

1

_

+h(S

d

1

)

_

S

u

1

−e

rδt

S

0

S

u

1

−S

d

1

__

.

Let’s rederive this based on an ab initio arbitrage argument for h.

Let’s try to construct a replicating portfolio for h, that is, a portfolio

consisting of the stock and bond so that no matter whether the stock goes

3.4. DISCRETE TIME FINANCE 67

up or down, the portfolio is worth the same as the derivative. Thus, suppose

at time 0 we set up a portfolio

θ =

_

φ

ψ

_

(3.4.3)

φ units of stock S

0

and ψ units of bond B

0

.

At time δt, the value of this portfolio is worth either φS

u

1

+ψe

rδt

B

0

if the

state is u, or φS

d

1

+ ψe

rδt

B

0

if the state is d. We want this value to be the

same as h(S

1

). So we solve for φ and ψ:

φS

u

1

+ψe

rδt

B

0

= h(S

u

1

)

φS

d

1

+ψe

rδt

B

0

= h(S

d

1

)

(3.4.4)

Subtracting,

φ(S

u

1

−S

d

1

) = h(S

u

1

) −h(S

d

1

).

So we ﬁnd

φ =

h(S

u

1

) −h(S

d

1

)

S

u

1

−S

d

1

while

ψ =

h(S

u

1

) −φS

u

1

e

rδt

B

0

=

h(S

u

1

) −

h(S

u

1

)−h(S

d

1

)

S

u

1

−S

d

1

S

u

1

e

rδt

B

0

=

h(S

u

1

)S

u

1

−h(S

u

1

)S

d

1

−h(S

u

1

)S

u

1

+h(S

d

1

)S

u

1

e

rδt

B

0

(S

u

1

−S

d

1

)

=

h(S

u

1

)S

u

1

−h(S

u

1

)S

d

1

e

rδt

B

0

(S

u

1

−S

d

1

)

Thus, the value of the portfolio at time 0 is

φS

0

+ψB

0

=

h(S

u

1

)−h(S

d

1

)

S

u

1

−S

d

1

S

0

+

h(S

d

1

)S

u

1

−h(S

u

1

)S

d

1

e

rδt

B

0

(S

u

1

−S

d

1

)

B

0

=

_

e

rδt

(h(S

u

1

)−h(S

d

1

))

e

rδt

(S

u

1

−S

d

1

)

_

S

0

+

h(S

d

1

)S

u

1

−h(S

u

1

)S

d

1

e

rδt

(S

u

1

−S

d

1

)

= e

−rδt

_

h(S

u

1

)

_

e

rδt

S

0

−S

d

1

S

u

1

−S

d

1

_

+h(S

d

1

)

_

S

u

1

−e

rδt

S

0

S

u

1

−S

d

1

__

.

(3.4.5)

68 CHAPTER 3. DISCRETE TIME

Since the value of this portfolio and the derivative are the same at time 1, by

the no-arbitrage hypothesis, they must be equal at time 0. So we’ve arrived

the value of the derivative by a somewhat diﬀerent argument.

Now we move on to a multi-period model. We are interested in deﬁning a

reasonably ﬂexible and concrete model for the movement of stocks.

We now want to divide the time period from now, time 0 to some horizon T,

which will often be the expiration time for a derivative, into N periods, each

of length δt so that Nδt = T.

One Step Redo with δt .

S

u

δt

q

¸

S

0

1−q

¸

S

d

δt

(3.4.6)

q =

S

0

e

rδt

−S

d

δt

S

u

δt

−S

d

δt

, 1 −q =

S

u

δt

−S

0

e

rδt

S

u

δt

−S

d

δt

Then for a contigency claim h we have V (h) = e

−rδt

E

Q

(h).

Two Steps

•

uu

q

u

1

¸

•

u

1−q

u

1

→ •

ud

q

0

¸

•

1−q

0

¸

•

d

1−q

d

1

→ •

du

q

d

1

¸

•

dd

(3.4.7)

q

d

1

=

e

rδt

S

d

1δt

−S

dd

2δt

S

du

2δt

−S

dd

2δt

, 1 −q

d

1

=

S

du

2δt

−e

rδt

S

d

1δt

S

du

2δt

−S

dd

2δt

q

u

1

=

e

rδt

S

u

1δt

−S

ud

2δt

S

uu

2δt

−S

ud

2δt

, 1 −q

u

1

=

S

uu

2δt

−e

rδt

S

u

1δt

S

uu

2δt

−S

ud

2δt

3.4. DISCRETE TIME FINANCE 69

q

0

=

e

rδt

S

0

−S

d

1δt

S

u

1δt

−S

d

1δt

, 1 −q

0

=

S

u

1δt

−e

rδt

S

0

S

u

1δt

−S

d

1δt

Suppose h is a contingency claim, and that we know h(S

2δt

). Then we

have

h(S

u

1δt

) = e

−rδt

(q

u

1

h(S

uu

2δt

) + (1 −q

u

1

)h(S

ud

2δt

)),

h(S

d

1δt

) = e

−rδt

(q

d

1

h(S

du

2δt

) + (1 −q

d

1

)h(S

dd

2δt

)).

Finally,

h(S

0

) = e

−rδt

(q

0

h(S

u

1δt

) + (1 −q

0

)h(S

d

1δt

))

= e

−rδt

(q

0

(e

−rδt

(q

u

1

h(S

uu

2δt

)+(1−q

u

1

)h(S

ud

2δt

)))+(1−q

0

)(e

−rδt

(q

d

1

h(S

du

2δt

)+(1−q

d

1

)h(S

dd

2δt

)))

= e

−2rδt

(q

0

q

u

1

h(S

uu

2δt

)+q

0

(1−q

u

1

)h(S

ud

2δt

)+(1−q

0

)q

d

1

h(S

du

2δt

)+(1−q

0

)(1−q

d

1

)h(S

dd

2δt

)).

Let’s try to organize this into an N-step tree.

First, let’s notice what the coeﬃcients of h(S

••

2δt

) are. q

0

q

u

1

represents

going up twice. q

0

(1 − q

u

1

) represents going up, then down, etc.. That is,

these coeﬃcients are the probability of ending up at a particular node of the

tree. Thus, if we deﬁne a sample space

Ψ = ¦uu, ud, du, dd¦

P(uu) = q

0

q

u

1

, P(ud) = q

0

(1−q

u

1

), P(du) = (1−q

0

)q

d

1

, P(dd) = (1−q

0

)(1−q

d

1

).

We check that this is a probability on the ﬁnal nodes of the two step tree:

P(uu) +P(ud) +P(du) +P(dd)

= q

0

q

u

1

+q

0

(1 −q

u

1

) + (1 −q

0

)q

d

1

+ (1 −q

0

)(1 −q

d

1

)

= q

0

(q

u

1

+ 1 −q

u

1

) + (1 −q

0

)(q

d

1

+ 1 −q

d

1

) = q

0

+ 1 −q

0

= 1.

Now h(S

2

) is a random variable depending on w

0

w

1

∈ Ψ. Hence we have

V (h) = e

−2rδt

E

Q

(h(S

2

)).

So what’s the general case?

For N steps, the sample space is Ψ = ¦w

0

w

1

. . . w

N−1

[w

i

= u or d¦.

P(w

0

w

1

. . . w

N−1

) =

N−1

j=0

P(w

j

)

70 CHAPTER 3. DISCRETE TIME

where

P(w

0

) =

_

q

0

if w

0

= u

1 −q

0

if w

0

= d

_

and more generally

P(w

j

) = (

q

w

0

···w

j−1

1

if w

j

= u

1 −q

w

0

···w

j−1

1

ifw

j

= d

V (h) = e

−rNδt

E

Q

(h(S

N

))

3.5 Basic martingale theory

We will now begin our formal study of martingales.

Deﬁnition 3.5.1. Let M

n

be a stochastic process. We say M

n

is a Martingale

if

1. E([M

n

[) < ∞, and

2. E(M

n+1

[M

n

= m

n

, M

n−1

= m

n−1

, . . . , M

0

= m

0

) = m

n

.

Noting that E(M

n+1

[M

n

= m

n

, M

n−1

= m

n−1

, . . . , M

0

= m

0

) are the

values of a random variable which we called E(M

n+1

[M

n

, M

n−1

, . . . , M

0

) we

may write the condition 2. Above as E(M

n+1

[M

n

, M

n−1

, . . . , M

0

) = M

n

.

We also deﬁne a stochastic process M

n

to be a supermartingale if

1. E([M

n

[) < ∞, and

2. E(M

n+1

[M

n

= m

n

, M

n−1

= m

n−1

, . . . , M

0

= m

0

) ≤ m

n

(or equiva-

lently E(M

n+1

[M

n

, M

n−1

, . . . , M

0

) ≤ M

n

).

We say M

n

is a submartingale if

1. E([M

n

[) < ∞, and

2. E(M

n+1

[M

n

= m

n

, M

n−1

= m

n−1

, . . . , M

0

= m

0

) ≥ m

n

. (or equiva-

lently E(M

n+1

[M

n

, M

n−1

, . . . , M

0

) ≥ M

n

.

The following theorem is one expression of the fair game aspect of mar-

tingales.

3.5. BASIC MARTINGALE THEORY 71

Theorem 3.5.1. If Y

n

is a

1. supermartingale, then E(Y

n

) ≤ E(Y

m

) for n ≥ m.

2. submartingale, then E(Y

n

) ≥ E(Y

m

) for n ≥ m.

3. martingale E(Y

n

) = E(Y

m

) for n, m.

Proof

1. For n ≥ m, the tower property of conditional expectation implies

E(Y

n

) = E(E(Y

n

[Y

m

, Y

m−1

, . . . , Y

0

)) ≤ E(Y

m

).

2. is similar.

3. is implied by 1. and 2. together.

Q.E.D.

It will often be useful to consider several stochastic processes at once, or

to express a stochastic process in terms of other ones. But we will usually

assume that there is some basic or driving stochastic process and that others

are expressed in terms of this one and deﬁned on the same sample space.

Thus we make the following more general deﬁnition of a martingale.

Deﬁnition 3.5.2. (Generalization) Let X

n

be a stochastic process. A stochas-

tic process M

0

, M

1

, . . . is a martingale relative to X

0

, . . . , X

n

if

1. E([M

n

[) < ∞ and

2. E(M

n+1

−M

n

[X

0

= x

0

, . . . , X

n

= x

n

) = 0.

Examples

1. Let X

n

= a + ξ

1

+ ξ

n

where ξ

i

are independent random variables

with E([ξ

i

[) < ∞ and E(ξ

i

) = 0. Then X

n

is a martingale. For

E(X

n+1

[X

0

= x

0

, . . . , X

n

= x

n

)

= E(X

n

+ξ

n+1

[X

0

= x

0

, . . . , X

n

= x

n

)

= E(X

n

[X

0

= x

0

, . . . , X

n

= x

n

) +E(ξ

n+1

[X

0

= x

0

, . . . , X

n

= x

n

)

Now by property 1 of conditional expectation for the ﬁrst summand

and property 2 for the second this is

= X

n

+E(ξ

n+1

) = X

n

72 CHAPTER 3. DISCRETE TIME

and thus a martingale.

A special case of this is when ξ

i

are IID with P(ξ

i

= +1) = 1/2, and

P(ξ

i

= −1) = 1/2, the symmetric random walk.

2. There is a multiplicative analogue of the previous example. Let X

n

=

aξ

1

ξ

2

ξ

n

where ξ

i

are independent random variables with E([ξ

i

[) <

∞ and E(ξ

i

) = 1. Then X

n

is a martingale. We leave the veriﬁcation

of this to you.

3. Let S

n

= S

0

+ξ

1

+ +ξ

n

, S

0

constant, ξ

i

independent random variables,

E(ξ

i

) = 0, σ

2

(ξ

i

) = σ

2

i

. Set v

n

=

n

i=1

σ

2

i

. Then M

n

= S

2

n

− v

n

is a

martingale relative to S

0

, . . . , S

n

.

For

M

n+1

= (S

2

n+1

−v

n+1

) −(S

2

n

−v

n

)

= (S

n

+ξ

n+1

)

2

−S

2

n

−(v

n+1

−v

n

)

= 2S

n

ξ

n+1

+ξ

2

n+1

−σ

2

n+1

,

(3.5.1)

and thus E(M

n+1

−M

n

[S

0

, S

1

, . . . , S

n

)

= E(2S

n

ξ

n+1

+ξ

2

n+1

−σ

2

n+1

[S

0

, . . . , S

n

)

= 2S

n

E(ξ

n+1

) +σ

2

n+1

−σ

2

n+1

= 0

4. There is a very general way to get martingales usually ascribed to Levy.

This will be very important to us. Let X

n

be a driving stochastic

process and let Y be a random variable deﬁned on the same sample

space. Then we may deﬁne Y

n

= E(Y [X

n

, . . . , X

0

). Then Y

n

is a

martingale. For by the tower property of conditional expectation, we

have E(Y

n+1

[X

n

, . . . , X

0

) is by deﬁnition

E(E(Y [X

n+1

, . . . , X

0

)[X

n

, . . . , X

0

)) = E(Y [X

n

, . . . , X

0

) = Y

n

We now introduce a very useful notation. Let T(X

0

, . . . , X

n

) be the col-

lection of random variables that can be written as g(X

0

, . . . , X

n

) for some

function g of n + 1 variables. More compactly, given a stochastic process

X

k

, we let T

n

= T(X

0

, . . . , X

n

). T

n

represents all the information that is

knowable about the stochastic process X

k

at time n and we call it the ﬁltra-

tion induced by X

n

. We write E(Y [T

n

) for E(Y [X

0

, . . . , X

n

). We restate

Proposition (3.3.3) in this notation. We also have an analogous statement

for conditioning on a set of variables.

3.5. BASIC MARTINGALE THEORY 73

Proposition 3.5.1. Let X, X

0

, . . . , X

n

, . . . be a stochastic process with ﬁl-

tration T

n

, and X a random variable deﬁned on the same sample space.

Then

1. E(E(X[T

n

)[T

m

) = E(X[T

m

) if m ≤ n.

2. If X, X

0

, . . . , X

n

are independent, then E(X[T

n

) = E(X).

3. Letting X ∈ T

n

. So X = f(X

0

, . . . , X

n

), then E(XY [T

n

) = XE(Y [T

n

)

Deﬁnition 3.5.3. Let X

n

be a stochastic process and T

n

its ﬁltration. We

say a process φ

n

is

1. adapted to X

n

(or to T

n

) if φ

n

∈ T

n

for all n.

2. previsible relative to X

n

(or T

n

) if φ ∈ T

n−1

.

We motivate the deﬁnitions above and the following construction by

thinking of a gambling game. Let S

n

be a stochastic process representing

the price of a stock at the nth tick. Suppose we are playing a game and that

our winnings per unit bet in the nth game is X

n

−X

n

−1. Let φ

n

denote the

amount of our bet on the nth hand. It is a completely natural assumption

that φ

n

should only depend on the history of the game up to time n−1. This

is the previsible hypothesis. Given a bet of φ

n

on the nth hand our winnings

playing this game up til time n will be

φ

1

(X

1

−X

0

) +φ

2

(X

2

−X

1

) + φ

n

(X

n

−X

n−1

=

n

j=1

φ

j

(X

j

−X

j−1

=

n

j=1

φ

j

∆X

j

This is called the martingale transform of φ with respect to X

n

. We denote

it more compactly by (

φ∆X)

n

. The name comes from the second theorem

below, which gives us yet another way of forming new martingales from old.

Theorem 3.5.2. Let Y

n

be a supermartingale relative to X

0

, X

1

, . . ., T

n

the

ﬁltration associated to X

n

. Let φ

n

be a previsible process and 0 ≤ φ

n

≤ c

n

.

Then G

n

= G

0

+

n

m=1

φ

m

(Y

m

−Y

m−1

) is an T

n

-supermartingale.

74 CHAPTER 3. DISCRETE TIME

Proof G

n+1

= G

n

+φ

n+1

(Y

n+1

−Y

n

). Thus

E(G

n+1

−G

n

)[X

0

, . . . , X

n

) = E(φ

n+1

(Y

n+1

−Y

n

)[X

0

, . . . , X

n

)

= φ

n+1

E(Y

n+1

−Y

n

[X

0

, . . . , X

n

)

≤ φ

n+1

0 ≤ 0.

(3.5.2)

Q.E.D.

Theorem 3.5.3. If φ

n+1

∈ T

n

, [φ

n

[ ≤ c

n

, and Y

n

is a martingale relative to

X

i

, then G

n

= G

0

+

n

m=1

φ

m

(Y

m

−Y

m−1

) is a martingale.

Deﬁnition 3.5.4. T, a random variable, is called a stopping time relative

to X

i

, if the event ¦T = n¦ is determined by X

0

, . . . , X

n

.

Example: In this example, we deﬁne a simple betting strategy of betting 1

until the stopping time tells us to stop. Thus, set

φ

n

(ω) =

_

1 if T(ω) ≥ n,

0 if T(ω) < n.

_

(3.5.3)

Let’s check that φ

n

∈ T(X

0

, . . . , X

n

). By deﬁnition, ¦T(ω) = n¦ is de-

termined by X

0

, . . . , X

n

. Now ¦φ

n

(ω) = 0¦ = ¦T ≤ n −1¦ =

n−1

k=0

¦T = k¦

is determined by X

0

, . . . , X

n−1

.

Let T ∧ n be the minimum of T and n. Thus

(T ∧ n)(ω) =

_

T(ω) if T(ω) < n,

= n if T(ω) ≥ n.

_

(3.5.4)

Theorem 3.5.2. 1. Let Y

n

be a supermartingale relative to X

i

and T a

stopping time. Then Y

T∧n

is a supermartingale.

2. Let Y

n

be a submartingale relative to X

i

and T a stopping time. Then

Y

T∧n

is a submartingale.

3. Let Y

n

be a martingale relative to X

i

and T a stopping time. Then

Y

T∧n

is a martingale.

Proof: Let φ

n

be the previsible process deﬁned above, i.e.,

φ

n

=

_

1 if n ≤ T,

0 if n > T.

_

(3.5.5)

3.5. BASIC MARTINGALE THEORY 75

Then

G

n

= Y

0

+

n

m=1

φ

m

(Y

m

−Y

m−1

)

= Y

0

+φ

1

(Y

1

−Y

0

) +φ

2

(Y

2

−Y

1

) + +φ

n

(Y

n

−Y

n−1

)

=

_

Y

n

if n ≤ T

Y

T

if n > T.

_

= Y

T∧n

(3.5.6)

So it follows by the above.

Q.E.D.

Example: The following betting strategy of doubling after each loss is known

to most children and is in fact THE classic martingale, the object that gives

martingales their name.

Let P(ξ = ±1) =

1

2

, ξ

i

are independent, identically distributed random

variables, X

n

=

n

i=1

ξ

i

, which is a martingale. Now φ

n

is the previsible

process deﬁned in the following way: φ

1

= 1. To deﬁne φ

n

we set inductively

φ

n

= 2φ

n−1

if ξ

n−1

= −1 and φ

n

= 1 if ξ

n−1

= 1. Then φ

n

is previsible. Now

deﬁne and T = min¦n[ξ

n

= 1¦. Now consider the process G

n

= (

φ∆X)

n

.

Then G

n

is a martingale by a previous theorem. On the other hand, E(G

T

) =

1, that is, when you ﬁnely win according to this betting strategy, you always

win 1. This seems like a sure ﬁre way to win. But it requires an arbitrarily

pot to make this work. Under more realistic hypotheses, we ﬁnd

Theorem 3.5.4. (Doob’s optional stopping theorem) Let M

n

be a martingale

relative to X

n

, T a stopping time relative to X

n

with P(T

ξ

< ∞) = 1. If

there is a K such that [M

T∧n

[ ≤ K for all n, then E(M

T

) = E(M

0

).

Proof We know that E(M

T∧n

) = E(M

0

) since by the previous theorem M

T∧n

is a martingale. Then

[E(M

0

) −E(M

T

)[ = [E(M

T∧n

) −E(M

T

)[ ≤ 2KP(T > n) → 0

as n → ∞.

Q.E.D.

An alternative version is

Theorem 3.5.5. If M

n

is a martingale relative to X

n

, T a stopping time,

[M

n+1

−M

n

[ ≤ K and E(T) < ∞, then E(M

T

) = E(M

0

).

We now discuss one last theorem before getting back to the ﬁnance. This

is the martingale representation theorem. According to what we proved

above, if X

n

is a martingale and φ

n

is previsible then the martingale transform

76 CHAPTER 3. DISCRETE TIME

(

φ∆X)

n

is also a martingale. The martingale representation theorem is

a converse of this. It gives conditions under which a martingale can be

expressed as a martingale transform of a given martingale. For this, we need

a condition on our driving martingale. Thus we deﬁne

Deﬁnition 3.5.5. A generalized random walk or binomial process is a stochas-

tic process X

n

in which at each time n, the value of X

n+1

−X

n

can only be

one of two possible values.

This like an ordinary random walk in that at every step the random walker

can take only 2 possible steps, but what the steps she can take depend on

the path up to time n. Thus

X

n+1

= X

n

+ξ

n+1

(X

1

, X

2

, . . . , X

n

)

Hence assume that

P(ξ

n+1

(X

1

, . . . , X

n

) = a) = p

and

P(ξ

n+1

(X

1

, . . . , X

n

) = b) = 1 −p

where I am being a little sloppy in that a, b, and p are all dependent on

X

1

, . . . , X

n

. Then

E(X

n+1

[T

n

) = E(X

n

+ξ

n

(X

0

, . . . , X

n

)[T

n

)

= X

n

+E(ξ

n

(X

0

, . . . , X

n

)[T

n

) = X

n

+pa + (1 −p)b

(3.5.7)

So in order that X

n

be a martingale we need

pa + (1 −p)b = 0 (3.5.8)

Theorem 3.5.6. Let X

0

, X

1

, . . . be a generalized random walk with associated

ﬁltration T

n

. Let M

n

be a Martingale relative to T

n

. Then there exists a

previsible process φ

n

such that

M

n

=

n

i=1

φ

i

(X

i

−X

i−1

) = (

φ∆X)

n

3.5. BASIC MARTINGALE THEORY 77

Proof Since M

n

is adapted to X

n

, there exist functions f

n

such that M

n

=

f

n

(X

0

, . . . , X

n

). By induction, we need to show there exists φ

n+1

∈ T

n

with

M

n+1

= φ

n+1

(X

n+1

−X

n

) +M

n

. Now

0 = E(M

n+1

−M

n

[T

n

)

= E(f

n+1

(X

0

, . . . , X

n+1

) −f

n

(X

0

, . . . , X

n

)[T

n

)

= E(f

n+1

(X

0

, . . . , X

n+1

[T

n

) −f

n

(X

0

, . . . , X

n

).

(3.5.9)

Now by assumption, X

n+1

can take on only 2 possible values: X

n

+ a, with

probability p, and X

n

+b, with probability 1−p. So E(f

n+1

(X

0

, . . . , X

n+1

)[T

n

)

= f

n+1

(X

0

, . . . , X

n

, X

n

+a) p +f

n+1

(X

0

, . . . , X

n

, X

n

+b) (1 −p).

So continuing the earlier calculation,

0 = E(f

n+1

(X

0

, . . . , X

n+1

[T

n

) −f

n

(X

0

, . . . , X

n

)

= (f

n+1

(X

0

, . . . , X

n

, X

n

+a) −f

n

(X

0

, . . . , X

n

)) p

+(f

n+1

(X

0

, . . . , X

n

, X

n

+b) −f

n

(X

0

, . . . , X

n

)) (1 −p)

=

(f

n+1

(X

0

,...,Xn,Xn+a)−fn(X

0

,...,Xn))

a

ap

+

(f

n+1

(X

0

,...,Xn,Xn+b)−fn(X

0

,...,Xn))

b

b(1 −p)

=

(f

n+1

(X

0

,...,Xn,Xn+a)−fn(X

0

,...,Xn))

a

ap

+

(f

n+1

(X

0

,...,Xn,Xn+b)−fn(X

0

,...,Xn))

b

(−ap).

(3.5.10)

since b(1 − p) = −ap by (3.5.8). Or equivalently, (assuming neither a nor p

are zero,

(f

n+1

(X

0

, . . . , X

n

, X

n

+a) −f

n

(X

0

, . . . , X

n

))

a

=

(f

n+1

(X

0

, . . . , X

n

, X

n

+b) −f

n

(X

0

, . . . , X

n

))

b

Set φ

n+1

to be equal to this common value. Clearly, φ

n+1

only depends on

X

0

, . . . , X

n

and is therefore previsible. Now we have to cases to consider: if

X

n+1

= X

n

+ a or X

n+1

= X

n

+ b. Both calculations are the same. So if

X

n+1

= X

n

+a then

φ

n+1

(X

n+1

−X

n

) +M

n

= φ

n+1

a +M

n

= M

n+1

Q.E.D.

78 CHAPTER 3. DISCRETE TIME

3.6 Pricing a derivative

We now discuss the problem we were originally out to solve: to value a

derivative. So suppose we have a stock whose price process we denote by S

n

,

where each tick of the clock represents some time interval δt. We will assume,

to be concrete, that our riskless bond is given by the standard formula,

B

n

= e

nrδt

B

0

. We use the following notation. If V

n

represents some price

process, then

˜

V

n

= e

−nrδt

V

n

is the discounted price process. Suppose h

denotes the payoﬀ function from a derivative on S

n

.

There are three steps to pricing h.

Step 1: Find Q such that

˜

S

n

= e

−rnδt

S

n

is a Q-martingale. Let T

n

denote

the ﬁltration corresponding to

˜

S.

Step 2: Set

˜

V

n

= E(e

−rNδt

h(S

N

)[T

n

), then

˜

V

n

is a martingale as well. By

the martingale representation theorem there exists a previsible φ

i

such that

˜

V

n

= V

0

+

n

i=1

φ

i

(

˜

S

i

−

˜

S

i−1

).

Step 3: Construct a self-ﬁnancing hedging portfolio. Just as in the one

period case, we will form a portfolio Π consisting of the stock and bond.

Thus Π

n

= φ

n

S

n

+ψ

n

B

n

, where B

n

= e

rnδt

B

0

. So

˜

Π

n

= φ

n

˜

S

n

+ ψ

n

B

0

. We

require two conditions on Π.

1. φ

n

and ψ

n

should be previsible. From a ﬁnancial point of view this is a

completely reasonable assumption, since decisions about what to hold

during the nth tick must be made prior to the nth tick.

2. The portfolio Π should be self ﬁnancing. This too is ﬁnancially justiﬁed.

In order for the portfolio to be viewed as equivalent to the claim h, it

must act like the claim. The claim needs no infusion of money and

provides no income until expiration. And this is also what we require

of the portfolio.

The self-ﬁnancing condition is derived from the following considerations. Af-

ter tick n − 1 the protfolio is worth Π

n−1

= φ

n−1

S

n−1

+ψ

n−1

B

n−1

Between

tick n−1 and n we readjust the portfolio in anticipation of the nth tick. But

we only want to use the resources that are available to us via the portfolio

itself. Also, between the n − 1st and nth tick, the stock is still worth S

n−1

.

Therefore we require

(SFC1) φ

n

S

n−1

+ψ

n

B

n−1

= φ

n−1

S

n−1

+ψ

n−1

B

n−1

.

3.6. PRICING A DERIVATIVE 79

Another equivalent formulation of the self-ﬁnancing condition is arrived at

by

(∆Π)

n

= Π

n

−Π

n−1

= φ

n

S

n

+ψ

n

B

n

−(φ

n−1

S

n−1

+ψ

n−1

B

n−1

)

= φ

n

S

n

+ψ

n

B

n

−(φ

n

S

n−1

+ψ

n

B

n−1

)

= φ

n

(S

n

−S

n−1

) −ψ

n

(B

n

−B

n−1

)

= φ

n

(∆S)

n

+ψ

n

(∆B)

n

(3.6.1)

So we have

(SCF2) (∆Π)

n

= φ

n

(∆S)

n

+ψ

n

(∆B)

n

.

Now we have

˜

V

n

= E

Q

(e

−rNδt

h(S

N

)[T

n

) and we can write

˜

V

n

= V

0

+

n

i=1

φ

i

∆

˜

S

i

,

for some previsible φ

i

. Now we want

˜

Π

n

= φ

n

˜

S

n

+ψ

n

B

0

to equal

˜

V

n

. So set

ψ

n

= (

˜

V

n

−φ

n

˜

S

n

)/B

0

or equivalently

ψ

n

= (V

n

−φ

n

S

n

)/B

n

Clearly,

˜

Π

n

= φ

n

˜

S

n

+ψ

n

B

0

=

˜

V

n

.

We now need to check two things:

1. previsibility of φ

n

and ψ

n

;

2. self-ﬁnancing of the portfolio.

1. Previsibility: φ

n

is previsible by the martingale representation theorem.

As for ψ

n

, we have

ψ

n

=

˜

V

n

−φ

n

˜

S

n

B

0

=

φ

n

(

˜

S

n

−

˜

S

n−1

) +

n−1

j=1

φ

j

(

˜

S

j

−

˜

S

j−1

) −φ

n

˜

S

n

B

0

=

−φ

n

˜

S

n−1

+

n−1

j=1

φ

j

(

˜

S

j

−

˜

S

j−1

)

B

0

80 CHAPTER 3. DISCRETE TIME

which is manifestly independent of

˜

S

n

, hence ψ

n

is previsible.

2. Self-ﬁnancing: We need to verify

φ

n

S

n−1

+ψ

n

B

n−1

= φ

n−1

S

n−1

+ψ

n−1

B

n−1

or what is the same thing

(SFC3) (φ

n

−φ

n−1

)

˜

S

n−1

+ (ψ

n

−ψ

n−1

)B

0

= 0

We calculate the right term (ψ

n

−ψ

n−1

)B

0

= (

˜

V

n

−φ

n

˜

S

n

B

0

−

˜

V

n−1

−φ

n−1

˜

S

n−1

B

0

)B

0

=

˜

V

n

−

˜

V

n−1

+φ

n−1

˜

S

n−1

−φ

n

˜

S

n

= φ

n

(

˜

S

n

−

˜

S

n−1

) +φ

n−1

˜

S

n−1

−φ

n

˜

S

n

= −(φ

n

−φ

n−1

)

˜

S

n−1

which clearly implies (SFC3).

So ﬁnally we have constructed a self-ﬁnancing hedging strategy for the deriva-

tive h. By the no-arbitrage hypothesis, we must have that the value of h

at time n (including time 0 or now) is V (h)

n

= Π

n

= V

n

= e

rnδt

˜

V

n

=

e

rnδt

E

Q

(e

−rNδt

h(S

N

))

= e

−r(N−n)δt

E

Q

(h(S

N

))

3.7 Dynamic portfolio selection

In this section we will use our martingale theory and derivative pricing theory

to solve a problem in portfolio selection. We will carry this out in a rather

simpliﬁed situation, but the techniques are valid in much more generality.

Thus suppose that we want to invest in a stock, S

n

and the bond B

n

.

In our ﬁrst go round, we just want to ﬁnd a portfolio Π

n

= (φ

n

, ψ

n

) which

maximizes something. To maximize the expected value of Π

N

where N rep-

resents the time horizon, will just result in investing everything in the stock,

because it has a higher expected return. This does not take into account

our aversion to risk. It is better to maximize E(U(Π

n

)) where U is a (dif-

ferentiable) utility function that encodes our risk aversion. U is required to

satisfy two conditions:

3.7. DYNAMIC PORTFOLIO SELECTION 81

1. U should be increasing, which just represents that more money is better

than less, and means U

(x) > 0

2. U

should be decreasing, that is U

**< 0 which represents that the
**

utility of the absolute gain is decreasing, that is the utility of going

from a billion dollars to a billion and one dollars is less than the utility

of going from one hundred dollars to one hundred and one dollars.

Some common utility functions are

U

0

(x) = ln(x)

U

p

(x) =

x

p

p

for 0 ,= p < 1

(3.7.1)

We will carry out the problem for U(x) = U

0

(x) = ln(x) and leave the

case of U

p

(x) as a problem.

So our problem is to now maximize E

P

(ln(Π

n

)) but we have constraints.

Here P represents the real world probabilities, since we want to maximize

in the real world, not the risk neutral one. First, want that the value of

our portfolio at time 0 is the amount that we are willing to invest in it. So

we want Π

0

= W

0

for some initial wealth W

0

. Second, if we want to let

this portfolio go, with no other additions or withdrawals, then we want it

to be self ﬁnancing. Thus Π

n

should satisfy SFC. And ﬁnally, for Π

n

to be

realistically tradeable, it also must satisfy that it is previsible.

In chapter 3 we saw that there was an equivalence between two sets of

objects:

1. Self-ﬁnancing previsible portfolio processes Π

n

and

2. derivatives h on the stock S.

If in addition we want the value of the portfolio to satisfy Π

0

= W

0

then our

problem

max(E

P

(ln(Π

N

))) subject to Π

0

= 0

can be reformulated as

max(E

P

(ln(h))) subject to the value of h at time 0 is W

0

82 CHAPTER 3. DISCRETE TIME

And now we may use our derivative pricing formula to look at this condition,

since the value of a derivative h at time 0 is

E

Q

(exp(−rT)h)

where Q is the risk neutral probabilities. Thus the above problem becomes

max(E

P

(ln(h))) subject to E

Q

(exp(−rT)h)) = W

0

Now we are in a position where we are really using two probabilities at the

same time. We need the following theorem which allows us to compare them.

Theorem 3.7.1. (Radon-Nikodym) Given two probabilities P and Q assume

that if P(E) = 0 then Q(E) = 0 for any event E. Then there exists a random

variable ξ such that

E

Q

(X) = E

P

(ξX)

for any random variable X.

Proof. In our discrete setting this theorem is easy. For each outcome ω, set

ξ(ω) = Q(ω)/P(ω), if P(ω) ,= 0 and 0 otherwise. Then

E

Q

(X) =

ω

X(ω)Q(ω)

=

ω

X(ω)P(ω)

Q(ω)

P(ω)

= E

P

(ξX)

So now our problem is

max(E

P

(ln(h))) subject to E

P

(ξ exp(−rT)h)) = W

0

We can now approach this extremal problem by the method of Lagrange

multipliers.

According to this method, to ﬁnd an extremum of the function

max(E

P

(ln(h))) subject to E

P

(ξ exp(−rT)h)) = W

0

we introduce the mutliplier λ. And we solve

d

d

[

=0

(E

P

(ln(h +v))) −λE

P

(ξ exp(−rT)(h +v))) = 0

3.7. DYNAMIC PORTFOLIO SELECTION 83

for all random variables v. Now the left hand side is

E

P

(

d

d

[

=0

ln(h +v)) −λE

P

(

d

d

[

=0

ξ exp(−rT)(h +v)) =

E

P

(

1

h

v −λξ exp(−rT)v) =

E

P

([

1

h

−λξ exp(−rT)]v) =

(3.7.2)

and we want this to be 0 for all random variables v. In order for this expec-

tation to be 0 for all v, it must be the case that

1

h

−λξ exp(−rT) = 0

almost surely. Hence h = exp(rT)/(λξ). To calculate λ we put h back into

the constraint: W

0

= E

P

(ξ exp(−rT)h)

=E

P

(ξ exp(−rT) exp(rT)/(λξ))

=E

P

(1/λ)

(3.7.3)

So λ = 1/W

0

and thus ﬁnally we have

h = W

0

exp(rT)/ξ

To go further we need to write down a speciﬁc model for the stock. Thus

let X

1

, X

2

, . . . , be IID with law P(X

j

= ±1) = 1/2. Set S

0

be the initial

value of the stock and inductively

S

n

= S

n−1

(1 +µ δt +σ

√

δtX

n

)

In other words, S

n

is just a binomial stock model with

u = (1 +µ δt +σ

√

δt)

and

d = (1 +µ δt −σ

√

δt)

The sample space for this process can be taken to be sequences x

1

x

2

x

3

x

N

where x

i

is u or d representing either an up move or a down move.

Then by our formula for the risk neutral probability

q =

exp rδt −d

u −d

=

rδt −(µδt −σ

√

δt)

(µδt +σ

√

δt) −(µδt −σ

√

δt)

84 CHAPTER 3. DISCRETE TIME

up to ﬁrst order in δt. So

q =

1

2

_

1 + (

r −µ

σ

)

√

δt

_

and

1 −q =

1

2

_

1 −(

r −µ

σ

)

√

δt

_

This then determines the random variable ξ in the Radon-Nikodym theorem,

3.7.1:

ξ(x

1

x

2

x

3

x

N

) = ξ

1

(x

1

)ξ

2

(x

2

) ξ

N

(x

N

)

where x

i

denotes either u or d and

ξ

n

(u) =

q

p

=

_

1 + (

r −µ

σ

)

√

δt

_

(3.7.4)

and

ξ

n

(d) =

1 −q

1 −p

=

_

1 −(

r −µ

σ

)

√

δt

_

We now know which derivative has as its replicating portfolio the portfolio

that solves our optimization problem. We now calculate what the replicating

portfolio actually is. For this we use our theory from chapter 3 again. This

tells us that we take

¯

V

n

= E

Q

(exp(−rT)h[T

n

)

Then by the martingale representation theorem there is a previsible process

φ

n

such that

¯

V

n

=

¯

V

0

+

n

k=1

φ

k

(

¯

S

k

−

¯

S

k−1

)

Moreover, it tells us how pick φ

n

,

φ

n

=

¯

V

n

−

¯

V

n−1

¯

S

n

−

¯

S

n−1

Now let’s recall that by the fact that

¯

S

n

is a martingale with respect to Q

¯

S

n−1

= E

Q

(

¯

S

n

[T

n−1

) = q

¯

S

u

n

+ (1 −q)

¯

S

d

n

3.7. DYNAMIC PORTFOLIO SELECTION 85

where

¯

S

u

n

denotes the value of

¯

S

n

if S takes an uptick at time n. Similarly,

we have

¯

V

n−1

= E

Q

(

¯

V

n

[T

n−1

) = q

¯

V

u

n

+ (1 −q)

¯

V

d

n

From this, we see that if S takes an uptick at time n that

φ

n

=

¯

V

n

−

¯

V

n−1

¯

S

n

−

¯

S

n−1

=

¯

V

u

n

−(q

¯

V

u

n

+ (1 −q)

¯

V

d

n

)

¯

S

u

n

−(q

¯

S

u

n

+ (1 −q)

¯

S

d

n

)

=

¯

V

u

n

−

¯

V

d

n

¯

S

u

n

−

¯

S

d

n

A similar check will give the same exact answer if S took a downtick at time

n. (After all, φ

n

must be previsible and therefore independent of whether S

takes an uptick or downtick at time n.) Finally let’s remark that φ

n

can be

expressed in terms of the undiscounted prices

φ

n

=

¯

V

u

n

−

¯

V

d

n

¯

S

u

n

−

¯

S

d

n

=

V

u

n

−V

d

n

S

u

n

−S

d

n

since both the numerator and the denominator are expressions at the same

time n and therefore the discounting factor for the numerator and the de-

nominator cancel.

Now we use the following reﬁnement of the Radon-Nikodym theorem.

Theorem 3.7.2. Given two probabilities P and Q assume that if P(E) = 0

then Q(E) = 0 for any event E. Let ξ be the random variable such that

E

Q

(X) = E

P

(ξX)

for any random variable X. Then for any event A, we have

E

Q

(X[A) =

E

P

(ξX[A)

E

P

(ξ[A)

I will ask you to prove this.

From this theorem it follows that

¯

V

n

= E

Q

(

¯

h[T

n

) =

E

P

(ξ

¯

h[T

n

)

E

P

(ξ[T

n

)

Therefore we see that

¯

V

n

=

¯

V

n−1

ξ

n

(x

n

)

86 CHAPTER 3. DISCRETE TIME

recalling that x

n

denotes the nth tick and ξ

n

is given by (3.7.4).

Now we are ready to ﬁnish our calculation.

¯

V

n

=

e

V

n−1

ξn(xn)

so

¯

V

u

n

=

¯

V

n−1

ξ

n

(u)

=

¯

V

n−1

_

1 + (

r−µ

σ

)

√

δt

_

¯

V

d

n

=

¯

V

n−1

ξ

n

(d)

=

¯

V

n−1

_

1 −(

r−µ

σ

)

√

δt

_

We also have

S

u

n

= S

n−1

(1 +µδt +σ

√

δt)

so

¯

S

u

n

=

¯

S

n−1

(1 + (µ −r)δt +σ

√

δt)

up to ﬁrst order in δt. Similarly

¯

S

d

n

=

¯

S

n−1

(1 + (µ −r)δt −σ

√

δt)

to ﬁrst order in δt. So we have

φ

n

=

¯

V

u

n

−

¯

V

d

n

¯

S

u

n

−

¯

S

d

n

=

¯

V

n−1

_

1

(1+(

r−µ

σ

)

√

δt)

−

1

(1−(

r−µ

σ

)

√

δt)

_

¯

S

n−1

(2σ

√

δt)

=

¯

V

n−1

¯

S

n−1

_

_

_

_

1 −(

r−µ

σ

)

√

δt

_

−

_

1 + (

r−µ

σ

)

√

δt

_

_

1 −(

r−µ

σ

)

2

δt

_

2σ

√

δt

_

_

_

=

¯

V

n−1

¯

S

n−1

_

−

2(

r−µ

σ

)

√

δt

2σ

√

δt

_

(3.7.5)

up to terms of order δt. So we have

φ

n

=

¯

V

n−1

(

µ−r

σ

2

)

¯

S

n−1

=

V

n−1

(

µ−r

σ

2

)

S

n−1

(3.7.6)

the last equality is because the quotient of the two discounting factors cancel.

V

n−1

represents the value of the portfolio at time n − 1, and φ

n

represents

3.7. DYNAMIC PORTFOLIO SELECTION 87

the number of shares of stock in this portfolio. Hence the amount of money

invested in the stock is φ

n

S

n−1

= (

µ−r

σ

2

)V

n−1

. Thus formula (3.7.6) expresses

the fact that at every tick, you should have the proportion

µ−r

σ

2

of your port-

folio invested in the stock.

Over the 50 year period from 1926 to 1985, the S&P 500 has had excess

return µ−r over the short term treasury bills of about .058 with a standard

deviation of σ of .216. Doing the calculation with the values above yields

.058/(.216)

2

= 1.23 = 123%. That is, with log utility, the optimal portfolio

is one where one invests all ones wealth in the S&P 500, and borrows an

additional 23% of current wealth and invests it also in the S&P.

88 CHAPTER 3. DISCRETE TIME

3.8 Exercises

1. Let S

0

, S

u

1

, S

d

1

be as in class, for the one-step binomial model. For

the pseudo-probabilities q of going up and 1 −q of going down that we

calculated in class, what is E

Q

(S

1

).

2. We arrived at the following formula for discounting. If B is a bond

with par value 1 at time T, then at time t < T the bond is worth

exp(−r(T − t)) assuming constant interest rates. Now assume that

interest rates are not constant but are described by a known function

of time r(t). Recalling the derivation of the formula above for constant

interest rate, calculate the value of the same bond under the interest

rate function r(t). Your answer should have an integral in it.

3. A fair die is rolled twice. Let X denote the sum of the values. What is

E(X[A) where A is the event that the ﬁrst roll was a 2.

4. Toss a nickel, a dime and a quarter. Let B

i

denote the event that i of

the coins landed heads up, i = 0, 1, 2, 3. Let X denote the sum of the

amount shown, that is, a coin says how much it is worth only on the

tales side, so X is the sum of the values of the coins that landed tales

up. What is E(X[B

i

).

5. Let X be a random variable with pdf p(x) = αexp(−αx), (a continuous

distribution.) Calculate E(X[¦X ≥ t¦).

In the next three problems calculate E(X[Y ) for the two random vari-

ables X and Y . That is, describe E(X[Y ) as a function of the values

of Y . Also, construct the pdf for this random variable.

6. A die is rolled twice. X is the sum of the values, and Y is the value on

the ﬁrst roll.

7. From a bag containing four balls numbered 1, 2, 3 and 4 you draw two

balls. If at least one of the balls is numbered 2 you win $1 otherwise

3.8. EXERCISES 89

you lose a $1. Let X denote the amount won or lost. Let Y denote the

number on the ﬁrst ball chosen.

8. A bag contains three coins, but one of them is fake and has two heads.

Two coins are drawn and tossed. Let X be the number of heads tossed

and Y denote the number of honest coins drawn.

9. Let W

n

denote the symmetric random walk starting at 0, that is p = 1/2

and W

0

= 0. Let X denote the number of steps to the right the random

walker takes in the ﬁrst 6 steps. Calculate

(a) E(X[W

1

= −1).

(b) E(X[W

1

= 1).

(c) E(X[W

1

= 1, W

2

= −1)

(d) E(X[W

1

= 1, W

2

= −1, W

3

= −1)

10. We’ve seen that if ξ

n

are independent with expectations all 0, then

M

n

=

n

j=1

ξ

j

is a Martingale. It is not necessary for the ξ

j

’s to be independent.

In this exercise you are asked to prove that, though they may not be

independent, they are uncorrelated. Thus, let M

n

be a Martingale and

let ξ

j

= M

j

−M

j−1

be its increments. Show

(a) E(ξ

j

) = 0 for all j.

(b) E(ξ

j

ξ

k

) = 0 for all j ,= k. (This means that ξ

j

and ξ

k

have 0

covariance, since their expectations are 0.) (Hint: You will have

to use several of the properties of conditional expectations.)

In the following three problems, you are asked to derive what is called

the gambler’s ruin using techniques from Martingale theory.

Suppose a gambler plays a game repeatedly of tossing a coin with prob-

ability p of getting heads, in which case he wins $1 and 1 −p of getting

tails in which case he loses $1. Suppose the gambler begins with $a

90 CHAPTER 3. DISCRETE TIME

and keeps gambling until he either goes bankrupt or gets $b, a < b.

Mathematically, let ξ

j

, j = 1, 2, 3, . . . be IID random variables with

distribution P(ξ

j

= 1) = p and P(ξ

j

= −1) = 1 −p, where 0 < p < 1.

Let S

n

denote the gamblers fortune after the nth toss, that is

S

n

= a +ξ

1

+ +ξ

n

Deﬁne the stopping time, T = min¦n[S

n

= 0 or b¦ that is, T is the

time when the gambler goes bankrupt or achieves a fortune of b. Deﬁne

A

n

= S

n

−n(2p −1) and M

n

= (

1−p

p

)

Sn

.

11. Show A

n

and M

n

are Martingales.

12. Calculate P(S

T

= 0), the probability that the gambler will eventually

go bankrupt, in the following way. Using Doob’s optional stopping

theorem, calculate E(M

T

). Now use the deﬁnition of E(M

T

) in terms

of P(S

T

= j) to express P(S

T

= 0) in terms of E(M

T

) and solve for

P(S

T

= 0).

13. Calculate E(S

T

).

14. Calculate E(T) the expected time till bust or fortune in the follow-

ing way: calculate E(A

T

) by the optional stopping theorem. Express

E(A

T

) in terms of E(S

T

) and E(T). Solve for E(T).

In the following problems, you are asked to ﬁnd an optimal betting

strategy using Martingale methods.

Suppose a gambler is playing a game repeatedly where the winnings of

a $1 bet pay-oﬀ $1 and the probability of winning each round is p > 1/2

is in the gambler’s favor. (By counting cards, blackjack can be made

such a game.) What betting strategy should the gambler adopt. It is

not hard to show that if the gambler wants to maximize his expected

fortune after n trials, then he should bet his entire fortune each round.

This seems counter intuitive and in fact doesn’t take into much account

the gambler’s strong interest not to go bankrupt. It turns out to make

more sense to maximize not ones expected fortune, but the expected

value of the logarithm his fortune.

As above, let ξ

j

be IID with P(ξ

j

= 1) = p and P(ξ

j

= −1) = 1 − p

and assume this time that p > 1/2. Let F

n

denote the fortune of the

3.8. EXERCISES 91

gambler at time n. Assume the gambler starts with $a and adopts a

betting strategy φ

j

, (a previsible process), that is, in the jth round he

bets $φ

j

. Then we can express his fortune

F

n

= a +

n

j=1

φ

j

ξ

j

We assume that 0 < φ

n

< F

n−1

, that is the gambler must bet at least

something to stay in the game, and he doesn’t ever want to bet his

entire fortune. Let e = p ln(p) + (1 −p) ln(1 −p) + ln 2.

15. Show L

n

= ln(F

n

) −ne is a super Martingale and thus E(ln(F

n

/F

0

)) ≤

ne.

16. Show that for some strategy, L

n

is a Martingale. What is the strategy?

(Hint: You should bet a ﬁxed percentage of your current fortune.)

17. Carry out a similar analysis to the above using the utility function

U

p

(x) =

x

p

p

, p < 1 not equal to 0. Reduce to terms of order δt. You

will have to use Taylor’s theorem. Your answer should not have a δt in

it.

92 CHAPTER 3. DISCRETE TIME

Chapter 4

Stochastic processes and

ﬁnance in continuous time

In this chapter we begin our study of continuous time ﬁnance. We could

develop it similarly to the model of chapter 3 using continuous time martin-

gales, martingale representation theorem, change of measure, etc. But the

probabilistic infra structure needed to handle this is more formidable. An ap-

proach often taken is to model things by partial diﬀerential equations, which

is the approach we will take in this chapter. We will however discuss contin-

uous time stochastic processes and their relationship to partial diﬀerential

equations in more detail than most books.

We will warm up by progressing slowly from stochastic processes discrete

in both time and space to processes continuous in both time and space.

4.1 Stochastic processes continuous in time,

discrete in space

We segue from our completely discrete processes by now allowing the time

parameter to be continuous but still demanding space to be discrete. Thus

our second kind of stochastic process we will consider is a process X

t

where

the time t is a real-valued deterministic variable and at each time t, the

sample space Ψ of the random variable X

t

is a (ﬁnite or inﬁnite) discrete set.

In such a case, a state X of the entire process is described as a sequence of

jump times (τ

1

, τ

2

, . . .) and a sequence of values of X (elements of the common

93

94 CHAPTER 4. CONTINUOUS TIME

discrete state space) j

0

, j

1

, . . .. These describe a ”path” of the system through

S given by X(t) = j

0

for 0 < t < τ

1

, X

t

= j

1

for τ

1

< t < τ

2

, etc... Because

X moves by a sequence of jumps, these processes are often called ”jump”

processes.

Thus, to describe the stochastic process, we need, for each j ∈ S, a contin-

uous random variable F

j

that describes how long it will be before the process

(starting at X = j) will jump, and a set of discrete random variables Q

j

that

describe the probabilities of jumping from j to other states when the jump

occurs. We will let Q

jk

be the probability that the state starting at j will

jump to k (when it jumps), so

k

Q

jk

= 1 and Q

jj

= 0

We also deﬁne the function F

j

(t) to be the probability that the process

starting at X = j has already jumped at time t, i.e.,

F

j

(t) = P(τ

1

< t [ X

0

= j).

All this is deﬁned so that

1.

P(τ

1

< t, X

τ

1

= k[X

0

= j) = F

j

(t)Q

jk

(4.1.1)

(this is an independence assumption: where we jump to is independent

of when we jump).

Instead of writing P(. . . [X

0

= j), we will begin to write P

j

(. . . ).

We will make the usual stationarity and Markov assumptions as follows:

We assume ﬁrst that

2.

P

j

(τ

1

< s, X

τ

1

= k, τ

2

−τ

1

< t, X

τ

2

= l)

= P(τ

1

< s, X

τ

1

= k, τ

2

−τ

1

< t, X

τ

2

= l [ X

0

= j)

= F

j

(s)Q

jk

F

k

(t)Q

kl

.

It will be convenient to deﬁne the quantity P

jk

(t) to be the probability

that X

t

= k, given that X

0

= j, i.e.,

P

jk

(t) = P

j

(X

t

= k).

4.1. CONTINUOUS TIME, DISCRETE SPACE 95

3. The Markov assumption is

P(X

t

= k[X

s

1

= j

1

, . . . , X

sn

= j

n

, X

s

= j) = P

jk

(t −s).

The three assumptions 1, 2, 3 will enable us to derive a (possibly inﬁnite)

system of diﬀerential equations for the probability functions P

jk

(t) for j and

k ∈ S. First, the stationarity assumption M1 tells us that

P

j

(τ

1

> t +s [ τ

1

> s) = P

j

(τ

1

> t).

This means that the probability of having to wait at least one minute more

before the process jumps is independent of how long we have waited already,

i.e., we can take any time to be t = 0 and the rules governing the process

will not change. (From a group- theoretic point of view this means that the

process is invariant under time translation). Using the formula for conditional

probability (P(A[B) = P(A ∩ B)/P(B)), the fact that the event τ

1

> t + s

is a subset of the event τ

1

> s, we translate this equation into

1 −F

j

(t +s)

1 −F

j

(s)

= 1 −F

j

(t)

This implies that the function 1-F

j

(t) is a multiplicative function, i.e., it

must be an exponential function. It is decreasing since F

j

(t) represents the

probability that something happens before time t, and F

j

(0)=0, so we must

have that 1-F

j

(t)=e

−q

j

t

for some positive number q

j

. In other words,

F

j

is exponentially distributed with probability density function

f

j

(t) = q

j

e

−q

j

t

for t ≥ 0

To check this, note that

P

j

(τ

1

> t) = 1 −F

j

(t) =

_

∞

t

q

j

e

−q

j

s

ds = e

−q

j

t

so the two ways of computing F

j

(t) agree .

The next consequence of our stationarity and Markov assumptions are the

Chapman-Kolmorgorov equations in this setting:

P

jk

(t +s) =

l∈S

P

jl

(t)P

lk

(s).

96 CHAPTER 4. CONTINUOUS TIME

They say that to calculate the probability of going from state j to state k

in time t + s, one can sum up all the ways of going from j to k via any l at

the intermediate time t. This is an obvious consistency equation, but it will

prove very useful later on.

Now we can begin the calculations that will produce our system of diﬀer-

ential equations for P

jk

(t). The ﬁrst step is to express P

jk

(t) based on the

time of the ﬁrst jump from j. The result we intend to prove is the following:

P

jk

(t) = δ

jk

e

−q

j

t

+

_

t

0

q

j

e

q

j

s

_

l=j

Q

jl

P

lk

(t −s)

_

ds

As intimidating as this formula looks, it can be understood in a fairly intuitive

way. We take it a term at a time:

δ

jk

e

−q

j

t

: This term is the probability that the ﬁrst jump hasn’t happened

yet, i.e., that the state has remained j from the start. That is why the

Kronecker delta symbol is there (recall that δ

jk

= 1 if j = k and = 0 if j is

diﬀerent from k). This term is zero unless j = k, in which case it gives the

probability that τ

1

> t, i.e., that no jump has occurred yet.

The integral term can be best understood as a ”double sum”, and then

by using the Chapman Kolmorgorov equation. For any s between 0 and

t, the deﬁnition of f

j

(t) tells us that the probability that the ﬁrst jump

occurs between time s and time s+ds is (approximately) f

j

(s)ds = q

j

e

−q

j

s

ds

(where the error in the approximation goes to zero faster than ds as ds

goes to zero). Therefore, the probability that for l diﬀerent from j, the ﬁrst

jump occurs between time s and time s+ds and the jump goes from j to l

is (approximately) q

j

e

−q

j

s

Q

jl

ds. Based on this, we see that the probability

that the ﬁrst jump occurs between time s and time s+ds, and the jump goes

from j to l, and in the intervening time (between s and t) the process goes

from l to k is (approximately)

q

j

e

−q

j

s

Q

jl

P

lk

(t −s)ds

To account for all of the ways of going from j to k in time t, we need to add

up these probabilities over all possible times s of the ﬁrst jump (this gives an

integral with respect to ds in the limit as ds goes to 0), and over all possible

results l diﬀerent from j of the ﬁrst jump. This reasoning yields the second

term of the formula.

We can make a change of variables in the integral term of the formula: let

s be replaced by t-s. Note that this will transform ds into -ds, but it will

4.1. CONTINUOUS TIME, DISCRETE SPACE 97

also reverse the limits of integration. The result of this change of variables

is (after taking out factors from the sum and integral that depend neither

upon l nor upon s) :

P

jk

(t) = δ

jk

e

−q

j

t

+q

j

e

−q

j

t

_

t

0

e

q

j

s

_

l=j

Q

jl

P

lk

(s)

_

ds

= e

−q

j

t

_

δ

jk

+q

j

_

t

0

e

q

j

s

_

l=j

Q

jl

P

lk

(s)

_

ds

_

Now P

jk

(t) is a continuous function of t because it is deﬁned as the product

of a continuous function with a constant plus a constant times the integral

of a bounded function of t (the integrand is bounded on any bounded t

interval). But once P

jk

is continuous, we can diﬀerentiate both sides of the

last equation with respect to t and show that P

jk

(t) is also diﬀerentiable.

We take this derivative (and take note of the fact that when we diﬀerentiate

the ﬁrst factor we just get a constant times what we started with) using the

fundamental theorem of calculus to arrive at:

P

jk

(t) = −q

j

P

jk

(t) +q

j

l=j

Q

jl

P

lk

(t).

A special case of this diﬀerential equation occurs when t=0. Note if we

know that X

0

= j, then it must be true that P

jk

(0) = δ

jk

. This observation

enables us to write:

P

jk

(0) = −q

j

P

jk

(0) +q

j

l=j

Q

jl

P

lk

(0)

= −q

j

δ

jk

+q

j

l=j

Q

jl

δ

lk

= −q

j

δ

jk

+q

j

Q

jk

.

One last bit of notation! Let q

jk

= P

jk

(0). So q

jj

= −q

j

and q

jk

= q

j

Q

jk

if

j ,= k. (Note that this implies that

k=j

q

jk

= q

j

= −q

jj

98 CHAPTER 4. CONTINUOUS TIME

so the sum of the q

jk

as k ranges over all of S is 0. the constants q

jk

a re

called the inﬁnitesimal parameters of the stochastic process.

Given this notation, we can rewrite our equation for P

jk

(t) given above as

follows:

P

jk

(t) =

l∈S

q

jl

P

lk

(t)

for all t > 0. This system of diﬀerential equations is called the backward

equation of the process.

Another system of diﬀerential equations we can derive is the forward equa-

tion of the process. To do this, we diﬀerentiate both sides of the Chapman

Kolmogorov equation with respect to s and get

P

jk

(t +s) =

l∈S

P

jl

(t)P

lk

(s)

If we set s=0 in this equation and use the deﬁnition of q

jk

, we get the forward

equation:

P

jk

(t) =

l∈S

q

lk

P

jl

(t).

4.2 Examples of Stochastic processes contin-

uous in time, discrete in space

The simplest useful example of a stochastic process that is continuous in time

but discrete in space is the Poisson process. The Poisson process is a special

case of a set of processes called birth processes. In these processes, the sample

space for each random variable X

t

is the set of non-negative integers, and for

each non-negative integer j, we choose a positive real parameter λ

j

which will

be equal to both q

j,j+1

and −q

j

. Thus, in such a process, the value of X can

either jump up by 1 or stay the same. The time until the process jumps up

by one, starting from X=j, is exponentially distributed with parameter λ

j

.

So if we think of X as representing a population, it is one that experiences

only births (one at a time) at random times, but no deaths.

The specialization in the Poisson process is that λ

j

is chosen to have the

same value, λ, for all j. Since we move one step to the right (on the j-axis)

4.2. EXAMPLES 99

according to the exponential distribution, we can conclude without much

thought that

P

jk

(t) = 0 for k < j and t > 0

and that

P

jj

(t) = e

−λt

for all t > 0. To learn any more, we must consider the system of diﬀerential

equations (the ”forward equations”) derived in the preceding lecture.

Recall that the forward equation

P

jk

(t) =

l∈S

q

lk

P

jl

(t)

gave the rate of change in the probability of being at k (given that the process

started at j) in terms of the probabilities of jumping to k from any l just at

time t (the terms of the sum for l ,= k) and (minus) the probability of already

being at k an d jumping away just at time t (recall that q

kk

is negative.

For the Poisson process, the forward equations simplify considerably, be-

cause q

jk

is nonzero only if j = k or if k = j + 1. And for these values we

have:

q

jj

= −q

j

= −λ and q

j,j+1

= λ

Therefore the forward equation reduces to:

P

jk

(t) =

∞

l=0

q

lk

P

jl

(t) = q

k−1,k

P

j,k−1

(t) +q

kk

P

jk

(t)

= λP

j,k−1

(t) −λP

jk

(t).

Since we already know P

jk

(t) for k ≤ j, we will use the preceding diﬀerential

equation inductively to calculate P

jk

(t) for k > j, one k at a time.

To do this, though, we need to review a little about linear ﬁrst-order dif-

ferential equations. In particular, we will consider equations of the form

f

+λf = g

where f and g are functions of t and λ is a constant. To solve such equa-

tions, we make the left side have the form of the derivative of a product by

multiplying through by e

λt

:

e

λt

f

+λe

λt

f = e

λt

g

100 CHAPTER 4. CONTINUOUS TIME

We recognize the left as the derivative of the product of e

λt

and f:

(e

λt

f)

= e

λt

g

We can integrate both sides of this 9with s substituted for t) from 0 to t to

recover f(t):

_

t

0

d

ds

(e

λs

f(s)) ds =

_

t

0

e

λs

g(s) ds

Of course, the integral of the derivative may be evaluated using the (second)

fundamental theorem of calculus:

e

λt

f(t) −f(0) =

_

t

0

e

λs

g(s) ds

Now we can solve algebraically for f(t):

f(t) = e

−λt

f(0) +e

−λt

_

t

0

e

λs

g(s) ds

Using this formula for the solution of the diﬀerential equation, we substitute

P

jk

(t) for f(t), and λP

j,k−1

(t) for g(t). We get that

P

jk

(t) = e

−λt

P

jk

(0) +λ

_

t

0

e

−λ(t−s)

P

j,k−1

(s) ds

Since we are assuming that the process starts at j, i.e., X

0

= j, we know that

P

jk

(0) = 0 for all k ,= j and P

jj

(0) = 1. In other words

P

jk

(0) = δ

jk

(this is the Kronecker delta).

Now we are ready to compute. Note that we have to compute P

jk

(t) for

all possible nonnegative integer values of j and k. In other words, we have to

ﬁll in all the entries in the table:

P

00

(t) P

01

(t) P

02

(t) P

03

(t) P

04

(t)

P

10

(t) P

11

(t) P

12

(t) P

13

(t) P

14

(t)

P

20

(t) P

21

(t) P

22

(t) P

23

(t) P

24

(t)

P

30

(t) P

31

(t) P

32

(t) P

33

(t) P

34

(t)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4.2. EXAMPLES 101

Of course, we already know part of the table:

e

−λt

P

01

(t) P

02

(t) P

03

(t) P

04

(t)

0 e

−λt

P

12

(t) P

13

(t) P

14

(t)

0 0 e

−λt

P

23

(t) P

24

(t)

0 0 0 e

−λt

P

34

(t)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

So we begin the induction:

We know that P

jj

= e

−λt

, so we can use the diﬀerential equation

P

j,j+1

(t) +λP

j,j+1

(t) = λP

jj

(t)

to get that

P

j,j+1

(t) = λ

_

t

0

e

−λ(t−s)

e

−λs

ds = λte

−λt

Note that this ﬁlls in the next diagonal of the table.

Then we can use the diﬀerential equation

P

j,j+2

(t) +λP

j,j+2

(t) = λP

j,j+1

(t)

to get that

P

j,j+2

(t) = λ

_

t

0

e

−λ(t−s)

λse

−λs

ds =

(λt)

2

2

e

−λt

Continuing by induction we get that

P

jk

(t) =

(λt)

k−j

(k −j)!

e

−λt

provided 0 ≤ j ≤ k and t > 0. This means that the random variable X

t

(this is the inﬁnite discrete random variable that gives the value of j at time

t, i.e., tells how many jumps have happened up until time t, has a Poisson

distribution with parameter λt This is the important connection between the

exponential distribution and the Poisson distribution.

102 CHAPTER 4. CONTINUOUS TIME

4.3 The Wiener process

4.3.1 Review of Taylor’s theorem

Taylor series are familiar to every student of calculus as a means writing

functions as power series. The real force of Taylor series is what is called

Taylor’s theorem which tells one how well the Taylor polynomials approxi-

mate the function. We will be satisﬁed with the following version of Taylor’s

theorem. First, we introduce some notation.

Deﬁnition 4.3.1. For two functions f and g, we say that f(h) is o(g(h))

(as h → 0) (read ”f is little o of g) if

lim

h→0

f(h)/g(h) = 0

It of course means that f(h) is smaller than g(h) (as h gets small) but in

a very precise manner.

As an example, note that another way to say that f

(a) = b is to say that

f(a +h) −(f(a) +bh) = o(h)

That is, that the diﬀerence between the function h → f(a+h) and the linear

function h → f(a) +bh is small as h → 0.

Taylor’s theorem is just the higher order version of this statement.

Theorem 4.3.1. Let f be a function deﬁned on an interval (α, β) be n times

continuously diﬀerentiable. Then for any a ∈ (α, β)

f(a +h) = f(a) +f

(a)h +

f

(a)

2!

h

2

+

f

(3)

(a)

3!

h

3

+ +

f

(n)

(a)

n!

h

n

+o(h

n

)

We will in fact need a multivariable version of this, but only of low order.

Theorem 4.3.2. Let f(x, y) be a function of two variables deﬁned on an

open set of O ⊂ R

2

. Then for any point (a, b) ∈ O one has

f(a +k, b +k) = f(a, b) +

∂f

∂x

(a, b)h +

∂f

∂y

(a, b)k

+

1

2

(

∂

2

f

∂x

2

(a, b)h

2

+ 2

∂

2

f

∂x∂y

(a, b)kh +

∂

2

f

∂y

2

(a, b)k

2

) +o(h

2

, k

2

, hk)

4.3. THE WIENER PROCESS 103

4.3.2 The Fokker-Plank equation

We will give a derivation of the Fokker-Planck equation, which governs the

evolution of the probability density function of a random variable-valued

function X

t

that satisﬁes a ”ﬁrst-order stochastic diﬀerential equation”. The

variable evolves according to what in the literature is called a Wiener process,

named after Norbert Wiener, who developed a great deal of the theory of

stochastic processes during the ﬁrst half of this century.

We shall begin with a generalized version of random walk (an ”unbalanced”

version where the probability of moving in one direction is not necessarily

the same as that of moving in the other). Then we shall let both the space

and time increments shrink in such a way that we can make sense of the limit

as both ∆x and ∆y approach 0.

Recall from the previous sections the one-dimensional random walk (on

the integers) beginning at W

0

= 0. We ﬁx a parameter p, and let p be the

probability of taking a step to the right, so that 1 − p is the probability of

taking a step to the left, at any (integer-valued) time n. In symbols, this is:

p = P(W

n+1

= x + 1[W

n

= x) and 1 −p = P(W

n+1

= x −1[W

n

= x).

Thus

W

n

= ξ

1

+ +ξ

n

where ξ

j

are IID with P(ξ

j

= 1) = p and P(ξ

j

= −1) = 1 − p. Since we

are given that W

0

= 0, we know that the probability function of W

n

will be

essentially binomially distributed (with parameters n and p, except that at

time n, x will go from −n to +n, skipping every other integer as discussed

earlier). We will set v

x,n

= P(W

n

= x). Then

v

x,n

=

_

n

1

2

(n +x)

_

p

1

2

(n+x)

(1 −p)

1

2

(n−x)

for x = −n, −(n −2), . . . , (n −2), n and v

x,n

= 0 otherwise.

For use later, we calculate the expected value and variance of W

n

. First,

E(W

n

) =

n

x=−n

xv

x,n

=

n

k=0

(2k −n)v

x,2k−n

=

n

k=0

(2k −n)

_

n

k

_

p

k

(1 −p)

n−k

= n(2p −1)

104 CHAPTER 4. CONTINUOUS TIME

(it’s an exercise to check the algebra; but the result is intuitive). Also,

Var(W

n

) =

n

k=0

(2k −n)

2

v

x,2k−n

=

n

k=0

(2k −n

2

)

_

n

k

_

p

k

(1 −p)

n−k

= 4np(1 −p)

(another exercise!).

Now lets form a new stochastic process W

∆

and we let the space steps ∆x

and time steps ∆t be diﬀerent from 1. W

∆

is a shorthand for W

∆t,∆x

.

First, the time steps. We will take time steps of length ∆t. This will have

little eﬀect on our formulas, except everywhere we will have to note that the

number n of time steps needed to get from time 0 to time t is

n =

t

∆t

Each step to the left or right has length ∆x instead of length 1. This has

the eﬀect of scaling W by the factor ∆x. Since multiplying a random variable

by a constant has the eﬀect of multiplying its expected value by the same

constant, and its variance by the square of the constant.

Therefore, the mean and variance of the total displacement in time t are:

E(W

∆

t

) =

t

∆t

(2p −1)∆x and Var(W

∆

t

) = 4p(1 −p)t

(∆x)

2

∆t

We want both of these quantities to remain ﬁnite for all ﬁxed t, since we

don’t want our process to have a positive probability of zooming oﬀ to inﬁnity

in ﬁnite time, and we also want Var(W(t)) to be non-zero. Otherwise the

process will be completely deterministic.

We need to examine what we have control over in order to achieve these

ends. First oﬀ, we don’t expect to let ∆x and ∆t to go to zero in any arbitrary

manner. In fact, it seems clear that in order for the variance to be ﬁnite and

non-zero, we should insist that ∆t be roughly the same order of magnitude

as (∆x)

2

. This is because we don’t seem to have control over anything else

in the expression for the variance. It is reasonable that p and q vary with

∆x, but we probably don’t want their product to approach zero (that would

force our process to move always to the left or always to the right), and it

can’t approach inﬁnity (since the maximum value of p(1 −p) is 1/4). So we

4.3. THE WIENER PROCESS 105

are forced to make the assumption that

(∆x)

2

∆t

remain bounded as they both

go to zero. In fact, we will go one step further. We will insist that

(∆x)

2

∆t

= 2D (4.3.1)

for some (necessarily positive) constant D.

Now, we are in a bind with the expected value. The expected value will

go to inﬁnity as ∆x → 0, and the only parameter we have to control it is

2p −1. We need 2p −1 to have the same order of magnitude as ∆x in order

to keep E(W

∆

t

) bounded. Since p = 1 − p when p = 1/2, we will make the

assumption that

p =

1

2

+

c

2D

∆x (4.3.2)

for some real (positive, negative or zero) constant c, which entails

1 −p =

1

2

−

c

2D

∆x

The reason for the choice of the parameters will become apparent. D is called

the diﬀusion coeﬃcient, and c is called the drift coeﬃcient.

Note that as ∆x and ∆t approach zero, the values v

n,k

= P(W

∆

n∆t

=

k∆x) = p(k∆x, n∆t) specify values of the function p at more and more

points in the plane. In fact, the points at which p is deﬁned become more

and more dense in the plane as ∆x and ∆t approach zero. This means that

given any point in the plane, we can ﬁnd a sequence of points at which such an

p is deﬁned, that approach our given point as ∆x and ∆t approach zero. We

would like to be able to assert the existence of the limiting function, and that

it expresses how the probability of a continuous random variable evolves with

time. We will not quite do this, but, assuming the existence of the limiting

function, we will derive a partial diﬀerential equation that it must satisfy.

Then we shall use a probabilistic argument to obtain a candidate limiting

function. Finally, we will check that the candidate limiting function is a

solution of the partial diﬀerential equation. This isn’t quite a proof of the

result we would like, but it touches on the important steps in the argument.

We begin with the derivation of the partial diﬀerential equation. To do

this, we note that for the original random walk, it was true that v

x,n

satisﬁes

the following diﬀerence equation:

v

k,n+1

= pv

k−1,n

+ (1 −p)v

k+1,n

106 CHAPTER 4. CONTINUOUS TIME

This is just the matrix of the (inﬁnite discrete) Markov process that deﬁnes

the random walk. When we pass into the ∆x, ∆t realm, and use p(x, t)

notation instead of v

k∆x,n∆t

, this equation becomes:

(∗) p(x, t + ∆t) = pp(x −∆x, t) + (1 −p)p(x + ∆x, t)

Next, we need Taylor’s expansion for a function of two variables. As we do

the expansion, we must keep in mind that we are going to let ∆t → 0, and as

we do this, we must account for all terms of order up to and including that

of ∆t. Thus, we must account for both ∆x and (∆x)

2

terms (but (∆x)

3

and

∆t∆x both go to zero faster than ∆t).

We expand each of the three expressions in equation (*) around the point

(x, t), keeping terms up to order ∆t and (∆x)

2

. We get:

p(x, t + ∆t) = p(x, t) + ∆(t)

∂p

∂t

(x, t) +

p(x + ∆x, t) = p(x, t) + ∆(x)

∂p

∂x

(x, t) +

(∆x)

2

2

∂

2

p

∂x

2

(x, t) +

p(x −∆x, t) = p(x, t) −∆(x)

∂p

∂x

(x, t) +

(∆x)

2

2

∂

2

p

∂x

2

(x, t) +

We substitute these three equations into (*), recalling that p + (1 −p) = 1,

we get

∆t

∂p

∂t

= −(2p −1)∆x

∂p

∂x

+

(∆x)

2

2

∂

2

p

∂x

2

(where all the partial derivatives are evaluated at (x, t)). Finally, we divide

through by ∆t, and recall that (∆x)

2

/∆t = 2D, and 2p − 1 = ∆xc/D to

arrive at the equation:

∂p

∂t

= −2c

∂p

∂x

+D

∂

2

p

∂x

2

This last partial diﬀerential equation is called the Fokker-Planck equation.

To ﬁnd a solution of the Fokker-Planck equation, we use a probabilistic

argument to ﬁnd the limit of the random walk distribution as ∆x and ∆t

approach 0. The key step in this argument uses the fact that the binomial

distribution with parameters n and p approaches the normal distribution

with expected value np and variance np(1 − p) as n approaches inﬁnity. In

terms of our assumptions about ∆x, ∆t, p and 1 −p, we see that at time t,

the binomial distribution has mean

t(2p −1)

∆x

∆t

= 2ct

4.3. THE WIENER PROCESS 107

and variance

4p(1 −p)t

(∆x)

2

∆t

which approaches 2Dt as ∆t → 0. Therefore, we expect W

t

in the limit to be

normally distributed with mean 2ct and variance 2Dt. Thus, the probability

that W

t

< x is

P(W

t

< x) =

1

√

2π

_ x−2ct

√

2Dt

−∞

e

−

1

2

y

2

dy

Therefore the pdf of W

t

is the derivative of this with respect to x, which

yields the pdf

p(x, t) =

1

2

√

πDt

e

−

(x−2ct)

2

4Dt

A special case of this is when c = 0 and D =

1

2

. The corresponding process

is called the Wiener process (or Brownian motion). The following properties

characterize the Wiener process, which we record as a theorem.

Theorem 4.3.3. 1. W

0

= 0 with probability 1.

2. W

t

is a continuous function of t with probability 1.

3. For t > s W

t

− W

s

is normally distributed with mean 0 and variance

t −s.

4. For T ≥ s ≥ t ≥ u ≥ v, W

s

−W

t

and W

u

−W

v

are independent.

5. The probability density for W

t

is

p(x, t) =

1

√

2πt

e

−

x

2

2t

Proof: Let’s discuss these points.

1. This is clear since we deﬁned W

t

as a limit of random walks which all

started at 0.

3. This we derived from the Fokker-Plank equation.

4. Again, we deﬁned W

t

as a limit of random walks, W

∆

. Now W

∆

t

=

∆x(ξ

1

+ ξ

n

) where n = t/∆t. So if t = n∆t, s = m∆t, u = k∆t and

v = l∆t where n ≥ m ≥ k ≥ l then we have

W

t

−W

s

= ∆x(ξ

m+1

+ξ

m+2

+ ξ

n

)

108 CHAPTER 4. CONTINUOUS TIME

and

W

u

−W

v

= ∆x(ξ

l+1

+ ξ

k

)

which are independent.

2. This fact is beyond the scope of our discussion but is intuitive.

Q.E.D.

While, according to 2. above, the paths W

t

are continuous, they are very

rough and we see according to the next proposition that they are not diﬀer-

entiable, with probability 1.

Proposition 4.3.4. For all t > 0 and all n = 1, 2, . . .,

lim

h→0

P([W

t+h

−W

t

[ > n[h[) = 1

so that W

t

is nowhere diﬀerentiable.

Proof: Assume without loss of generality that h > 0. Then

P([W

t+h

−W

t

[ > nh) = P([W

h

[ > nh)

by 3) of ??. Now this gives

1 −P([W

h

[ ≤ nh)

Now

P([W

h

[ ≤ nh)) =

1

√

2πh

_

nh

−nh

e

−

x

2

2t

dx

≤

1

√

2πh

_

nh

−nh

1dx ≤

1

√

2πh

2hn

=

_

2h

π

which converges to 0 as h → 0 for any n. Q.E.D.

Almost everything we use about the Wiener process will follow from ??.

We now do some sample computations to show how to manipulate the Wiener

process and some of which will be useful later.

1. Calculate the expected distance of the Wiener process from 0.

Solution: This is just E([W

t

[) which is

_

∞

−∞

[x[

1

√

2πt

e

−

x

2

2t

4.3. THE WIENER PROCESS 109

= 2

_

∞

0

x

1

√

2πt

e

−

x

2

2t

Changing variables, u = x

2

/(2t) this integral becomes

2t

√

2πt

_

∞

0

e

−u

du =

_

2t/π

2. Calculate E(W

t

W

s

).

Solution: Assume without loss of generality that t > s. Then

E(W

t

W

s

) = E((W

t

−W

s

)W

s

+W

2

s

) = E((W

t

−W

s

)W

s

) +E(W

2

s

)

Now W

t

−W

s

is independent of W

s

since t > s and W

s

has mean 0, therefore

we get

E(W

t

−W

s

)E(W

s

) +σ

2

(W

s

) = s

since W

t

−W

s

has mean 0.

The Wiener Process Starting at y, W

y

The deﬁning properties of W

y

t

= W

t

+y, the Wiener process starting at y,

are:

1. W

y

0

= y.

2. W

y

t

is continuous with probability 1.

3. W

y

t

is normally distributed with µ = y, σ

2

= t.

4. W

y

t

−W

y

s

is normally distributed with µ = 0, σ

2

= t −s, t > s.

5. W

y

t

−W

y

s

, W

y

u

−W

y

v

are independent for t ≥ s ≥ u ≥ v.

6. The pdf is given by

p(x, t[y) =

1

√

2πt

e

−

(x−y)

2

2t

P(a ≤ W

y

t

≤ b) =

1

√

2πt

_

b

a

e

−

(x−y)

2

2t

dx

110 CHAPTER 4. CONTINUOUS TIME

How do we use Wiener Process to model stock movements?

We would like to make sense of the idea that

dS

t

dt

= µS

t

+noise(σ, S

t

), where

if the noise term were absent then S

t

is a bank account earning interest u.

There are two obvious problems:

(1) THe Wiener process is diﬀerentiable nowhere, so why should S

t

be

diﬀerentiable anywhere?

(2) “Noise” is very diﬃcult to make rigorous.

To motivate our way out of these problems let us look at deterministic

motion. The most general 1st order diﬀerential equation is

df

dt

= φ(f(t), t) .

By integrating both sides we arrive at the following integral equation

f(t) −f(t

0

) =

_

t

t

0

df

dt

dt =

_

t

t

0

φ(f(t), t) dt .

That is,

f(t) = f(t

0

) +

_

t

t

0

φ(f(t), t) dt .

Theoretically integral equations are better behaved than diﬀerential equa-

tions (they tend to smooth out bumps). So we try to make the corresponding

conversion from diﬀerential to integral equation in the stochastic world.

Our ﬁrst goal is just to deﬁne a reasonable notion of stochastic integral.

We had the Martingale Transform: If S

n

was a stock, φ

n

denoted the

amount of stock at tick n, then the gain from tick 0 to tick N was

N

i=1

φ

i

(S

i

−S

i−1

)

It is these sorts of sums that become integrals in continuous time.

Recall the Riemann integral

_

b

a

f(x) dx: Take a partition of [a, b], T =

¦a = t

0

< t

1

< < t

n

= b¦.

Theorem 4.3.5. (Riemann’s Theorem) Let f be a function such that the

probability of choosing a point of discontinuity is 0. Let mesh T = max¦[t

i

−t

i−1

[ [i =

1, . . . , n¦, let t

i−1

< ξ

i

< t

i

, then let

RS(f, T) =

n

i=1

f(ξ

i

)(t

i

−t

i−1

) .

4.3. THE WIENER PROCESS 111

(RS stands for Riemann sum). Then

lim

meshP→0

RS(f, T)

exists, and is independent of the choices of ξ

i

within their respective intervals.

We call this limit the Riemann integral

_

b

a

f(x)dx

Note that the summation is beginning to look like a martingale transform.

The next level of integral technology is the Riemann-Stieltjes integral.

Suppose φ(x) is a non decreasing function, T a partition, and ξ

i

values be-

tween partition points. Let

RSS(f, T, φ) =

n

i=1

f(ξ

i

)(φ(t

i

) −φ(t

i−1

)) .

(“RSS” stands for Riemann-Stieltjes sum.) If lim

meshP→0

RSS(f, T, φ) ex-

ists, then by deﬁnition,

_

b

a

f(t) dφ(t) is this limit. Assuming φ is diﬀeren-

tiable, the corresponding Riemann’s theorem holds, namely

lim

meshP→0

RSS(f, T, φ)

exists (and is independent of the choices of ξ

i

).

Proposition 4.3.6. If φ is diﬀerentiable,

_

b

a

f(t) dφ(t) =

_

b

a

f(t)φ

(t) dt.

Proof. Let T = ¦a = t

0

< t

1

< < t

n

= b¦.

RSS(f, T, φ) =

n

i=1

f(ξ

i

)(φ(t

i

) −φ(t

i−1

))

=

n

i=1

f(ξ

i

)

φ(t

i

)−φ(t

i−1

)

t

i

−t

i−1

(t

i

−t

i−1

)

=

n

i=1

f(ξ

i

)(φ

(t

i

) +o([t

i

−t

i−1

[

2

))(t

i

−t

i−1

)

(4.3.3)

and so lim

meshP→0

n

i=1

f(ξ

i

)φ

(t

i

)(t

i

−t

i−1

) =

_

b

a

f(t)φ

(t) dt.

Example. Let φ(x) be diﬀerentiable.

_

b

a

φ(t) dφ(t) =

_

b

a

φ(t)φ

(t) dt =

1

2

φ(b)

2

−

1

2

φ(a)

2

.

112 CHAPTER 4. CONTINUOUS TIME

A stochastic integral is formed much like the Riemann-Stieltjes integral,

with one twist.

For X

t

and φ

t

stochastic processes,

_

b

a

φ

t

dX

t

is to be a random variable. In the Riemann and Riemann-Stieltjes cases, the

choices of the points ξ

i

in the intervals were irrelevant, in the stochastic case it

is very much relevant. As an example, take T = ¦a = t

0

< t

1

< < t

n

= b¦.

Consider

n

i=1

φ

ξ

i

(X

t

i

−X

t

i−1

) .

Let’s work out the above sum for X

t

= φ

t

= W

t

, where W

t

is the Wiener

process, using two diﬀerent choices for the ξ

i

’s.

First, let ξ

i

= t

i−1

, the left endpoint. Then

E

_

n

i=1

W

t

i−1

(W

t

i

−W

t

i−1

)

_

=

n

i=1

E(W

t

i−1

(W

t

i

−W

t

i−1

))

=

n

i=1

E(W

t

i−1

)E(W

t

i

−W

t

i−1

) = 0 ,

using independence.

Second, let ξ

i

= t

i

, the right endpoint. Then

E

_

n

i=1

W

t

i

(W

t

i

−W

t

i−1

)

_

=

n

i=1

E(W

t

i

(W

t

i

−W

t

i−1

)) =

n

i=1

E(W

t

i

2

−W

t

i

W

t

i−1

)

=

n

i=1

E(W

t

i

2

) −E(W

t

i

W

t

i−1

)

=

n

i=1

t

i

−t

i−1

= (t

1

−t

0

) + (t

2

−t

1

) + + (t

n

−t

n−1

)

= t

n

−t

0

= b −a .

Which endpoint to choose? Following Itˆo, one uses the left-hand endpoint.

One deﬁnes the Itˆo sum

IS(φ

t

, T, X

t

) =

n

i=1

φ

t

i−1

(X

t

i

−X

t

i−1

)

.

There are two motivations for this choice.

4.3. THE WIENER PROCESS 113

1. The lefthand choice looks a little like a previsibility condition. For

example, if X

t

is a stock-process and φ

t

a portfolio process then the

left-hand choice embodies the idea that our gains from time T

i−1

to

time t

i

should be based on a choice made at time t

i−1

.

2. It turns out, that if X

t

is a Martingale, then the Ito sum will be a

Martingale as well.

Following Stratonovich, one takes ξ

i

=

1

2

(t

i−1

+ t

i

), the midpoint. As an

advantage, the resulting Stratonovich calculus looks like traditional calculus.

As a disadvantage, the derived stochastic process is not a martingale.

Let X

t

be a stochastic process. Let φ

t

be another stochastic process deﬁned

on the same sample space as X

t

.

Deﬁnition 4.3.2. φ

t

is adapted to X

t

if φ

t

is independent of X

s

− X

t

for

s > t.

Then we deﬁne the Itˆo integral

_

t

a

φ

s

dX

s

≡ lim

meshP→0

IS(φ, T, X) = lim

meshP→0

n

i=1

φ

t

i−1

(X

t

i

−X

t

i−1

) .

Let us call a random variable with ﬁnite ﬁrst and second moments sqaure

integrable. One knows that for X, Y square integrable random variables

deﬁned on the same sample space, P(X = Y ) = 1 iﬀ E(X) = E(Y ) and

Var(X −Y ) = 0. Now Var(X −Y ) = E((X −Y )

2

) −(E(X −Y ))

2

, so

P(X = Y ) = 1 iﬀ E((X −Y )

2

) = 0 .

This is a criterion for checking whether two square integrable random vari-

ables are equal (almost surely).

Deﬁnition 4.3.3. A sequence X

α

of square integrable random variables

converges in mean-square to X if E((X −X

α

)

2

) → 0 as α → 0. In symbols,

limX

α

= X. This is equivalent to E(X

α

) → E(X) and Var(X −X

α

) → 0.

This is the sense of the limit in the deﬁnition of the stochastic integral.

Here is a sample computation of a stochastic integral. Let’s evaluate

_

b

a

W

t

dW

t

114 CHAPTER 4. CONTINUOUS TIME

We need the following facts: E(W

t

) = 0, E(W

t

2

) = t, E(W

t

3

) = 0, and

E(W

t

4

) = 3t

2

.

IS(W

t

, T, W

t

) =

n

i=1

W

t

i−1

∆W

t

i

=

n

i=1

W

t

i−1

(W

t

i

−W

t

i−1

)

=

1

2

n

i=1

(W

t

i−1

+ ∆W

t

i

)

2

−W

t

i−1

2

−(∆W

t

i

)

2

=

1

2

n

i=1

W

t

i

2

−W

t

i−1

2

−

1

2

n

i=1

(∆W

t

i

)

2

=

1

2

((W

t

1

2

−W

t

0

2

) + (W

t

2

2

−W

t

1

2

) +

+(W

tn

2

−W

t

n−1

2

)) −

1

2

n

i=1

(∆W

t

i

)

2

=

1

2

(W

b

2

−W

a

2

) −

1

2

n

i=1

(∆W

t

i

)

2

.

The ﬁrst part is recognizable as what the Riemann-Stieltjes integral gives

formally. But what is

n

i=1

(∆W

t

i

)

2

as meshT → 0? The simple calculation

E

_

n

i=1

(∆W

t

i

)

2

_

=

n

i=1

E((∆W

t

i

)

2

)

=

n

i=1

E((W

t

i

−W

t

i−1

)

2

) =

n

i=1

t

i

−t

i−1

= b −a

leads us to the

Claim:

lim

meshP→0

E(

n

i=1

(∆W

t

i

)

2

) = b −a

and therefore

_

b

a

W

t

dW

t

=

1

2

(W

b

2

−W

a

2

) −

1

2

(b −a) .

Proof.

E

_

(

n

i=1

(∆W

t

i

)

2

−(b −a))

2

_

= E

_

(

n

i=1

(∆W

t

i

2

))

2

−2(b −a)

n

i=1

(∆W

t

i

)

2

+ (b −a)

2

_

= E

_

n

i=1

(∆W

t

i

)

4

+ 2

i<j

(∆W

t

i

)

2

(∆W

t

j

)

2

−2(b −a)

n

i=1

(∆W

t

i

)

2

+ (b −a)

2

_

=

n

i=1

3(∆t

i

)

2

+ 2

i<j

E((∆W

t

i

)

2

)E((∆W

t

j

)

2

) −E(2(b −a)

n

i=1

(∆W

t

i

)

2

+ (b −a)

2

)

= 3

n

i=1

(∆t

i

)

2

+ 2

i<j

(∆t

i

)(∆t

j

) −2(b −a)

n

i=1

∆t

i

+ (b −a)

2

= 2

n

i=1

(∆t

i

)

2

+

_

n

i=1

(∆t

i

)

2

+ 2

i<j

(∆t

i

)(∆t

j

)

_

−2(b −a)

n

i=1

∆t

i

+ (b −a)

2

= 2

n

i=1

(∆t

i

)

2

+

_

n

i=1

(∆t

i

)

_

2

−2(b −a)

n

i=1

∆t

i

+ (b −a)

2

= 2

n

i=1

(∆t

i

)

2

+

_

n

i=1

∆t

i

−(b −a)

_

2

= 2

n

i=1

(∆t

i

)

2

4.3. THE WIENER PROCESS 115

Now for each i, ∆t

i

≤ mesh T, so (∆t

i

)

2

≤ ∆t

i

mesh T. (Note that only

one ∆t

i

is being rounded up.) Summing i = 1 to n, 0 ≤

n

i=1

(∆t

i

)

2

≤

_

n

i=1

∆t

i

_

mesh T = (b −a) mesh T → 0 as meshT → 0. So

lim

meshP→0

n

i=1

(∆t

i

)

2

= 0 .

Intuitively, ∆t

i

≈ mesh T ≈

1

n

, so lim

meshP→0

n

i=1

(∆t

i

)

2

≈ lim

n→∞

n

(

1

n

)

2

→ 0.

The underlying reason for the diﬀerence of the stochastic integral and the

Stieltjes integral is that ∆W

t

i

is, with high probability, of order

√

∆t

i

.

Slogan of the Itˆo Calculus: dW

t

2

= dt.

In deriving the Wiener Process,

(∆X)

2

∆t

= 1, so (∆X)

2

= (∆W)

2

= ∆t.

dW

2

t

= dt is only given precise sense through its integral form:

Theorem 4.3.7. . If φ

s

is adapted to W

s

, then

_

t

a

φ

s

(dW

s

)

2

=

_

t

a

φ

s

ds.

Moreover, for n > 2,

_

t

a

φ

s

(dW

s

)

n

= 0.

Proving this will be in the exercises.

Properties of the Itˆo Integral. If φ

s

, ψ

s

are adapted to the given process

(X

s

or W

s

), then

(1)

_

t

a

dX

s

= X

t

−X

a

,

(2)

_

t

a

(φ

s

+ψ

s

) dX

s

=

_

t

a

φ

s

dX

s

+

_

t

a

ψ

s

dX

s

,

(3) E

_

_

t

a

φ

s

dW

s

_

= 0 and

(4) E

_

(

_

t

a

φ

s

dW

s

)

2

_

= E

_

_

t

a

φ

s

2

ds

_

.

Proof.

116 CHAPTER 4. CONTINUOUS TIME

1. For any partition T = ¦a = s

0

< s

1

< < s

n

= t¦, IS(1, T, X

s

) =

n

i=1

1 (X

s

i

− X

s

i−1

), which telescopes to X

sn

− X

s

0

= X

t

− X

a

. So

the limit as meshT → 0 is X

t

−X

a

.

2. It is obvious for any partition T that IS(φ

s

+ψ

s

, T, X

s

) = IS(φ

s

, T, X

s

)+

IS(ψ

s

, T, X

s

). Now take the limit as meshT → 0.

3. Given a partition T, IS(φ

s

, T, W

s

) =

n

i=1

φ

s

i−1

(W

s

i

−W

s

i−1

). Taking

expected values gives

E(IS(φ

s

, T, W

s

)) = E

_

n

i=1

φ

s

i−1

(W

s

i

−W

s

i−1

)

_

=

n

i=1

E(φ

s

i−1

(W

s

i

−W

s

i−1

))

=

n

i=1

E(φ

s

i−1

)E(W

s

i

−W

s

i−1

) = 0 .

Taking the limit as meshT → 0 gives the result.

4. Given a partition T,

E(IS(φ

s

, T, W

s

)

2

)

= E

_

(

n

i=1

φ

s

i−1

(W

s

i

−W

s

i−1

))

2

_

= E

_

n

i=1

φ

s

i−1

2

(W

s

i

−W

s

i−1

)

2

+ 2

i<j

φ

s

i−1

φ

s

j−1

(W

s

i

−W

s

i−1

)(W

s

j

−W

s

j−1

)

_

= E

_

n

i=1

φ

s

i−1

2

(∆W

s

i

)

2

_

+ 2

i<j

E(φ

s

i−1

φ

s

j−1

(W

s

i

−W

s

i−1

)(W

s

j

−W

s

j−1

))

= E

_

n

i=1

φ

s

i−1

2

(∆W

s

i

)

2

_

+ 2

i<j

E(φ

s

i−1

φ

s

j−1

(W

s

i

−W

s

i−1

)) E(W

s

j

−W

s

j−1

)

=

n

i=1

E(φ

s

i−1

2

)∆s

i

+ 2

i<j

E(φ

s

i−1

φ

s

j−1

(W

s

i

−W

s

i−1

)) 0

=

n

i=1

E(φ

s

i−1

2

)∆s

i

= IS(φ

s

2

, T, s) .

Now take the limit as meshT → 0.

4.3.3 Stochastic Diﬀerential Equations.

A ﬁrst-order stochastic diﬀerential equation is an expression of the form

dX

t

= a(X

t

, t) dt +b(X

t

, t) dW

t

,

where a(x, t), b(x, t) are functions of two variables.

A stochastic process X

t

is a solution to this SDE if

X

t

−X

a

=

_

t

a

dX

s

=

_

t

a

a(X

s

, s) ds +b(X

s

, s) dW

s

.

4.3. THE WIENER PROCESS 117

Lemma 4.3.8. (Itˆo’s Lemma). Suppose we are given a twice diﬀerentiable

function f(x), and a solution X

t

to the SDE dX

t

= a dt +b dW

t

. Then f(X

t

)

is a stochastic process, and it satisﬁes the SDE

df(X

t

) = [f

(X

t

)a +

1

2

f

(X

t

)b

2

] dt +f

(X

t

)b dW

t

.

More generally, if f(x, t) a function of two variables, twice diﬀerentiable in

x, once diﬀerentiable in t, one has df(X

t

, t)

= [

∂f

∂t

(X

t

, t) +

∂f

dx

(X

t

, t)a +

1

2

∂

2

f

∂x

2

(X

t

, t)b

2

] dt +

∂f

∂x

(X

t

, t)b dW

t

.

Proof. This is just a calculation using two terms of the Taylor’s series for

f, and the rules dW

t

2

= dt and dW

t

n

= 0 for n > 2. (Note that dt dW

t

=

dW

t

3

= 0 and dt

2

= dW

t

4

= 0.)

df(X

t

) = f(X

t+dt

) −f(X

t

) = [f(X

t

+dX

t

)] −f(X

t

)

= [f(X

t

) +f

(X

t

) dX

t

+

1

2

f

(X

t

)(dX

t

)

2

+o((dX

t

)

3

)] −f(X

t

)

= f

(X

t

) dX

t

+

1

2

f

(X

t

)(dX

t

)

2

= f

(X

t

)[a(X

t

, t) dt +b(X

t

, t) dW

t

] +

1

2

f

(X

t

)[a

2

dt

2

+ 2ab dt dW

t

+b

2

dW

t

2

]

= [f

(X

t

)a(X

t

, t) +

1

2

f

(X

t

)(b(X

t

, t))

2

] dt +f

(X

t

)b(X

t

, t) dW

t

Example. To calculate

_

t

a

W

t

dW

t

, show that is equal to

_

t

a

dX

t

= X

t

−X

a

for some X

t

. That is, ﬁnd f(x) such that W

t

dW

t

= d(f(W

t

)). Since

_

x dx =

1

2

x

2

, consider f(x) =

1

2

x

2

. Then f

(x) = x and f

(x) = 1, so Itˆo’s lemma,

applied to dW

t

= 0 dt + 1 dW

t

, gives

d(f(W

t

)) = f

(W

t

) dW

t

+

1

2

f

(W

t

) dt = W

t

dW

t

+

1

2

1 dt = W

t

dW

t

+

1

2

dt .

Equivalently, W

t

dW

t

= df(W

t

)−

1

2

dt. Therefore,

_

t

a

W

s

dW

s

=

_

t

a

d(f(W

s

))−

_

t

a

1

2

dt = f(W

t

) −f(W

a

) −

1

2

(t −a) =

1

2

W

t

2

−

1

2

W

a

2

−

1

2

(t −a).

Example. Calculate

_

t

a

W

s

2

dW

s

. Since

_

x

2

dx =

1

3

x

3

, consider f(x) =

1

3

x

3

.

Then f

(x) = x

2

and f

**(x) = 2x. Itˆo’s lemma implies
**

d(

1

3

x

3

) = W

s

2

dW

s

+

1

2

2W

s

ds = W

s

2

dW

s

+W

s

ds ,

or W

s

2

dW

s

= d(

1

3

W

s

3

) −W

s

ds, so

_

t

a

W

s

2

dW

s

=

1

3

W

t

2

−

1

3

W

a

2

−

_

t

a

W

s

ds .

118 CHAPTER 4. CONTINUOUS TIME

(The last integrand does not really simplify.)

Example. Solve dX

s

=

1

2

X

s

ds + X

s

dW

s

. Consider f

(X

s

) X

s

= 1. With

ordinary variables, this is f

(x)x = 1, or f

(x) =

1

x

. So we are led to consider

f(x) = ln x, with f

(x) =

1

x

, and f

(x) = −

1

x

2

.

Then, using dX

s

=

1

2

X

s

ds +X

s

dW

s

and dX

s

2

= X

s

2

ds,

d(ln X

s

) =

1

Xs

dX

s

+

1

2

(−

1

Xs

2

)(dX

s

)

2

=

1

2

ds +dW

s

−

1

2

ds = dW

s

.

Now integrate both sides, getting

_

t

a

d(ln X

s

) =

_

t

a

dW

s

= W

t

−W

a

or

ln

_

Xt

Xa

_

= W

t

−W

a

and so

X

t

= X

a

e

Wt−Wa

.

Asset Models

The basic model for a stock is the SDE

dS

t

= µS

t

dt +σS

t

dW

t

,

where µ is the short-term expected return, and σ is the standard deviation

of the the short-term returns, called volatility.

Note that if σ = 0, the model simpliﬁes to dS

t

= µS

t

dt, implying S

t

=

S

a

e

µt

, which is just the value of an interest-bearing, risk-free security.

Compared with the generic SDE

dX

t

= a(X

t

, t) dt +b(X

t

, t) dW

t

the stock model uses a(x, t) = µx and b(x, t) = σx. We have dS

t

2

= σ

2

S

t

2

dt.

The model SDE is similar to the last example, and the same change of vari-

ables will work. Using f(x) = ln x, f

(x) =

1

x

, and f

(x) = −

1

x

2

, Itˆo’s lemma

gives

d(ln S

t

) =

1

St

dS

t

+

1

2

(−

1

St

2

)(dS

t

)

2

= µdt +σ dW

t

−

1

2

σ

2

dt

= (µ −

1

2

σ

2

) dt +σ dW

t

.

Integrating both sides gives

_

t

0

d(ln S

s

) =

_

t

0

(µ −

1

2

σ

2

) ds +σ dW

s

ln S

t

−ln S

0

= (µ −

1

2

σ

2

)t +σW

t

St

S

0

= e

(µ −

1

2

σ

2

)t +σW

t

S

t

= S

0

e

(µ −

1

2

σ

2

)t +σW

t

.

4.3. THE WIENER PROCESS 119

The expected return of S

t

is

E(ln(S

t

/S

0

)) = E((µ −

1

2

σ

2

)t +σW

t

) = (µ −

1

2

σ

2

)t .

Vasicek’s interest rate model is dr

t

= a(b−r

t

) dt+σ dW

t

. It satisﬁes “mean

reversion”, meaning that if r

t

< b, then dr

t

> 0, pulling r

t

upwards, while if

r

t

> b, then dr

t

< 0, pushing r

t

downwards.

The Cox-Ingersoll-Ross model is dr

t

= a(b−r

t

) dt +σ

√

r

t

dW

t

. It has mean

reversion, and a volatility that grows with r

t

.

Ornstein-Uhlenbeck Process. An O-U process is a solution to

dX

t

= µX

t

dt +σ dW

t

with an initial condition X

0

= x

0

.

This SDE is solved by using the integrating factor e

−µt

. First,

e

−µt

dX

t

= µe

−µt

X

t

dt +σe

−µt

dW

t

holds just by multiplying the given SDE. Letting f(x, t) = xe

−µt

, then

∂f

∂t

= −µe

−µt

X

t

dt ,

∂f

∂x

= e

−µt

, and

∂

2

f

∂x

2

= 0 .

Itˆo’s lemma gives

df(X

t

, t) =

∂f

∂t

(X

t

, t) dt +

∂f

∂x

(X

t

, t) dX

t

+

∂

2

f

∂x

2

(X

t

, t) dX

t

2

d(e

−µt

X

t

) = −µe

−µt

X

t

dt +e

−µt

dX

t

or

e

−µt

dX

t

= d(e

−µt

X

t

) +µe

−µt

X

t

dt .

So the SDE, with integrating factor, rewrites as

d(e

−µt

X

t

) +µe

−µt

X

t

dt = µe

−µt

X

t

dt +σe

−µt

dW

t

d(e

−µt

X

t

) = σe

−µt

dW

t

.

Now integrate both sides from 0 to t:

_

t

0

d(e

−µs

X

s

) =

_

t

0

σe

−µs

dW

s

e

−µt

X

t

−X

0

=

_

t

0

σe

−µs

dW

s

X

t

= e

µt

X

0

+e

µt

_

t

0

σe

−µs

dW

s

.

120 CHAPTER 4. CONTINUOUS TIME

Let’s go back to the stock model

dS

t

= µS

t

dt +σS

t

dW

t

with solution

S

t

= S

0

e

(µ −

1

2

σ

2

)t +σW

t

.

We calculate the pdf of S

t

as follows. First,

P(S

t

≤ S) = P(S

0

e

(µ −

1

2

σ

2

)t +σW

t

≤ S)

= P(e

(µ −

1

2

σ

2

)t +σW

t

≤ S/S

0

)

= P((µ −

1

2

σ

2

)t +σW

t

≤ ln(S/S

0

))

=

1

√

2πσ

2

t

_

ln(S/S

0

)

−∞

e

−

(x−(µ−

1

2

σ

2

)t)

2

2σ

2

t

dx .

Second, the pdf is the derivative of the above with respect to S. The funda-

mental theorem of calculus gives us derivatives of the form

d

du

_

u

a

φ(x) dx =

φ(u). The chain rule gives us, where u = g(S), a derivative

d

dS

_

g(S)

a

φ(x) dx = (

d

du

_

u

a

φ(x) dx)

du

dS

= φ(u) g

(S) = φ(g(S)) g

(S) .

In the case at hand, g(S) = ln(S/S

0

) = ln S −ln S

0

, so g

(S) = 1/S, and the

pdf is

d

dS

P(S

t

≤ S) =

d

dS

1

√

2πσ

2

t

_

ln(S/S

0

)

−∞

e

−

(x−(µ−

1

2

σ

2

)t)

2

2σ

2

t

dx

=

1

S

√

2πσ

2

t

e

−

(ln(S/S

0

)−(µ−

1

2

σ

2

)t)

2

2σ

2

t

for S > 0 and identically 0 for S ≤ 0.

Pricing Derivatives in Continuous Time

Our situation concerns a stock with price stochastic variable S

t

, and a

derivative security, assumed to have expiration T, and further assumed that

its (gross) payoﬀ depends only on T and S

T

. Let V (S

T

, T) denote this payoﬀ.

More generally, let V (S

t

, t) denote the payoﬀ for this derivative if it expires

at time t with stock price S

t

. In particular, V (S

t

, t) depends only on t and

S

t

. That is, we have a deterministic function V (s, t), and are plugging in a

stochastic process S

t

and getting a stochastic process V (S

t

, t).

To price V , construct a self-ﬁnancing, hedging portfolio for V . That is, set

up a portfolio consisting of φ

t

units of stock S

t

and ψ

t

units of bond B

t

Π

t

= φ

t

S

t

+ψ

t

B

t

,

4.3. THE WIENER PROCESS 121

such that the terminal value of the portfolio equals the derivative payoﬀ

Π

T

= V

T

,

subject to the self-ﬁnancing condition

SFC1 Π

t

= Π

0

+

_

t

0

φ

s

dS

s

+ψ

s

dB

s

.

The pricing of the portfolio is straightforward, and the no-arbitrage as-

sumption implies Π

t

= V

t

for all t ≤ T.

Recall, also, the self-ﬁnancing condition in the discrete case:

(∆Π)

n

= φ

n

∆S

n

+ψ

n

∆B

n

.

The continuous limit is

SFC2 dΠ

t

= φ

t

dS

t

+ψ

t

dB

t

.

We now model the stock and bond. The stock is dS

t

= µS

t

dt + σS

t

dW

t

,

while the bond is dB

t

= rB

t

dt.

Theorem 4.3.9. (Black-Scholes). Given a derivative as above, there exists

a self-ﬁnancing, hedging portfolio Π

t

= φ

t

S

t

+ψ

t

B

t

with Π

t

= V (S

t

, t) if and

only if

(BSE)

∂V

∂t

+

1

2

σ

2

S

2

∂

2

V

∂S

2

+rS

∂V

∂S

−rV = 0,

and furthermore, φ

t

=

∂V

∂S

.

Proof. Let’s calculate dV (S

t

, t) = dΠ

t

two ways. First, by Ito’s lemma

dV (S

t

, t) =

∂V

∂t

(S

t

, t) dt +

∂V

∂S

(S

t

, t) dS

t

+

1

2

∂

2

V

∂S

2

(S

t

, t)(dS

t

)

2

,

or, suppressing the S

t

, t arguments,

dV

t

=

∂V

∂t

dt +

∂V

∂S

dS

t

+

1

2

∂

2

V

∂S

2

(dS

t

)

2

= (

∂V

∂t

+µS

t

∂V

∂S

+

1

2

σ

2

S

t

2 ∂

2

V

∂S

2

) dt +

∂V

∂S

σS

t

dW

t

.

On the other hand, using our stock and bond model on a portfolio, the self

ﬁnancing condition (SFC2) says that V

t

= Π

t

= φ

t

S

t

+ψ

t

B

t

,

dV

t

= dΠ

t

= φ

t

dS

t

+ψ

t

dB

t

= (φ

t

µS

t

+ψ

t

rB

t

) dt +φ

t

σS

t

dW

t

.

122 CHAPTER 4. CONTINUOUS TIME

If these are to be equal, the coeﬃcients of dt and of dW

t

must be equal.

Equating the coeﬃcients of dW

t

gives

φ

t

σS

t

=

∂V

∂S

σS

t

, hence φ

t

=

∂V

∂S

,

and equating the coeﬃcients of dt gives

φ

t

µS

t

+ψ

t

rB

t

=

∂V

∂t

+µS

t

∂V

∂S

+

1

2

σ

2

S

t

2

∂

2

V

∂S

2

,

hence

ψ

t

rB

t

=

∂V

∂t

+

1

2

σ

2

S

t

2

∂

2

V

∂S

2

.

Since ψ

t

B

t

= Π

t

−φ

t

S

t

= V

t

−

∂V

∂S

S

t

we have

r(V

t

−

∂V

∂S

S

t

) =

∂V

∂t

+

1

2

σ

2

S

t

2

∂

2

V

∂S

2

,

or in other works (BSE), completing the proof.

In summary, to construct a self-ﬁnancing, replicating portfolio for a deriva-

tive, you need the value of the derivative V

t

= V (S

t

, t) to satisfy the Black-

Scholes equation and for the stock holding to be φ

t

=

∂V

∂S

.

The theory of partial diﬀerential equations tells us that in order to get

unique solutions, we need boundary and temporal conditions. Since the BSE

is ﬁrst order in t, we need one time condition. And since the BSE is second

order in S, we need two boundary conditions.

The time condition is simply the expiration payoﬀ V (S

T

, T) = h(S

T

).

(Usually, one has an “initial” condition, corresponding to time 0. Here, one

has a “ﬁnal” or “terminal” condition.) The boundary conditions are usually

based on S = 0 and S → ∞.

For example, consider a call. h(S) = max(S − K, 0) is, by deﬁnition,

the payoﬀ of a call at expiry, so V (S

T

, T) = max(S

T

− K, 0) is the ﬁnal

condition. If the stock price is zero, a call is worthless, hence V (0, t) = 0 is

one boundary condition. If the stock price grows arbitrarily large, the call is

essentially worth the same as the stock, that is V (S, t) ∼ S.

The One-Dimensional Heat Equation

Let the temperature of a homogeneous one-dimensional object at position

x, time t be given by u(x, t). The heat equation is

∂u

∂t

= k

∂

2

u

∂x

2

.

4.3. THE WIENER PROCESS 123

where k is a constant, the thermal conductivity of the object’s material.

k =

1

2

gives the Fokker-Planck equation for a Wiener path, with solution

u(x, t) =

1

√

2πt

e

−

x

2

2t

.

If we let k = 1, the solution is

u(x, t) =

1

√

4πt

e

−

x

2

4t

.

What is lim

t→0

u(x, t), the initial temperature distribution? To answer this,

we use the following notion.

Deﬁnition. A Dirac δ-function is a one-parameter family δ

t

, indexed by

t ≥ 0, of piecewise smooth functions such that

(1) δ

t

(x) ≥ 0 for all t, x,

(2)

_

∞

−∞

δ

t

(x) dx = 1 for all t,

(3) lim

t→0

δ

t

(x) = 0 for all x ,= 0.

Example. Consider

δ

t

= ¦

0 , if x < −t or x > t ,

1

2t

, if −t ≤ x ≤ t .

Important Example. The solutions to the heat equation

δ

t

(x) = u(x, t) =

1

√

4πt

e

−

x

2

4t

are a Dirac δ-function.

Proposition. If f is any bounded smooth function, δ

t

any Dirac δ-function,

then

lim

t→0

_

∞

−∞

δ

t

(x)f(x) dx = f(0) ,

more generally,

lim

t→0

_

∞

−∞

δ

t

(x −y)f(x) dx = f(y) .

Intuitively, this can be seen as follows. Every Dirac δ-function is something

like the ﬁrst example, and on a small interval, continuous functions are ap-

proximately constant, so lim

t→0

_

∞

−∞

δ

t

(x−y)f(x) dx ≈ lim

t→0

_

t−y

−t−y

1

2t

f(x) dx ≈

lim

t→0

2t

1

2t

f(y) ≈ f(y).

124 CHAPTER 4. CONTINUOUS TIME

Proceeding, the Heat Equation

∂u

∂t

=

∂

2

u

∂x

2

,

with initial condition u(x, 0) = φ(x) and boundary conditions u(∞, t) =

u(−∞, t) = 0, will be solved using

δ

t

=

1

√

4πt

e

−

x

2

4t

,

which is both a Dirac δ-function and a solution to the Heat Equation for

t > 0.

Let

v(y, t) =

1

√

4πt

_

∞

−∞

e

−

(x−y)

2

4t

φ(x) dx for t > 0 .

Claim. lim

t→0

v(x, t) = φ(x). This is just an example of the previous proposi-

tion:

lim

t→0

1

√

4πt

_

∞

−∞

e

−

(x−y)

2

4t

φ(x) dx = lim

t→0

_

∞

−∞

δ

t

(x −y)φ(x) dx = φ(y) .

Claim. v satisﬁes the Heat Equation.

∂v

∂t

=

∂

∂t

_

∞

−∞

e

−

(x−y)

2

4t

φ(x) dx

=

_

∞

−∞

∂

∂t

(e

−

(x−y)

2

4t

) φ(x) dx

=

_

∞

−∞

∂

2

∂y

2

(e

−

(x−y)

2

4t

) φ(x) dx

=

∂

2

∂y

2

_

∞

−∞

e

−

(x−y)

2

4t

φ(x) dx =

∂

2

v

∂y

2

.

Back to the Black-Scholes equation

∂V

∂t

+

1

2

σ

2

S

2

∂

2

V

∂S

2

+rS

∂V

∂S

−rV = 0 .

Note that µ, the expected return of S, does not appear explicitly in the

equation. This is called risk neutrality. The particulars of the derivative

are encoded in the boundary conditions. For a call C(S, t) one has

∂C

∂t

+

1

2

σ

2

S

2

∂

2

C

∂S

2

+rS

∂C

∂S

−rV = 0 ,

4.3. THE WIENER PROCESS 125

C(S, T) = max(S −K, 0) ,

C(0, t) = 0 for all t, C(S, t) ∼ S as S → ∞.

And for a put P(S, t) one has

∂P

∂t

+

1

2

σ

2

S

2

∂

2

P

∂S

2

+rS

∂P

∂S

−rV = 0 ,

P(S, T) = max(K −S, 0) ,

P(0, t) = K for all t, P(S, t) → 0 as S → ∞.

Black-Scholes Formula

The solution to the the call PDE above is

C(S, t) = SN(d

+

) −Ke

−r(T−t)

N(d

−

) ,

where

d

+

=

ln(

S

K

)+(r+

1

2

σ

2

)(T−t)

σ

√

T−t

,

d

−

=

ln(

S

K

)+(r−

1

2

σ

2

)(T−t)

σ

√

T−t

,

and

N(x) =

1

√

2π

_

x

−∞

e

−

s

2

2

ds =

1

√

2π

_

∞

−x

e

−

s

2

2

ds

is the cumulative normal distribution.

If you are given data, including the market price for derivatives, and nu-

merically solve the Black-Scholes formula for σ, the resulting value is called

the implied volatility.

We derive the Black-Scholes formula from the Black-Scholes equation, by

using a change of variables to obtain the heat equation, whose solutions we

have identiﬁed, at least in a formal way.

The ﬁrst diﬀerence is the heat equation has an initial condition at t = 0,

while the Black-Scholes equation has a ﬁnal condition at t = T. This can be

compensated for by letting τ = T −t. Then in terms of τ, the ﬁnal condition

at t = T is an initial condition at τ = 0.

The second diﬀerence is that the Black-Scholes equation has S

2 ∂

2

V

∂S

2

while

the heat equation has

∂

2

u

∂x

2

. If we let S = e

x

, or equivalently, x = ln S, then

∂

∂x

=

∂S

∂x

∂

∂S

= e

x ∂

∂S

= S

∂

∂S

∂

2

∂x

2

= S

2 ∂

2

∂S

2

+

126 CHAPTER 4. CONTINUOUS TIME

The third diﬀerence is that the Black-Scholes equation has

∂V

∂S

and V terms,

but the heat equation has no

∂u

∂x

or u terms. An integrating factor can

eliminate such terms, as follows. Suppose we are given

∂u

∂t

=

∂

2

u

∂x

2

+a

∂u

∂x

+bu.

Let

u(x, t) = e

αx+βt

v(x, t) ,

with α, β to be chosen later. Then

∂u

∂t

= βe

αx+βt

v +e

αx+βt

∂v

∂t

,

∂u

∂x

= αe

αx+βt

v +e

αx+βt

∂v

∂x

,

∂

2

u

∂x

2

= α

2

e

αx+βt

v + 2αe

αx+βt

∂v

∂x

+e

αx+βt

∂

2

v

∂x

2

.

So

∂u

∂t

=

∂

2

u

∂x

2

+a

∂u

∂x

+bu

becomes

e

αx+βt

(βv+

∂v

∂t

) = e

αx+βt

(α

2

v+2α

∂v

∂x

+

∂

2

v

∂x

2

)+ae

αx+βt

(αv+

∂v

∂x

)+be

αx+βt

v ,

or

βv +

∂v

∂t

= α

2

v + 2α

∂v

∂x

+

∂

2

v

∂x

2

+a (αv +

∂v

∂x

) +b v ,

hence,

∂v

∂t

=

∂

2

v

∂x

2

+ (a + 2α)

∂v

∂x

+ (α

2

+aα +b −β)v .

With a and b given, one can set a + 2α = 0 and α

2

+ aα + b − β = 0 and

solve for α and β. One gets α = −

1

2

a and β = b − α

2

= b −

1

4

a

2

. In other

words, if v(x, t) is a solution to

∂v

∂t

=

∂

2

v

∂x

2

,

4.3. THE WIENER PROCESS 127

then u(x, t) = e

−

1

2

ax + (b −

1

4

a

2

)t

v(x, t) satisﬁes

∂u

∂t

=

∂

2

u

∂x

2

+a

∂u

∂x

+bu.

With these three diﬀerences accounted for, we will now solve the Black-

Scholes equation for a European call. We are considering

∂C

∂t

+

1

2

σ

2

S

2

∂

2

C

∂S

2

+rS

∂C

∂S

−rV = 0

with ﬁnal condition

C(S, T) = max(S −K, 0) .

First make the change of variables S = Ke

x

, or equivalently, x = ln(S/K) =

ln S −ln K, and also let τ =

1

2

σ

2

(T −t), so C(S, t) = Ku(x, τ). Then

1.

∂C

∂t

=

∂C

∂τ

dτ

dt

= (K

∂u

∂τ

) (−

1

2

σ

2

) = −

1

2

σ

2

K

∂u

∂τ

,

2.

∂C

∂S

=

∂C

∂x

dx

dS

= (K

∂u

∂x

)

1

S

=

K

S

∂u

∂x

,

3.

∂

2

C

∂S

2

=

∂

∂S

(

K

S

∂u

∂x

) = −

K

S

2

∂u

∂x

+

K

S

∂

∂S

(

∂u

∂x

)

= −

K

S

2

∂u

∂x

+

K

S

∂

∂x

(

∂u

∂x

)

∂x

∂S

= −

K

S

2

∂u

∂x

+

K

S

2

∂

2

u

∂x

2

.

Substituting this into the Black-Scholes equation, we have

−

1

2

σ

2

K

∂u

∂τ

+

1

2

σ

2

S

2

(−

K

S

2

∂u

∂x

+

K

S

2

∂

2

u

∂x

2

) +rS(

K

S

∂u

∂x

) −rKu = 0

or

1

2

σ

2

∂u

∂τ

=

1

2

σ

2

∂

2

u

∂x

2

−

1

2

σ

2

∂u

∂x

+r

∂u

∂x

−ru

or

∂u

∂τ

=

∂

2

u

∂x

2

+ (

2r

σ

2

−1)

∂u

∂x

−

2r

σ

2

u.

128 CHAPTER 4. CONTINUOUS TIME

Set k =

2r

σ

2

. The equation becomes

∂u

∂τ

=

∂

2

u

∂x

2

+ (k −1)

∂u

∂x

−ku.

And the ﬁnal condition C(S, T) = max(S −K, 0) becomes the initial condi-

tion Ku(x, 0) = max(Ke

x

−K, 0), which becomes u(x, 0) = max(e

x

−1, 0).

As we saw, this reduces to the heat equation if we let u(x, τ) = e

αx+βτ

v(x, τ),

with α = −

1

2

(k −1) and β = −k −

1

4

(k −1)

2

= −

1

4

(k + 1)

2

. And the initial

condition u(x, 0) = max(e

x

−1, 0) becomes

e

−

1

2

(k−1)x

v(x, 0) = max(e

x

−1, 0)

or equivalently,

v(x, 0) = max(e

x

e

1

2

(k−1)x

−e

1

2

(k−1)x

, 0) = max(e

1

2

(k+1)x

−e

1

2

(k−1)x

, 0) .

Recall that the heat equation

∂v

∂τ

=

∂

2

v

∂x

2

v(x, 0) = φ(x)

is solved by

v(x, τ) =

1

√

4πτ

_

∞

−∞

e

−

(y−x)

2

4τ

φ(y) dy ,

which in this case has φ(y) = max(e

1

2

(k+1)y

−e

1

2

(k−1)y

, 0).

Now, e

1

2

(k+1)y

> e

1

2

(k−1)y

if and only if e

1

2

(k+1)y

/e

1

2

(k−1)y

> 1 if and only if

e

y

> 1 if and only if y > 0. So in our situation, we can write

φ(y) =

_

e

1

2

(k+1)y

−e

1

2

(k−1)y

if y > 0

0 if y ≤ 0

Now make the change of variables s = (y − x)/

√

2τ. Then ds = dy/

√

2τ,

so dy =

√

2τ ds. Then since φ(

√

2τs + x) = 0 when

√

2τs + x < 0, that is,

when s <

−x

√

2τ

,

v(x, τ) =

√

2τ

√

4πτ

_

∞

−∞

e

−

(x−(

√

2τs+x))

2

4τ

φ(

√

2τs +x) ds

=

1

√

2π

_

∞

−∞

e

−

2τs

2

4τ

φ(

√

2τs +x) ds

=

1

√

2π

_

∞

−∞

e

−

s

2

2

φ(

√

2τs +x) ds

=

1

√

2π

_

∞

−

x

√

2τ

e

−

s

2

2

(e

1

2

(k+1)(

√

2τs+x)

−e

1

2

(k−1)(

√

2τs+x)

) ds

= I

+

−I

−

,

4.3. THE WIENER PROCESS 129

where

I

±

=

1

√

2π

_

∞

−

x

√

2τ

e

−

1

2

s

2

+

1

2

(k±1)(

√

2τs+x)

ds .

We calculate I

+

by completing the square inside the exponent.

I

+

=

1

√

2π

_

∞

−

x

√

2τ

e

−

1

2

s

2

+

1

2

(k+1)(

√

2τs+x)

ds

=

1

√

2π

e

1

2

(k+1)x

_

∞

−

x

√

2τ

e

−

1

2

s

2

+

1

2

(k+1)

√

2τs

ds

=

e

1

2

(k+1)x

√

2π

_

∞

−

x

√

2τ

e

−

1

2

(s−

1

2

(k+1)

√

2τ)

2

+

1

4

(k+1)

2

τ

ds

=

e

1

2

(k+1)x+

1

4

(k+1)

2

τ

√

2π

_

∞

−

x

√

2τ

e

−

1

2

(s−

1

2

(k+1)

√

2τ)

2

ds

=

e

1

2

(k+1)x+

1

4

(k+1)

2

τ

√

2π

_

∞

−

x

√

2τ

−

1

2

(k+1)

√

2τ

e

−

1

2

t

2

dt

=

e

1

2

(k+1)x+

1

4

(k+1)

2

τ

√

2π

_

x

√

2τ

+

1

2

(k+1)

√

2τ

−∞

e

−

1

2

t

2

dt

= e

1

2

(k+1)x+

1

4

(k+1)

2

τ

N(d

+

) = e

x−αx−βτ

N(d

+

) .

using α = −

1

2

(k−1) and β = −

1

4

(k+1)

2

and where d

+

=

x

√

2τ

+

1

2

(k + 1)

√

2τ.

The second to last line was arrived at since

_

∞

a

e

−t

2

/2

dt =

_

−a

−∞

e

−t

2

/2

dt

Now S = Ke

x

and 2τ = σ

2

(T −t), so

d

+

=

x

√

2τ

+

1

2

(k + 1)

√

2τ

=

ln(S/K)

σ

√

T−t

+

1

2

(

2r

σ

2

+ 1)σ

√

T −t

=

ln(

S

K

)+(r+

1

2

σ

2

)(T−t)

σ

√

T−t

.

The integral I

−

is treated in the same manner and gives

I

−

= e

1

2

(k−1)+

1

4

(k−1)

2

τ

N(d

−

)

= e

−αx−βτ−kτ

N(d

−

)

where d

−

=

x

√

2τ

+

1

2

(k − 1)

√

2τ =

ln(

S

K

)+(r−

1

2

σ

2

)(T−t)

σ

√

T−t

. So with v(x, τ) =

I

+

−I

−

established, we have substituting everything back in: x = ln(S/K),

130 CHAPTER 4. CONTINUOUS TIME

τ =

σ

2

2

(T −t) and k =

2r

σ

2

.

C(S, t) = Ku(x, τ) = Ke

αx+βτ

v(x, τ)

= Ke

αx+βτ

(I

+

−I

−

)

= Ke

αx+βτ

e

x−αx−βτ

N(d

+

) −Ke

αx+βτ

e

−αx−βτ−kτ

N(d

−

)

= Ke

x

N(d

+

) −Ke

−kτ

N(d

−

)

= SN(d

+

) −Ke

−r(T−t)

N(d

−

)

4.4 Exercises

1. Draw some graphs of the distribution of a Poisson process X

t

for various

values of t (and λ). What happens as t increases?

2. Since for a Poisson process P

jk

(t) = P

0,k−j

(t) for all positive t, show

that if X

0

= j, then the random variable X

t

− j has a Poisson distri-

bution with parameter λt. Then, use the Markov assumption to prove

that X

t

−X

s

has Poisson distribution with parameter λ(t −s).

3. Let T

m

be the continuous random variable that gives the time to the

mth jump of a Poisson process with parameter λ. Find the pdf of T

m

.

4. For a Poisson process with parameter λ, ﬁnd P(X

s

= m[X

t

= n), both

for the case

0 ≤ n ≤ m and 0 ≤ t ≤ s

and the case

0 ≤ m ≤ n and 0 ≤ s ≤ t

5. Verify that the expected value and variance of W

∆

t

are as claimed in

the text.

6. Memorize the properties of the Wiener process and close your notes

and books and write them on them down.

7. Verify that the function p(x, t) obtained at the end of the section sat-

isﬁes the Fokker-Planck equation.

8. Calculate E(e

λWt

).

4.4. EXERCISES 131

9. Suppose a particle is moving along the real line following a Wiener

process. At time 10 you observe the position of the particle is 3. What

is the probability that at time 12 the particle will be found in the

interval [5, 7]. (Write down the expression and then use a computer, or

a table if you must, to estimate this value.)

10. On a computer (or a calculator), ﬁnd the smallest value of n = 1, 2, 3,

such that P([W

1

[ ≤ n) is calculated as 0.

11. What is E(W

2

s

W

t

). You must break up the cases s < t and s > t.

The following three problems provide justiﬁcation for the Ito calculus

rule (dW

s

)

2

= ds. This problems is very similar to the calculation of

the Ito integral

_

W

t

dW

t

. Recall that for a process φ

t

adapted to X

t

how the Ito integral

_

t

t

0

φ

s

dX

s

is deﬁned. First, for a partition T = ¦t

0

< t

1

< t

2

< t

n

= t¦ we

deﬁne the Ito sum

I(φ

s

, T, dX

s

) =

n

j=1

φ

t

j−1

∆X

t

j

where ∆X

t

j

= X

t

j

−X

t

j−1

. Then one forms the limit as the mesh size

of the partitions go to zero,

_

t

t

0

φ

s

dX

s

= lim

mesh P→0

I(φ

s

, T, dX

s

)

12. Let W

t

be the Wiener process. Show

(a) Show E((∆W

t

i

)

2

) = ∆t

i

.

(b) Show E(((∆W

t

i

)

2

−∆t

i

)

2

) = 2(∆t

i

)

2

.

13. For a partition T as above, calculate the mean square diﬀerence which

is by deﬁnition

E((I(φ

s

, T, (dW

s

)

2

) −I(φ

s

, T, dt))

2

)

132 CHAPTER 4. CONTINUOUS TIME

= E((

n

j=1

φ

t

j−1

(∆W

t

j

)

2

−

n

j=1

φ

t

j−1

∆t

j

)

2

)

and show that it equals

2

n

j=1

E(φ

2

t

j−1

)(∆t

j

)

2

14. Show that for an adapted process φ

s

such that E(φ

2

s

) ≤ M for some

M > 0 and for all s, that the limit as the mesh T → 0 of this expression

is 0 and thus that for any adapted φ

s

_

t

t

0

φ

s

(dW

s

)

2

=

_

t

t

0

φ

s

ds

15. Calculate

_

t

0

W

n

s

dW

s

16. (a) Solve the stock model SDE

dS

t

= µS

t

dt +σS

t

dW

t

(b) Find the pdf of S

t

.

(c) If µ = .1 and σ = .25 and S

0

= 100 what is

P(S

1

≥ 110)

.

17. What is d(sin(W

t

))

18. Solve the Vasicek model for interest rates

dr

t

= a(b −r

t

)dt +σdW

t

subject to the initial condition r

0

= c. (Hint: You will need to use a

multiplier similar to the one for the Ornstein-Uhlenbeck process.)

4.4. EXERCISES 133

19. This problem is long and has many parts. If you can not do one part,

go onto the next.

In this problem, you will show how to synthesize any derivative (whose

pay-oﬀ only depends on the ﬁnal value of the stock) in terms of calls

with diﬀerent strikes. This allows us to express the value of any deriva-

tive in terms of the Black-Scholes formula.

(a) Consider the following functions: Fix a number h > 0. Deﬁne

δ

h

(x) =

_

¸

_

¸

_

0 if [x[ ≥ h

(x +h)/h

2

if x ∈ [−h, 0]

(h −x)/h

2

if x ∈ [0, h]

_

¸

_

¸

_

(4.4.1)

Show that collection of functions δ

h

(x) for h > 0 forms a δ-

function.

Recall that this implies that for any continuous function f that

f(y) = lim

h→0

_

∞

−∞

f(x)δ

h

(x −y)dx

(b) Consider a non-divdend paying stock S. Suppose European calls

on S with expiration T are available with arbitrary strike prices

K. Let C(S, t, K) denote the value at time t of the call with strike

K. Thus, for example, C(S, T, K) = max(S −K, 0).

Show

δ

h

(S −K) =

1

h

2

(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h))

(c) Let h(S) denote the payoﬀ function of some derivative. Then we

can synthesize the derivative by

h(S) = lim

h→0

_

∞

0

h(K)δ

h

(S −K)dK

If V (S, t) denotes the value at time t of the derivative with payoﬀ

function h if stock price is S (at time t) then why is V (S, t) =

lim

h→0

_

∞

0

h(K)

1

h

2

(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h)) dK

134 CHAPTER 4. CONTINUOUS TIME

(d) Using Taylor’s theorem, show that

lim

h→0

1

h

2

(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h))

=

∂

2

C(S, t, K)

∂K

2

(e) Show that V (S, t) is equal to

h(K)

∂C(S, t, K)

∂K

[

∞

K=0

−h

(K)C(S, t, K)[

∞

K=0

+

_

∞

0

h

(K)C(S, t, K)dK

(Hint: Integrate by parts.)

2

Information for the class

Oﬃce: DRL3E2-A Telephone: 215-898-8468 Oﬃce Hours: Tuesday 1:30-2:30, Thursday, 1:30-2:30. Email: blockj@math.upenn.edu References: 1. Financial Calculus, an introduction to derivative pricing, by Martin Baxter and Andrew Rennie. 2. The Mathematics of Financial Derivatives-A Student Introduction, by Wilmott, Howison and Dewynne. 3. A Random Walk Down Wall Street, Malkiel. 4. Options, Futures and Other Derivatives, Hull. 5. Black-Scholes and Beyond, Option Pricing Models, Chriss 6. Dynamic Asset Pricing Theory, Duﬃe I prefer to use my own lecture notes, which cover exactly the topics that I want. I like very much each of the books above. I list below a little about each book. 1. Does a great job of explaining things, especially in discrete time. 2. Hull—More a book in straight ﬁnance, which is what it is intended to be. Not much math. Explains ﬁnancial aspects very well. Go here for details about ﬁnancial matters. 3. Duﬃe— This is a full ﬂedged introduction into continuous time ﬁnance for those with a background in measure theoretic probability theory. Too advanced. But you might want to see how our course compares to a PhD level course in this material.

3 4. Wilmott, Howison and Dewynne—Immediately translates the issues into PDE. It is really a book in PDE. Doesn’t really touch much on the probabilistic underpinnings of the subject. Class grade will be based on homework and a take-home ﬁnal. I will try to give homework every week and you will have a week to hand it in.

And what they didn’t know. (c) The basic problem: How much should I pay for an option? Fair price.4 Syllabus 1. (a) What is a derivative security? (b) Types of derivatives. I am assuming familiarity with this material (from Stat 430). (a) Probability spaces and random variables. Derivatives. Arbitrage and reassigning probabilities. Einstein and Bachelier. . (b) Basic probability distributions. (e) Arbitrage and no arbitrage. (d) Expectation pricing. The following material will not be covered in class. Probability theory. Discrete time stochastic processes and pricing models. 2. what they knew about derivative pricing. (d) Bivariate distributions. (c) Expectation and variance. moments. (e) Conditional probability. Arbitrage arguments. 3. (g) Arbitrage pricing and hedging. or for those of you who have not taken a probability course but think that you can become comfortable with this material. I will hand out notes regarding this material for those of you who are rusty. The simple case of futures. (f) The arbitrage theorem. (a) Binomial methods without much math.

(e) Random walks. (i) Pricing a derivative and hedging portfolios. Finer structure of ﬁnancial time series. discrete in time. (g) Martingales. . (j) Martingale approach to dynamic asset allocation. (f) Solving the Black Scholes equation.. (b) Stochastic integration. (d) Black-Scholes model. Their connection to PDE. (c) Stochastic processes. 5. (f) Change of probabilities. (g) Optimal portfolio selection. (d) Conditional expectations. Continuous time processes. 4.5 (b) A ﬁrst look at martingales. (h) Martingale representation theorem. (a) Wiener processes. (c) Stochastic diﬀerential equations and Ito’s lemma. (e) Derivation of the Black-Scholes Partial Diﬀerential Equation. Comparison with martingale method.

6 .

or the possible outcomes of rolling two dice) or it can be a continuous set (such as an interval of the real number line for measuring temperature. that is we assume Ec ∈ Ω 7 . . etc). . if E ∈ Ω then its complement is too. A random variable is a function of the basic outcomes in a probability space. A sigma-algebra (or σ-algebra) of subsets of S. To deﬁne a probability space one needs three ingredients: 1. This means a set Ω of subsets of S (so Ω itself is a subset of the power set P(S) of all subsets of S) that contains the empty set. . for j = 1. then ∞ Ej ∈ Ω j=1 and ∞ Ej ∈ Ω j=1 In addition. This is the set of all “basic” things that can happen. A sample space. 2. is a sequence of subsets in Ω. 3. and is closed under countable intersections and countable unions. that is a set S of “outcomes” for some experiment.Chapter 1 Basic Probability The basic concept in probability theory is that of a random variable. contains S itself. 2. This set can be a discrete set (such as the set of 5-card poker hands. That is when Ej ∈ Ω.

Ai ∩ Aj = ∅ for all i diﬀerent from j). but do not think of X −1 as a function.. The function P must have the properties that: (a) P (S) = 1. In order for X −1 to be a function we would need X to be . when S is a continuous subset of the real line. These axioms imply that if Ac is the complement of A. then P (∪Ai )) = P (Ai ). . . When the basic set S is ﬁnite (or countably inﬁnite).8 CHAPTER 1. this is not possible. and one usually restricts attention to the set of subsets that can be obtained by starting with all open intervals and taking intersections and unions (countable) of them – the so-called Borel sets [or more generally. . Ω. BASIC PROBABILITY The above deﬁnition is not central to our approach of probability but it is an essential part of the measure theoretic foundations of probability theory. 2. 3. P ) is a function that to every outcome s ∈ S gives a real number X(s) ∈ R. A function P from Ω to the real numbers that assigns probabilities to events. 3. Lebesgue measurable subsets of S). P (∅) = 0. then Ω is often taken to be all subsets of S. is to bring things back to something we are more familiar with. (b) 0 ≤ P (A) ≤ 1 for every A ∈ Ω (c) If Ai .e. As a technical point (that you should probably ignore). For a subset A ⊂ R let’s deﬁne X −1 (A) to be the following subset of S: X −1 (A) = {s ∈ S|X(s) ∈ A} It may take a while to get used to what X −1 (A) means. is a countable (ﬁnite or inﬁnite) collection of disjoint sets (i. then P (Ac ) = 1 − P (A). One of the purposes of random variables. So we include the deﬁnition for completeness and in case you come across the term in the future. In fact. we will most often call them events. i = 1. General probability spaces are a bit abstract and can be hard to deal with. As mentioned above. and the principle of inclusion and exclusion: P (A ∪ B) = P (A) + P (B) − P (A ∩ B). even if A and B are not disjoint. a random variable is a function X on a probability space (S. Ω is the collection of sets to which we will assign probabilities and you should think of them as the possible events.

for discrete random variables.1 Discrete Random Variables These take on only isolated (discrete) values. P ({X = k}) for all k. Uniform distribution – This is often encountered. and we can assign a probability to any subset of the sample space.0. other games of chance: S is the set of whole numbers from a to b (this is a set with b − a + 1 elements!). it is the probability density that we deal with and mention of (S. They are lurking somewhere in the background. Ω is the set of all subsets of S. Proposition 1. and should really be part of the deﬁnition of random variable.) then we can deﬁne PX (A) = P (X −1 (A)). 1.9 one-one. i. called measurability. It will be useful to write p(k) = P ({X = k}) — set p(k) = 0 for all numbers not in the sample space. coin ﬂips. 1. The point of this deﬁnition is that if A ⊂ R then X −1 (A) ⊂ S. p(k) is called the probability density function (or pdf for short) of X. So we have in some sense transferred the probability function to the real line. but not always.. We repeat. the value p(k) represents the probability that the event {X = k} occurs. such as when you are counting something. PX is a probability function on R. rolls of a single die.0. which we are not assuming.0. b) ⊂ R an inverval.e. In practice. If for A = (a. X −1 (A) ∈ Ω (this is a techinical assumption on X. 1] that has the property that ∞ p(k) = 1 k=−∞ deﬁnes a discrete probability distribution. e. PX is called the probability density of the random variable X.. X −1 (A) is just a subset of S as deﬁned above.g. as soon as we know the probability of any set containing one element. Usually. So any function from the integers to the (real) interval [0. the values of a discrete random variable X are (subset of the) integers. P ) is omitted. and P is deﬁned by giving its values on all sets consisting of one element each (since then the rule for disjoint unions takes over . that is X −1 (A) is the event that X of the outcome will be in A.2 Finite Discrete Random Variables 1. Ω.1.

4. namely 1/6. 3. or specifying a function f (k) = 1/6 for all k = 1.3 Inﬁnite Discrete Random Variables 3.. 1. {3}. Binomial distribution – ﬂip n fair coins. {4}. {5} and {6}.n since these are the possible outcomes (number of heads. or Geiger-counter clicks. 2. The sample space S is the set of all nonnegative integers S = 0. 1.. . . We can express this by making a table. 2... Each of these outcomes has the same probability. 5}) = 1/2. 1. “Uniform” means that the same value is assigned to each one-element set.0..e. Since P (S) = 1. 2. what is the probability that k of them come up heads? Or do a sample in a population that favors the Democrat over the Republican 60 percent to 40 percent. the possible outcomes of rolling one die are {1}. 2. how many come up heads? i. 5. more than 45 will favor the Republican? The sample space S is 0.10 CHAPTER 1. {2}. Poisson distribution (with parameter α) – this arises as the number of (random) events of some kind (such as people lining up at a bank. etc. number of people favoring the Republican [n=100 in this case]). As before. 3. we ﬁnd for example that P ({1. The function P is more interesting this time: P ({k}) = where n k n k 2n is the binomial coeﬃcient n n! = k!(n − k)! k which equals the number of subsets of an n-element set that have exactly k elements. 6 and f (k) = 0 otherwise. or telephone calls arriving) per unit time. the sigma algebra Ω is the set of all subsets of S. For example. Using the disjoint union rule. .. 3}) = 1/3. the value that must be assigned to each one element set is 1/(b − a + 1).. P ({2. BASIC PROBABILITY to calculate the probability on other sets). What is the probability that in a sample of size 100.. 2..

the pdf is p(x) = lim so that P [a. the probabilities were determined once we knew the probabilities of individual outcomes. since we will have ∞ P (S) = P (∪∞ {k}) k=1 = k=1 e −α α k ∞ k! =e −α k=1 αk = e−α eα = 1 k! 1.1) This is really the deﬁning equation of the continuous pdf and I can not stress too strongly how much you need to use this. Any function p(x) that takes on only positive values (they don’t have to be between 0 and 1 though). These subsets. we can only assign a probability to certain subsets of the sample space (but there are a lot of them). the probability of a single outcome is always 0. That is P was determined by p(k).11 and again Ω is the set of all subsets of S. We think of dx as a very small number and then taking its limit at 0. such as when you are measuring something. In the continuous case. and we can calculate continuous probabilities as integrals of “probability density functions”. in the continuous case. so-called pdf’s. P ({x}) = 0. Usually the values are (subset of the) reals. either the collection of Borel sets ((sets that can be obtained by taking countable unions and intersections of intervals)) or Lebesgue-measurable sets ((Borels plus a some other exotic sets)) comprise the set Ω. Thus. P ([a.0.e. and for technical reasons. The probability function on Ω is derived from: αk P ({k}) = e−α k! Note that this is an honest probability function. i.4 Continuous Random Variables A continuous random variable can take on only any real values. b]) for all a and b. Recall that in the discrete case. and whose integral over the whole sample space (we can use the whole real line if we assign the value p(x) = 0 for points x outside the sample space) is . x + dx]) dx→0 dx b p(x)dx (1.. As soon as we know the probability of any interval. b] = a 1 P ([x. x + dx].0. However. it is enough to know the probabilities of “very small” intervals of the form [x. we can calculate the probabililty of any Borel set.

That means that p(x) should be a constant for x between a and b. we need the constant to be 1/(b − a). we have the probability of the set (interval) [a. we have (for small dx) that p(x)dx represents (approximately) the probability of the set (interval) [x. b] and all variables are equally likely.12 CHAPTER 1. all small intervals [x. the variable takes on values in some interval [a. In other words. x + dx] are equally likely as long as dx is ﬁxed and only x varies. In this case. b]) = a p(x) dx So any nonnegative function on the real numbers that has the property that ∞ p(x) dx = 1 −∞ deﬁnes a continuous probability distribution. b]. BASIC PROBABILITY equal to 1 can be a pdf. to have the integral of p(x) come out to be 1. The sample space for the normal distribution is always the entire real line. then p([r. and zero outside the interval [a. What constant? Well. b] is: b p([a. Uniform distribution As with the discrete uniform distribution. It is easy to calculate that if a < r < s < b. s]) = s−r b−a The Normal Distribution (or Gaussian distribution) This is the most important probability distribution. x + dx] (with error that goes to zero faster than dx does). we need to calculate an integral: ∞ I= −∞ e−x dx 2 We will use a trick that goes back (at least) to Liouville: Now note that . But to begin. More generally. because the distribution of the average of the results of repeated experiments always approaches a normal distribution (this is the “central limit theorem”).

the sample space is the positive part of the real line [0. We need to arrange things so that the integral is 1. and dxdy becomes rdrdθ: 2π ∞ 0 I2 = 0 e−r rdrdθ = 0 2 2π 1 −r2 ∞ e |0 dθ = 2 2π 0 1 dθ = π 2 √ Therefore. r goes from 0 to inﬁnity as θ goes from 0 to 2π.4) = = e−x dx · −∞ ∞ ∞ 2 e−x −∞ −∞ 2 −y 2 dxdy because we can certainly change the name of the variable in the second integral. we deﬁne the normal distribution with parameters µ and σ to be −(x−µ)2 1 x−µ 1 p(x) = N ( )= √ e 2σ2 σ σ 2πσ Exponential distribution (with parameter α) This arises when measuring waiting times until an event. ∞) (or we can just let p(x) = 0 for x < 0).2) (1.13 ∞ I 2 = −∞ ∞ e −x2 ∞ dx · −∞ ∞ −∞ e−x dx e−y dy 2 2 (1.0. so p(x) deﬁnes a bona ﬁde probability density function. or time-to-failure in reliability studies. For this distribution. and then we can convert the product of single integrals into a double integral. we arrange this as follows: deﬁne 1 2 N (x) = √ e−x /2 2π Then N (x) deﬁnes a probability distribution. . called the standard normal distribution.3) (1. I = π. The probability density function is given by p(x) = αe−αx . Now (the critical step).0. we’ll evaluate the integral in polar coordinates (!!) – note that over the whole plane.0. More generally. and for reasons that will become apparent later. It is easy to check that the integral of p(x) from 0 to inﬁnity is equal to 1.

Therefore. x + dx] and calculating the probability p(x)dx that X fall s into the interval. : 1. p(2) = 0.1.6. 2. the situation is similar. As a trivial example. we would expect to get about 30 occurrences of X = 1. the expectation should be E(X) = lim dx→0 xp(x)dx = x∈S S xp(x)dx.1.3) + (2)(0.1 Expectation of a Random Variable The expectation of a random variable is essentially the average value it is expected to take on. (1)(0.3.1 and p(3) = 0. also in use is the notation X . consider the (discrete) random variable X whose sample space is the set {1.3.6). Uniform discrete: We’ll need to use the formula for k that you .1) + (3)(0. it is calculated as the weighted average of the possible outcomes of the random variable.14 CHAPTER 1. If we repeated this experiment 100 times. This is the formula for the expectation of a continuous random variable. By reasoning similar to the previous paragraph. This reasoning leads to the deﬁning formula: E(X) = x∈S xp(x) for any discrete random variable. In other words. except the sum is replaced by an integral (think of summing up the average values of x by dividing the sample space into small intervals [x. For continuous random variables. BASIC PROBABILITY 1. The average X would then be ((30)(1) + (10)(2) + (60)(3))/100 = 2. Example 1. where the weights are just the probabilities of the outcomes. The notation E(X) for the expectation of X is standard. 3} with probability function given by p(1) = 0. 10 of X = 2 and 60 of X = 3.

7) (1.3) b−a+1 2 2 1 1 = (b2 + b − a2 + a) (1.1. EXPECTATION OF A RANDOM VARIABLE learned in freshman calculus when you evaluated Riemann sums: b 15 E(X) = k=a k· 1 b−a+1 b a−1 (1.1. recall the Taylor series formula for the exponential function: ∞ e = k=0 x xk k! Note that we can take the derivative of both sides to get the formula: ∞ e = k=0 x kxk−1 k! If we multiply both sides of this formula by x we get ∞ xe = k=0 x kxk k! .1) 1 = b−a+1 = k− k=0 k=0 k (1.5) 2b−a+1 b+a = (1. 2. Poisson distribution with parameter α: Before we do this.1.1.6) 2 which is what we should have expected from the uniform distribution.1. Uniform continuous: (We expect to get (b + a)/2 again.8) 3.1.1. This is easier: E(X) = 1 dx b−a a 1 b 2 − a2 b+a = = b−a 2 2 x b (1.1. right?).2) b(b + 1) (a − 1)a 1 − (1.1.4) 2b−a+1 1 1 = (b − a + 1)(b + a) (1.1.

We can compute the expectation of a random variable X having a geometric distribution with parameter r as follows (the trick is reminiscent of what we used on the Poisson distribution): ∞ ∞ E(X) = k=0 k(1 − r)r = (1 − r)r k=0 k krk−1 . it is pretty easy to conclude that the expectation of the binomial distribution (with parameter n) is n/2. 6.. then its expectation is: e−α αk E(X) = = e−α k k! k=0 ∞ ∞ k=0 kαk k! = e−α (αeα ) = α.16 CHAPTER 1. Exponential distribution with parameter α. has probability function given by p(k) = (1 − r)rk The geometric series formula: ∞ ark = k=0 a 1−r shows that p(k) is a probability function. 0 α α α Note the diﬀerence between the expectations of the Poisson and exponential distributions!! 5. 2. and we will have to integrate by parts (we’ll use u = x so du = dx. This is a little like the Poisson calculation (with improper integrals instead of series). 3. 1. By the symmetry of the respective distributions around their “centers”. and 1 dv = e−αx dx so that v will be − α e−αx ): ∞ E(X) = 0 1 1 1 xαe−αx dx = α − xe−αx − 2 e−αx |∞ = . Geometric random variable (with parameter r): It is deﬁned on the sample space {0. If X is a discrete random variable with a Poisson distribution. BASIC PROBABILITY We will use this formula with x replaced by α. given a ﬁxed value of r between 0 and 1. . 4. and the expectation of the normal distribution (with parameters µ and σ) is µ..} of non-negative integers and.

y]) is the integral of p(y) from 0 to y. consider an exponentially distributed random variable X with parameter α=1. Since X can only be positive.. As an example. we have that 6 E(X) = 2 π ∞ k=1 k 6 = 2 2 k π ∞ k=1 1 =∞ k So the expectation of this random variable is inﬁnite (Can you interpret this in terms of an experiment whose outcome is a random variable with this distribution?) If X is a random variable (discrete or continuous). b] is the same as the probability that in √ X is in the interval [ a. We can calculate the probability density function p(y) of Y by recalling that the probability that Y is in the interval [0. In other words. Let Y = X 2 . y]. say X 2 or sin(X) or whatever. EXPECTATION OF A RANDOM VARIABLE = (1 − r)r d dr ∞ 17 rk = (1 − r)r k=0 r d 1 = dr 1 − r 1−r 7. it is possible to deﬁne related random variables by taking various functions of X.1. This is a probability function based on the famous formula due to Euler : ∞ k=1 1 π2 = k2 6 But an interesting thing about this distribution concerns its expectation: Because the harmonic series diverges to inﬁnity. b]. The other is to make the change of variables y = x2 in . 3. 2. then the probability of Y being in some set A is deﬁned to be the probability of X being in the set f −1 (A). .1. Another distribution deﬁned on the positive integers {1. the probability that Y is√ the interval [a. p(y) is the integral of the function h(y). where h(y) = the probability that Y is in the interval [0.} by the function p(k) = π26k2 . If Y = f (X) for some function f . The ﬁrst is obvious: we can integrate yp(y). y].. But h(y) √ is the same as the probability that X is in the interval [0. y] (actually (−∞. We calculate: √ d p(y) = dy y e−x dx = 0 √ √ √ d d 1 y = (1 − e− y ) = √ e− y −e−x |0 dy dy 2 y There are two ways to calculate the expectation of Y .

(Its square root is the standard deviation. 1. The rth moment of X is deﬁned to be the expected value of X r . These are deﬁned to be the rth moments of the variable X − E(X). Uniform discrete: If X has a discrete uniform distribution on the interval [a. we compute the variance of the uniform and exponential distributions: 1. the second moment of X − E(X) is called the variance of X and is denoted σ 2 (X). As an example. which will yield (Check this!) that the expectation of Y = X 2 is ∞ x2 e−x dx E(Y ) = E(X 2 ) = 0 Theorem 1.1. b]. A more useful set of moments is called the set of central moments. (Law of the unconscious statistician) If f (x) is any function.2. BASIC PROBABILITY this integral. In particular the ﬁrst moment of X is its expectation. In particular. then the expectation of the function f (X) of the random variable X is E(f (X)) = f (x)p(x)dx or x f (x)p(x) where p(x) is the probability density function of X if X is continuous.18 CHAPTER 1. since xr > xs for large values of x. If s > r. or the probability density function of X if X is discrete. 12 .) It is a crude measure of the extent to which the distribution of X is spread out from its expectation value.2 Variance and higher moments We are now lead more or less naturally to a discussion of the “moments” of a random variable. We calculate Var(X) as follows: b+a Var(X) = E(X )− 2 2 2 b = k=a k2 b+a − b−a+1 2 2 = 1 (b−a+2)(b−a). then recall that E(X) = (b + a)/2. then having a ﬁnite sth moment is a more restrictive condition than having and rth one (this is a convergence issue as x approaches inﬁnity. It is a useful exercise to work out that σ 2 (X) = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 .

y) = 1/36 for each (x.3 Bivariate distributions It is often the case that a random experiment yields several measurements (we can think of the collection of measurements as a random vector) – one simple example would be the numbers on the top of two thrown dice. Exponential with parameter α: If X is exponentially distributed with parameter α. 1. y. 6}. y) in the cartesian product S×S. . 5. y) is non-negative and satisﬁes ∞ ∞ p(x. z. Any function p(x. y) is the joint pdf of X and Y if p(x. For example. Uniform continuous: If X has a continuous uniform distribution on the interval [a. recall that E(X) = 1/α. we can consider discrete probability functions p(x. b]. Thus: ∞ Var(x) = 0 x2 αe−αx dx − 1 1 = 2 2 α α The variance decreases as α increases.. BIVARIATE DISTRIBUTIONS 19 2. When there are several numbers like this. More generally. y) such that p(x. the idea is the same. 3. and to form the joint probability density (or joint probability) function p(x. this agrees with intuition gained from the graphs of exponential distributions shown last time. 12 3. y) on sets of the form S × T ..). y) is between 0 and 1 for all (x.3. where X ranges over S and Y over T . For continuous functions. then its variance is calculated as follows: b σ (X) = a 2 x2 dx − b−a b+a 2 2 = 1 (b − a)2 . 4. y) = 1 x∈S y∈T deﬁnes a (joint) probability function on S × T . and p(x. y)dxdy = 1 −∞ −∞ .1. p(x. it is common to consider each as a random variable in its own right. 2. X and Y are discrete random variables each with sample space S = {1. y) and such that p(x. in the case of two dice.

y) in the triangle with vertices (0. 2). p(x.20 CHAPTER 1. we consider the roll of a pair of dice. y) dxdy = 1 −∞ −∞ As with discrete variables. We assume that we can tell the dice apart. We interpret the function as follows: p(x. y) on (some subset of) the plane of pairs of real numbers. y) must be a non-negative valued function with the property that ∞ ∞ p(x. we deﬁne the joint probability density function p(x. if our random variables always lie in some subset of the plane. so there are thirty-six possible outcomes and each is equally likely. BASIC PROBABILITY We will use the following example of a joint pdf throughout the rest of this section: X and Y will be random variables that can take on values (x. y) over the triangle and get 1: 2 0 0 x 1 dydx = 2x 2 0 y x | dx = 2x 0 2 0 1 dx = 1. we will deﬁne p(x. This give a function p. j) = 0 for all i and j outside the domain we are considering. For the discrete random variable. 0). y)dxdy is (approximately) the probability that X is between x and x + dx and Y is between y and y + dy (with error that goes to zero faster than dx and dy as they both go to zero). Thus our joint probability function will be p(i. the joint probability function. j) ≤ 1 for all i and j and ∞ ∞ p(i. we will set p(i. of X and Y that is deﬁned on (some subset of) the set of pairs of integers and such that 0 ≤ p(i. we need to integrate p(x. 0) and (2. y) to be 0 for all (x. j) be the probability that X = i and Y = j. (2. j) = 1/36 if 1 ≤ i ≤ 6. Thus. j) = 1 i=−∞ j=−∞ When we ﬁnd it convenient to do so. For continuous variables. y) outside that subset. 2 For discrete variables. 1 ≤ j ≤ 6 . We take one simple example of each kind of random variable. y) = 1/(2x) if (x. y) is in the triangle and 0 otherwise. The joint pdf of X and Y will be given by p(x. P (X = i ∩ Y = j). we let p(i. To see that this is a probability density function.

1. we wish to restrict our attention to the value of just one or the other. In other words the event X = i is the union of the events X = i and Y = j as j runs over all possible values. To calculate pX (i).4. j) = 0 otherwise. . It is certainly equal to the probability that X = i and Y = 0. ∞ pY (j) = i=−∞ p(i. the sum of p(i. we will show how to calculate the marginal distributions pX (i) of X and pY (j) of Y .4 Marginal distributions Often when confronted with the joint probability of two random variables. Likewise. We checked last time that this is a probability density function (its integral is 1). j)). we take the example mentioned at the end of the last lecture: 1 p(x. or X = i and Y = 1. 0) and (2. we recall that pX (i) is the probability that X = i. 1. the probability of their union is the sum of the probabilities of the events (namely. . For our continuous example. Thus: ∞ pX (i) = j=−∞ p(i. We can calculate the probability distribution of each variable separately in a straightforward way. or . These separated probability distributions are called the marginal distributions of the respective individual random variables. y) in the triangle with vertices (0. MARGINAL DISTRIBUTIONS 21 and p(i. j) . j) of the discrete variables X and Y . (2. y) = 0 otherwise. Since these events are disjoint. if we simply remember how to interpret probability functions. y) = 2x for (x. Given the joint probability function p(i. and p(x. . 0). Each of the 36 possible rolls has probability 1/36 of . 2). j) Make sure you understand the reasoning behind these two formulas! An example of the use of this formula is provided by the roll of two dice discussed above.

y) dx −∞ Again. In the limit as dy approaches zero. BASIC PROBABILITY occurring. y) is positive on the actual sample space subset of the plane. and are obtained either by common sense or by adding across the rows (resp down the columns). y) dy −∞ ∞ Similarly. To do this. the situation is similar.this becomes an integral: ∞ pX (x)dx = −∞ p(x. you should make sure you understand the intuition and the reasoning behind these important formulas. Given the joint probability density function p(x. y) dy dx ∞ In other words. we should sum all the probabilities that both X is in [x. They are the probabilities for the outcomes of the ﬁrst (resp second) of the dice. pY (y) = p(x. x + dx] and Y is in [y. so we have probability function p(i. j) as indicated in the following table: i\j 1 2 3 4 5 6 pX (i) 1 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 1/36 1/36 1/36 1/36 1/36 1/36 1/6 3 1/36 1/36 1/36 1/36 1/36 1/36 1/6 4 1/36 1/36 1/36 1/36 1/36 1/36 1/6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 5 6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 pY (j) 1/6 1/6 1/6 1/6 1/6 1/6 The marginal probability distributions are given in the last column and last row of the table. we wish to calculate the marginal probability density functions of X and Y . y) = 1 2x . pX (x) = p(x. and zero outside it). We return to our example: p(x. y + dy] over all possible values of Y . For continuous random variables. y) of a bivariate distribution of the two random variables X and Y (where p(x. recall that pX (x)dx is (approximately) the probability that X is between x and x + dx.22 CHAPTER 1. So to calculate this probability.

FUNCTIONS OF TWO RANDOM VARIABLES 23 for (x. the most common such function is the sum of two random variables.5 Functions of two random variables Frequently. and p(x. and pX (x) = 0 otherwise.1. 0). Note that for a given value of x between 0 and 2. 2). This indicates that the values of X are uniformly distributed over the interval from 0 to 2 (this agrees with the intuition that the random points occur with greater density toward the left side of the triangle but there is more area on the right side to balance this out). 0) and (2. y) dx = x x=2 x=y 1 dx 2x = 1 ln(x) 2 1 = (ln(2) − ln(y)) 2 if 0 ≤ y ≤ 2 and pY (y) = 0 otherwise.e. 1. i. 2]. we consider the events . The principle we will follow for discrete random variables is as follows: to calculate the probability function for F (X. it is necessary to calculate the probability (density) function of a function of two random variables. y) dy = 0 1 dy 2x = y 2x y=x = y=0 1 2 if 0 ≤ x ≤ 2. but the idea of the calculation applies in principle to any function of two (or more!) random variables.. (2. To calculate pY (y). By far. and pY (y) approaches 0 as y approaches 2. y) in the triangle with vertices (0.5. Y ). and compute its marginal density functions. The easy one is pX (x) so we do that one ﬁrst. y ranges from 0 to x inside the triangle: ∞ x pX (x) = −∞ p(x. that its integral is 1. x ranges from y to 2 inside the triangle: ∞ 2 pY (y) = −∞ p(x. You should check that this function is actually a probability density function on the interval [0. given the joint probability (density) function. we begin with the observation that for each value of y between 0 and 2. y) = 0 otherwise. Note that pY (y) approaches inﬁnity as y approaches 0 from above.

to recover the pdf of F . Y ) < f } (by integrating the . we recall that to calculate the probability of the event F < f .y)=f } p(x. Y ) will be to calculate the probabilities of the events {F (X. but it is surprisingly useful when combined with a little insight (and cleverness). As an example. Since the outcome of each of the dice is a number between 1 and 6. y) This seems like a pretty weak principle. we can diﬀerentiate the resulting function: d f pF (g) dg pF (f ) = df −∞ (this is simply the ﬁrst fundamental theorem of calculus). Since there are only countably many points in the sample space. we calculate the distribution of the sum of the two dice. To enunciate it.f −1) pF (f ) = x=−∞ p(x. but more powerful as well. the random variable F that results is discrete.f −6) 1 1 = (6 − |f − 7|) 36 36 A table of the probabilities of various sums is as follows: f 2 3 4 5 6 7 8 9 10 11 12 pF (f ) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 The ”tent-shaped” distribution that results is typical of the sum of (independent) uniformly distributed random variables. the outcome of the sum must be a number between 2 and 12. Y ) = f } for each value of f that can result from evaluating F at points of the sample space of (X.24 CHAPTER 1. f − x) = max(1. Our principle for calculating the pdf of a function of two random variables F (X.y)∈S|F (x. Y ). our principle will be a little more complicated. For continuous distributions. Then the probability function pF (f ) is pF (f ) = {(x. we integrate the pdf of F from −∞ to f : f P (F < f ) = −∞ pF (g) dg Conversely. BASIC PROBABILITY {F (X. So for each f between 2 and 12: ∞ min(6.

z/2). To calculate the pdf pZ (z). 0). (2. we ﬁrst note that for any ﬁxed number z. (z/2. and we can calculate: z/2 z−y y P (Z < z) = 0 1 1 dxdy = 2x 2 z/2 ln(x)|x=z−y dy x=y 0 1 y=z/2 (ln(z − y) − ln(y)) dy = (z − (z − y) ln(z − y) − y ln(y))|y=0 2 0 1 = z ln(2) 2 1 And since the (cumulative) probability that Z < z is 2 z ln(2) for 0 < z < 2. 0) and (2. and then to diﬀerentiate with respect to f to get the pdf. 0). FUNCTIONS OF TWO RANDOM VARIABLES 25 joint pdf over the region of the plane deﬁned by this inequality). 2 The calculation of the pdf for 2 ≤ z ≤ 4 is somewhat trickier because the intersection of the half-plane x + y < z and the triangle T is more complicated. z/2) and (z. Of course. depending upon whether z is greater than or less than 2: If 0 ≤ z ≤ 2. the pdf over this range is pZ (z) = ln(2) . it is easier to integrate ﬁrst with respect to x and then with respect to y. since both X and Y are always between 0 and 2 the biggest the sum can be is 4. We could calculate P (Z < z) by integrating p(x. The intersection in this case is a quadrilateral with vertices at the points (0. (z/2. the region of the plane where Z < z is the half plane below and to the left of the line y = z − x. Therefore P (Z < z) = 1 for all z ≥ 4. 0). But we will be a = 1 2 z/2 . y) = 1 2x for (x. we need to integrate 1/2x over the intersection of the half-plane x + y < z and the triangle T. For z between 0 and 4. for z ≤ 0. we get zero since the half plane z < 0 has no points in common with the triangle where the pdf is supported.1.y) over this quadrilateral. The shape of this intersection is diﬀerent. and p(x. 2). y) over this region. Let Z = X + Y . z − 2) and (2. y) = 0 otherwise. y) in the triangle T with vertices (0. (2. 0). In this case. We apply this principle to calculate the pdf of the sum of the random variables X and Y in our example: p(x. To calculate the probability P (Z < z). the intersection is a triangle with vertices at the points (0. 0). Likewise. we must integrate the joint pdf p(x.5.

1. 0). In other words it is points inside this large triangle (and note that we already have computed the integral of 1/2x 1 over this large triangle to be 2 z ln(2)) that are NOT inside the triangle with vertices (2. we need only diﬀerentiate this quantity. For example. The following fact is extremely useful. (2. BASIC PROBABILITY little more clever: Note that the quadrilateral is the ”diﬀerence” of two sets. we will often be presented with situations where we have some knowledge that will aﬀect the probability of whether some event will occur. to get pZ (z) = 1 1 1 d (z ln(2) − z ln(z) + z − 1) = ln(2) − ln(z) dz 2 2 2 Now we have the pdf of Z = X + Y for all values of z. Thus. If X and Y are two random variables.6 Conditional Probability In our study of stochastic processes. 2 It would be good practice to check that the integral of pZ (z) is 1. (z. then E(X +Y ) = E(X) + E(Y ). in the roll of two dice.1. It consists of points inside the triangle with vertices (0. we can calculate P (Z < z) as z z−x 1 1 dydx P (Z < z) = z ln(2) − 2 2x 2 0 1 = z ln(2) − 2 z z 2 y y=z−x | dx 2x y=0 z−x 1 1 1 dx = ln(2) + (z ln(x) − x)|x=z = z ln(2) − x=2 2 2x 2 2 2 1 1 = z ln(2) − (z ln(z) − z − z ln(2) + 2) 2 2 1 1 = z ln(2) − z ln(z) + z − 1 2 2 To get the pdf for 2 < z < 4.5. z − 2) and (z. Proposition 1. 0) that are to the left of the line x = 2. for 2 ≤ z ≤ 4. suppose we already know that the sum will be greater than 7. The event F ≥ 8 has probability . 0). 0). It is pZ (z) = ln(2)/2 for 0 < z < 2.26 CHAPTER 1. (z/2. This changes the probabilities from those that we computed above. it is pZ (z) = ln(2) − 1 ln(z) for 2 < z < 4 and it is 0 otherwise. z/2).

it is also the case the A is true. The quantity we wish to calculate is denoted P (F = 9|F ≥ 8). read ”the probability that F=9 given that F ≥ 8”. So we are restricted to less than half of the original sample space. we need only realize that we need to compute the fraction of the time that the event B is true. But note that Y > 1 is a subset of the event X > 1 in the triangle. CONDITIONAL PROBABILITY 27 (5 + 4 + 3 + 2 + 1)/36 = 15/36. We might wish to calculate the probability of getting a 9 under these conditions. P (F ≥ 8) 15/36 15 As another example (with continuous probability this time). so we get: 2 x 1 2 1 0 x P (Y > 1|X > 1) = P (Y > 1) = P (X > 1) 1 1 dydx 1 (1 − ln(2)) 2x = 2 = 1 − ln(2) 1 1 2 dydx 2x Two events A and B are called independent if the probability of A given B is the same as the probability of A (with no knowledge of B) and vice versa. In general to calculate P (A|B) for two events A and B (it is not necessary that A is a subset of B). (2. we calculate for our 1/2x on the triangle example the conditional probabilities: P (X > 1|Y > 1) as well as P (Y > 1|X > 1) (just to show that the probabilities of A given B and B given A are usually diﬀerent). 0). 0) and (2. In symbols. 2) it is true that Y > 1 implies that X > 1. This one is easy! Note that in the triangle with vertices (0.6. Thus P (X > 1|Y > 1) = 1. Independence of two random variables is expressed by the equations: P (A|B) = P (A) P (B|A) = P (B) . The assumption of independence of certain events is essential to many probabilistic arguments. we get P (F = 9|F ≥ 8) = P (F = 9) 4/36 4 = = . First P (X > 1|Y > 1). so the fraction we need to compute will have the same numerator and denominator. For P (Y > 1|X > 1) we actually need to compute something.1. we have P (A ∩ B) P (A|B) = P (B) For our dice example (noting that the event F = 9 is a subset of the event F ≥ 8). Therefore the events (Y > 1) ∩ (X > 1) and Y > 1 are the same.

This reduces to the fact that the joint probability (or probability density) function of X and Y ”splits” as a product: p(x. Theorem 1. Then P ( lim n i=1 Xi n→∞ n → µ) = 1 . be a sequence of independent random variables. whose expectation is µ.28 and especially CHAPTER 1. It states that if a random experiment is performed repeatedly under identical and independent conditions. X2 . This theorem is the main rigorous result giving credence to the empirical law of averages. . 1. 1. .1. . This belief about the way the random world works is embodied in the empirical law of averages.7 Law of large numbers The point of probability theory is to organize the random world. If X and Y are two independent random variables. one can give a quantitative account of this randomness. BASIC PROBABILITY P (A ∩ B) = P (A) · P (B) Two random variables X and Y are independent if the probability that a < X < b remains unaﬀected by knowledge of the value of Y and vice versa. We now state for future reference the strong law of large numbers. This law is not a mathematical theorem. If X is a random variable. people have noticed that. 2. Probability gives a mathematical framework for understanding statements like the chance of hitting 13 on a roulette wheel is 1/38. all with the same distribution.7. This formula is a straightforward consequence of the deﬁnition of independence and is left as an exercise. then σ 2 (aX) = a2 σ 2 (X) for any real number a. Proposition 1.6. then the proportion of trials that result in a given outcome converges to a limit as the number of trials goes to inﬁnity. then σ 2 (X + Y ) = σ 2 (X) + σ 2 (Y ). y) = pX (x)pY (y) of the marginal probabilities (or probability densities). (The strong law of large numbers) Let X1 . though certain measurements or processes seem random.1. For many centuries.

rewrite φ as a function of ∂x2 ∂t x. deﬁned to be the distance of the random point from the origin. 7. how many elements are in P(S) the power set of S. If S is a ﬁnite set with n elements. and don’t forget to count the empty set.7. 3. Consider the experiment of picking a point at random from a uniform distribution on the disk of radius R centered at the origin in the plane (“uniform” here means if two regions of the disk have the same area. 2. try it for the ﬁrst few values of n. ∂φ 5. Calculate the expectation of this random variable. Calculate the expectation of the underlying random variable X.1. What is the relationship among the following partial derivatives? ∂σ .. σ) = √ e 2σ2 σ 2π is really a probability density function on the real line (i. Calculate the expectation of the underlying random variable X. and ∂φ . LAW OF LARGE NUMBERS Exercises 29 1. 4. (Hint. µ. µ and t). 6. 8. Prove the principle of inclusion and exclusion. where t = σ 2 (for the last one. p(A∪B) = p(A)+p(B)− p(A ∩ B) and p(Ac ) = 1 − p(A). Make your own example of a probability space that is inﬁnite and discrete. 2φ ∂ . Make your own example of a probability space that is ﬁnite and discrete. Prove that the normal distribution function −(x−µ)2 1 φ(x. Calculate the probability density function and the expectation of the random variable D.e. then the random point is equally likely to be in either one). Calculate its expectation.) . that it is positive and that its integral from −∞ to ∞ is 1). Make your own example of a continuous random variable.

12. 15. Show that the minimum value of f occurs when a = E(X). Poisson and normal distributions. 10. . y) be the uniform joint probability density on the unit disk. P then its distribution Pf as deﬁned above is also a probability. 13. . its mean and its variance. BASIC PROBABILITY 9. Calculate the variance of the binomial. Let p(x. each distributed according to the exponential distribution with parameter α. Find the pdf of X + Y . 11. 14. a collection of subsets of the real numbers. if they exist. 1 p(x. ﬁnd the mean and variance of each of the following:|X|. The answer for the normal distribution is σ 2 . y) = 0 otherwise. Fill in the details of the calculation of the variance of the uniform and exponential distributions. Let X be a random variable with the standard normal distribution. X 2 and etX . Calculate the variance of the binomial. . Consider the function f (a) deﬁned as follows: f (a) = E((X − a)2 ). Find the pdf of X.30 CHAPTER 1. 18. Also. Let X be the sine of an angle in radians chosen uniformly from the interval (−π/2. Let X be any random variable with ﬁnite second moment. Also ﬁnd the mean and variance of X + Y . The answer for the normal distribution is σ 2 . 2 and 3 calculate the expectation and the variance. Ω. Find the joint pdf of X and Y (easy). Show that for a function f : S → R that for A1 .e. that f −1 (∪∞ Aj ) = ∪∞ f −1 (Aj ) j=1 j=1 Use this to prove that if f is a random variable on a probability space (S. For each of the examples of random variables you gave in problems 1. π/2). Calculate the pdf of X + Y .. Suppose X and Y are independent random variables. i. A2 . 16. y) = if x2 + y 2 < 1 π and p(x. prove that σ 2 (X) = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 . A3 . Poisson and normal distributions.. 17. .

The underlying in theory could be practically anything: a stock. how much income we want to derive from the assets and what level of return we are looking for. 2. a stock index. asks to calculate an intrinsic value of a derivative security (for example. The ﬁrst. Thus we will attack derivative valuation ﬁrst and then apply it to asset allocation. an option on a stock). the second question is actually more basic than the ﬁrst (at least in our approach to the problem). asset allocation. An options contract gives the right but not the obligation to the purchaser of the contract to buy (or sell) a speciﬁed (in our case ) stock 31 .1 What is a derivative security? What is a derivative security? By this we mean any ﬁnancial instrument whose value depends on (i. Our most fundamental example is an option on a stock. The second. is derived from ) an underlying security.Chapter 2 Aspects of ﬁnance and derivatives The point of this book is to develop some mathematical techniques that can be used in ﬁnance to model movements of assets and then to use this to solve some interesting problems that occur in concrete practice. a bond or much more exotic things.e. asks how do we allocate our resources in various assets to achieve our goals? These goals are usually phrased in terms of how much risk we are willing to acquire. Two of the problems that we will encounter is asset allocation and derivative valuation. Though it might seem more esoteric. derivative valuation.

A similar notion is that of a futures contract. Thus they provide liquidity for the security involved.2 Securities The two most common types of securities are an ownership stake or a debt stake in some entity. Of course. I could buy a (call) option to buy 100 shares of IBM at $90 a share on January 19.32 CHAPTER 2. No money changes hands at the establishment of the contract. For example. Securities can either be sold in a market (an exchange) or over the counter. You become a part owner in a company by buying shares of the company’s stock. A forward contract is an agreement/contract between two parties that gives the right and the obligation to one of the parties to purchase (or sell) a speciﬁed commodity (or ﬁnancial instrument) at a speciﬁed price at a speciﬁc future time.25/gallon. A debt security like a bond is a loan to a company in exchange for interest income and the promise to repay the loan at a future maturity date. When sold over the counter. Market makers must quote bid prices (the amount they are willing to buy the security for)and ask prices (the amount . This would obligate me to take delivery of the frozen orange juice concentrate in June at that price. Another well known and simpler example is a forward. The seller of this option earns the $19 2 (the 2 price (premium) of the options contract) and incurs the obligation to sell me 100 shares of IBM stock on January 31 for $90 if I should so choose. I would exercise my option only if the price of IBM is more than $90. I could buy a June forward contract for 50. Securities sold on an exchange are generally handled by market makers or specialists. Currently. When sold in an exchange the price of the security is subject to market forces. when January 19 rolls around.000 gallons of frozen orange juice concentrate for $. The diﬀerence between a forward and a future contract is that futures are generally traded on exchanges and require other technical aspects. These are traders whose job it is to always make sure that the security can always be bought or sold at some price. 2. the price is arrived at through negotiation by the parties involved. 1 this contract is selling for $19 1 . ASPECTS OF FINANCE AND DERIVATIVES at a speciﬁc price at a speciﬁc future time. for example a company or a government. For example. No money changes hands at the outset of the agreement. such as margin accounts. A debt security is usually acquired by buying a company’s or a government’s bonds.

at a speciﬁed date. Be short an option means that you are the writer of the option.2. T . I am short one share of IBM. Some terminology: If one owns an asset. for example). To short sell. That is. but on the entire history of the stock’s movements. sells it in the market. TYPES OF DERIVATIVES 33 they are willing to sell the security for) of the securities. Asian options: these are options allowing one to buy a share of stock for the average value of the stock over the lifetime of the option. in particular in options. We call this feature path dependence. one borrows the stock (from your broker. then it is called an American call. Diﬀerent versions of these are based on calculating the average in diﬀerent ways. one must return the stock (together with any dividend income that the stock earned). It gives the owner of the contract the right to sell. The opposite position is called short. The spot price is the amount that the security is currently selling for. It also comes in the European and American varieties. These basic call and put options are sold on exchanges and their prices are determined by market forces. A put is just the opposite.3. One can have long or short positions in any asset. Short selling is a basic ﬁnancial maneuver. 2. Thus if I owe a share of IBM. the strike price.3 Types of derivatives A European call option (on a stock) is a contract that gives the buyer the right but not the obligation to buy a speciﬁed stock for a speciﬁed price K. An interesting feature of these is that their pay oﬀ depends not only on the value of the stock at expiration. one is said to have a long position in that asset. we will usually assume that the bid and ask prices are the same. I am long one share in IBM. Of course the bid price is usually greater than the ask and this is one of the main ways market makers make their money. In the future. Short selling allows negative amounts of a stock in a portfolio and is essential to our ability to value derivatives. the expiration date from the seller of the option. Barrier options: These are options that either go into eﬀect or out of eﬀect if . Being long an option means owning it. This is one of many idealizations that we will make as we analyze derivatives. If the owner of the option can exercise the option any time up to T . There are also many types of exotic options. Thus is I own a share of IBM.

When a developer wants to ﬁgure out how much to charge for a new house say. a down and out option is an option that expires worthless if the barrier X is reached from above before expiration. Risk is what one gets rewarded for in ﬁnancial investments. insurance. Our problem is to determine the value of an option. wanting to make a proﬁt. This redistribution of risk towards those who are willing and able to assume it is what modern ﬁnancial markets are about. It is essential that every individual or corporation be able to select and manipulate the level of risk for a given transaction. This is the houses value. who along with Fischer Black. Quantifying risk and the level of reward that goes along with a given level of risk is a major activity in both theoretical and practical circles in ﬁnance. and you are encouraged to think what these mean. Black and Scholes wrote down there formula. the Nobel Prize in economics was awarded to Robert Merton and Myron Scholes. For example. risk is not only a bad thing. wrote down a solution to this problem. Black died in his mid-ﬁfties in 1995 and was therefore not awarded the Nobel prize. It is pretty clear that a prerequisite for eﬃcient management of risk is that these derivative products are correctly valued. cost of ﬁnancing etc. The developer. material and labor. adds to this a markup. up and out.. These are path dependent. Lookback options: Options whose pay-oﬀ depends on the the maximum or minimum value of the stock over the lifetime of the option. . Contrary to common parlance. Options can be used to reduce or increase risk.34 CHAPTER 2. he calculates his costs: land. up and in. ASPECTS OF FINANCE AND DERIVATIVES the stock value passes some threshold. There are also down and in. The total we call the price. Thus our basic question: How much should I pay for an option? We make a formal distinction between value and price.4 The basic problem: How much should I pay for an option? Fair value A basic notion of modern ﬁnance is that of risk. 2. and priced. Markets for options and other derivatives are important in the sense that agents who anticipate future revenues or payments can ensure a proﬁt above a certain level or insure themselves against a loss above a certain level. The price is what is actually charged while the value is what the thing is actually worth. In 1997.

That is. . Hewlett Packard had a calculator with this formula hardwired into it.2. within a year of its publication. P (X = i) = 1/6. the casino feels reasonably conﬁdent that they will be paying oﬀ something close to $21/6 · n. Assuming that the die is fair. suppose we play a (rather silly) game of tossing a die and the casino pays the face value on the upside of the die in dollars. the law of large numbers says that if there are n plays of the game. 2. (i. the payoﬀ of some experiment is random. Their method is quite general and can be used and modiﬁed to handle may kinds of exotic options as well as to value insurance contracts. 2. what they knew about derivative pricing. . 6 and that each roll of the die doesn’t aﬀect (that is independent of ) any other roll. Such models are of course statistical/probabilistic. and even has applications outside of ﬁnance. EXPECTATION PRICING. the ﬁrst thing is to ﬁgure out the fair value. The casino desiring to make a proﬁt sets the actual cost to play at 4. WHAT THEY KNEW ABOUT DER the Black-Scholes Formula in 1973. . The casino of course is ﬁguring that people will play this game many many times. that the average of the face values on the die will be with probability close to 1 E(X) = 1 · 1/6 + 2 · 1/6 + · · · + 6 · 1/6 = 21/6 Thus after n plays of the game.5.5 Expectation pricing. for i = 1. EINSTEIN AND BACHELIER.e. Einstein and Bachelier. Today this formula is an indispensable tool for thousands of traders and investors to value stock options in markets throughout the world. the amount the casino charges for a given bet is based on models for the payoﬀs of the bets. One models this payoﬀ as a random variable and uses . . Well. However. As an example. How much should the casino charge to play this game. That is. they would lose their money too fast. This is an example of expectation pricing. an option seems more like pricing bets in a casino. They had a diﬃcult time getting their paper published. And what they didn’t know As opposed to the case of pricing a house. So 21/6 is the fair valuer per game. Not too much higher than the fair value or no would want to play.

ASPECTS OF FINANCE AND DERIVATIVES the strong law of large numbers to argue that the fair value is the expectation of the random variable. But for now. in independent trials. Let St denote the value of the stock at time t. it might not be a very good value. Thus.1 Payoﬀs of derivatives We will have a lot to say in the future about models for stock movements.The one time we know the value of the option is at expiration. This is what we call the payoﬀ of the derivative. There are a number of reasons why the actual value might not be the expectation price. Bachelier’s thesis advisor. calculate the payoﬀ of all the derivatives deﬁned in section 2. (for which Einstein got his Nobel Prize ). Unfortunately. 0). we don’t exercise our option and expires worthless. and selling it on the market for ST . This actually predated the use of these to model Brownian motion. the best we can do here as well is to give a probabilistic model. let’s see how might we use this to value an option. Let us consider a European call option C to buy a share of XYZ stock with strike price K at time T . paying K for the stock. If ST > K then it makes sense to exercise the option. The ﬁrst order of business would be to model the movement of the underlying stock. It is what the derivative pays at expiration. Jacques Hadamard did not have such high regard for Bachelier’s work. 2. On your own.5. there are more subtle and interesting reason why the fair value is not enforced in the market. In order for the law of large numbers to kick in. what would be the obvious way to use expectation pricing to value an option. Some times it doesn’t. He did so by what we now call a Random Walk (discrete version) or a Wiener process (continuous version). This works much of the time. The French mathematician Louis Bachelier in his doctoral dissertation around 1900 was one of the earliest to attempt to model stock movements. Thinking back to stocks and options. the experiment has to be performed many times. which in this case only depends on the value of the stock at expiration. the whole transaction netting $ST − K. the option is worth max(St − K. If the value of the stock at expiration is less than the strike price ST ≤ K.3.36 CHAPTER 2. At expiration there are two possibilities. If you are oﬀered to play a game once. What’s the fair value to play Russian Roulette? As we will see below. .

the interest rates do no vary so much. unfortunately. then at time T > t it will be worth $ exp(r(T − t))B. 2. if this is not zero. So expectation pricing says F = E(ST ). at time t worth $St . Since the contract costs nothing to set up. According to expectation pricing. A zero-coupon bond is a promise of a speciﬁc amount of money (the face or par value) at some speciﬁed time in the future. ) Bachelier attempted to value options along these lines. that is. I must buy a share of XYZ for $F from him and he must sell it to me for $F . I am assuming interest rates are 0. This makes some sense. the contract is worth $ST − F . If one has $B at time t . (I am ignoring the time value of money. the expiration date. This time I will also keep track of the time value of money. 0)). There is another wrinkle to option pricing. So at time T . Thus at time T . It’s called arbitrage. On the other hand. For the meanwhile. Most options are rather short lived. that is Bt = exp(−r(T − t))BT . it would lead to proﬁts or losses over the long run. Arbitrage arguments I will illustrate the failure of expectation pricing with the example of a forward contract. So if a bond promises to pay $BT at time T . and adjusting for the time value of money. then its current value Bt can be expressed as BT discounted by some interest rate r. Over this period. 3 months to a year or so. ARBITRAGE ARGUMENTS37 Assuming a probabilistic model for St we would arrive at a random variable ST describing in a probabilistic way the value of XYZ at time T . Thus the value of the option would be the random variable max(ST −K. by the law of large numbers. And expectation pricing would say that the fair price should be E(max(ST − K. Now suppose that we enter into a forward contract with another person to buy one share of XYZ stock at time T.6 The simple case of forwards. it turns out that expectation pricing (naively understood at least) does not work. consider the following strategy for the party who must deliver the stock at time T . Bonds are the market instruments that express the time value of money. We want to calculate what the forward price $F is. If the stock is now.6. The time value of money is expressed by interest rates. 0). we will assume that the interest rate r is constant.2. . THE SIMPLE CASE OF FORWARDS. this contract should at time t be worth $ exp(r(T − t))E(ST − F ).

borrow it) and sell it in the market for $St . Therefore. what will happen in practice if this is positive. buy the stock for $F . Again.1. you should have calculated that the payoﬀ on a put is max(K − ST . That is. return the stock. At the beginning of the contract. So we arrive at F = exp(r(T − t))St . Now form the portfolio P + S − C. then at time t. A similar argument as in the paragraph above shows that $(exp(r(T − t))St − F ) ≤ 0.38 CHAPTER 2. at time t the contract costs nothing to set up. At time T . then we have an arbitrage opportunity. the price arrived at by expectation pricing F = E(ST ) would be much higher than the price arrived at by the arbitrage argument. the party to the contract who must buy the stock for F at time T . At time T he can just hand over the stock. Proof: By now. ASPECTS OF FINANCE AND DERIVATIVES he borrows the money at the risk free interest rate r. buys the stock now. The forward price has nothing to do with the expected value of the stock at time ST . and at time T she has $(exp(r(T − t))St − F ) which is positive. and in the end has $(F − exp(r(T − t))St . can at time t short the stock (i. This is an example of the basic arbitrage argument. This is also a basic result expressing the relationship between the value of a call and of a put. No matter how small the proﬁt is on any single transaction. and holds the stock until time T . So we see something very interesting. It only has to do with the value of the stock at the time of the beginning of the contract. So again. On the other hand.6. That is. Someone else will come along and oﬀer the same contract but with a slightly smaller F still making a positive (and thus a potentially unlimited) positive. it is worth nothing. an arbitrage opportunity. If this is positive. So $(F − exp(r(T − t))St ≤ 0 If $(F − exp(r(T − t))St < 0 you can work it from the other side. So he starts with a contract costing no money to set up. Put the money in the risk free instrument. Let’s give another example of an arbitrage argument. In practice. (Put-Call parity) If St denotes the value of a stock XYZ at time t.e. P + St = C + K exp(−r(T − t)). long a share of . That is a risk free proﬁt. no money changes hands. one can parley this into unlimited proﬁts by performing the transaction in large multiples. Proposition 2. After the transaction is over he has $(F − exp(r(T − t))St . and C and P denote the value of a call and put on XYZ with the same strike price K and expiration T . 0). as long as $(F − exp(r(T − t))St > 0 someone can come along and undercut him and still make unlimited money. and now has exp(r(T − t))St in the risk free instrument. collect $F and repay the loan exp(r(T − t))St .

Hence the value of this portfolio at time t is K exp(−r(T − t)). along with the eﬃcient market hypothesis. ARBITRAGE AND NO-ARBITRAGE 39 stock. arbitrage opportunities don’t really exist. and short a call. and thus able to generate large proﬁts by carrying out multiple transactions.7.7 Arbitrage and no-arbitrage Already we have made arguments based on the fact that it is natural for competing players to undercut prices as long as they are guaranteed a positive risk free proﬁt. The no-arbitrage hypothesis is the assumption that arbitrage opportunities do not exist. This is formalized by the no arbitrage hypothesis.D. Deﬁnition 2. There exist people called arbitrageurs whose job it is to ﬁnd these opportunities. and a put. But even if you don’t buy this. (We formalize this even more in the next section. 2.2. it seems to be a reasonable assumption (especially in view of the reasonable arguments we made above.1.E. . Then P +S −C = max(K −ST . Thus the word instantaneous in the deﬁnition is equivalent to saying that the portfolio is risk free and grows faster than the risk-free interest rate. 0) = 0 + ST − (ST − K) = K So the payoﬀ at time T from this portfolio is K no matter what. the portfolio is worth K at time T .) An arbitrage is a portfolio that guarantees an (instantaneous) risk free proﬁt. But Malkiel and others would insist that in actual fact. Q. Then at time T we calculate. 0)+ST −max(ST −K. but the growth of the risk free bond is just an expression of the time value of money. a good discussion of this is in Malkiel’s book. Then P +S −C = max(K −ST . Case 1: ST ≤ K.) as a starting point. Regardless of what ST is. 0) = K − ST + ST − 0 = K Case 2: ST > K. It is risk free to be sure.) Actually. What is the sense of this? Of course arbitrage opportunities exist. That is it’s risk free. 0)+ST −max(ST −K. It is important to realize that a zero-coupon bond is not an arbitrage. What other ﬁnancial entity has this property? A zero-coupon bond with par value K. (And some of them make a lot of money doing it. Giving the theorem.7.

. it might mean the existence of arbitrage opportunities. in which case your evaluation of derivative prices is better founded. any dividends that are paid or any possible change to the over all worth of the security.. ...8. S2 .8 The arbitrage theorem In this section we establish the basic connection between ﬁnance and mathematics. say i has a payoﬀ dij in state j. Consider the column vector S whose components are the Sj ’s. These may be stocks. which are labeled 1. Consider N possible securities. .2 d2. . .1 d1. This payoﬀ reﬂects any change in the security’s price. . if one sees derivatives mispriced. On publicly traded stocks the existence of arbitrage is increasingly rare. Our ﬁrst goal is to ﬁgure out what arbitrage means mathematically and for this we model ﬁnancial markets in the following way. dN. Also.40 CHAPTER 2. At time 1. d1. or you can make money (having found an arbitrage opportunity). the random nature of ﬁnancial markets are represented by the possible states of the world.1) . . Note that this notation is diﬀerent than notation in the rest of the notes in that subscript does not denote time but indexes the diﬀerent securities. S3 . We consider the following situation. etc. d2.. 4. . So you have a dichotomy. That is.1 d2. commodities. dN. .. Each security.. at time 0. .1 dN. . from time 0 to time 1 the world has changed from its current state (reﬂected in the securities prices S). to one of M possible states. bonds. labeled {1..2 D= . . . Suppose that now.2 . . . . the existence of arbitrageurs makes the no-arbitrage argument more reasonable. ASPECTS OF FINANCE AND DERIVATIVES In fact. . you can be right (about the no-arbitrage hypothesis). We can organize these payoﬀs into an N × M matrix. ..M . SN units of account (say dollars if you need closure). So as a basic principle in valuing derivatives it is a good starting point. d1.M . . 2. N }. M . it closes all the more quickly. 3. . 2. The existence of many people seeking arbitrage opportunities out means that when such an opportunity arises. these securities are selling at S1 . . options.M (2.

. .e. in other words. The pay-oﬀ of the portfolio can be represented by the M × 1 matrix DT θ N j=1 dj.1.8. .8. regardless of the state at time 1. . xn )|xi ≥ 0 for all i} + and Rn = {(x1 . N j=1 dj.M θj where the ith row of the matrix. less that worthless) to being worth something (resp.. a free-lunch.2. + ++ Deﬁnition 2. . A state price vector is a M × 1 column vector ψ such that ψ > 0 and S = Dψ . THE ARBITRAGE THEOREM Our portfolio is represented by a N × 1 column matrix θ1 θ θ= 2 .8. Deﬁnition 2. indicating short selling. This can be a negative number. . xn )|xi > 0 for all i} ++ . The portfolio is riskless if the payoﬀ of the portfolio is independent of the state of the world. . the market value of the portfolio is θ · S = θT S.3) = ..2) where θi represents the number of copies of security i in the portfolio. In common parlance. Let Rn = {(x1 .2.8. θ · S < 0 and DT θ ≥ 0 These conditions express that the portfolio goes from being worthless (respectively. θN 41 (2. An arbitrage is a portfolio θ such that either 1.8. We write v ≥ 0 if v ∈ Rn and v > 0 if v ∈ Rn . an arbitrage is an (instantaneous) risk free proﬁt. at least worth nothing). (DT θ)i = (DT θ)j for any states of the world i and j. . Now. . θ · S ≤ 0 and DT θ > 0 or 2.2 θj (2.1 θj N j=1 dj. . which we write as (DT θ)i represents the payoﬀ of the portfolio in state i. i..

42

CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

Theorem 2.8.1. (Arbitrage theorem) There are no arbitrage portfolios if and only if there is a state price vector. Before we prove this theorem, we need an abstract theorem from linear algebra. Let’s recall that a subset C ⊂ Rn is called convex if the line between any two points of C is completely contained in C. That is, for any two points x, y ∈ C and any t ∈ [0, 1] one has x + t(y − x) ∈ C Theorem 2.8.2. (The separating hyperplane theorem) Let C be a compact convex subset of Rn and V ⊂ RN a vector subspace such that V ∩ C = ∅. Then there exists a point ξ0 ∈ Rn such that ξ0 · c > 0 for all c ∈ C and ξ0 · v = 0 for all v ∈ V . First we need a lemma. Lemma 2.8.3. Let C be a closed and convex subset of Rn not containing 0. Then there exists ξ0 ∈ C such that ξ0 · c ≥ ||ξ0 ||2 for all c ∈ C. Proof: Let r > 0 be such that Br (0) ∩ C = ∅. Here Br (x) denotes all those points a distance r or less from x. Let ξ0 ∈ C ∩ Br (0) be an element which minimizes the distance from the origin for elements in C ∩ Br (0). That is, ||ξ0 || ≤ ||c|| (2.8.4)

for all c ∈ Br (0) ∩ C. Such a ξ0 exists by the compactness of C ∩ Br (0). Then it follows that (2.8.4) holds for all c ∈ C since if c ∈ C ∩ Br (0) then ||ξ0 || ≤ r ≤ ||c||. Now since C is convex, it follows that for any c ∈ C, we have ξ0 +t(c−ξ0 ) ∈ C so that ||ξ0 ||2 ≤ ||ξ0 + t(c − ξ0 )||2 = ||ξ0 ||2 + 2tξ0 · (c − ξ0 ) + t2 ||c − ξ0 ||2 Hence 2||ξ0 ||2 − t||c − ξ0 || ≤ 2ξ0 · c for all t ∈ [0, 1]. So ξ0 · c ≥ ||ξ0 ||2 QED. Proof (of the separating hyperplane theorem) Let C be compact and convex, V ⊂ Rn as in the statement of the theorem. Then consider C − V = {c − v | c ∈ C, v ∈ V }

2.8. THE ARBITRAGE THEOREM

43

This set is closed and convex. (Here is where you need C be compact.) Then since C ∩ V = ∅ we have that C − V doesn’t contain 0. Hence by the lemma there exists ξ0 ∈ C − V such that ξ0 · x ≥ ||ξ0 ||2 for all x ∈ C − V . But this implies that for all c ∈ C, v ∈ V that ξ0 · c − ξ0 · v ≥ ||ξ0 ||2 . Or that ξ0 · c > ξ0 · v for all c ∈ C and v ∈ V . Now note that since V is a linear subspace, that for any v ∈ V , av ∈ V for all a ∈ R. Hence ξ0 · c > ξ0 · av = aξ0 · v for all a. But this can’t happen unless ξ0 · v = 0. So we have ξ0 · v = 0 for all v ∈ V and also that ξ0 · c > 0. QED. Proof (of the arbitrage theorem): Suppose S = Dψ, with ψ > 0. Let θ be a portfolio with S · θ ≤ 0. Then 0 ≥ S · θ = Dψ · θ = ψ · DT θ and since ψ > 0 this implies DT θ ≤ 0. The same argument works to handle the case that S · θ < 0. To prove the opposite implication, let V = {(−S · θ, DT θ) | θ ∈ RN } ⊂ RM +1 Now consider C = R+ × RM . C is closed and convex. Now by deﬁnition, + there exists an arbitrage exactly when there is a non-zero point of intersection of V and C. Now consider C1 = {(x0 , x1 , . . . , xM ) ∈ C | x0 + · · · + xM = 1} C1 is compact and convex. Hence there exists according to (2.8.2) x ∈ C1 with 1. x · v = 0 for all v ∈ V and 2. x · c > 0 for all c ∈ C1 . Then writing x = (x0 , x1 , . . . , xM ) we have that x · v = 0 implies that for all θ ∈ RN , (−S ·θ)x0 +DT θ ·(x1 , . . . , xM )T = 0. So DT θ ·(x1 , . . . , xM )T = x0 S ·θ or θ · D(x1 , . . . , xM )T = x0 S · θ. Hence letting ψ= (x1 , . . . , xM )T x0

44

CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

we have θ · Dψ = S · θ for all θ. Thus Dψ = S and ψ is a state price vector. Q.E.D. Let’s consider the following example. A market consisting of two stocks 1, 2. And two states 1, 2. Suppose that S1 = 1 and S2 = 1. And D= 1 1 1 2 2 (2.8.5)

So security 1 pays oﬀ 1 regardless of state and security 2 pays of 1/2 in state 1 and 2 in state 2. Hence if θ1 θ= (2.8.6) θ2 is a portfolio its payoﬀ will be DT θ θ1 + θ2 /2 θ1 + 2θ2 (2.8.7)

Is there a ψ with S = Dψ. One can ﬁnd this matrix easily by matrix inversion so that 1 1 1 ψ1 = (2.8.8) 1 2 1 ψ2 2 So that ψ1 ψ2 = 4/3 −2/3 −1/3 2/3 1 1 = 2/3 1/3 (2.8.9)

Now we address two natural and important questions. What is the meaning of the state price vector? How do we use this to value derivatives? We formalize the notion of a derivative. Deﬁnition 2.8.3. A derivative (or a contingency claim) is a function on the set of states. We can write this function as an M × 1 vector. So if h is a contingency claim, then h will payoﬀ hj if state j is realized. To see how this compares with what we have been talking about up until now, let us model a call on

2. Let j be a contingency claim that pays 1 if state change occurs. Then the market value of the portfolio θj is S · θj = Dψ · θj = ψ · DT θj = ψ · j = ψj Hence we have shown. Its market value will then be R = M ψj . and 0 if any other state occurs. Let ψ be a state price vector.j − 1 1 . To understand what the state price vector means. (you might want to think of R as (1 + r)−1 where r is the . Then hj = max(d2. Or 2 h= h1 h2 = 0 1 2 (2. How much is this bet worth? Suppose that we can ﬁnd a portfolio θj such that DT θj = j . That is.11) 1 in the jth row ··· 0 These are often called binary options.8.8. B is a riskless zero-coupon bond with par value 1 and expiration at time 1. 0). except that it allows the possibility of a payoﬀ depending on a combination of securities. we introduce some important special contingency claims. So the state price vector is the market value of a bet that state j will be realized.10) So our derivatives are really nothing new here. Now consider 1 M 1 j B= = (2. It is just a bet paying oﬀ 1 that state j occurs.8. Consider a call with strike price 1 1 on 2 security 2 with expiration at time 1.8.12) ··· j=1 1 This is a contingency claim that pays 1 regardless of state. that the market value of a portfolio whose payoﬀ is j is ψj . which is certainly a degree of generality that we might want. This is the meaning of ψ. THE ARBITRAGE THEOREM 45 security 2 in the example above. So 0 0 ··· j = (2. So this R is the discount factor for j=1 riskless borrowing.

we have that for θ = j=1 hj θ that D θ = h. a market is complete if and only if every claim can be replicated. we say that θ replicates h. For this we make the following Deﬁnition 2. Hence. the market value of θ (and therefore the derivative) is the discounted expectation of h with respect to the pseudo-probability Q. They are probabilities that are derived only from the no-arbitrage condition. When is there a portfolio θ whose payoﬀ DT θ is equal to a contingency claim h. ASPECTS OF FINANCE AND DERIVATIVES 1 interest rate. That is. there is one further thing we need to discuss. They are therefore often called pseudo-probabilities.46 CHAPTER 2. . not from any actual probabilities of the states occurring. 1. A market is complete if and only if every contingency claim can be replicated. R = 1 in this example. A market is complete if for each of the θj with DT θj = j . i. then for any contingency claim M j T h. Coming back to our problem of valuing derivatives. Let us go back to the example above. If θ is a portfolio whose payoﬀ is h.e. j there is a portfolio Now we see that if a market is complete. consider a derivative h.8. Then the market value of θ is S · θ = Dψ · θ = ψ · DT θ M = ψ · h = RQ · h = D j=1 qj hj = REQ (h) That is.4. We state this and an additional fact as a Theorem 2. Note that Q ∈ R++ and M j=1 qj = 1. DT θ = h. So Q can be interpreted as a probability function on the state space. Let’s go ahead and value the option on security number 2. so the value of h is 2 11 ) = 1/6 REQ (h) = 1 · (0 + 3 23 Coming back to general issues. but with respect to probabilities derived from only from the no-arbitrage hypothesis. It is important to remark that these probabilities are not the real world probabilities that might be estimated for the states j.8. we have regained our idea that the value of a contingency claim should be an expectation.4.) Now we can form a vector Q = R ψ.

QED. THE ARBITRAGE THEOREM 47 2. Then θ · Dτ = 0 for all θ so Dτ = 0 and ψ 2 = ψ 1 + τ is another state price vector. So assume that the market is not complete. and ψ 1 is a state price vector. then D(ψ 1 − ψ 2 ) · θ = (S − S) · θ = 0 for all θ. So we can ﬁnd a vector τ orthogonal to the image. We prove the other direction by contradiction. An arbitrage free market is complete if and only if the state price vector is unique.8. i. But then 0 = D(ψ 1 − ψ 2 ) · θ = (ψ 1 − ψ 2 ) · DT θ. It is clear from the ﬁrst statement that if a market is complete then DT : RN → RM is onto. Proof: Having proved the ﬁrst statement above we prove the second. Since DT is onto. Then DT is not onto.2. Thus if ψ 1 and ψ 2 are two state price vectors. DT θ · τ = 0 for all θ. . that is has rank M .e. this means that (ψ 1 − ψ 2 ) · v = 0 for all v ∈ RN which means that ψ 1 − ψ 2 = 0.

show that there can only be one bond price. 7. for any intermediate time t < T . ASPECTS OF FINANCE AND DERIVATIVES 1. To ﬁnance the ﬁrst case. tossing a die and the casino pays the face value on the upside of the die in dollars. Suppose that C is a call on a non-dividend paying stock S. 8. then their market value is the same. you borrow K at time t to buy the stock. using an arbitrage argument. Calculate its market value. Except this time. 6. Show. Using the arbitrage theorem. Suppose that C and D are two calls on the same stock with the same strike price K. Calculate this portfolio at time T . that is. Show by an arbitrage argument that Dt ≥ Ct for all t < T1 . respectively. Consider a variant of the game described above. Show by an arbitrage argument that Ct ≥ Dt for all t < T . That is.) 4. Ct > St − K. . but with strike prices K < L respectively. 2. 5. she is allowed one additional toss and the casino then pays the face value on the second toss. with strike price K and expiration T . The other case you know the payoﬀ. Now compare and set up an arbitrage if St − K ≥ Ct . (Hint: Calculate the payoﬀ of the option as seen from the same time T in two cases: 1) If you exercise early at time t < T . What is the value of the stock S+ just after the dividend? Use an arbitrage argument. Let h 1 be a put on security 2 with strike price 2 . that it is never advantageous to exercise early. Consider the example of the market described in section (2. $D. and repay the loan at time T . 3. Show that for any time t < T that Ct > St − exp(−r(T − t))K. if the player doesn’t like her ﬁrst toss. and C is an American call on XY Z with strike price K and expiration T . What is the fair value of this game?. Suppose that C and D are two calls on the same stock with the same expiration T .48 Exercises CHAPTER 2. but with expirations T1 < T2 . 2) If you hold it until expiration. buy the stock.7). regardless of state. if there are 2 securities paying $1. Suppose that stock XY Z pays no dividends. Suppose a stock is worth $S− just before it pays a dividend.

and show that there is a state price vector. THE ARBITRAGE THEOREM 49 9.2.8. . Make up your own market. Consider a speciﬁc derivative (of your choice) and calculate its market value.

ASPECTS OF FINANCE AND DERIVATIVES .50 CHAPTER 2.

) How do we mathematically model this notion of a fair game. A History of Probability and Statistics and their applications before 1750. and if in your own. In fact. In his treatise. an Italian. and then give an application to calculating expected stopping times. of bystanders. Hald. 3. you are a fool. Giorolamo Cardano. playing a gambling 51 . that is.1 A ﬁrst look at martingales In this section I would like to give a very intuitive sense of what a martingale is. First let’s think about what we mean by a fair game. we proceed from the one period model of the last chapter to the mutli-period model. of money. The Book of Games of Chance he wrote: ”The most fundamental principle of all in gambling is simply equal conditions. of equal conditions? Consider a number of gamblers. you are unjust. if it is in your opponent’s favor. New York. John Wiley and Sons. why they come up in option pricing.Chapter 3 Discrete time In this chapter. of situation.” (From A. Let me begin by stating the slogan that a martingale is a mathemical model for a fair game. for example. of opponents. was one of the ﬁrst mathematicians to study the theory of gambling. of the dice box. the term martingale derives from the well known betting strategy (in roulette) of doubling ones bet after losing. The concept of a martingale is one of primary importance in the modern theory of probability and in theoretical ﬁnance. say N . and of the die itself. To the extent to which you depart from that equality.

Noting the gamblers’ fortunes at a single time is not so useful. That is. so we keep track of their fortunes at each moment of time. -Juliet . Romeo! wherefore art thou Romeo? Deny thy father and refuse thy name. O Romeo. be but sworn my love. this is the only information that we really need to talk about. F0 ) = Fn−1 . then he can expect to have Fn−1 after the nth hand. Hence we have that E(Fn ) = E(Fn−1 + ξn ) = E(Fn−1 ) + E(ξn ) = E(Fn−1 ) So the expected fortunes of the gambler will not change over time. Fn is what is called a stochastic process.52 CHAPTER 3. So let Fn denote fortune of one of the gamblers after the nth hand. we must have E(ξn ) = 0. with pdf qn (x). which we explain below. if pn (x) denotes the probability density function of Fn . What are the consequences of this condition? The amount that the gambler wins in hand n is the random variable ξn = Fn − Fn−1 and the equal distribution of the gamblers’ fortunes implies that the gamblers’ wins in the nth hand are also equally distributed. A sequence of random variables.) This is closer to the Martingale condition which says that E(Fn |Fn−1 . then we have pn (x) = qn (x) for all n and x. E(Fn |Fn−1 ) = Fn−1 . It is reasonable to model each gambler as the gamblers’ current fortunes. We will assume that this game is played in discrete steps called hands. The same analysis says that if the gambler has Fn−1 after the n − 1 hand. Because of the random nature of gambling we let Fn be a random variable for each n. Or. if thou wilt not. · · · . Fn−1 . and Gn is any of the other gamblers’ fortunes. We also assume that the gamblers make equal initial wagers and the winner wins the whole pot. (Here E(Fn |Fn−1 ) = Fn−1 denotes a conditional expectation. From the conditions on the gambling game described above. That is. Now for the equal conditions. A reasonable way to insist on the equal conditions of the gamblers is to assume that each of the gamblers’ fortune processes are distributed (as random variables) in exactly the same way. That is. Think about this. DISCRETE TIME game. And I’ll no longer be a Capulet. This is close to the Martingale condition.

T HT }) = 2/8 P ({HHHT. what is the expected number of times it takes to ﬂip two Heads in a row: HH ? We can actually calculate these numbers by hand. T HHT. T HHHT. ignoring spaces and commas.1. A FIRST LOOK AT MARTINGALES 53 Now let me describe a very pretty application of the notion of martingale to a simple problem.3. Then we want to calculate E(T1 ) and E(T2 ).1. T T T HT }) = 4/32 P (T1 = 6) = P ({HHHHHT. But how long will it take. T T HHHT. If T denotes one of these RV’s (random variables) then ∞ E(T ) = n=1 nP (T = n) We then have 0 P ({HT }) = 1/4 P ({HHT.1) . T T HT }) = 3/16 P ({HHHHT. T T HHT. capitals and punctuation) and typing one letter per second will eventually type the complete works of Shakespeare with probability 1. Flipping a fair coin over and over. So the question more mathematically is what is the expected time for a monkey to type ”O Romeo Romeo”. Everyone recognizes this as Juliet’s line in Act 2 scene 2 of Shakespeare’s Romeo and Juliet. So let’s consider a couple of similar but simpler questions and also get our mathematical juices going. what is the expected number of times it takes to ﬂip before getting a Heads and then a Tails: HT ? 2. Benvloio also says this in Act 3 scene 1. Everyone has come accross the theorem in a basic probability course that a monkey typing randomly at a typewriter with a probability of 1/26 of typing any given letter (we will always ignore spaces. More particularly. T T T HHT. 1. T T T T HT }) = 5/64 P (T1 P (T1 P (T1 P (T1 P (T1 = 1) = = 2) = = 3) = = 4) = = 5) = (3. Flipping a fair coin over and over. Romeo”. This seems perhaps pretty hard. T HHHHT. let’s consider the question of how long it will take just to type: ”O Romeo. Let T1 denote the random variable of expected time until HT and T2 the expected time until HH.

So far pretty easy and no surprises. HT HT − T HH. HT − T HH. At this point. T T T H − T HH. T T HT − T HH. Here 0 P ({HH}) = 1/4 P ({T HH}) = 1/8 P ({T − T HH. T T H − T HH. Let’s do E(T2 ). let’s write down the ”generating function” ∞ f (t) = n=1 n(n − 1)tn so that E(T1 ) = f (1/2) To be able to calculate with such generating functions is a useful skill. HT T T − T HH. this term sits in front of tn instead of tn−2 so we need to multiply a second derivative by t2 . DISCRETE TIME and so E(T1 ) = n=2 n(n − 1)/2n .2) Again the pattern is easy to see: P (T2 = n) = fn−1 /2n where fn denotes the nth Fibonacci number (Prove it!). T H − T HH}) = 3/32 P ({T T T − T HH. H − T HH} = 2/16 P ({T T − T HH. First not that the term n(n − 1) is what would occur in the general term after diﬀerentiating a power series twice. On the other hand. T HT T − T HH. To sum this.1.(Recall that the Fibonacci numbers are P (T2 P (T2 P (T2 P (T2 P (T2 P (T2 = 1) = = 2) = = 3) = = 4) = = 5) = = 6) = . HT T − T HH. we make some guesses and then adjustments to see that ∞ f (t) = t2 ( n=0 tn ) so it is easy to see that f (t) = 2t2 (1 − t)3 and that E(T1 ) = f (1/2) = 4. T HT H − T HH. HT H − T HH}) = 5/64 P (T2 = 7) = P ({T T T T − T HH. T HT − T HH. HT T H − T HH} = 8/128 (3.54 We can see the pattern P (T1 = n) = ∞ n−1 2n CHAPTER 3.

If their letter came up. assume that a round of monkey typing takes exactly one minute. If he .1. Together a group of gamblers conspire together on the following betting scheme. f2 = 1 and fn = fn−1 + fn−2 .1. The same kind of analysis could in theory be done for ”OROMEOROMEO” but it would probably be hopelessly complicated. I can never do better than break even. they are paid oﬀ at a rate of 26 to 1. What is accounting for the fact that HH takes longer? A simple answer is that if we are awaiting the target string (HT or HH) then if we succeed to get the ﬁrst letter at time n but then fail at time n + 1 to get the second letter. To approach ”OROMEOROMEO” let’s turn it into a game. No matter what strategy I adopt. That is: Theorem 3. A little surprising perhaps. the earliest we can succeed after that is n + 2 in the case of HT but is n + 3 in the case of HH. on average.) So ∞ E(T2 ) = n=2 nfn−1 /2n Writing g(t) = complicated) that ∞ n=1 nfn−1 t we calculate in a similar manner (but more 2t2 − t3 g(t) = (1 − t − t2 )2 n and that E(T2 ) = g(1/2) = 6. Let T denote the random variable giving me the expected time until ”OROMEOROMEO”. The game then continues. I walk into a casino and sitting at a table dressed in the casino’s standard uniform is a monkey in front of a typewriter. Eventually this will be a theorem in martingale theory (and not a very hard one at that). If he wins. the ﬁrst bettor walks up to the table and bets one Italian Lira (monkey typing is illegal in the United States) ITL 1 that the ﬁrst letter is an ”O”. A FIRST LOOK AT MARTINGALES 55 deﬁned by the recursive relationship.1. For simplicity. On the ﬁrst play. People are standing around the table making bets on the letters of the alphabet.3. I know in my mathematical bones that I can not expect to do better on average than to break even. Then the monkey strikes a key at random (it is a fair monkey) and the people win or lose according to whether their bet was on the letter struck. f1 = 1. he gets ITL 26 and bets all ITL 26 on the second play that the second letter is an ”R”. Clearly the odds and the pay-oﬀ are set up to yield a ”fair game”: this is an example of a martingale.

one for each play of monkeytyping. DISCRETE TIME wins he bets all ITL 262 that on the third play the letter is an ”O”. How many such losses are there? Well.} such that the totality of values of all the Xn ’s combined form a discrete set. Then they calculate their total wins and losses. because the game is fair. Try this for the ”HH” problem above to see more clearly how this works. all the money is conﬁscated and he walks away with a net loss of the initial ITL 1. the betting sequence beginning ”OROMEOROMEO” has yielded ITL 2611 . the bettors (there are a lot of them) initiate a new betting sequence at each play. 3. that is.2 Introducing stochastic processes A stochastic process which is discrete in both space and time is a sequence of random variables. the conspirators expected wins is going to be 0. regardless of what has happened on the ﬁrst play. And ﬁnally the ﬁnal ”O” earns an additional ITL 26.56 CHAPTER 3. it means that the expected value E(T ) = 2611 + 266 + 26. The diﬀerent values that the Xn ’s can take are called the states of the stochastic process. . X2 . etc. ITL 1 the betting sequence initiated at that play has been defeated. In addition to this though. Because they got through. at each play. their net earnings will be 2611 + 266 + 26 − T Since the expected value of this is 0. except for those enumerated above. deﬁned on the same sample space (that is they are jointly distrubuted) {X0 . Again. let’s calculate the amount they’ve won once ”OROMEOROMEO” has been achieved. the sequence starting with the ﬁnal ”O” in the ﬁrst ROMEO and continuing through the end has gone undefeated and yields ITL 266 . Each of these then counts as a ITL 1 loss. Now in addition to starting the above betting sequence on the ﬁrst play. If at some point one of the bettors gets through ”OROMEOROMEO”. . Therefore. . One after another. On the other hand. the game ends. Of course if at some point he loses. she also bets ITL 1 that the second play will be an O. The subscript n in Xn is to be thought of as time. X1 . the second bettor begins the same betting sequence starting with the second play. A fair amount of money even in Lira. On the other hand. and the conspirators walk away with their winnings and losses which they share. .

. Let Wn denote the position of a random walker.1) 0 otherwise In general. Thus. · · · . that is p0 (k) = P (X0 = k). k pjk = 1 for each j. if at time n.1. The matrix pjk as we’ve deﬁned it tells us how the probabilities change from the nth to the n + 1st step. the random walker has a probability p of taking a step to the left and a probability 1 − p of taking a step to the right. One way to describe such a thing is to consider how Xn depends on X0 . then the process is called a stationary Markov process. X1 . We express this mathematically by saying if j = k − 1 p 1 − p if j = k + 1 P (Wn+1 = j|Wn = k) = (3. They satisfy (merely by virtue of being probabilities ) 1. .2. pjk ≥ 0 2. called the matrix of transition probabilities.3. If it doesn’t. Example 3. once we know the pjk the only information left to know is the initial probability distribution of X0 . · · · . if Xn is a stationary Markov process. The basic question is: Given p0 and pjk what is the . Wn = k then at time n + 1 there is probability p of being at k − 1 and a probability 1 − p of being at k + 1 at time n + 1. Thus. Xn−1 = kn−1 . X0 = k0 ) = P (Xn+1 = j|Xn = kn ) the process is called a Markov process. at time 0. That is. We write pjk = P (Xn+1 = k|Xn = j) The numbers pjk form a matrix (an inﬁnite by inﬁnite matrix if the random variables Xn are inﬁnite).2. for a stochastic process Xn we might specify P (Xn+1 = j|Xn = kn . . W0 = 0 say and at each moment in time. Let’s reiterate: pjk represents the probability of going from state j to state k in one time step. X0 = k0 ) If as in the case of a random walk P (Xn+1 = j|Xn = kn . So pjk a priori depends on n. .2. INTRODUCING STOCHASTIC PROCESSES 57 We are interested in understanding the evolution of such a process. Xn−1 . Xn−1 = kn−1 .

Now the probability of going from state j to state h in n − 1 steps and then going from state h to state k in one step is P (Xn−1 = h|X0 = j)P (Xn = k|Xn−1 = h) = pjh (n−1) (n) phk Now the proposition follows by summing up over all the states one could have landed upon on the n − 1st step.D. must land somewhere on the n − 1st step. it just says that if P = pjk denotes the matrix of transition probabilities from one step to the next. DISCRETE TIME probability distribution of Xn ? Towards this end we introduce the ”higher” transition probablilities pjk = P (Xn = k|X0 = j) which by stationarity also equals P (Xm+n = k|Xm = j) the probability of going from j to k in n steps.1. If P (X0 = j) = p0 (j) is the initial probability distribution of a stationary Markov process with transition matrix P = pjk then P (Xn = k) = j p0 (j)pjk (n) . Proposition 3. then P n the matrix raised to the nth power represents the higher transition probabilities and of course P n+m = P n P m .2.E. Suppose it happens to land on the state h. Expressed in terms of matrix mulitplication. It follows by repeated application of the preceding proposition that pjk (n+m) = h pjh phk (n) (m) This is called the Chapman-Kolmogorov equation. Now we can answer the basic question posed above. Q. Then the probability P (Xn = k|Xn−1 = h) = P (X1 = k|X0 = h) = phk by stationarity.58 CHAPTER 3. pjk = h (n) (n) pjh (n−1) phk Proof: pjk = P (Xn = k|X0 = j) Now a process that goes from state j to state k in n steps.

1. let’s use these facts to do some computations. Let X be a discrete random variable whose states are any integer j and has distribution p(k). Now according to fact 4) above. Since FX (t) = k kp(k)tk−1 . I want to introduce a technique for doing calculations with random variables. 3. Before going on.2. 2. We introduce its generating function FX (t) = p(k)tk k At this point we do not really pay attention to the convergence of FX . since the Xk are mutually independent we have FWn = FX1 FX2 · · · FXn = (FX )n = (pt−1 + (1 − p)t)n n = j=0 (j)(pt−1 )j ((1 − p)t)n−1 n .3. we will do formal manipulations with FX and therefore FX is called a formal power series (or more properly a formal Laurent series since it can have negative powers of t). then FX+Y (t) = FX (t)FY (t). This is an easy calculation as well. Now evaluate at 1. E(X) = FX (1). Example 3. Var(X) = FX (1) + FX (1) − (FX (1))2 . INTRODUCING STOCHASTIC PROCESSES 59 3. Now a Random walk can be described by Wn = X1 + · · · + Xn where each Xk is an independent copy of X: Xk represents taking a step to the left or right at the kth step. Thus FX = pt−1 + (1 − p)t. If X and Y are independent random variables. FX (1) = 1 since this is just the sum over all the states of the probabilities. 4.2.2. Let X be a Random variable representing a Bernoulli trial.2.1 Generating functions Before getting down to some examples. that is P (X = −1) = p. Here are some of the basic facts about the generating function of X. P (X = 1) = 1 − p and P (X = k) = 0 otherwise. which must be 1.

What is its distribution? We can describe X mathematically as follows.2.. Hence our random variable X can be written as N X= j=1 Yj (3. n and is zero for all other values. E(Wn ) = FWn (1). 3. P (N = n) = 36 (6 − |7 − n|) for n = 2. Similarly.) Also. . Roll two dice and read oﬀ the sum on them. .2. FWn (t) = d (pt−1 + (1 − p)t)n = n(pt−1 + (1 − p)t)n−1 ((1 − p) − pt−2 ) and evaluating dt at 1 gives n(p + (1 − p))((1 − p) − p) = n((1 − p) − p). Also. Then ﬂip a coin that number of times and count the number of heads. So P (Wn = n − 2j) = (j)pj (1 − p)n−j for j = 0. in general a composite random variable is one that can be expressed by (3. .60 n n CHAPTER 3. . FWn = n(n − 1)(pt−1 + (1 − p)t)n−2 ((1 − p) − pt−2 )2 + n(pt−1 + (1 − p)t)n−1 (2pt−3 ) and evaluating at 1 gives n(n − 1)((1 − p) − p)2 + n2p. This describes a random variable X. DISCRETE TIME = j=0 (j)pj (1 − p)n−j tn−2j Hence the distribution for the random walk Wn can be read oﬀ as the con eﬃcients of the generating function. 2. Let N denote the random variable of the sum on two dice. 1 let Y count the number of heads in one toss. so that P (Y = k) = 2 if k = 0 or 1 and is 0 otherwise. (Note: This is our third time calculating this distribution. This is an example of a composite random variable. The generating function of such a composite is easy to calculate. 1.2) where Yj are independent copies of Y and note that the upper limit of the sum is the random variable N .2)where N and Y are general random variables.. . So Var(Wn ) = FWn (1) + FWn (1) − FWn (1)2 = n(n − 1)((1 − p) − p)2 + 2np + n((1 − p) − p) − n2 ((1 − p) − p)2 = −n((1 − p) − p)2 + 2np + n((1 − p) − p) = n((1 − p) − p)(1 − ((1 − p) − p)) + 2np = 2pn((1 − p) − p) + 2np = 2np(1 − p) − 2np2 + 2np = 2np(1 − p) + 2np(1 − p) = 4np(1 − p) Consider the following experiment. that is.12 and is 0 otherwise. . So according to a calculation in a previous set of 1 notes.

n where X1. Let X be the composite of two random variables N and Y . we ask that in the nth generation.3. FN = 1 12 1 1 n n=2 2 36 (6−|7−n|)t and FY (t) = 1/2+1/2t. So X0 = 1. Now in the ﬁrst generation we have X1 individuals. Thus Xn−1 Xn = n=1 X1. 2 descendants etc. Suppose we try to model a population (of amoebae. So in our example above of rolling dice and then ﬂipping coins. The above sort of compound random variables appear naturally in the context of branching processes. each individual will produce oﬀspring according to the same distribution as its ancestor X1 . We denote the number of members in the nth generation by Xn . X2 is the compound random variable of X1 and itself. Proof: P (X = k) = n=0 ∞ n P (N = n)P ( j=1 Yj = k) Now. We start with a single individual in the zeroth generation. Continuing on. p(0) is the probability that X0 dies. of subatomic particles) in the following way. The number of oﬀspring X1 is a random variable with a certain distribution P (X1 = k) = p(k). p(1) is the probability that X0 has one descendent. each of which will produce oﬀspring according to the same probability distribution as its parent. INTRODUCING STOCHASTIC PROCESSES 61 Proposition 3.n is an independent copy of X1 . Therefore j=1 ∞ FX = n=1 P (N = n)(FY (t))n = FN (FY (t)) Q. Then FX (t) = FN (FY (t)) = FN ◦ FY . X1 X2 = n=1 X1. Thus in the second generation. Therefore FX (t) = n=2 36 (6− n |7 − n|)(1/2 + 1/2t) from which one can read oﬀ the distribution of X.2.2. of humans. p(2). That is.n . FPn Yj = (FY (t))n since the Yj ’s are mutually independent.2.E. This individual splits into (or becomes) X1 individuals (oﬀspring) in the ﬁrst generation.D.

P (A) and in the continous case xp(x|A) dx Some basic properties of conditional expectation are Proposition 3. DISCRETE TIME Using generating functions we can calculate the distribution of Xn . FX2 (t) = FX1 (FX1 (t)) and more generally.times ◦ FX1 Without generating functions.62 CHAPTER 3. then E(XY |A) = cE(Y |A). / E(X · 1A ) . 3.3 Conditional expectation We now introduce a very subtle concept of utmost importance to us: conditional expectation. E(X + Y |A) = E(X|A) + E(Y |A). 2. If X is a constant c on A.1. 0 if ω ∈ A. So we start with FX1 (t). Then by the formula for the generating function of a compound random variable. this equals xP (X = x|A) x 1 if ω ∈ A. Given an event A.3. FXn = FX1 ◦ · · · n. . We will begin by deﬁning it in its simplest incarnation: expectation of a random variable conditioned on an event. this expression would be quite diﬃcult. (Properties of conditional expectation) 1. deﬁne its indicator to be the random variable 1A = Deﬁne E(X|A) = In the discrete case.

Thus we make the following Deﬁnition 3. CONDITIONAL EXPECTATION 3. Y be two random variables deﬁned on the same sample space. .3.3. . Finally P [E(X|Y ) = e] = {j|E(X|Y =j)=e} P [Y = j] More generally. we may deﬁne E(X|Y0 = y0 . Letting Z = f (Y ). . Then 1.2. Y1 . Let X. . If X and Y are two random variables deﬁned on the same sample space. Some basic properties of the random variable E(X|Y ) are listed in the following proposition. If B = n i=1 63 Ai . Y1 . 2. 3.3. Deﬁne a random variable called E(X|Y ) and called the expectation of X conditioned on Y . But a more useful way to think of the conditional expectation of X conditioned on Y is as a random variable. . . . Y1 = y1 . (Tower property) E(E(X|Y )) = E(X). Proposition 3.1. Let X and Y be two random variables deﬁned on the same sample space. . Y are independent. E(X|Y = j) takes on diﬀerent values. then E(ZX|Y ) = ZE(X|Y ) . Yn ). These are the values of a new random variable. . There is nothing really new here. . then E(X|Y ) = E(X). . Hence it makes sense to talk of E(X|Y = j). The values of this random variable are E(X|Y = j) as j runs over the values of Y .3. then n E(X|B) = i=1 E(X|Ai ) P (Ai ) P (B) (Jensen’s inequality) A function f is convex if for all t ∈ [0. E(f (X)) ≥ f (E(X)) and E(f (X)|A) ≥ f (E(X|A)) . . If X. if Y0 . . Yn = yn ) and E(X|Y0 . then we can talk of the particular kind of event {Y = j}. Yn are a collection of random variables. As j runs over the values of the variable Y . 1] and x < y f satisﬁes the inequality f ((1 − t)x + ty) ≤ (1 − t)f (x) + tf (y) Then for f convex.

.1) 2. . Let X. We have E(ZX|Y = j) = k. . . . . If X is independent of Y0 . . .3. 3. DISCRETE TIME P (Y = j) {j|E(X|Y =j)=e} Then E(E(X|Y )) = = = = = = e eP (E(X|Y ) = e) E(X|Y = j)P (Y = j) j j( k kP (X = k|Y = j)) P (Y = j) j k k(P ((X = k) ∩ (Y = j))/P (Y = j))P (Y = j) j k kP ((X = k) ∩ (Y = j)) = k j kP ((X = k) ∩ (Y = j)) k kP ((X = k) ∩ j Y =j )= k kP (X = k) = E(X) (3.3. 2. .l klP ((Z = k) ∩ (X = l)|Y = j) = k. Yn ) = E(X). then E(ZX|Y0 . (Tower property) E(E(X|Y0 . . . k (3. Yn )) = E(X). .2) This is true for all j so the conditional expectation if a constant. Y0 . then E(X|Y ) = E(X). We also have an analogous statement for conditioning on a set of variables.64 Proof: 1. . . Yn .3. Yn ) Proof: See exercises.l klP ((f (Y ) = k) ∩ (X = l)|Y = j) = f (j) l lP (X = l|Y = j) = ZE(X|Y = j). Yn ). If X. . . 3. . Letting Z = f (Y0 . . . .3. then E(X|Y0 . . . . . Yn be random variables deﬁned on the same sample space. . . (3.D. Yn ) = ZE(X|Y0 .3. Y are independent.3) Q. . Then 1. P (E(X|Y ) = e) = CHAPTER 3. So E(X|Y = j) = = kP (X = k|Y = j) k kP (X = k) = E(X). . Proposition 3. .E.

let’s consider a very simple market consisting of two securities. The expected value with respect to Q of the stock at time 1 is EQ (d2 ) = j d2. thus.4. Let’s go back to our model of the market from the last chapter. we have j qj = 1 and the qj ’s determine a probability Q on the set of states. If Sn were a Martingale. is Sn a Martingale? There are two reasons why not. This is the ﬁrst step to ﬁnding Q such that the discounted stock prices are martingales. Now the value of the stock at time 1 is a random variable. It does not tell us how to ﬁnd Q in actual practice. there is a state price vector ψ so that S = Dψ. E(Rd2 ) = S2 . If the discounted stock value were a Martingale. This is the basic phenomena that we will take advantage of in the rest of this chapter to value derivatives. In fact. For this we will simplify our market drastically and specify a more deﬁnite model for stock movements. Thus we have a vector of securities S and a pay-oﬀ matrix D = (dij ) where security i pays oﬀ dij in state j. with respect to the probabilities Q which were arrived at by the no-arbitrage assumption. That is. Then the value of the stock at time 1 is d2j if state j occurs. But any other security would not be. You might as well just put your money under your mattress.3. to a probability on the set of states such that the expected value of the discounted price of the stock at time 1 is the same as the stock price at time 0. The second reason why the discounted stock price process is not a Martingale is the non-existence of arbitrage opportunities. you would expect to make as much money as in your bank account and people would just stick their money in the bank. the expected discounted stock price at time 1 is the same as the value at time 0. If Sn denotes the value of a security at time n. Under the no-arbitrage condition. What about the discounted security process? This would turn a bank account into a Martingale. Setting R = j ψj and qj = ψj /R as in the last chapter. We’ve seen that the no-arbitrage hypothesis led to the existence of a state price vector. which we will call d2 with respect to the probabilities Q.j qj = (D(ψ/R))2 = (S/R)2 = S2 /R In other words. The ﬁrst reason is the time value of money. no one would buy it. or to what is more or less the same thing. The arbitrage theorem of last chapter only gives us an existence theorem. . DISCRETE TIME FINANCE 65 3. P (d2 = d2j ) = qj . a bond S1 and a stock S2 .4 Discrete time ﬁnance Now let us return to the market.

2) If h is a contingency claim on S. B0 → erδt B0 (3. The bond will go to erδt B0 . at time 1 · δt. the value the derivative is V (h) = REQ (h(S1 )) u d = e−rδt (qh(S1 ) + (1 − q)h(S1 )) = e−rδt d erδt S0 − S1 u d S 1 − S1 u h(S1 ) + u S1 − erδt S0 u d S1 − S1 d h(S1 ) u = e−rδt h(S1 ) d erδt S0 − S1 u d S1 − S1 d + h(S1 ) u S1 − erδt S0 u d S1 − S 1 . u S1 S0 d S1 . DISCRETE TIME Our market will consist of a single stock S and a bond B.1) According to our picture up there we need to ﬁnd a probability Q such that EQ (e−rδt S1 ) = S0 . at time 0.4.4. the stock will either go up to some u d value S1 or down to S1 . Now. a portfolio consisting of the stock and bond so that no matter whether the stock goes . That is. then according to our formula of last chapter. that is. Then we want u d S0 = EQ (e−rδt S1 ) = qe−rδt S1 + (1 − q)e−rδt S1 Thus we need u d d q(e−rδt S1 − e−rδt S1 ) + e−rδt S1 = S0 So d erδt S0 − S1 u d S1 − S1 S u − erδt S0 1−q = 1 u d S1 − S 1 q= (3. Let’s try to construct a replicating portfolio for h. Let q be the probability that the stock goes up. Let’s rederive this based on an ab initio arbitrage argument for h.66 CHAPTER 3. At time 1. u (up) or d (down). Let us change notation a little. there will be only two states of nature. the stock is worth S0 and the bond is worth B0 .

3) φ units of stock S0 and ψ units of bond B0 .5) . or φS1 + ψerδt B0 if the state is d. We want this value to be the same as h(S1 ). u At time δt. So we ﬁnd φ= while d u h(S1 ) − h(S1 ) u d S1 − S1 u u h(S1 ) − φS1 erδt B0 u d h(S1 )−h(S1 ) u S1 u −S d S1 1 erδt B0 ψ= = u h(S1 ) − = u u u d u u d u h(S1 )S1 − h(S1 )S1 − h(S1 )S1 + h(S1 )S1 u d erδt B0 (S1 − S1 ) u u u d h(S1 )S1 − h(S1 )S1 = u d erδt B0 (S1 − S1 ) Thus. the value of the portfolio at time 0 is d u d u h(S1 )−h(S1 ) h(S1 )S u −h(S1 )S d · S0 + erδt B1 (S u −S d ) 1 · B0 u d S1 −S1 0 1 1 u d u u erδt (h(S1 )−h(S1 )) h(S d )S1 −h(S1 )S d · S0 + e1rδt (S u −S d ) 1 u d erδt (S1 −S1 ) 1 1 d u erδt S0 −S1 S1 −erδt S0 −rδt u d e h(S1 ) S u −S d + h(S1 ) S u −S d 1 1 1 1 φS0 + ψB0 = = = (3. u d u d φ(S1 − S1 ) = h(S1 ) − h(S1 ). the portfolio is worth the same as the derivative. . DISCRETE TIME FINANCE 67 up or down.4) Subtracting.3. Thus. the value of this portfolio is worth either φS1 + ψerδt B0 if the d state is u. suppose at time 0 we set up a portfolio θ= φ ψ (3.4.4.4. So we solve for φ and ψ: u u φS1 + ψerδt B0 = h(S1 ) d d φS1 + ψerδt B0 = h(S1 ) (3.4.

So we’ve arrived the value of the derivative by a somewhat diﬀerent argument. q u Sδt S0 1−q d Sδt u d Sδt − S0 erδt S0 erδt − Sδt .4. time 0 to some horizon T . each of length δt so that N δt = T . they must be equal at time 0.68 CHAPTER 3. du dd S2δt − S2δt u ud erδt S1δt − S2δt . One Step Redo with δt . We now want to divide the time period from now. DISCRETE TIME Since the value of this portfolio and the derivative are the same at time 1. 1−q = q= u d u d Sδt − Sδt Sδt − Sδt (3. Two Steps •uu u q1 u 1−q1 •u q0 → •ud (3.4.6) Then for a contigency claim h we have V (h) = e−rδt EQ (h). into N periods. which will often be the expiration time for a derivative. We are interested in deﬁning a reasonably ﬂexible and concrete model for the movement of stocks. uu ud S2δt − S2δt d 1 − q1 = u 1 − q1 = du d S2δt − erδt S1δt du dd S2δt − S2δt uu u S2δt − erδt S1δt uu ud S2δt − S2δt . by the no-arbitrage hypothesis. Now we move on to a multi-period model.7) • 1−q0 d 1−q1 •d → d q1 •du •dd d q1 = u q1 = d dd erδt S1δt − S2δt .

wN −1 ) = j=0 P (wj ) . Finally. Hence we have V (h) = e−2rδt EQ (h(S2 )). wN −1 |wi = u or d}. P (du) = (1−q0 )q1 . That is. Then we have u u uu u ud h(S1δt ) = e−rδt (q1 h(S2δt ) + (1 − q1 )h(S2δt )). u d h(S0 ) = e−rδt (q0 h(S1δt ) + (1 − q0 )h(S1δt )) u uu u ud d du d dd = e−rδt (q0 (e−rδt (q1 h(S2δt )+(1−q1 )h(S2δt )))+(1−q0 )(e−rδt (q1 h(S2δt )+(1−q1 )h(S2δt ))) u uu u ud d du d dd = e−2rδt (q0 q1 h(S2δt )+q0 (1−q1 )h(S2δt )+(1−q0 )q1 h(S2δt )+(1−q0 )(1−q1 )h(S2δt )). ud. dd} u u d d P (uu) = q0 q1 . So what’s the general case? For N steps.4. P (dd) = (1−q0 )(1−q1 ). if we deﬁne a sample space Ψ = {uu. d du d dd d h(S1δt ) = e−rδt (q1 h(S2δt ) + (1 − q1 )h(S2δt )). du. Thus. u d S1δt − S1δt 69 1 − q0 = u S1δt − erδt S0 u d S1δt − S1δt Suppose h is a contingency claim. the sample space is Ψ = {w0 w1 . then down. •• u First. . q0 q1 represents u going up twice. N −1 P (w0 w1 . . Let’s try to organize this into an N -step tree. . q0 (1 − q1 ) represents going up. We check that this is a probability on the ﬁnal nodes of the two step tree: P (uu) + P (ud) + P (du) + P (dd) u u d d = q0 q1 + q0 (1 − q1 ) + (1 − q0 )q1 + (1 − q0 )(1 − q1 ) u u d d = q0 (q1 + 1 − q1 ) + (1 − q0 )(q1 + 1 − q1 ) = q0 + 1 − q0 = 1. DISCRETE TIME FINANCE q0 = d erδt S0 − S1δt . these coeﬃcients are the probability of ending up at a particular node of the tree. Now h(S2 ) is a random variable depending on w0 w1 ∈ Ψ. . let’s notice what the coeﬃcients of h(S2δt ) are. P (ud) = q0 (1−q1 )..3. and that we know h(S2δt ). etc.

E(Mn+1 |Mn = mn . . . . (or equivalently E(Mn+1 |Mn . . . M0 = m0 ) are the values of a random variable which we called E(Mn+1 |Mn . . Mn−1 = mn−1 . . We say Mn is a submartingale if 1. . M0 = m0 ) = mn . Let Mn be a stochastic process. . . . Above as E(Mn+1 |Mn . Mn−1 . .1. . E(|Mn |) < ∞. .5. Mn−1 . M0 = m0 ) ≥ mn . M0 ) ≤ Mn ). and 2. M0 ) we may write the condition 2. . M0 ) ≥ Mn .5 Basic martingale theory We will now begin our formal study of martingales. and 2. Mn−1 = mn−1 . We also deﬁne a stochastic process Mn to be a supermartingale if 1. E(|Mn |) < ∞. . Noting that E(Mn+1 |Mn = mn . .70 where P (w0 ) = and more generally P (wj ) = ( CHAPTER 3. . E(Mn+1 |Mn = mn . M0 = m0 ) ≤ mn (or equivalently E(Mn+1 |Mn . . E(|Mn |) < ∞. . Deﬁnition 3. . . . We say Mn is a Martingale if 1. Mn−1 = mn−1 . . . E(Mn+1 |Mn = mn . The following theorem is one expression of the fair game aspect of martingales. Mn−1 = mn−1 . Mn−1 . . . . Mn−1 . . . . and 2. M0 ) = Mn . DISCRETE TIME q0 if w0 = u 1 − q0 if w0 = d q1 0 j−1 if wj = u w0 ···wj−1 1 − q1 ifwj = d w ···w V (h) = e−rN δt EQ (h(SN )) 3. . .

Xn = xn ) = E(Xn |X0 = x0 . Examples 1. For E(Xn+1 |X0 = x0 . the tower property of conditional expectation implies E(Yn ) = E(E(Yn |Ym . . .5. Xn if 1. . is similar. Y0 )) ≤ E(Ym ). A stochastic process M0 . Xn = xn ) = E(Xn + ξn+1 |X0 = x0 . together. For n ≥ m. is implied by 1. Xn = xn ) + E(ξn+1 |X0 = x0 . then E(Yn ) ≤ E(Ym ) for n ≥ m. . Let Xn = a + ξ1 + · · · ξn where ξi are independent random variables with E(|ξi |) < ∞ and E(ξi ) = 0. . .1. 2. If Yn is a 1. then E(Yn ) ≥ E(Ym ) for n ≥ m. . . . . . . . martingale E(Yn ) = E(Ym ) for n. m. 2.2. . Xn = xn ) Now by property 1 of conditional expectation for the ﬁrst summand and property 2 for the second this is = Xn + E(ξn+1 ) = Xn . M1 . But we will usually assume that there is some basic or driving stochastic process and that others are expressed in terms of this one and deﬁned on the same sample space. . . Deﬁnition 3. . . Then Xn is a martingale. . . . .3. Xn = xn ) = 0. supermartingale. . Q. E(|Mn |) < ∞ and 2. BASIC MARTINGALE THEORY Theorem 3. or to express a stochastic process in terms of other ones. (Generalization) Let Xn be a stochastic process.E. It will often be useful to consider several stochastic processes at once.D. . submartingale. . is a martingale relative to X0 . E(Mn+1 − Mn |X0 = x0 . 3. . Proof 71 1. Ym−1 .5. Thus we make the following more general deﬁnition of a martingale. and 2. . . 3. .5. . .

X0 ). CHAPTER 3. Xn ). . . . . Xn ). Let Xn be a driving stochastic process and let Y be a random variable deﬁned on the same sample space. . Then Xn is a martingale. Xn ) be the collection of random variables that can be written as g(X0 . .3. We write E(Y |Fn ) for E(Y |X0 . . Fn represents all the information that is knowable about the stochastic process Xk at time n and we call it the ﬁltration induced by Xn . . . σ 2 (ξi ) = σi . . . . X0 ) is by deﬁnition E(E(Y |Xn+1 . . Then Mn = Sn − vn is a i=1 martingale relative to S0 . . Xn ) for some function g of n + 1 variables. Let F(X0 . We restate Proposition (3. . Sn . . . . For 2 2 Mn+1 = (Sn+1 − vn+1 ) − (Sn − vn ) 2 = (Sn + ξn+1 )2 − Sn − (vn+1 − vn ) 2 2 = 2Sn ξn+1 + ξn+1 − σn+1 . . . S1 . . . (3. This will be very important to us. X0 ) = Yn We now introduce a very useful notation. . S0 constant. . Sn ) 2 2 = E(2Sn ξn+1 + ξn+1 − σn+1 |S0 . .5. . . . . Let Sn = S0 +ξ1 +· · ·+ξn . X0 )|Xn . More compactly. 2. . ξi independent random variables. Then we may deﬁne Yn = E(Y |Xn . . 3. . . the symmetric random walk. . There is a multiplicative analogue of the previous example.72 and thus a martingale. . Then Yn is a martingale.3) in this notation. .1) and thus E(Mn+1 − Mn |S0 . We also have an analogous statement for conditioning on a set of variables. . DISCRETE TIME A special case of this is when ξi are IID with P (ξi = +1) = 1/2. . we have E(Yn+1 |Xn . For by the tower property of conditional expectation. . . . . . . 2 2 2 E(ξi ) = 0. Let Xn = aξ1 · ξ2 · · · ξn where ξi are independent random variables with E(|ξi |) < ∞ and E(ξi ) = 1. Set vn = n σi . . . we let Fn = F(X0 . . and P (ξi = −1) = 1/2. Sn ) 2 2 = 2Sn E(ξn+1 ) + σn+1 − σn+1 = 0 4. X0 )) = E(Y |Xn . . given a stochastic process Xk . . We leave the veriﬁcation of this to you. There is a very general way to get martingales usually ascribed to Levy.

adapted to Xn (or to Fn ) if φn ∈ Fn for all n. . Theorem 3. It is a completely natural assumption that φn should only depend on the history of the game up to time n−1. which gives us yet another way of forming new martingales from old. Xn are independent. X0 . Fn the ﬁltration associated to Xn . We say a process φn is 1. Let φn denote the amount of our bet on the nth hand. 3. . Given a bet of φn on the nth hand our winnings playing this game up til time n will be φ1 (X1 − X0 ) + φ2 (X2 − X1 ) + · · · φn (Xn − Xn−1 n = j=1 φj (Xj − Xj−1 n = j=1 φj ∆Xj This is called the martingale transform of φ with respect to Xn . Let Xn be a stochastic process and Fn its ﬁltration. X0 . be a stochastic process with ﬁltration Fn . Suppose we are playing a game and that our winnings per unit bet in the nth game is Xn − Xn − 1.5.5. Let Yn be a supermartingale relative to X0 . X1 . Letting X ∈ Fn . . .2. . . So X = f (X0 . E(E(X|Fn )|Fm ) = E(X|Fm ) if m ≤ n.1. 2. . Xn . . previsible relative to Xn (or Fn ) if φ ∈ Fn−1 . Then Gn = G0 + n φm (Ym − Ym−1 ) is an Fn -supermartingale. Let X. .3. Xn ). . We motivate the deﬁnitions above and the following construction by thinking of a gambling game. 2. The name comes from the second theorem below. .. and X a random variable deﬁned on the same sample space. If X.5. then E(X|Fn ) = E(X). m=1 . This is the previsible hypothesis. Then 1.5. . . We denote it more compactly by ( φ∆X)n . Let Sn be a stochastic process representing the price of a stock at the nth tick. . . Let φn be a previsible process and 0 ≤ φn ≤ cn . then E(XY |Fn ) = XE(Y |Fn ) Deﬁnition 3. . . BASIC MARTINGALE THEORY 73 Proposition 3.3. .

.5.3) (3. . . If φn+1 ∈ Fn .5. Xn−1 . . Xn ) = φn+1 E(Yn+1 − Yn |X0 . (3. Let T ∧ n be the minimum of T and n. 0 if T (ω) < n. . we deﬁne a simple betting strategy of betting 1 until the stopping time tells us to stop.. Let Yn be a supermartingale relative to Xi and T a stopping time. Xn ).2) Let’s check that φn ∈ F(X0 . T . . Xn ) ≤ φn+1 · 0 ≤ 0. then Gn = G0 + n φm (Ym − Ym−1 ) is a martingale. By deﬁnition. .D. if n > T. Example: In this example. = n if T (ω) ≥ n. Now {φn (ω) = 0} = {T ≤ n − 1} = n−1 {T = k} k=0 is determined by X0 . i. Let Yn be a martingale relative to Xi and T a stopping time. . is called a stopping time relative to Xi . . a random variable. (3. DISCRETE TIME Proof Gn+1 = Gn + φn+1 (Yn+1 − Yn ).4. . Thus (T ∧ n)(ω) = T (ω) if T (ω) < n.5. Then YT ∧n is a submartingale. . Xn . {T (ω) = n} is determined by X0 .4) Theorem 3. Q.5. . . . 1. Then YT ∧n is a supermartingale. 2. . . . if the event {T = n} is determined by X0 . Thus E(Gn+1 − Gn )|X0 . Proof: Let φn be the previsible process deﬁned above. .e. Xn ) = E(φn+1 (Yn+1 − Yn )|X0 . . and Yn is a martingale relative to Xi . φn = 1 0 if n ≤ T. . m=1 Deﬁnition 3.2. 3. . . set φn (ω) = 1 if T (ω) ≥ n. |φn | ≤ cn . . . Then YT ∧n is a martingale. .5.5) . (3. Thus. Theorem 3. Xn .E. . .3. .74 CHAPTER 3.5. Let Yn be a submartingale relative to Xi and T a stopping time.5.

that is. Under more realistic hypotheses. (Doob’s optional stopping theorem) Let Mn be a martingale relative to Xn . But it requires an arbitrarily pot to make this work. Q.5.E. which is a martingale. Example: The following betting strategy of doubling after each loss is known to most children and is in fact THE classic martingale. ξi are independent. = YT ∧n 75 (3. T a stopping time. According to what we proved above. E(GT ) = 1. Q. when you ﬁnely win according to this betting strategy. Proof We know that E(MT ∧n ) = E(M0 ) since by the previous theorem MT ∧n is a martingale. On the other hand. Then φn is previsible. We now discuss one last theorem before getting back to the ﬁnance. if Xn is a martingale and φn is previsible then the martingale transform . Then |E(M0 ) − E(MT )| = |E(MT ∧n ) − E(MT )| ≤ 2KP (T > n) → 0 as n → ∞. identically distributed random 2 n variables. we ﬁnd Theorem 3. the object that gives martingales their name.6) So it follows by the above. Now φn is the previsible process deﬁned in the following way: φ1 = 1. BASIC MARTINGALE THEORY Then Gn = Y0 + n φm (Ym − Ym−1 ) m=1 = Y0 + φ1 (Y1 − Y0 ) + φ2 (Y2 − Y1 ) + · · · + φn (Yn − Yn−1 ) Yn if n ≤ T = YT if n > T. To deﬁne φn we set inductively φn = 2φn−1 if ξn−1 = −1 and φn = 1 if ξn−1 = 1. An alternative version is Theorem 3.3. If Mn is a martingale relative to Xn .4. Let P (ξ = ±1) = 1 . Now consider the process Gn = ( φ∆X)n . you always win 1. If there is a K such that |MT ∧n | ≤ K for all n. then E(MT ) = E(M0 ).D.5.5.5. Now deﬁne and T = min{n|ξn = 1}.5. This seems like a sure ﬁre way to win.D. T a stopping time relative to Xn with P (Tξ < ∞) = 1. |Mn+1 − Mn | ≤ K and E(T ) < ∞. This is the martingale representation theorem. Xn = i=1 ξi . then E(MT ) = E(M0 ). Then Gn is a martingale by a previous theorem.E.

5. . . . . Xn ) = b) = 1 − p where I am being a little sloppy in that a. . . The martingale representation theorem is a converse of this. and p are all dependent on X1 . X1 . X2 . . Xn )|Fn ) = Xn + E(ξn (X0 . b.76 CHAPTER 3. .8) Theorem 3.6. the value of Xn+1 − Xn can only be one of two possible values. Let X0 . . . we need a condition on our driving martingale. Thus we deﬁne Deﬁnition 3. . Xn ) = a) = p and P (ξn+1 (X1 . . A generalized random walk or binomial process is a stochastic process Xn in which at each time n. be a generalized random walk with associated ﬁltration Fn . Xn ) Hence assume that P (ξn+1 (X1 .5.5. It gives conditions under which a martingale can be expressed as a martingale transform of a given martingale.5. Let Mn be a Martingale relative to Fn .7) So in order that Xn be a martingale we need pa + (1 − p)b = 0 (3. This like an ordinary random walk in that at every step the random walker can take only 2 possible steps. Xn . For this. Xn )|Fn ) = Xn + pa + (1 − p)b (3. . Thus Xn+1 = Xn + ξn+1 (X1 . . . . . . . Then there exists a previsible process φn such that n Mn = i=1 φi (Xi − Xi−1 ) = ( φ∆X)n .5. Then E(Xn+1 |Fn ) = E(Xn + ξn (X0 . . but what the steps she can take depend on the path up to time n. . . . . . . DISCRETE TIME ( φ∆X)n is also a martingale. .

BASIC MARTINGALE THEORY 77 Proof Since Mn is adapted to Xn . . Xn+1 |Fn ) − fn (X0 .. Xn .... . . . . . . Or equivalently. . Xn . ... . Xn + a) · p + fn+1 (X0 . . .Xna (fn+1 (X0 .. Both calculations are the same. .. Xn )) a (fn+1 (X0 . Xn . (assuming neither a nor p are zero. . (3. Xn ). . . Xn and is therefore previsible. Xn + b) − fn (X0 .5.Xn . . .. Xn+1 |Fn ) − fn (X0 .. . (fn+1 (X0 . . . . with probability p.. with probability 1−p.. By induction. . . . . .Xn )) + b(1 − p) b (fn+1 (X0 . . we need to show there exists φn+1 ∈ Fn with Mn+1 = φn+1 (Xn+1 − Xn ) + Mn . Xn )) b Set φn+1 to be equal to this common value.. . . Xn )|Fn ) = E(fn+1 (X0 ...Xn . Clearly....8)... .. .9) Now by assumption.3.. .E.Xn +b)−fn (X0 . Xn )) · (1 − p) +a)−fn (X0 .. .Xn )) = ap a (fn+1 (X0 . . . . Xn . . So continuing the earlier calculation. . Xn )) · p + (fn+1 (X0 .. Xn ) = (fn+1 (X0 .. . b (3.Xn +b)−fn (X0 . Xn + b) · (1 − p). . .. . Xn + a) − fn (X0 . .10) since b(1 − p) = −ap by (3. φn+1 only depends on X0 . .5. . So E(fn+1 (X0 . .Xn . Now we have to cases to consider: if Xn+1 = Xn + a or Xn+1 = Xn + b. . . Xn . . 0 = E(fn+1 (X0 . Xn+1 )|Fn ) = fn+1 (X0 . . . Xn + b) − fn (X0 . . Xn+1 ) − fn (X0 . . . .5.D. Xn ). Xn + a) − fn (X0 . .. . .Xn +a)−fn (X0 .Xn )) ap = (fn+1 (X0 . .Xn . . .5.Xn )) + (−ap). .. Xn .. and Xn +b. . there exist functions fn such that Mn = fn (X0 . Xn+1 can take on only 2 possible values: Xn + a.. . . . . So if Xn+1 = Xn + a then = φn+1 (Xn+1 − Xn ) + Mn = φn+1 a + Mn = Mn+1 Q. .. . . . . . . .. Now 0 = E(Mn+1 − Mn |Fn ) = E(fn+1 (X0 . .

We require two conditions on Π. And this is also what we require of the portfolio. the stock is still worth Sn−1 . 2. So Πn = φn Sn + ψn B0 . that our riskless bond is given by the standard formula.78 CHAPTER 3. ˜ ˜ Thus Πn = φn Sn + ψn Bn . After tick n − 1 the protfolio is worth Πn−1 = φn−1 Sn−1 + ψn−1 Bn−1 Between tick n − 1 and n we readjust the portfolio in anticipation of the nth tick. We will assume. Bn = enrδt B0 . where each tick of the clock represents some time interval δt. The self-ﬁnancing condition is derived from the following considerations. between the n − 1st and nth tick. Therefore we require (SF C1) φn Sn−1 + ψn Bn−1 = φn−1 Sn−1 + ψn−1 Bn−1 . So suppose we have a stock whose price process we denote by Sn . then Vn is a martingale as well. Suppose h denotes the payoﬀ function from a derivative on Sn . ˜ Step 1: Find Q such that Sn = e−rnδt Sn is a Q-martingale. The claim needs no infusion of money and provides no income until expiration. Also. 1. In order for the portfolio to be viewed as equivalent to the claim h. There are three steps to pricing h. If Vn represents some price ˜ process. This too is ﬁnancially justiﬁed. φn and ψn should be previsible. then Vn = e−nrδt Vn is the discounted price process.6 Pricing a derivative We now discuss the problem we were originally out to solve: to value a derivative. it must act like the claim. where Bn = ernδt B0 . DISCRETE TIME 3. But we only want to use the resources that are available to us via the portfolio itself. we will form a portfolio Π consisting of the stock and bond. We use the following notation. The portfolio Π should be self ﬁnancing. By ˜ Step 2: Set V the martingale representation theorem there exists a previsible φi such that n ˜ Vn = V0 + i=1 ˜ ˜ φi (Si − Si−1 ). From a ﬁnancial point of view this is a completely reasonable assumption. ˜n = E(e−rN δt h(SN )|Fn ). . to be concrete. since decisions about what to hold during the nth tick must be made prior to the nth tick. Just as in the one period case. Let Fn denote ˜ the ﬁltration corresponding to S. Step 3: Construct a self-ﬁnancing hedging portfolio.

As for ψn .6. So set ˜ ˜ ψn = (Vn − φn Sn )/B0 or equivalently ψn = (Vn − φn Sn )/Bn ˜ ˜ ˜ Clearly. 1. Now we want Πn = φn Sn + ψn B0 to equal Vn . we have ˜ ˜ Vn − φn Sn ψn = B0 n−1 ˜ ˜ ˜ ˜ ˜ φn (Sn − Sn−1 ) + j=1 φj (Sj − Sj−1 ) − φn Sn = B0 n−1 ˜ ˜ ˜ −φn Sn−1 + j=1 φj (Sj − Sj−1 ) = B0 .6. (3.3. 2. PRICING A DERIVATIVE 79 Another equivalent formulation of the self-ﬁnancing condition is arrived at by (∆Π)n = Πn − Πn−1 = φn Sn + ψn Bn − (φn−1 Sn−1 + ψn−1 Bn−1 ) = φn Sn + ψn Bn − (φn Sn−1 + ψn Bn−1 ) = φn (Sn − Sn−1 ) − ψn (Bn − Bn−1 ) = φn (∆S)n + ψn (∆B)n So we have (SCF 2) (∆Π)n = φn (∆S)n + ψn (∆B)n .1) ˜ Now we have Vn = EQ (e−rN δt h(SN )|Fn ) and we can write n ˜ Vn = V0 + i=1 ˜ φi ∆Si . Πn = φn Sn + ψn B0 = Vn . self-ﬁnancing of the portfolio. ˜ ˜ ˜ for some previsible φi . previsibility of φn and ψn . Previsibility: φn is previsible by the martingale representation theorem. We now need to check two things: 1.

Thus suppose that we want to invest in a stock. U is required to satisfy two conditions: .80 CHAPTER 3. This does not take into account our aversion to risk. Self-ﬁnancing: We need to verify φn Sn−1 + ψn Bn−1 = φn−1 Sn−1 + ψn−1 Bn−1 or what is the same thing (SF C3) ˜ (φn − φn−1 )Sn−1 + (ψn − ψn−1 )B0 = 0 We calculate the right term (ψn − ψn−1 )B0 =( ˜ ˜ ˜ ˜ Vn − φn Sn Vn−1 − φn−1 Sn−1 − )B0 B0 B0 ˜ ˜ ˜ ˜ = Vn − Vn−1 + φn−1 Sn−1 − φn Sn ˜ ˜ ˜ ˜ = φn (Sn − Sn−1 ) + φn−1 Sn−1 − φn Sn ˜ = −(φn − φn−1 )Sn−1 which clearly implies (SFC3). but the techniques are valid in much more generality. 2. ψn ) which maximizes something. It is better to maximize E(U (Πn )) where U is a (differentiable) utility function that encodes our risk aversion. because it has a higher expected return. In our ﬁrst go round. To maximize the expected value of ΠN where N represents the time horizon. we must have that the value of h ˜ at time n (including time 0 or now) is V (h)n = Πn = Vn = ernδt Vn = ernδt EQ (e−rN δt h(SN )) = e−r(N −n)δt EQ (h(SN )) 3.7 Dynamic portfolio selection In this section we will use our martingale theory and derivative pricing theory to solve a problem in portfolio selection. We will carry this out in a rather simpliﬁed situation. By the no-arbitrage hypothesis. we just want to ﬁnd a portfolio Πn = (φn . Sn and the bond Bn . So ﬁnally we have constructed a self-ﬁnancing hedging strategy for the derivative h. will just result in investing everything in the stock. DISCRETE TIME ˜ which is manifestly independent of Sn . hence ψn is previsible.

And ﬁnally. with no other additions or withdrawals. U should be decreasing. for Πn to be realistically tradeable. which just represents that more money is better than less. and means U (x) > 0 2. that is U < 0 which represents that the utility of the absolute gain is decreasing.7. not the risk neutral one. Self-ﬁnancing previsible portfolio processes Πn and 2.1) We will carry out the problem for U (x) = U0 (x) = ln(x) and leave the case of Up (x) as a problem. If in addition we want the value of the portfolio to satisfy Π0 = W0 then our problem max(EP (ln(ΠN ))) subject to Π0 = 0 can be reformulated as max(EP (ln(h))) subject to the value of h at time 0 is W0 . if we want to let this portfolio go. DYNAMIC PORTFOLIO SELECTION 81 1. In chapter 3 we saw that there was an equivalence between two sets of objects: 1. Some common utility functions are U0 (x) = ln(x) xp for 0 = p < 1 Up (x) = p (3. So our problem is to now maximize EP (ln(Πn )) but we have constraints. So we want Π0 = W0 for some initial wealth W0 . then we want it to be self ﬁnancing. Thus Πn should satisfy SFC.3. Here P represents the real world probabilities. want that the value of our portfolio at time 0 is the amount that we are willing to invest in it. since we want to maximize in the real world. Second. derivatives h on the stock S. U should be increasing. that is the utility of going from a billion dollars to a billion and one dollars is less than the utility of going from one hundred dollars to one hundred and one dollars. it also must satisfy that it is previsible.7. First.

According to this method. We need the following theorem which allows us to compare them. (Radon-Nikodym) Given two probabilities P and Q assume that if P (E) = 0 then Q(E) = 0 for any event E. For each outcome ω. to ﬁnd an extremum of the function max(EP (ln(h))) subject to EP (ξ exp(−rT )h)) = W0 we introduce the mutliplier λ. since the value of a derivative h at time 0 is EQ (exp(−rT )h) where Q is the risk neutral probabilities. In our discrete setting this theorem is easy. Then there exists a random variable ξ such that EQ (X) = EP (ξX) for any random variable X. DISCRETE TIME And now we may use our derivative pricing formula to look at this condition.7.82 CHAPTER 3. Theorem 3. set ξ(ω) = Q(ω)/P (ω).1. if P (ω) = 0 and 0 otherwise. Thus the above problem becomes max(EP (ln(h))) subject to EQ (exp(−rT )h)) = W0 Now we are in a position where we are really using two probabilities at the same time. And we solve d | =0 (EP (ln(h + v))) − λEP (ξ exp(−rT )(h + v))) = 0 d . Proof. Then EQ (X) = ω X(ω)Q(ω) Q(ω) = EP (ξX) P (ω) = ω X(ω)P (ω) So now our problem is max(EP (ln(h))) subject to EP (ξ exp(−rT )h)) = W0 We can now approach this extremal problem by the method of Lagrange multipliers.

. it must be the case that 1 − λξ exp(−rT ) = 0 h almost surely. . be IID with law P (Xj = ±1) = 1/2. . . Sn is just a binomial stock model with √ u = (1 + µ · δt + σ δt) and √ d = (1 + µ · δt − σ δt) (3. Thus let X1 .7. DYNAMIC PORTFOLIO SELECTION for all random variables v. To calculate λ we put h back into the constraint: W0 = EP (ξ exp(−rT )h) =EP (ξ exp(−rT ) · exp(rT )/(λξ)) =EP (1/λ) So λ = 1/W0 and thus ﬁnally we have h = W0 exp(rT )/ξ To go further we need to write down a speciﬁc model for the stock. X2 .7. In order for this expectation to be 0 for all v. Now the left hand side is EP ( d d | =0 ln(h + v)) − λEP ( | =0 ξ exp(−rT )(h + v)) = d d 1 EP ( v − λξ exp(−rT )v) = h 1 EP ([ − λξ exp(−rT )]v) = h 83 (3.3) The sample space for this process can be taken to be sequences x1 x2 x3 · · · xN where xi is u or d representing either an up move or a down move.7. Set S0 be the initial value of the stock and inductively √ Sn = Sn−1 (1 + µ · δt + σ δtXn ) In other words. Hence h = exp(rT )/(λξ).2) and we want this to be 0 for all random variables v.3. Then by our formula for the risk neutral probability √ exp rδt − d rδt − (µδt − σ δt) √ √ q= = u−d (µδt + σ δt) − (µδt − σ δt) .

φn = Vn − Vn−1 Sn − Sn−1 Now let’s recall that by the fact that Sn is a martingale with respect to Q u d Sn−1 = EQ (Sn |Fn−1 ) = q Sn + (1 − q)Sn .7. This tells us that we take Vn = EQ (exp(−rT )h|Fn ) Then by the martingale representation theorem there is a previsible process φn such that n Vn = V0 + k=1 φk (Sk − Sk−1 ) Moreover. 3.84 up to ﬁrst order in δt. For this we use our theory from chapter 3 again.1: ξ(x1 x2 x3 · · · xN ) = ξ1 (x1 )ξ2 (x2 ) · · · ξN (xN ) where xi denotes either u or d and ξn (u) = and ξn (d) = q = p 1+( r−µ √ ) δt σ r−µ √ ) δt σ (3. So q= and 1−q = 1 2 1−( 1 2 1+( CHAPTER 3.4) 1−q = 1−p 1−( We now know which derivative has as its replicating portfolio the portfolio that solves our optimization problem. DISCRETE TIME r−µ √ ) δt σ r−µ √ ) δt σ This then determines the random variable ξ in the Radon-Nikodym theorem. it tells us how pick φn . We now calculate what the replicating portfolio actually is.7.

φn must be previsible and therefore independent of whether S takes an uptick or downtick at time n. Theorem 3. From this theorem it follows that Vn = EQ (h|Fn ) = Therefore we see that Vn = Vn−1 ξn (xn ) EP (ξ h|Fn ) EP (ξ|Fn ) EP (ξX|A) EP (ξ|A) .7.) Finally let’s remark that φn can be expressed in terms of the undiscounted prices φn = u d Vn − Vn u d Sn − Sn = u d Vn − Vn u d Sn − Sn since both the numerator and the denominator are expressions at the same time n and therefore the discounting factor for the numerator and the denominator cancel. Now we use the following reﬁnement of the Radon-Nikodym theorem. we see that if S takes an uptick at time n that φn = Vn − Vn−1 Sn − Sn−1 = u u d Vn − (q Vn + (1 − q)Vn ) u u d Sn − (q Sn + (1 − q)Sn ) = u d Vn − Vn u d Sn − S n A similar check will give the same exact answer if S took a downtick at time n.3. (After all. Similarly. Then for any event A. we have d u Vn−1 = EQ (Vn |Fn−1 ) = q Vn + (1 − q)Vn From this. DYNAMIC PORTFOLIO SELECTION 85 u where Sn denotes the value of Sn if S takes an uptick at time n.7. Given two probabilities P and Q assume that if P (E) = 0 then Q(E) = 0 for any event E. we have EQ (X|A) = I will ask you to prove this. Let ξ be the random variable such that EQ (X) = EP (ξX) for any random variable X.2.

7. So we have φn = d u Vn − Vn u d S n − Sn Vn−1 = Vn−1 Sn−1 Vn−1 Sn−1 = = 1 − 1−( r−µ )√δt ( ) σ √ Sn−1 (2σ δt) √ √ 1 − ( r−µ ) δt − 1 + ( r−µ ) δt σ σ √ r−µ 2 1 − ( σ ) δt 2σ δt √ 2( r−µ ) δt σ √ − 2σ δt (1+( r−µ ) σ 1 √ δt) (3.5) up to terms of order δt.86 CHAPTER 3.7. DISCRETE TIME recalling that xn denotes the nth tick and ξn is given by (3.4).7. Vn = ξVn−1) so n (xn u Vn = Vn−1 Vn−1 = √ ξn (u) 1 + ( r−µ ) δt σ Vn−1 Vn−1 = √ ξn (d) 1 − ( r−µ ) δt σ d Vn = We also have so √ u Sn = Sn−1 (1 + µδt + σ δt) √ u Sn = Sn−1 (1 + (µ − r)δt + σ δt) √ d Sn = Sn−1 (1 + (µ − r)δt − σ δt) up to ﬁrst order in δt. So we have φn = Vn−1 ( µ−r ) σ2 Sn−1 Vn−1 ( µ−r ) σ2 = Sn−1 (3. Vn−1 represents the value of the portfolio at time n − 1. Similarly to ﬁrst order in δt. and φn represents .6) the last equality is because the quotient of the two discounting factors cancel. e Now we are ready to ﬁnish our calculation.

the optimal portfolio is one where one invests all ones wealth in the S&P 500.058/(. you should have the proportion µ−r of your portσ2 folio invested in the stock. . with log utility.3. the S&P 500 has had excess return µ − r over the short term treasury bills of about .7. Over the 50 year period from 1926 to 1985.216)2 = 1.6) expresses σ2 the fact that at every tick. Doing the calculation with the values above yields .216.058 with a standard deviation of σ of . DYNAMIC PORTFOLIO SELECTION 87 the number of shares of stock in this portfolio. Thus formula (3. Hence the amount of money invested in the stock is φn Sn−1 = ( µ−r )Vn−1 .23 = 123%. That is. and borrows an additional 23% of current wealth and invests it also in the S&P.7.

3 and 4 you draw two balls. Let Bi denote the event that i of the coins landed heads up. DISCRETE TIME 3. Recalling the derivation of the formula above for constant interest rate.88 CHAPTER 3. Let X be a random variable with pdf p(x) = α exp(−αx). In the next three problems calculate E(X|Y ) for the two random variables X and Y . A fair die is rolled twice. a dime and a quarter. 6. X is the sum of the values. for the one-step binomial model. If B is a bond with par value 1 at time T . Also. what is EQ (S1 ). Let X denote the sum of the amount shown.8 Exercises d u 1. That is. describe E(X|Y ) as a function of the values of Y . Now assume that interest rates are not constant but are described by a known function of time r(t). (a continuous distribution. 7. For the pseudo-probabilities q of going up and 1 − q of going down that we calculated in class. 2. Your answer should have an integral in it. 4. 5. We arrived at the following formula for discounting. From a bag containing four balls numbered 1. Let S0 .) Calculate E(X|{X ≥ t}). so X is the sum of the values of the coins that landed tales up. If at least one of the balls is numbered 2 you win $1 otherwise . S1 . Toss a nickel. calculate the value of the same bond under the interest rate function r(t). i = 0. construct the pdf for this random variable. then at time t < T the bond is worth exp(−r(T − t)) assuming constant interest rates. Let X denote the sum of the values. What is E(X|Bi ). A die is rolled twice. that is. 3. 3. 2. S1 be as in class. a coin says how much it is worth only on the tales side. What is E(X|A) where A is the event that the ﬁrst roll was a 2. and Y is the value on the ﬁrst roll. 1. 2.

W2 = −1) (d) E(X|W1 = 1. since their expectations are 0. EXERCISES 89 you lose a $1. you are asked to derive what is called the gambler’s ruin using techniques from Martingale theory. 9. Let Wn denote the symmetric random walk starting at 0. A bag contains three coins. Suppose a gambler plays a game repeatedly of tossing a coin with probability p of getting heads. let Mn be a Martingale and let ξj = Mj − Mj−1 be its increments. In this exercise you are asked to prove that. Let X denote the number of steps to the right the random walker takes in the ﬁrst 6 steps. (b) E(ξj ξk ) = 0 for all j = k. though they may not be independent. Suppose the gambler begins with $a . but one of them is fake and has two heads. Calculate (a) E(X|W1 = −1). Let X be the number of heads tossed and Y denote the number of honest coins drawn.) (Hint: You will have to use several of the properties of conditional expectations. in which case he wins $1 and 1 − p of getting tails in which case he loses $1. Show (a) E(ξj ) = 0 for all j. Thus. W2 = −1. W3 = −1) 10. Let Y denote the number on the ﬁrst ball chosen. (c) E(X|W1 = 1. (b) E(X|W1 = 1).) In the following three problems. (This means that ξj and ξk have 0 covariance. Two coins are drawn and tossed. that is p = 1/2 and W0 = 0. Let X denote the amount won or lost.3.8. they are uncorrelated. then n Mn = j=1 ξj is a Martingale. We’ve seen that if ξn are independent with expectations all 0. It is not necessary for the ξj ’s to be independent. 8.

12.) What betting strategy should the gambler adopt. T is the time when the gambler goes bankrupt or achieves a fortune of b. 3. In the following problems. As above. 13. Calculate P (ST = 0). in the following way. . Suppose a gambler is playing a game repeatedly where the winnings of a $1 bet pay-oﬀ $1 and the probability of winning each round is p > 1/2 is in the gambler’s favor. 14. 2. calculate E(MT ). It is not hard to show that if the gambler wants to maximize his expected fortune after n trials. Express E(AT ) in terms of E(ST ) and E(T ). Deﬁne An = Sn − n(2p − 1) and Mn = ( 1−p )Sn . Mathematically. Using Doob’s optional stopping theorem. (By counting cards. Let Sn denote the gamblers fortune after the nth toss. you are asked to ﬁnd an optimal betting strategy using Martingale methods. . j = 1. It turns out to make more sense to maximize not ones expected fortune. that is Sn = a + ξ1 + · · · + ξn Deﬁne the stopping time. Calculate E(ST ). let ξj . T = min{n|Sn = 0 or b} that is. the probability that the gambler will eventually go bankrupt. let ξj be IID with P (ξj = 1) = p and P (ξj = −1) = 1 − p and assume this time that p > 1/2. a < b. be IID random variables with distribution P (ξj = 1) = p and P (ξj = −1) = 1 − p. but the expected value of the logarithm his fortune. blackjack can be made such a game. Now use the deﬁnition of E(MT ) in terms of P (ST = j) to express P (ST = 0) in terms of E(MT ) and solve for P (ST = 0). where 0 < p < 1. DISCRETE TIME and keeps gambling until he either goes bankrupt or gets $b. then he should bet his entire fortune each round. .90 CHAPTER 3. Let Fn denote the fortune of the . This seems counter intuitive and in fact doesn’t take into much account the gambler’s strong interest not to go bankrupt. Show An and Mn are Martingales. p 11. Solve for E(T ). Calculate E(T ) the expected time till bust or fortune in the following way: calculate E(AT ) by the optional stopping theorem.

3. EXERCISES 91 gambler at time n. Show that for some strategy. Reduce to terms of order δt. What is the strategy? (Hint: You should bet a ﬁxed percentage of your current fortune. You will have to use Taylor’s theorem. Your answer should not have a δt in it. Carry out a similar analysis to the above using the utility function p Up (x) = xp . . in the jth round he bets $φj . Then we can express his fortune n Fn = a + j=1 φj ξj We assume that 0 < φn < Fn−1 . that is. that is the gambler must bet at least something to stay in the game. p < 1 not equal to 0. Ln is a Martingale.) 17. Assume the gambler starts with $a and adopts a betting strategy φj .8. 15. 16. and he doesn’t ever want to bet his entire fortune. Show Ln = ln(Fn ) − ne is a super Martingale and thus E(ln(Fn /F0 )) ≤ ne. Let e = p ln(p) + (1 − p) ln(1 − p) + ln 2. (a previsible process).

DISCRETE TIME .92 CHAPTER 3.

But the probabilistic infra structure needed to handle this is more formidable. We will warm up by progressing slowly from stochastic processes discrete in both time and space to processes continuous in both time and space. the sample space Ψ of the random variable Xt is a (ﬁnite or inﬁnite) discrete set. τ2 . . etc. 4. which is the approach we will take in this chapter. Thus our second kind of stochastic process we will consider is a process Xt where the time t is a real-valued deterministic variable and at each time t.Chapter 4 Stochastic processes and ﬁnance in continuous time In this chapter we begin our study of continuous time ﬁnance. a state X of the entire process is described as a sequence of jump times (τ1 . We will however discuss continuous time stochastic processes and their relationship to partial diﬀerential equations in more detail than most books. change of measure. In such a case. An approach often taken is to model things by partial diﬀerential equations.) and a sequence of values of X (elements of the common 93 . discrete in space We segue from our completely discrete processes by now allowing the time parameter to be continuous but still demanding space to be discrete. We could develop it similarly to the model of chapter 3 using continuous time martingales. .1 Stochastic processes continuous in time. . martingale representation theorem.

. given that X0 = j. .1) (this is an independence assumption: where we jump to is independent of when we jump). ).. These describe a ”path” of the system through S given by X(t) = j0 for 0 < t < τ1 . We will let Qjk be the probability that the state starting at j will jump to k (when it jumps). so Qjk = 1 k and Qjj = 0 We also deﬁne the function Fj (t) to be the probability that the process starting at X = j has already jumped at time t. All this is deﬁned so that 1.e. for each j ∈ S. a continuous random variable Fj that describes how long it will be before the process (starting at X = j) will jump. we need. τ2 − τ1 < t. to describe the stochastic process. Fj (t) = P (τ1 < t | X0 = j). . and a set of discrete random variables Qj that describe the probabilities of jumping from j to other states when the jump occurs. i.. Xτ2 = l) = P (τ1 < s. j1 .. . Pjk (t) = Pj (Xt = k). CONTINUOUS TIME discrete state space) j0 .94 CHAPTER 4. Xτ1 = k. these processes are often called ”jump” processes. . τ2 − τ1 < t. etc.e. It will be convenient to deﬁne the quantity Pjk (t) to be the probability that Xt = k. Pj (τ1 < s. Xτ2 = l | X0 = j) = Fj (s)Qjk Fk (t)Qkl . |X0 = j). P (τ1 < t. we will begin to write Pj (.1. . Xt = j1 for τ1 < t < τ2 ... Xτ1 = k. i. Thus. . We will make the usual stationarity and Markov assumptions as follows: We assume ﬁrst that 2. Instead of writing P (. Xτ1 = k|X0 = j) = Fj (t)Qjk (4. . Because X moves by a sequence of jumps.

we can take any time to be t = 0 and the rules governing the process will not change. Using the formula for conditional probability (P (A|B) = P (A ∩ B)/P (B)). In other words. 2. . Fj is exponentially distributed with probability density function fj (t) = qj e−qj t To check this. CONTINUOUS TIME.theoretic point of view this means that the process is invariant under time translation). so we must have that 1-Fj (t)=e−qj t for some positive number qj . This means that the probability of having to wait at least one minute more before the process jumps is independent of how long we have waited already.e. DISCRETE SPACE 3. (From a group. . .e. i.1.4. the fact that the event τ1 > t + s is a subset of the event τ1 > s. Xsn = jn .. 95 The three assumptions 1. It is decreasing since Fj (t) represents the probability that something happens before time t. . and Fj (0)=0. The next consequence of our stationarity and Markov assumptions are the Chapman-Kolmorgorov equations in this setting: Pjk (t + s) = l∈S Pjl (t)Plk (s). we translate this equation into 1 − Fj (t + s) = 1 − Fj (t) 1 − Fj (s) This implies that the function 1-Fj (t) is a multiplicative function. . First. 3 will enable us to derive a (possibly inﬁnite) system of diﬀerential equations for the probability functions Pjk (t) for j and k ∈ S.. i. the stationarity assumption M1 tells us that Pj (τ1 > t + s | τ1 > s) = Pj (τ1 > t). note that ∞ for t ≥ 0 Pj (τ1 > t) = 1 − Fj (t) = t qj e−qj s ds = e−qj t so the two ways of computing Fj (t) agree . Xs = j) = Pjk (t − s). The Markov assumption is P (Xt = k|Xs1 = j1 . it must be an exponential function.

that no jump has occurred yet. but it will prove very useful later on. i. We take it a term at a time: δjk e−qj t : This term is the probability that the ﬁrst jump hasn’t happened yet. The result we intend to prove is the following: t Pjk (t) = δjk e−qj t + 0 qj eqj s l=j Qjl Plk (t − s) ds As intimidating as this formula looks. and in the intervening time (between s and t) the process goes from l to k is (approximately) qj e−qj s Qjl Plk (t − s)ds To account for all of the ways of going from j to k in time t. i. This is an obvious consistency equation. Based on this.e. CONTINUOUS TIME They say that to calculate the probability of going from state j to state k in time t + s. The integral term can be best understood as a ”double sum”. and then by using the Chapman Kolmorgorov equation. we see that the probability that the ﬁrst jump occurs between time s and time s+ds. This term is zero unless j = k. the probability that for l diﬀerent from j. it can be understood in a fairly intuitive way. one can sum up all the ways of going from j to k via any l at the intermediate time t.e. the deﬁnition of fj (t) tells us that the probability that the ﬁrst jump occurs between time s and time s+ds is (approximately) fj (s)ds = qj e−qj s ds (where the error in the approximation goes to zero faster than ds as ds goes to zero).96 CHAPTER 4. Therefore. but it will . and over all possible results l diﬀerent from j of the ﬁrst jump. in which case it gives the probability that τ1 > t. we need to add up these probabilities over all possible times s of the ﬁrst jump (this gives an integral with respect to ds in the limit as ds goes to 0). We can make a change of variables in the integral term of the formula: let s be replaced by t-s. That is why the Kronecker delta symbol is there (recall that δjk = 1 if j = k and = 0 if j is diﬀerent from k). Note that this will transform ds into -ds. For any s between 0 and t. The ﬁrst step is to express Pjk (t) based on the time of the ﬁrst jump from j. and the jump goes from j to l. This reasoning yields the second term of the formula. that the state has remained j from the start. the ﬁrst jump occurs between time s and time s+ds and the jump goes from j to l is (approximately) qj e−qj s Qjl ds.. Now we can begin the calculations that will produce our system of diﬀerential equations for Pjk (t)..

then it must be true that Pjk (0) = δjk . (Note that this implies that qjk = qj = −qjj k=j . A special case of this diﬀerential equation occurs when t=0. CONTINUOUS TIME. Note if we know that X0 = j.4. we can diﬀerentiate both sides of the last equation with respect to t and show that Pjk (t) is also diﬀerentiable. The result of this change of variables is (after taking out factors from the sum and integral that depend neither upon l nor upon s) : t Pjk (t) = δjk e−qj t + qj e−qj t 0 eqj s l=j Qjl Plk (s) ds t = e−qj t δjk + qj 0 eqj s l=j Qjl Plk (s) ds Now Pjk (t) is a continuous function of t because it is deﬁned as the product of a continuous function with a constant plus a constant times the integral of a bounded function of t (the integrand is bounded on any bounded t interval). So qjj = −qj and qjk = qj Qjk if j = k. DISCRETE SPACE 97 also reverse the limits of integration. But once Pjk is continuous.1. We take this derivative (and take note of the fact that when we diﬀerentiate the ﬁrst factor we just get a constant times what we started with) using the fundamental theorem of calculus to arrive at: Pjk (t) = −qj Pjk (t) + qj l=j Qjl Plk (t). One last bit of notation! Let qjk = Pjk (0). This observation enables us to write: Pjk (0) = −qj Pjk (0) + qj l=j Qjl Plk (0) = −qj δjk + qj l=j Qjl δlk = −qj δjk + qj Qjk .

To do this. This system of diﬀerential equations is called the backward equation of the process.2 Examples of Stochastic processes continuous in time. it is one that experiences only births (one at a time) at random times. but no deaths. the sample space for each random variable Xt is the set of non-negative integers. starting from X=j. The time until the process jumps up by one. So if we think of X as representing a population. Since we move one step to the right (on the j-axis) . Thus. Pjk (t) = l∈S 4. CONTINUOUS TIME so the sum of the qjk as k ranges over all of S is 0. Another system of diﬀerential equations we can derive is the forward equation of the process. the value of X can either jump up by 1 or stay the same. we can rewrite our equation for Pjk (t) given above as follows: Pjk (t) = qjl Plk (t) l∈S for all t > 0. and for each non-negative integer j. In these processes. for all j. the constants qjk a re called the inﬁnitesimal parameters of the stochastic process.98 CHAPTER 4. The specialization in the Poisson process is that λj is chosen to have the same value. The Poisson process is a special case of a set of processes called birth processes. in such a process.j+1 and −qj . we diﬀerentiate both sides of the Chapman Kolmogorov equation with respect to s and get Pjk (t + s) = l∈S Pjl (t)Plk (s) If we set s=0 in this equation and use the deﬁnition of qjk . λ. discrete in space The simplest useful example of a stochastic process that is continuous in time but discrete in space is the Poisson process. we choose a positive real parameter λj which will be equal to both qj. Given this notation. is exponentially distributed with parameter λj . we get the forward equation: qlk Pjl (t).

k Pj.k−1 (t) + qkk Pjk (t) = λPj. we will consider equations of the form f + λf = g where f and g are functions of t and λ is a constant. because qjk is nonzero only if j = k or if k = j + 1. Since we already know Pjk (t) for k ≤ j. For the Poisson process. we can conclude without much thought that Pjk (t) = 0 for k < j and t > 0 and that Pjj (t) = e−λt for all t > 0. To do this. though. we must consider the system of diﬀerential equations (the ”forward equations”) derived in the preceding lecture.k−1 (t) − λPjk (t). Recall that the forward equation Pjk (t) = l∈S qlk Pjl (t) gave the rate of change in the probability of being at k (given that the process started at j) in terms of the probabilities of jumping to k from any l just at time t (the terms of the sum for l = k) and (minus) the probability of already being at k an d jumping away just at time t (recall that qkk is negative. In particular. one k at a time.2.4. we will use the preceding diﬀerential equation inductively to calculate Pjk (t) for k > j. we make the left side have the form of the derivative of a product by multiplying through by eλt : eλt f + λeλt f = eλt g . To solve such equations. we need to review a little about linear ﬁrst-order differential equations. To learn any more.j+1 = λ Therefore the forward equation reduces to: ∞ Pjk (t) = l=0 qlk Pjl (t) = qk−1. EXAMPLES 99 according to the exponential distribution. the forward equations simplify considerably. And for these values we have: qjj = −qj = −λ and qj.

. we substitute Pjk (t) for f(t). P04 (t) P14 (t) P24 (t) P34 (t) . In other words Pjk (0) = δjk (this is the Kronecker delta). and λPj. P03 (t) P13 (t) P23 (t) P33 (t) .e. . ··· ··· ··· ··· ··· . X0 = j. We get that t Pjk (t) = e−λt Pjk (0) + λ 0 e−λ(t−s) Pj. .. i. Note that we have to compute Pjk (t) for all possible nonnegative integer values of j and k. P01 (t) P11 (t) P21 (t) P31 (t) . we know that Pjk (0) = 0 for all k = j and Pjj (0) = 1. the integral of the derivative may be evaluated using the (second) fundamental theorem of calculus: t e f (t) − f (0) = 0 λt eλs g(s) ds Now we can solve algebraically for f(t): t f (t) = e−λt f (0) + e−λt 0 eλs g(s) ds Using this formula for the solution of the diﬀerential equation. P02 (t) P12 (t) P22 (t) P32 (t) . . we have to ﬁll in all the entries in the table: P00 (t) P10 (t) P20 (t) P30 (t) .. Now we are ready to compute.. . .100 CHAPTER 4. CONTINUOUS TIME We recognize the left as the derivative of the product of eλt and f: (eλt f ) = eλt g We can integrate both sides of this 9with s substituted for t) from 0 to t to recover f(t): t t d λs eλs g(s) ds (e f (s)) ds = 0 ds 0 Of course.k−1 (t) for g(t). . .k−1 (s) ds Since we are assuming that the process starts at j. In other words.

.2. tells how many jumps have happened up until time t.. . .j+1 (t) = λPjj (t) to get that t Pj. P04 (t) P14 (t) P24 (t) P34 (t) . EXAMPLES Of course. .j+2 (t) = λPj.. .j+1 (t) + λPj. .j+2 (t) + λPj. .j+2 (t) = λ 0 e−λ(t−s) λse−λs ds = (λt)2 −λt e 2 Continuing by induction we get that Pjk (t) = (λt)k−j −λt e (k − j)! provided 0 ≤ j ≤ k and t > 0. . . .j+1 (t) to get that t Pj. we already know part of the table: e−λt P01 (t) P02 (t) P03 (t) 0 e−λt P12 (t) P13 (t) 0 0 e−λt P23 (t) 0 0 0 e−λt . .e. . This means that the random variable Xt (this is the inﬁnite discrete random variable that gives the value of j at time t. . Then we can use the diﬀerential equation Pj.j+1 (t) = λ 0 e−λ(t−s) e−λs ds = λte−λt Note that this ﬁlls in the next diagonal of the table. so we can use the diﬀerential equation Pj. has a Poisson distribution with parameter λt This is the important connection between the exponential distribution and the Poisson distribution. i.4. ··· ··· ··· ··· ··· 101 So we begin the induction: We know that Pjj = e−λt .

note that another way to say that f (a) = b is to say that f (a + h) − (f (a) + bh) = o(h) That is. we introduce some notation.3. hk) 2 ∂x ∂x∂y ∂y . b) + ∂f ∂f (a.2. We will be satisﬁed with the following version of Taylor’s theorem. but only of low order. The real force of Taylor series is what is called Taylor’s theorem which tells one how well the Taylor polynomials approximate the function.3 4. b)h2 + 2 (a. Taylor’s theorem is just the higher order version of this statement.3. Then for any point (a. b)k ∂x ∂y 1 ∂2f ∂2f ∂2f + ( 2 (a. Theorem 4.3. Deﬁnition 4.1 The Wiener process Review of Taylor’s theorem Taylor series are familiar to every student of calculus as a means writing functions as power series.1. For two functions f and g. Let f be a function deﬁned on an interval (α. Theorem 4. First.1. CONTINUOUS TIME 4.102 CHAPTER 4. y) be a function of two variables deﬁned on an open set of O ⊂ R2 . b)h + (a. Then for any a ∈ (α. Let f (x. b)k 2 ) + o(h2 . that the diﬀerence between the function h → f (a + h) and the linear function h → f (a) + bh is small as h → 0. b)kh + 2 (a. As an example. we say that f (h) is o(g(h)) (as h → 0) (read ”f is little o of g) if h→0 lim f (h)/g(h) = 0 It of course means that f (h) is smaller than g(h) (as h gets small) but in a very precise manner. k 2 .3. b + k) = f (a. b) ∈ O one has f (a + k. β) be n times continuously diﬀerentiable. β) f (n) (a) n f (a) 2 f (3) (a) 3 h + h + ··· + h + o(hn ) f (a + h) = f (a) + f (a)h + 2! 3! n! We will in fact need a multivariable version of this.

4.3. THE WIENER PROCESS

103

4.3.2

The Fokker-Plank equation

We will give a derivation of the Fokker-Planck equation, which governs the evolution of the probability density function of a random variable-valued function Xt that satisﬁes a ”ﬁrst-order stochastic diﬀerential equation”. The variable evolves according to what in the literature is called a Wiener process, named after Norbert Wiener, who developed a great deal of the theory of stochastic processes during the ﬁrst half of this century. We shall begin with a generalized version of random walk (an ”unbalanced” version where the probability of moving in one direction is not necessarily the same as that of moving in the other). Then we shall let both the space and time increments shrink in such a way that we can make sense of the limit as both ∆x and ∆y approach 0. Recall from the previous sections the one-dimensional random walk (on the integers) beginning at W0 = 0. We ﬁx a parameter p, and let p be the probability of taking a step to the right, so that 1 − p is the probability of taking a step to the left, at any (integer-valued) time n. In symbols, this is: p = P (Wn+1 = x + 1|Wn = x) and 1 − p = P (Wn+1 = x − 1|Wn = x). Thus Wn = ξ1 + · · · + ξn where ξj are IID with P (ξj = 1) = p and P (ξj = −1) = 1 − p. Since we are given that W0 = 0, we know that the probability function of Wn will be essentially binomially distributed (with parameters n and p, except that at time n, x will go from −n to +n, skipping every other integer as discussed earlier). We will set vx,n = P (Wn = x). Then vx,n =

1 (n 2

n + x)

p 2 (n+x) (1 − p) 2 (n−x)

1

1

**for x = −n, −(n − 2), . . . , (n − 2), n and vx,n = 0 otherwise. For use later, we calculate the expected value and variance of Wn . First,
**

n n n

E(Wn ) =

x=−n

xvx,n =

k=0

(2k − n)vx,2k−n =

k=0

(2k − n)

n k p (1 − p)n−k k

= n(2p − 1)

104

CHAPTER 4. CONTINUOUS TIME

**(it’s an exercise to check the algebra; but the result is intuitive). Also,
**

n n

Var(Wn ) =

k=0

(2k − n) vx,2k−n =

k=0

2

(2k − n2 )

n k p (1 − p)n−k k

= 4np(1 − p) (another exercise!). Now lets form a new stochastic process W ∆ and we let the space steps ∆x and time steps ∆t be diﬀerent from 1. W ∆ is a shorthand for W ∆t,∆x . First, the time steps. We will take time steps of length ∆t. This will have little eﬀect on our formulas, except everywhere we will have to note that the number n of time steps needed to get from time 0 to time t is n= t ∆t

Each step to the left or right has length ∆x instead of length 1. This has the eﬀect of scaling W by the factor ∆x. Since multiplying a random variable by a constant has the eﬀect of multiplying its expected value by the same constant, and its variance by the square of the constant. Therefore, the mean and variance of the total displacement in time t are: E(Wt∆ ) = t (∆x)2 (2p − 1)∆x and Var(Wt∆ ) = 4p(1 − p)t ∆t ∆t

We want both of these quantities to remain ﬁnite for all ﬁxed t, since we don’t want our process to have a positive probability of zooming oﬀ to inﬁnity in ﬁnite time, and we also want Var(W (t)) to be non-zero. Otherwise the process will be completely deterministic. We need to examine what we have control over in order to achieve these ends. First oﬀ, we don’t expect to let ∆x and ∆t to go to zero in any arbitrary manner. In fact, it seems clear that in order for the variance to be ﬁnite and non-zero, we should insist that ∆t be roughly the same order of magnitude as (∆x)2 . This is because we don’t seem to have control over anything else in the expression for the variance. It is reasonable that p and q vary with ∆x, but we probably don’t want their product to approach zero (that would force our process to move always to the left or always to the right), and it can’t approach inﬁnity (since the maximum value of p(1 − p) is 1/4). So we

**4.3. THE WIENER PROCESS
**

2

105

are forced to make the assumption that (∆x) remain bounded as they both ∆t go to zero. In fact, we will go one step further. We will insist that (∆x)2 = 2D ∆t (4.3.1)

for some (necessarily positive) constant D. Now, we are in a bind with the expected value. The expected value will go to inﬁnity as ∆x → 0, and the only parameter we have to control it is 2p − 1. We need 2p − 1 to have the same order of magnitude as ∆x in order to keep E(Wt∆ ) bounded. Since p = 1 − p when p = 1/2, we will make the assumption that c 1 ∆x (4.3.2) p= + 2 2D for some real (positive, negative or zero) constant c, which entails 1−p= 1 c − ∆x 2 2D

The reason for the choice of the parameters will become apparent. D is called the diﬀusion coeﬃcient, and c is called the drift coeﬃcient. ∆ Note that as ∆x and ∆t approach zero, the values vn,k = P (Wn∆t = k∆x) = p(k∆x, n∆t) specify values of the function p at more and more points in the plane. In fact, the points at which p is deﬁned become more and more dense in the plane as ∆x and ∆t approach zero. This means that given any point in the plane, we can ﬁnd a sequence of points at which such an p is deﬁned, that approach our given point as ∆x and ∆t approach zero. We would like to be able to assert the existence of the limiting function, and that it expresses how the probability of a continuous random variable evolves with time. We will not quite do this, but, assuming the existence of the limiting function, we will derive a partial diﬀerential equation that it must satisfy. Then we shall use a probabilistic argument to obtain a candidate limiting function. Finally, we will check that the candidate limiting function is a solution of the partial diﬀerential equation. This isn’t quite a proof of the result we would like, but it touches on the important steps in the argument. We begin with the derivation of the partial diﬀerential equation. To do this, we note that for the original random walk, it was true that vx,n satisﬁes the following diﬀerence equation: vk,n+1 = pvk−1,n + (1 − p)vk+1,n

t) + ∂x 2 ∂x2 We substitute these three equations into (*). and as we do this. this equation becomes: (∗) p(x. we need Taylor’s expansion for a function of two variables. t + ∆t) = pp(x − ∆x. t) + · · · ∂t ∂p (∆x)2 ∂ 2 p (x. t). keeping terms up to order ∆t and (∆x)2 . we must account for all terms of order up to and including that of ∆t. we divide through by ∆t. we use a probabilistic argument to ﬁnd the limit of the random walk distribution as ∆x and ∆t approach 0. and recall that (∆x)2 /∆t = 2D. we see that at time t. t) + · · · p(x − ∆x. t) + ∆(t) p(x + ∆x. As we do the expansion. The key step in this argument uses the fact that the binomial distribution with parameters n and p approaches the normal distribution with expected value np and variance np(1 − p) as n approaches inﬁnity. ∆t realm. and 2p − 1 = ∆xc/D to arrive at the equation: ∂p ∂2p ∂p = −2c +D 2 ∂t ∂x ∂x This last partial diﬀerential equation is called the Fokker-Planck equation. t + ∆t) = p(x. t) = p(x. To ﬁnd a solution of the Fokker-Planck equation. t)). t) = p(x. recalling that p + (1 − p) = 1. ∆t. t) − ∆(x) (x. t) + ∆(x) ∂p (x.n∆t .106 CHAPTER 4. In terms of our assumptions about ∆x. we must keep in mind that we are going to let ∆t → 0. we get ∂p ∂p (∆x)2 ∂ 2 p ∆t = −(2p − 1)∆x + ∂t ∂x 2 ∂x2 (where all the partial derivatives are evaluated at (x. t) notation instead of vk∆x. CONTINUOUS TIME This is just the matrix of the (inﬁnite discrete) Markov process that deﬁnes the random walk. we must account for both ∆x and (∆x)2 terms (but (∆x)3 and ∆t∆x both go to zero faster than ∆t). Thus. and use p(x. t) Next. When we pass into the ∆x. the binomial distribution has mean ∆x t(2p − 1) = 2ct ∆t . We get: p(x. Finally. t) + · · · ∂x 2 ∂x2 (∆x)2 ∂ 2 p ∂p (x. t) + (x. p and 1 − p. We expand each of the three expressions in equation (*) around the point (x. t) + (1 − p)p(x + ∆x.

the probability that Wt < x is x−2ct √ 2Dt 1 2 1 √ P (Wt < x) = e− 2 y dy 2π −∞ 4p(1 − p)t Therefore the pdf of Wt is the derivative of this with respect to x. This we derived from the Fokker-Plank equation. For t > s Wt − Ws is normally distributed with mean 0 and variance t − s. The probability density for Wt is p(x. 4. t) = √ 2 πDt A special case of this is when c = 0 and D = 1 . W ∆ . THE WIENER PROCESS and variance 107 (∆x)2 ∆t which approaches 2Dt as ∆t → 0. Ws − Wt and Wu − Wv are independent. we deﬁned Wt as a limit of random walks. s = m∆t. 5. 1. The following properties characterize the Wiener process. 4. we expect Wt in the limit to be normally distributed with mean 2ct and variance 2Dt.3. t) = √ 1 − x2 e 2t 2πt Proof: Let’s discuss these points. u = k∆t and v = l∆t where n ≥ m ≥ k ≥ l then we have Wt − Ws = ∆x(ξm+1 + ξm+2 + · · · ξn ) . This is clear since we deﬁned Wt as a limit of random walks which all started at 0. W0 = 0 with probability 1. Wt is a continuous function of t with probability 1. 3. So if t = n∆t. 1. Now Wt∆ = ∆x(ξ1 + · · · ξn ) where n = t/∆t.3. which yields the pdf (x−2ct)2 1 e− 4Dt p(x. 2. Therefore.4. 3. which we record as a theorem. Again. For T ≥ s ≥ t ≥ u ≥ v.3. The corresponding process 2 is called the Wiener process (or Brownian motion). Thus. Theorem 4.

E. For all t > 0 and all n = 1. the paths Wt are continuous. . Q. . Now this gives 1 − P (|Wh | ≤ nh) Now P (|Wh | ≤ nh)) = √ ≤√ 1 2πh nh 1 2πh nh −nh e− 2t dx 1 2hn 2πh x2 1dx ≤ √ −nh 2h π which converges to 0 as h → 0 for any n. . according to 2. 2.. Solution: This is just E(|Wt |) which is = ∞ |x| √ −∞ 1 − x2 e 2t 2πt . 2. they are very rough and we see according to the next proposition that they are not diﬀerentiable. While. We now do some sample computations to show how to manipulate the Wiener process and some of which will be useful later.D. This fact is beyond the scope of our discussion but is intuitive.3. Calculate the expected distance of the Wiener process from 0.D.4. Q. h→0 lim P (|Wt+h − Wt | > n|h|) = 1 so that Wt is nowhere diﬀerentiable.108 and CHAPTER 4. Proof: Assume without loss of generality that h > 0.E. CONTINUOUS TIME Wu − Wv = ∆x(ξl+1 + · · · ξk ) which are independent. Almost everything we use about the Wiener process will follow from ??. with probability 1. above. 1. Proposition 4. Then P (|Wt+h − Wt | > nh) = P (|Wh | > nh) by 3) of ??.

W y The deﬁning properties of Wty = Wt + y. 2. t > s. The pdf is given by p(x. Wty is continuous with probability 1. therefore we get E(Wt − Ws )E(Ws ) + σ 2 (Ws ) = s since Wt − Ws has mean 0. u = x2 /(2t) this integral becomes 2t √ 2πt ∞ e−u du = 0 2t/π 2. the Wiener process starting at y. Wu − Wv are independent for t ≥ s ≥ u ≥ v. THE WIENER PROCESS ∞ 109 x√ 0 =2 1 − x2 e 2t 2πt Changing variables. Then E(Wt Ws ) = E((Wt − Ws )Ws + Ws2 ) = E((Wt − Ws )Ws ) + E(Ws2 ) Now Wt − Ws is independent of Ws since t > s and Ws has mean 0.4. are: y 1. Wty − Wsy is normally distributed with µ = 0. 3. Wty − Wsy . σ 2 = t − s. Solution: Assume without loss of generality that t > s. 4. The Wiener Process Starting at y. σ 2 = t. y y 5. Calculate E(Wt Ws ). W0 = y. Wty is normally distributed with µ = y. t|y) = √ 1 − (x−y)2 2t e 2πt 1 2πt b a (x−y) e− 2t dx 2 P (a ≤ Wty ≤ b) = √ . 6.3.

Our ﬁrst goal is just to deﬁne a reasonable notion of stochastic integral. Let mesh P = max{|ti − ti−1 | |i = 1. t) dt . let ti−1 < ξi < ti . To motivate our way out of these problems let us look at deterministic motion. The most general 1st order diﬀerential equation is df = φ(f (t). where We would like to make sense of the idea that dt if the noise term were absent then St is a bank account earning interest u. . We had the Martingale Transform: If Sn was a stock. t0 That is. . t) . CONTINUOUS TIME How do we use Wiener Process to model stock movements? dSt = µSt +noise(σ. P) = i=1 f (ξi )(ti − ti−1 ) . Theorem 4. b]. So we try to make the corresponding conversion from diﬀerential to integral equation in the stochastic world. f (t) = f (t0 ) + t0 φ(f (t). Theoretically integral equations are better behaved than diﬀerential equations (they tend to smooth out bumps). . t) dt . St ). (Riemann’s Theorem) Let f be a function such that the probability of choosing a point of discontinuity is 0. φn denoted the amount of stock at tick n.110 CHAPTER 4. n}.3. . then let n RS(f. . so why should St be diﬀerentiable anywhere? (2) “Noise” is very diﬃcult to make rigorous. dt By integrating both sides we arrive at the following integral equation t f (t) − f (t0 ) = t0 df dt = dt t t φ(f (t). There are two obvious problems: (1) THe Wiener process is diﬀerentiable nowhere. then the gain from tick 0 to tick N was N φi (Si − Si−1 ) i=1 It is these sorts of sums that become integrals in continuous time. b Recall the Riemann integral a f (x) dx: Take a partition of [a. P = {a = t0 < t1 < · · · < tn = b}.5.

a f (t) dφ(t) is this limit. and ξi values between partition points. The next level of integral technology is the Riemann-Stieltjes integral. THE WIENER PROCESS (RS stands for Riemann sum).) If limmeshP→0 RSS(f.6. P a partition. φ) exb ists. Example. Let n RSS(f. φ) = = = and so limmeshP→0 n i=1 n i=1 n i=1 n i=1 f (ξi )(φ(ti ) − φ(ti−1 )) f (ξi ) φ(ti )−φ(ti−1 ) (ti − ti−1 ) ti −ti−1 f (ξi )(φ (ti ) + o(|ti − ti−1 |2 ))(ti − ti−1 ) b a (4.3) f (ξi )φ (ti )(ti − ti−1 ) = f (t)φ (t) dt.4. (“RSS” stands for Riemann-Stieltjes sum. and is independent of the choices of ξi within their respective intervals. φ) = i=1 f (ξi )(φ(ti ) − φ(ti−1 )) . Let P = {a = t0 < t1 < · · · < tn = b}. P.3. b b φ(t) dφ(t) = a a 1 1 φ(t)φ (t) dt = φ(b)2 − φ(a)2 . Assuming φ is diﬀerentiable. Proposition 4.3. We call this limit the Riemann integral b f (x)dx a Note that the summation is beginning to look like a martingale transform. Then lim RS(f. P. the corresponding Riemann’s theorem holds. Proof.3. φ) meshP→0 exists (and is independent of the choices of ξi ). then by deﬁnition. b a b a f (t) dφ(t) = f (t)φ (t) dt. If φ is diﬀerentiable. Suppose φ(x) is a non decreasing function. 2 2 . RSS(f. P) meshP→0 111 exists. P. namely lim RSS(f. Let φ(x) be diﬀerentiable. P.

Then E n i=1 Wti (Wti − Wti−1 ) n n 2 = i=1 E(Wti − Wti Wti−1 ) i=1 E(Wti (Wti − Wti−1 )) = n 2 = i=1 E(Wti ) − E(Wti Wti−1 ) n = i=1 ti − ti−1 = (t1 − t0 ) + (t2 − t1 ) + · · · + (tn − tn−1 ) = tn − t0 = b − a . Then E n i=1 Wti−1 (Wti − Wti−1 ) = = n i=1 n i=1 E(Wti−1 (Wti − Wti−1 )) E(Wti−1 )E(Wti − Wti−1 ) = 0 . Second. There are two motivations for this choice. the choices of the points ξi in the intervals were irrelevant. one uses the left-hand endpoint. . Which endpoint to choose? Following Itˆ. using independence. i=1 Let’s work out the above sum for Xt = φt = Wt . Consider n φξi (Xti − Xti−1 ) .112 CHAPTER 4. b φt dXt a is to be a random variable. For Xt and φt stochastic processes. the left endpoint. First. take P = {a = t0 < t1 < · · · < tn = b}. In the Riemann and Riemann-Stieltjes cases. using two diﬀerent choices for the ξi ’s. in the stochastic case it is very much relevant. P. Xt ) = i=1 φti−1 (Xti − Xti−1 ) . with one twist. CONTINUOUS TIME A stochastic integral is formed much like the Riemann-Stieltjes integral. let ξi = ti . o One deﬁnes the Itˆ sum o n IS(φt . where Wt is the Wiener process. let ξi = ti−1 . As an example. the right endpoint.

3. P (X = Y ) = 1 iﬀ E(X) = E(Y ) and Var(X − Y ) = 0. In symbols. X) = lim meshP→0 meshP→0 φti−1 (Xti − Xti−1 ) . the midpoint. Let’s evaluate b Wt dWt a .4. if Xt is a stock-process and φt a portfolio process then the left-hand choice embodies the idea that our gains from time Ti−1 to time ti should be based on a choice made at time ti−1 . Let φt be another stochastic process deﬁned on the same sample space as Xt . one takes ξi = 1 (ti−1 + ti ). 2. The lefthand choice looks a little like a previsibility condition. It turns out. This is the sense of the limit in the deﬁnition of the stochastic integral. Now Var(X − Y ) = E((X − Y )2 ) − (E(X − Y ))2 .2. This is a criterion for checking whether two square integrable random variables are equal (almost surely). φt is adapted to Xt if φt is independent of Xs − Xt for s > t. A sequence X α of square integrable random variables converges in mean-square to X if E((X − X α )2 ) → 0 as α → 0. then the Ito sum will be a Martingale as well.3. Let Xt be a stochastic process. THE WIENER PROCESS 113 1. lim X α = X. Deﬁnition 4. the resulting Stratonovich calculus looks like traditional calculus. One knows that for X. the derived stochastic process is not a martingale.3. Deﬁnition 4. As an 2 advantage. As a disadvantage. Then we deﬁne the Itˆ integral o t n φs dXs ≡ a lim IS(φ. that if Xt is a Martingale. i=1 Let us call a random variable with ﬁnite ﬁrst and second moments sqaure integrable. so P (X = Y ) = 1 iﬀ E((X − Y )2 ) = 0 . Following Stratonovich. Y square integrable random variables deﬁned on the same sample space. This is equivalent to E(X α ) → E(X) and Var(X − X α ) → 0. Here is a sample computation of a stochastic integral. P.3. For example.

114 CHAPTER 4. P. E ( (∆Wti )2 − (b − a))2 i=1 n 2 i=1 (∆Wti ) = E ( n 2 2 i=1 (∆Wti )) − 2(b − a) + (b − a)2 n n 4 2 2 2 2 = E i=1 (∆Wti ) + 2 i<j (∆Wti ) (∆Wtj ) − 2(b − a) i=1 (∆Wti ) + (b − a) n n 2 2 2 2 2 = i=1 3(∆ti ) + 2 i<j E((∆Wti ) )E((∆Wtj ) ) − E(2(b − a) i=1 (∆Wti ) + (b − a) ) = 3 n (∆ti )2 + 2 i<j (∆ti )(∆tj ) − 2(b − a) n ∆ti + (b − a)2 i=1 i=1 = 2 = 2 = 2 n 2 i=1 (∆ti ) n 2 i=1 (∆ti ) n 2 i=1 (∆ti ) + + + n 2 i=1 (∆ti ) n i=1 (∆ti ) n i=1 +2 2 i<j (∆ti )(∆tj ) n i=1 − 2(b − a) n i=1 ∆ti + (b − a)2 − 2(b − a) 2 ∆ti + (b − a)2 ∆ti − (b − a) =2 n 2 i=1 (∆ti ) . CONTINUOUS TIME We need the following facts: E(Wt ) = 0. E(Wt 2 ) = t. But what is n (∆Wti )2 as meshP → 0? The simple calculation i=1 n n E i=1 n (∆Wti )2 = i=1 n 2 E((∆Wti )2 ) ti − ti−1 = b − a i=1 = i=1 E((Wti − Wti−1 ) ) = leads us to the Claim: n lim E( (∆Wti )2 ) = b − a meshP→0 i=1 and therefore b a 1 1 Wt dWt = (Wb 2 − Wa 2 ) − (b − a) . IS(Wt . i=1 2 The ﬁrst part is recognizable as what the Riemann-Stieltjes integral gives formally. Wt ) = = = = = n n i=1 Wti−1 · ∆Wti = i=1 Wti−1 (Wti − Wti−1 ) n 2 1 2 2 i=1 (Wti−1 + ∆Wti ) − Wti−1 − (∆Wti ) 2 n n 2 2 1 1 2 i=1 Wti − Wti−1 − 2 i=1 (∆Wti ) 2 2 2 2 1 ((Wt1 − Wt0 ) + (Wt2 − Wt1 2 ) + · · · 2 1 +(Wtn 2 − Wtn−1 2 )) − 2 n (∆Wti )2 i=1 1 1 (Wb 2 − Wa 2 ) − 2 n (∆Wti )2 . E(Wt 3 ) = 0. and E(Wt 4 ) = 3t2 . 2 2 n Proof.

So n lim meshP→0 (∆ti )2 = 0 .3.4. of order ∆ti . Slogan of the Itˆ Calculus: dWt 2 = dt. so (∆X)2 = (∆W )2 = ∆t. for n > 2. Proving this will be in the exercises. E ( a φs dWs ) 2 = E a φs 2 ds . o (∆X)2 = 1. ∆t dWt2 = dt is only given precise sense through its integral form: t a In deriving the Wiener Process. The underlying reason for the diﬀerence of the stochastic integral and the √ Stieltjes integral is that ∆Wti is.7. If φs is adapted to Ws . ∆ti ≈ mesh P ≈ n . Properties of the Itˆ Integral. with high probability. . a φs (dWs )n = 0. then t (1) a t dXs = Xt − Xa .3. . (Note that only n 2 one ∆ti is being rounded up. THE WIENER PROCESS 115 Now for each i. ∆ti ≤ mesh P. φs (dWs )2 = t a φs ds.) Summing i = 1 to n. i=1 1 Intuitively. (3) E a t φs dWs = 0 and t (4) Proof. so limmeshP→0 n (∆ti )2 ≈ limn→∞ n · i=1 1 ( n )2 → 0. ψs are adapted to the given process o (Xs or Ws ). 0 ≤ i=1 (∆ti ) ≤ n i=1 ∆ti · mesh P = (b − a) · mesh P → 0 as meshP → 0. then t Moreover. If φs . Theorem 4. so (∆ti )2 ≤ ∆ti · mesh P. t t (2) a (φs + ψs ) dXs = a t φs dXs + a ψs dXs .

Xs ) = n i=1 1 · (Xsi − Xsi−1 ). Ws )2 ) = = = = = = E ( E E E n i=1 n i=1 n i=1 n i=1 n i=1 n i=1 φsi−1 (Wsi − Wsi−1 ))2 i<j φsi−1 2 (Wsi − Wsi−1 )2 + 2 φsi−1 2 (∆Wsi )2 + 2 φsi−1 2 (∆Wsi )2 + 2 2 i<j i<j φsi−1 φsj−1 (Wsi − Wsi−1 )(Wsj − Wsj−1 ) E(φsi−1 φsj−1 (Wsi − Wsi−1 )(Wsj − Wsj−1 )) E(φsi−1 φsj−1 (Wsi − Wsi−1 )) · E(Wsj − Wsj−1 ) E(φsi−1 )∆si + 2 i<j E(φsi−1 φsj−1 (Wsi − Wsi−1 )) · 0 E(φsi−1 2 )∆si = IS(φs 2 . For any partition P = {a = s0 < s1 < · · · < sn = t}. where a(x. A stochastic process Xt is a solution to this SDE if t t Xt − Xa = a dXs = a a(Xs . Xs ) = IS(φs . Given a partition P. b(x.116 CHAPTER 4. Now take the limit as meshP → 0. P. which telescopes to Xsn − Xs0 = Xt − Xa . P. s) . t) dWt . A ﬁrst-order stochastic diﬀerential equation is an expression of the form dXt = a(Xt . Xs )+ IS(ψs .3. IS(φs . t) are functions of two variables. . So the limit as meshP → 0 is Xt − Xa . It is obvious for any partition P that IS(φs +ψs . P. Taking the limit as meshP → 0 gives the result. P. IS(1. E(IS(φs . s) dWs . Xs ). s) ds + b(Xs . 4. 2. P. Ws ) = expected values gives E(IS(φs . Given a partition P. Now take the limit as meshP → 0. t). CONTINUOUS TIME 1. t) dt + b(Xt . Taking φsi−1 (Wsi − Wsi−1 ) = n E(φsi−1 (Wsi − Wsi−1 )) i=1 n = i=1 E(φsi−1 )E(Wsi − Wsi−1 ) = 0 . 4. P. P. P.3 Stochastic Diﬀerential Equations. Ws )) = E n i=1 n i=1 φsi−1 (Wsi − Wsi−1 ). 3.

t)a + (Xt . t))2 ] dt + f (Xt )b(Xt . and it satisﬁes the SDE 1 df (Xt ) = [f (Xt )a + f (Xt )b2 ] dt + f (Xt )b dWt . This is just a calculation using two terms of the Taylor’s series for f . 3 2 1 or Ws 2 dWs = d( 3 Ws 3 ) − Ws ds.) df (Xt ) = = = = = f (Xt+dt ) − f (Xt ) = [f (Xt + dXt )] − f (Xt ) [f (Xt ) + f (Xt ) dXt + 1 f (Xt )(dXt )2 + o((dXt )3 )] − f (Xt ) 2 f (Xt ) dXt + 1 f (Xt )(dXt )2 2 1 f (Xt )[a(Xt .4. a . Then f (x) = x and f (x) = 1. t) =[ ∂f ∂f 1 ∂2f ∂f (Xt . show that is equal to a dXt = Xt − Xa for some Xt . Wt dWt = df (Wt )− 1 dt. 3 3 Then f (x) = x2 and f (x) = 2x. consider f (x) = 1 x2 . That is.3. To calculate a Wt dWt . (Itˆ’s Lemma). Therefore. ﬁnd f (x) such that Wt dWt = d(f (Wt )). Since x2 dx = 1 x3 . 2 2 2 a 2 t Example.8. Calculate a Ws 2 dWs . a Ws dWs = a d(f (Ws ))− 2 t 1 1 dt = f (Wt ) − f (Wa ) − 1 (t − a) = 2 Wt 2 − 1 Wa 2 − 1 (t − a). if f (x. o 2 2 applied to dWt = 0 · dt + 1 · dWt . t) + (Xt . t) dWt ] + 2 f (Xt )[a2 dt2 + 2ab dt dWt + b2 dWt 2 ] [f (Xt )a(Xt . 2 2 2 Equivalently. consider f (x) = 1 x3 . t)b2 ] dt + 2 ∂t dx 2 ∂x ∂x Proof. t)b dWt . t) a function of two variables. t) dt + b(Xt .3. (Note that dt dWt = dWt 3 = 0 and dt2 = dWt 4 = 0. and a solution Xt to the SDE dXt = a dt + b dWt . t) + 1 f (Xt )(b(Xt . THE WIENER PROCESS 117 Lemma 4. t) dWt 2 t t Example. Suppose we are given a twice diﬀerentiable o function f (x). and the rules dWt 2 = dt and dWt n = 0 for n > 2. 2 More generally. (Xt . so t t t Ws 2 dWs = a 1 2 1 Wt − Wa 2 − 3 3 t Ws ds . Itˆ’s lemma implies o 1 1 d( x3 ) = Ws 2 dWs + · 2Ws ds = Ws 2 dWs + Ws ds . Then f (Xt ) is a stochastic process. so Itˆ’s lemma. Since x dx = 1 2 x . one has df (Xt . once diﬀerentiable in t. twice diﬀerentiable in x. gives 1 1 1 d(f (Wt )) = f (Wt ) dWt + f (Wt ) dt = Wt dWt + · 1 dt = Wt dWt + dt .

and σ is the standard deviation of the the short-term returns. with f (x) = x . The basic model for a stock is the SDE dSt = µSt dt + σSt dWt . where µ is the short-term expected return. . which is just the value of an interest-bearing. Solve dXs = 2 Xs ds + Xs dWs . t) = µx and b(x. and the same change of vari1 1 ables will work. and f (x) = − x2 . Note that if σ = 0. called volatility. Then. t) dt + b(Xt . the model simpliﬁes to dSt = µSt dt. using dXs = 1 Xs ds + Xs dWs and dXs 2 = Xs 2 ds. Now integrate both sides. With 1 ordinary variables. 2 d(ln Xs ) = = t a 1 1 dXs + 2 (− X1 2 )(dXs )2 Xs s 1 ds + dWs − 1 ds = dWs 2 2 . this is f (x)·x = 1. risk-free security. implying St = Sa eµt . getting d(ln Xs ) = a dWs = Wt − Wa or Xt Xa t ln Asset Models Xt = Wt − Wa and so = Xa eWt −Wa . CONTINUOUS TIME (The last integrand does not really simplify. Consider f (Xs ) · Xs = 1. t) dWt the stock model uses a(x.) 1 Example. The model SDE is similar to the last example. Compared with the generic SDE dXt = a(Xt . and f (x) = − x2 . t) = σx. We have dSt 2 = σ 2 St 2 dt. 2 Integrating both sides gives 1 d(ln Ss ) = 0 (µ − 2 σ 2 ) ds + σ dWs 1 2 ln St − ln S0 = (µ − 2 σ )t + σWt 1 2 St = e(µ − 2 σ )t + σWt S0 t 0 t St 1 2 = S0 e(µ − 2 σ )t + σWt .118 CHAPTER 4. Using f (x) = ln x. or f (x) = x . Itˆ’s lemma o gives 1 1 d(ln St ) = St dSt + 2 (− S12 )(dSt )2 t = µ dt + σ dWt − 1 σ 2 dt 2 = (µ − 1 σ 2 ) dt + σ dWt . So we are led to consider 1 1 f (x) = ln x. f (x) = x .

then drt > 0. meaning that if rt < b. t) = xe−µt . ∂x2 Xt = 0 σe−µs dWs t = 0 σe−µs dWs t = eµt X0 + eµt 0 σe−µs dWs . t) dXt 2 ∂x2 ∂f = e−µt . then drt < 0.4. So the SDE. It satisﬁes “mean reversion”. while if rt > b. THE WIENER PROCESS The expected return of St is 1 1 E(ln(St /S0 )) = E((µ − σ 2 )t + σWt ) = (µ − σ 2 )t . t) dt + ∂f (Xt . An O-U process is a solution to dXt = µXt dt + σ dWt with an initial condition X0 = x0 . t . with integrating factor. t) = ∂f (Xt . First. ∂x and ∂2f = 0. e−µt dXt = µe−µt Xt dt + σe−µt dWt holds just by multiplying the given SDE. It has mean reversion. Now integrate both sides from 0 to t: t d(e−µs Xs ) 0 −µt e Xt − X0 ∂2f (Xt . Letting f (x. and a volatility that grows with rt . This SDE is solved by using the integrating factor e−µt . 2 2 119 Vasicek’s interest rate model is drt = a(b−rt ) dt+σ dWt . then ∂f = −µe−µt Xt dt . √ The Cox-Ingersoll-Ross model is drt = a(b − rt ) dt + σ rt dWt . t) dXt + ∂t ∂x d(e−µt Xt ) = −µe−µt Xt dt + e−µt dXt or e−µt dXt = d(e−µt Xt ) + µe−µt Xt dt .3. pushing rt downwards. ∂t Itˆ’s lemma gives o df (Xt . pulling rt upwards. rewrites as d(e−µt Xt ) + µe−µt Xt dt = µe−µt Xt dt + σe−µt dWt d(e−µt Xt ) = σe−µt dWt . Ornstein-Uhlenbeck Process.

where u = g(S). t). The fundau d mental theorem of calculus gives us derivatives of the form du a φ(x) dx = φ(u). 1 2 P (St ≤ S) = P (S0 e(µ − 2 σ )t + σWt ≤ S) 1 2 = P (e(µ − 2 σ )t + σWt ≤ S/S0 ) 1 = P ((µ − 2 σ 2 )t + σWt ≤ ln(S/S0 )) = √ 1 2πσ 2 t ln(S/S0 ) −∞ e− 1 (x−(µ− 2 σ 2 )t)2 2t 2σ dx . and a derivative security. hedging portfolio for V . We calculate the pdf of St as follows. In particular. That is. a derivative d dS g(S) φ(x) dx = ( a d du u φ(x) dx) · a du = φ(u) · g (S) = φ(g(S)) · g (S) . To price V . we have a deterministic function V (s. First. Pricing Derivatives in Continuous Time Our situation concerns a stock with price stochastic variable St . Let V (ST .120 Let’s go back to the stock model CHAPTER 4. and the pdf is (x−(µ− 1 σ 2 )t)2 2 ln(S/S0 ) − d d √ 1 2σ 2 t dx P (S ≤ S) = e dS t dS 2πσ 2 t −∞ = √1 e− S 2πσ 2 t (ln(S/S0 )−(µ− 1 σ 2 )t)2 2 2σ 2 t for S > 0 and identically 0 for S ≤ 0. . so g (S) = 1/S. t) depends only on t and St . assumed to have expiration T . That is. CONTINUOUS TIME dSt = µSt dt + σSt dWt with solution 1 2 St = S0 e(µ − 2 σ )t + σWt . More generally. t). the pdf is the derivative of the above with respect to S. The chain rule gives us. g(S) = ln(S/S0 ) = ln S − ln S0 . dS In the case at hand. Second. t) denote the payoﬀ for this derivative if it expires at time t with stock price St . construct a self-ﬁnancing. let V (St . set up a portfolio consisting of φt units of stock St and ψt units of bond Bt Πt = φt St + ψt Bt . V (St . T ) denote this payoﬀ. and further assumed that its (gross) payoﬀ depends only on T and ST . and are plugging in a stochastic process St and getting a stochastic process V (St .

3. On the other hand. The continuous limit is SF C2 dΠt = φt dSt + ψt dBt . dV (St . hedging portfolio Πt = φt St + ψt Bt with Πt = V (St . Given a derivative as above. t arguments. t) = dΠt two ways. t) dSt + (St . also. We now model the stock and bond. there exists a self-ﬁnancing. The pricing of the portfolio is straightforward. First. t) dt + (St . dVt = dΠt = φt dSt + ψt dBt = (φt µSt + ψt rBt ) dt + φt σSt dWt . ∂t 2 ∂S ∂S ∂V . and the no-arbitrage assumption implies Πt = Vt for all t ≤ T . t) if and only if (BSE) 1 ∂V ∂2V ∂V + σ 2 S 2 2 + rS − rV = 0. while the bond is dBt = rBt dt. t)(dSt )2 . THE WIENER PROCESS such that the terminal value of the portfolio equals the derivative payoﬀ ΠT = VT . Theorem 4. the self-ﬁnancing condition in the discrete case: (∆Π)n = φn ∆Sn + ψn ∆Bn . ∂t ∂S 2 ∂S 2 or. the self ﬁnancing condition (SFC2) says that Vt = Πt = φt St + ψt Bt . suppressing the St . (Black-Scholes). φt = Proof. using our stock and bond model on a portfolio.9. by Ito’s lemma ∂V 1 ∂2V ∂V (St . Let’s calculate dV (St . t) = 1 dVt = ∂V dt + ∂V dSt + 2 ∂ V (dSt )2 ∂t ∂S ∂S 2 2 = ( ∂V + µSt ∂V + 1 σ 2 St 2 ∂ V ) dt + ∂t ∂S 2 ∂S 2 2 ∂V σSt ∂S dWt . . Recall.3. The stock is dSt = µSt dt + σSt dWt .4. subject to the self-ﬁnancing condition t 121 SF C1 Πt = Π 0 + 0 φs dSs + ψs dBs . ∂S and furthermore.

t) ∼ S. Equating the coeﬃcients of dWt gives ∂V ∂V σSt . The heat equation is ∂2u ∂u =k 2. t) to satisfy the BlackScholes equation and for the stock holding to be φt = ∂V . we need one time condition. consider a call. the payoﬀ of a call at expiry. Since the BSE is ﬁrst order in t.) The boundary conditions are usually based on S = 0 and S → ∞. ∂t ∂x . h(S) = max(S − K. you need the value of the derivative Vt = V (St . completing the proof. hence φt = . If the stock price is zero. one has an “initial” condition. to construct a self-ﬁnancing. ∂t ∂S 2 ∂S ∂V 1 ∂2V + σ 2 St 2 2 . Here. 0) is the ﬁnal condition. ∂S The theory of partial diﬀerential equations tells us that in order to get unique solutions. a call is worthless. t). we need boundary and temporal conditions. ∂t 2 ∂S ∂V Since ψt Bt = Πt − φt St = Vt − ∂S St we have ψt rBt = ∂V ∂V 1 2 2 ∂2V r(Vt − St ) = + σ St . ∂S ∂S and equating the coeﬃcients of dt gives φt σSt = φt µSt + ψt rBt = hence ∂V ∂V 1 ∂2V + µSt + σ 2 St 2 2 . that is V (S. the call is essentially worth the same as the stock. the coeﬃcients of dt and of dWt must be equal. so V (ST . The time condition is simply the expiration payoﬀ V (ST . 0) is. If the stock price grows arbitrarily large. And since the BSE is second order in S. we need two boundary conditions. T ) = h(ST ). t) = 0 is one boundary condition. (Usually. replicating portfolio for a derivative. hence V (0. CONTINUOUS TIME If these are to be equal. In summary. time t be given by u(x.122 CHAPTER 4. one has a “ﬁnal” or “terminal” condition. corresponding to time 0. ∂S ∂t 2 ∂S 2 or in other works (BSE). For example. T ) = max(ST − K. The One-Dimensional Heat Equation Let the temperature of a homogeneous one-dimensional object at position x. by deﬁnition.

2πt What is limt→0 u(x. indexed by t ≥ 0. of piecewise smooth functions such that (1) (2) (3) Example. the solution is u(x. δt any Dirac δ-function. limt→0 δt (x) = 0 for all x = 0. If f is any bounded smooth function. Every Dirac δ-function is something like the ﬁrst example. and on a small interval. THE WIENER PROCESS 123 where k is a constant. t) = √ e 4πt are a Dirac δ-function. then ∞ lim δt (x)f (x) dx = f (0) . the initial temperature distribution? To answer this. The solutions to the heat equation 2 1 −x 4t δt (x) = u(x. t→0 −∞ ∞ more generally. if − t ≤ x ≤ t . continuous functions are ap∞ t−y 1 proximately constant. the thermal conductivity of the object’s material. lim t→0 δt (x − y)f (x) dx = f (y) . −∞ Intuitively. ∞ δ (x) dx = 1 for all −∞ t Important Example.4. t). t) = √ If we let k = 1. δt (x) ≥ 0 for all t. if x < −t or x > t .3. A Dirac δ-function is a one-parameter family δt . x. 2t t. Consider δt = { 0 . Proposition. 4πt x2 1 e− 2t . Deﬁnition. this can be seen as follows. t) = √ x2 1 e− 4t . k = 1 gives the Fokker-Planck equation for a Wiener path. so limt→0 −∞ δt (x−y)f (x) dx ≈ limt→0 −t−y 2t f (x) dx ≈ 1 limt→0 2t · 2t · f (y) ≈ f (y). 1 . we use the following notion. with solution 2 u(x. .

124 Proceeding, the Heat Equation

CHAPTER 4. CONTINUOUS TIME

**∂u ∂2u = , ∂t ∂x2 with initial condition u(x, 0) = φ(x) and boundary conditions u(∞, t) = u(−∞, t) = 0, will be solved using δt = √
**

x2 1 e− 4t , 4πt

which is both a Dirac δ-function and a solution to the Heat Equation for t > 0. Let ∞ (x−y)2 1 e− 4t · φ(x) dx for t > 0 . v(y, t) = √ 4πt −∞ Claim. limt→0 v(x, t) = φ(x). This is just an example of the previous proposition: lim √

t→0

1 4πt

∞ −∞

(x−y) e− 4t · φ(x) dx = lim t→0

2

∞

δt (x − y)φ(x) dx = φ(y) .

−∞

**Claim. v satisﬁes the Heat Equation.
**

∂v ∂t

= = = =

∂ ∂t

∞ −∞

(x−y)2 ∞ ∂2 (e− 4t ) · φ(x) dx −∞ ∂y 2 (x−y)2 ∞ ∂2 ∂2 e− 4t · φ(x) dx = ∂yv 2 ∂y 2 −∞

∞ −∞ ∂t

(x−y) e− 4t · φ(x) dx 2 − (x−y) ) · φ(x) dx ∂ 4t (e

2

.

Back to the Black-Scholes equation ∂V 1 ∂2V ∂V + σ 2 S 2 2 + rS − rV = 0 . ∂t 2 ∂S ∂S Note that µ, the expected return of S, does not appear explicitly in the equation. This is called risk neutrality. The particulars of the derivative are encoded in the boundary conditions. For a call C(S, t) one has ∂C 1 2 2 ∂ 2 C ∂C + σ S + rS − rV = 0 , ∂t 2 ∂S 2 ∂S

4.3. THE WIENER PROCESS C(S, T ) = max(S − K, 0) , C(0, t) = 0 for all t, C(S, t) ∼ S as S → ∞ . And for a put P (S, t) one has ∂2P ∂P 1 ∂P + σ 2 S 2 2 + rS − rV = 0 , ∂t 2 ∂S ∂S P (S, T ) = max(K − S, 0) , P (0, t) = K for all t, P (S, t) → 0 as S → ∞ . Black-Scholes Formula The solution to the the call PDE above is C(S, t) = SN (d+ ) − Ke−r(T −t) N (d− ) , where d+ = d− = and

S ln( K )+(r+ 1 σ 2 )(T −t) √ 2 σ T −t S ln( K )+(r− 1 σ 2 )(T −t) √ 2 σ T −t

125

, ,

x ∞ s2 s2 1 1 e− 2 ds = √ e− 2 ds N (x) = √ 2π −∞ 2π −x is the cumulative normal distribution. If you are given data, including the market price for derivatives, and numerically solve the Black-Scholes formula for σ, the resulting value is called the implied volatility. We derive the Black-Scholes formula from the Black-Scholes equation, by using a change of variables to obtain the heat equation, whose solutions we have identiﬁed, at least in a formal way. The ﬁrst diﬀerence is the heat equation has an initial condition at t = 0, while the Black-Scholes equation has a ﬁnal condition at t = T . This can be compensated for by letting τ = T − t. Then in terms of τ , the ﬁnal condition at t = T is an initial condition at τ = 0. 2 The second diﬀerence is that the Black-Scholes equation has S 2 ∂ V while ∂S 2 ∂2u the heat equation has . If we let S = ex , or equivalently, x = ln S, then ∂x2 ∂ ∂x ∂2 ∂x2 ∂ ∂ ∂ = ∂S ∂S = ex ∂S = S ∂S ∂x ∂2 = S 2 ∂S 2 + · · ·

126

CHAPTER 4. CONTINUOUS TIME

The third diﬀerence is that the Black-Scholes equation has ∂V and V terms, ∂S but the heat equation has no ∂u or u terms. An integrating factor can ∂x eliminate such terms, as follows. Suppose we are given ∂u ∂2u ∂u = + bu . +a 2 ∂t ∂x ∂x Let u(x, t) = eαx+βt v(x, t) , with α, β to be chosen later. Then ∂u ∂v = βeαx+βt v + eαx+βt , ∂t ∂t ∂u ∂v = αeαx+βt v + eαx+βt , ∂x ∂x ∂2u ∂v ∂2v = α2 eαx+βt v + 2αeαx+βt + eαx+βt 2 . ∂x2 ∂x ∂x So ∂u ∂2u ∂u = +a + bu ∂t ∂x2 ∂x ∂v ∂v ∂ 2 v ∂v ) = eαx+βt (α2 v+2α + 2 )+a·eαx+βt (αv+ )+b·eαx+βt v , ∂t ∂x ∂x ∂x ∂v ∂ 2 v ∂v ∂v = α2 v + 2α + 2 + a · (αv + )+b·v, ∂t ∂x ∂x ∂x

becomes eαx+βt (βv+ or βv + hence, ∂v ∂2v ∂v = + (a + 2α) + (α2 + aα + b − β)v . 2 ∂t ∂x ∂x With a and b given, one can set a + 2α = 0 and α2 + aα + b − β = 0 and solve for α and β. One gets α = − 1 a and β = b − α2 = b − 1 a2 . In other 2 4 words, if v(x, t) is a solution to ∂v ∂2v = , ∂t ∂x2

so C(S.3. t) = e− 2 ax + (b − 4 a )t v(x. 3. 0) . x = ln(S/K) = 1 ln S − ln K. and also let τ = 2 σ 2 (T − t). T ) = max(S − K. ∂τ ∂x2 σ ∂x σ 2 or . we have 1 ∂u 1 2 2 K ∂u K ∂ 2 u K ∂u − σ2K + σ S (− 2 + 2 2 ) + rS( ) − rKu = 0 2 ∂τ 2 S ∂x S ∂x S ∂x or 1 2 ∂u 1 ∂ 2 u 1 ∂u ∂u σ = σ2 2 − σ2 +r − ru 2 ∂τ 2 ∂x 2 ∂x ∂x ∂2u 2r ∂u 2r ∂u = + ( 2 − 1) − u. ∂t ∂τ dt ∂τ 2 2 ∂τ ∂C ∂C dx ∂u 1 K ∂u = · = (K ) · = . THE WIENER PROCESS 1 1 2 then u(x. ∂C ∂C dτ ∂u 1 1 ∂u = · = (K ) · (− σ 2 ) = − σ 2 K . Then 1. we will now solve the BlackScholes equation for a European call. t) = Ku(x. +a 2 ∂t ∂x ∂x With these three diﬀerences accounted for. Substituting this into the Black-Scholes equation. or equivalently.4. S 2 ∂x S ∂x ∂x ∂S S ∂x S 2 ∂x2 2. We are considering ∂C 1 2 2 ∂ 2 C ∂C + σ S − rV = 0 + rS 2 ∂t 2 ∂S ∂S with ﬁnal condition C(S. ∂S ∂x dS ∂x S S ∂x ∂2C ∂ K ∂u K ∂u K ∂ ∂u = ( )=− 2 + ( ) 2 ∂S ∂S S ∂x S ∂x S ∂S ∂x =− K ∂u K ∂ ∂u ∂x K ∂u K ∂ 2 u + ( ) =− 2 + . First make the change of variables S = Kex . t) satisﬁes 127 ∂u ∂2u ∂u = + bu . τ ).

2 which in this case has φ(y) = max(e − e 2 (k−1)y .128 Set k = 2r . 0) = max(ex · e 2 (k−1)x − e 2 (k−1)x . φ(y) = v(x. Then since φ( 2τ s + x) = 0 when 2τ s + x < 0. −x when s < √2τ . τ ) = eαx+βτ v(x. e 2 (k+1)y > e 2 (k−1)y if and only if e 2 (k+1)y /e 2 (k−1)y > 1 if and only if ey > 1 if and only if y > 0. v(x. CONTINUOUS TIME The equation becomes ∂u ∂2u ∂u = − ku . 0). 0) . 0) becomes e− 2 (k−1)x v(x. 0) or equivalently. which becomes u(x. And the initial 4 4 condition u(x. 0) = φ(x) ∞ −∞ 1 (k+1)y 2 1 1 1 1 1 1 (y−x) e− 4τ φ(y) dy . σ2 CHAPTER 4. 0) becomes the initial condition Ku(x. T ) = max(S − K. 1 with α = − 2 (k − 1) and β = −k − 1 (k − 1)2 = − 1 (k + 1)2 . τ ) = √ 1 4πτ v(x. we can write if y > 0 if y ≤ 0 √ √ Now make the change of variables s = (y − x)/ 2τ . 1 1 1 1 Now. Recall that the heat equation ∂2v ∂v = ∂τ ∂x2 is solved by v(x. 0) = max(ex − 1. 0). . τ ) = = = = √ √ 2τ 4πτ √1 2π √1 2π ∞ −∞ ∞ −∞ ∞ −∞ 2 − (x−( 2τ s+x)) φ(√2τ s + x) ds 4τ e 2τ s2 √ e− 4τ φ( 2τ s + x) ds e 2 (k+1)y − e 2 (k−1)y 0 1 1 √ ∞ √1 x 2π − √2τ s2 √ e− 2 φ( 2τ s + x) ds 2 − s2 (e 1 (k+1)(√2τ s+x) − e 1 (k−1)(√2τ s+x) ) ds 2 2 e = I+ − I− . 0) = max(e 2 (k+1)x − e 2 (k−1)x . τ ). Then ds = dy/ 2τ . So in our situation. 0) = max(ex − 1. 0). As we saw. 0) = max(Kex − K. + (k − 1) 2 ∂τ ∂x ∂x And the ﬁnal condition C(S. √ √ √ so dy = 2τ ds. 0) = max(ex − 1. this reduces to the heat equation if we let u(x. that is.

we have substituting everything back in: x = ln(S/K). THE WIENER PROCESS where 1 I± = √ 2π 129 ∞ − √x 2τ e− 2 s 1 2 1 + 2 (k±1)( √ 2τ s+x) ds . We calculate I+ by completing the square inside the exponent. So with v(x.4. e 2 (k+1)x+ 4 (k+1) √ 2π 1 1 2τ 2τ 1 using α = − 2 (k −1) and β = − 1 (k +1)2 and where d+ = 4 The second to last line was arrived at since ∞ a 2 /2 √x 2τ √ 1 + 2 (k + 1) 2τ . τ ) = 2 σ T −t 2τ I+ − I− established. −a e−t dt = −∞ e−t 2 /2 dt Now S = Kex and 2τ = σ 2 (T − t). so d+ = = = √x + 1 (k + 1) 2τ 2 2τ √ ln(S/K) 1 2r √ + 2 ( σ2 + 1)σ T σ T −t S ln( K )+(r+ 1 σ 2 )(T −t) √ 2 . I+ = = = = = = = √ ∞ − 1 s2 + 1 (k+1)( 2τ s+x) √1 2 ds x e 2 2π − √2τ √ 1 1 2 1 ∞ − s + 2 (k+1) 2τ s √1 · e 2 (k+1)x ds x e 2 −√ 2π 2τ √ 1 (k+1)x 1 1 2 1 2 ∞ e 2√ e− 2 (s− 2 (k+1) 2τ ) + 4 (k+1) τ ds − √x 2π 2τ √ 2 1 1 e 2 (k+1)x+ 4 (k+1) τ ∞ − 1 (s− 1 (k+1) 2τ )2 √ 2 ds x e 2 −√ 2π 1 2 ∞ √ e− 2 t dt − √x − 1 (k+1) 2τ 2 2τ √ 2 1 1 √x + 1 (k+1) 2τ 1 2 2 2τ e 2 (k+1)x+ 4 (k+1) τ √ e− 2 t dt −∞ 2π 1 1 2 e 2 (k+1)x+ 4 (k+1) τ · N (d+ ) = ex−αx−βτ N (d+ ) . 1 1 2 . σ T −t √ −t The integral I− is treated in the same manner and gives I− = e 2 (k−1)+ 4 (k−1) τ N (d− ) = e−αx−βτ −kτ N (d− ) S √ ln( K )+(r− 1 σ 2 )(T −t) √ 2 where d− = √x + 1 (k − 1) 2τ = .3.

Memorize the properties of the Wiener process and close your notes and books and write them on them down.130 τ= σ2 (T 2 CHAPTER 4. τ ) Keαx+βτ (I+ − I− ) Keαx+βτ ex−αx−βτ · N (d+ ) − Keαx+βτ e−αx−βτ −kτ N (d− ) Kex N (d+ ) − Ke−kτ N (d− ) SN (d+ ) − Ke−r(T −t) N (d− ) 4. Since for a Poisson process Pjk (t) = P0. Find the pdf of Tm .k−j (t) for all positive t. Verify that the expected value and variance of Wt∆ are as claimed in the text.4 Exercises 1. then the random variable Xt − j has a Poisson distribution with parameter λt. 4. use the Markov assumption to prove that Xt − Xs has Poisson distribution with parameter λ(t − s). For a Poisson process with parameter λ. t) obtained at the end of the section satisﬁes the Fokker-Planck equation. both for the case 0≤n≤m and 0≤t≤s and the case 0≤m≤n and 0≤s≤t 5. t) = = = = = Ku(x. Let Tm be the continuous random variable that gives the time to the mth jump of a Poisson process with parameter λ. ﬁnd P (Xs = m|Xt = n). CONTINUOUS TIME − t) and k = 2r . 8. Verify that the function p(x. τ ) = Keαx+βτ v(x. Draw some graphs of the distribution of a Poisson process Xt for various values of t (and λ). Calculate E(eλWt ). . What happens as t increases? 2. show that if X0 = j. Then. 7. 6. 3. σ2 C(S.

dt))2 ) . P.4. calculate the mean square diﬀerence which is by deﬁnition E((I(φs . dXs ) mesh P→0 12. Suppose a particle is moving along the real line following a Wiener process. (Write down the expression and then use a computer. For a partition P as above. EXERCISES 131 9. P. Show (a) Show E((∆Wti )2 ) = ∆ti . What is E(Ws2 Wt ). (dWs )2 ) − I(φs . Then one forms the limit as the mesh size of the partitions go to zero. P.) 10. 13. 2. dXs ) = j=1 φtj−1 ∆Xtj where ∆Xtj = Xtj − Xtj−1 . t φs dXs = t0 lim I(φs . 11. At time 10 you observe the position of the particle is 3. On a computer (or a calculator).4. for a partition P = {t0 < t1 < t2 < · · · tn = t} we deﬁne the Ito sum n I(φs . or a table if you must. (b) Show E(((∆Wti )2 − ∆ti )2 ) = 2(∆ti )2 . Let Wt be the Wiener process. 7]. First. to estimate this value. This problems is very similar to the calculation of the Ito integral Wt dWt . The following three problems provide justiﬁcation for the Ito calculus rule (dWs )2 = ds. Recall that for a process φt adapted to Xt how the Ito integral t φs dXs t0 is deﬁned. What is the probability that at time 12 the particle will be found in the interval [5. ﬁnd the smallest value of n = 1. 3. · · · such that P (|W1 | ≤ n) is calculated as 0. P. You must break up the cases s < t and s > t.

Solve the Vasicek model for interest rates drt = a(b − rt )dt + σdWt subject to the initial condition r0 = c. (c) If µ = . CONTINUOUS TIME n = E(( j=1 φtj−1 (∆Wtj )2 − j=1 φtj−1 ∆tj )2 ) and show that it equals n 2 j=1 E(φ2j−1 )(∆tj )2 t 14. that the limit as the mesh P → 0 of this expression is 0 and thus that for any adapted φs t t φs (dWs )2 = t0 t0 φs ds 15. What is d(sin(Wt )) 18. (Hint: You will need to use a multiplier similar to the one for the Ornstein-Uhlenbeck process. 17. Calculate 0 t Wsn dWs 16.1 and σ = . Show that for an adapted process φs such that E(φ2 ) ≤ M for some s M > 0 and for all s. (a) Solve the stock model SDE dSt = µSt dt + σSt dWt (b) Find the pdf of St .) .25 and S0 = 100 what is P (S1 ≥ 110) .132 n CHAPTER 4.

K − h)) h2 (c) Let h(S) denote the payoﬀ function of some derivative.4. T. t) denotes the value at time t of the derivative with payoﬀ function h if stock price is S (at time t) then why is V (S. t) = ∞ h→0 lim h(K) 0 1 (C(S. Suppose European calls on S with expiration T are available with arbitrary strike prices K. h] Show that collection of functions δh (x) for h > 0 forms a δfunction. Let C(S. Show 1 (C(S. t. K + h) − 2C(S. If you can not do one part. 0] (4. K) + C(S.1) 2 (h − x)/h if x ∈ [0. Then we can synthesize the derivative by δh (S − K) = ∞ h(S) = lim h→0 h(K)δh (S − K)dK 0 If V (S. for example. (a) Consider the following functions: Fix a number h > 0. T. C(S. K) denote the value at time t of the call with strike K. go onto the next. 0). This problem is long and has many parts. This allows us to express the value of any derivative in terms of the Black-Scholes formula. T. K) + C(S. T. Deﬁne 0 if |x| ≥ h 2 δh (x) = (x + h)/h if x ∈ [−h. T. K + h) − 2C(S. In this problem. T. K) = max(S − K. you will show how to synthesize any derivative (whose pay-oﬀ only depends on the ﬁnal value of the stock) in terms of calls with diﬀerent strikes. EXERCISES 133 19. K − h)) dK h2 .4. Recall that this implies that for any continuous function f that ∞ f (y) = lim h→0 f (x)δh (x − y)dx −∞ (b) Consider a non-divdend paying stock S. T.4. Thus.

) . t. t. K)|∞ + K=0 ∂K ∞ h (K)C(S.134 CHAPTER 4. K) + C(S. T. t. t. T. show that 1 (C(S. K) ∂K 2 (e) Show that V (S. K + h) − 2C(S. K) ∞ |K=0 −h (K)C(S. K)dK 0 (Hint: Integrate by parts. t) is equal to h(K) ∂C(S. CONTINUOUS TIME (d) Using Taylor’s theorem. T. K − h)) h→0 h2 lim = ∂ 2 C(S.

- random variable.pdfAndrew Tan
- syllabusAzfaar Siddiqui
- Lecture 9Ed Z
- Probability Theory Full Versionator0380
- CS2800-Probability Part a ReviseIrfan Baloch
- 2333Midterm Review 2014nguyentran263
- Practice Ex. 1Uzma Salim Arain
- Probability, Statistics and Stochastic ProcessesZhang Lizi
- Chap 02 Probability.pptKent Braña Tan
- Distribution and Density FunctionArun Kumar Dhupam
- Statisticalmohammad
- Intensity TransformationsAntonio Alejo Aquino
- sampling distribution and simulationRasel Biswas
- MS322 - Probability and Stochastic ProcessesWaqas Kiani
- Prob4Comp_Lec4Tiến Quảng
- 2016 s Poisson Distributionpkashyap
- Random VariablesSrikirupa V Muraly
- [Adv Anced] Analysis, Probability and Statistics Exercisesadynast
- An Inter Temporal Model With Stochastic Time-Inconsistent Preferences on Nicole Cherry's Love Lifescott_lee_40
- Random process by B. Hajekbasusoumya
- Evaluating performance of univariate, multivariate and VAR approaches forecasting Swiss Gross Domestic Product using EViewsMartin W
- ExamC-1-4[1]Chandra Rau
- שפות סימולציה- הרצאה 7 | Review of ProbabilityRon
- Eric Brehm Reliability of Unreinforced Masonry Bracing Wallsamaliaconstantin
- Lecture 02 IUB MAT 212Mir Sahib
- The Paradox of Random OrderRadorno212
- CDC12_BEGaddbugs
- Computing MAthsSam
- MA1014 Probability and Queueing TheoryDivanshuSri
- Theory of Probabilitysubhdsarma

- Quantitative Methods for EconomicGuruKPO
- tmp83ED.tmpFrontiers
- UT Dallas Syllabus for stat6331.501.09f taught by Robert Serfling (serfling)UT Dallas Provost's Technology Group
- 2016 Reuters Tracking Poll 4-13-2016The Conservative Treehouse
- UT Dallas Syllabus for cs3341.501 06s taught by Pankaj Choudhary (pkc022000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for opre6330.501 06f taught by Shun-chen Niu (scniu)UT Dallas Provost's Technology Group
- Statistical Methods MCQ'SGuruKPO
- UT Dallas Syllabus for se3341.501.08s taught by Pankaj Choudhary (pkc022000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for stat4351.501.09f taught by Yuly Koshevnik (yxk055000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for se3341.001 06f taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for ce6352.001.07s taught by Galigekere Dattatreya (datta)UT Dallas Provost's Technology Group
- UT Dallas Sample Syllabus for Chansu JungUT Dallas Provost's Technology Group
- UT Dallas Syllabus for cs3341.501.11s taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- As 3778.2.4-2007 Measurement of Water Flow in Open Channels General - Estimation of Uncertainty of a Flow RatSAI Global - APAC
- UT Dallas Syllabus for cs3341.001 05f taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for eco7391.001 06f taught by Wenhua Di (wxd041000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for gisc6379.002.07f taught by Daniel Griffith (dag054000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for se3341.5u1.08u taught by (xxx)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for se3341.501.10f taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for cs3341.001.10s taught by Pankaj Choudhary (pkc022000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for opre6301.504 06s taught by Alain Bensoussan (axb046100)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for cs3341.501.07f taught by Michael Baron (mbaron)UT Dallas Provost's Technology Group
- Statistical MethodsGuruKPO
- UT Dallas Syllabus for cs3341.501 05f taught by Pankaj Choudhary (pkc022000)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for meco6315.501.10s taught by John Wiorkowski (wiorkow)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for stat6331.501.07f taught by Robert Serfling (serfling)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for te3341.5u1.08u taught by Md Ali (mohammed)UT Dallas Provost's Technology Group
- tmpE82F.tmpFrontiers
- UT Dallas Syllabus for opre6301.503 06f taught by Carol Flannery (flannery)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for opre6301.mim.07f taught by Kurt Beron (kberon)UT Dallas Provost's Technology Group

- tmp103D.tmpFrontiers
- UT Dallas Syllabus for te3341.002.07f taught by John Fonseka (kjp)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for te3341.001.07f taught by John Fonseka (kjp)UT Dallas Provost's Technology Group
- UT Dallas Syllabus for te3341.001.08f taught by John Fonseka (kjp)UT Dallas Provost's Technology Group
- tmpC12BFrontiers
- tmpC525Frontiers
- tmpA395.tmpFrontiers

Sign up to vote on this title

UsefulNot usefulRead Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading