You are on page 1of 46

Financial Engineering & Risk Management

Review of Basic Probability

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Discrete Random Variables
Definition. The cumulative distribution function (CDF), F (·), of a random
variable, X , is defined by
F(x) := P(X ≤ x).

Definition. A discrete random variable, X , has probability mass function (PMF),


p(·), if p(x) ≥ 0 and for all events A we have
X
P(X ∈ A) = p(x).
x∈A

Definition. The expected value of a discrete random variable, X , is given by


X
E[X ] := xi p(xi ).
i

Definition. The variance of any random variable, X , is defined as


Var(X ) := E (X − E[X ])2
 

= E[X 2 ] − E[X ]2 .

2
The Binomial Distribution

We say X has a binomial distribution, or X ∼ Bin(n, p), if


 
n
P(X = r) = pr (1 − p)n−r .
r

For example, X might represent the number of heads in n independent coin


tosses, where p = P(head). The mean and variance of the binomial distribution
satisfy

E[X ] = np
Var(X ) = np(1 − p).

3
A Financial Application

Suppose a fund manager outperforms the market in a given year with


probability p and that she underperforms the market with probability 1 − p.
She has a track record of 10 years and has outperformed the market in 8 of
the 10 years.
Moreover, performance in any one year is independent of performance in
other years.
Question: How likely is a track record as good as this if the fund manager had no
skill so that p = 1/2?

Answer: Let X be the number of outperforming years. Since the fund manager
has no skill, X ∼ Bin(n = 10, p = 1/2) and
n  
X n
P(X ≥ 8) = pr (1 − p)n−r
r
r=8

Question: Suppose there are M fund managers? How well should the best one do
over the 10-year period if none of them had any skill?
4
The Poisson Distribution

We say X has a Poisson(λ) distribution if

λr e −λ
P(X = r) = .
r!
E[X ] = λ and Var(X ) = λ.

For example, the mean is calculated as


∞ ∞ ∞
X X λr e −λ X λr e −λ
E[X ] = r P(X = r) = r = r
r=0 r=0
r! r=1
r!

X λr−1 e −λ
= λ
r=1
(r − 1)!

X λr e −λ
= λ = λ.
r=0
r!

5
Bayes’ Theorem

Let A and B be two events for which P(B) 6= 0. Then


T
P(A B)
P(A | B) =
P(B)
P(B | A)P(A)
=
P(B)
P(B | A)P(A)
= P
j P(B | Aj )P(Aj )

where the Aj ’s form a partition of the sample-space.

6
An Example: Tossing Two Fair 6-Sided Dice

6 7 8 9 10 11 12
5 6 7 8 9 10 11
4 5 6 7 8 9 10
Y2 3 4 5 6 7 8 9
2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
Y1
Table : X = Y1 + Y2

Let Y1 and Y2 be the outcomes of tossing two fair dice independently of


one another.
Let X := Y1 + Y2 . Question: What is P(Y1 ≥ 4 | X ≥ 8)?

7
Continuous Random Variables

Definition. A continuous random variable, X , has probability density function


(PDF), f (·), if f (x) ≥ 0 and for all events A
Z
P(X ∈ A) = f (y) dy.
A

The CDF and PDF are related by


Z x
F(x) = f (y) dy.
−∞

It is often convenient to observe that


    
P X∈ x− , x+ ≈ f (x)
2 2

8
The Normal Distribution

We say X has a Normal distribution, or X ∼ N(µ, σ 2 ), if

(x − µ)2
 
1
f (x) = √ exp − .
2πσ 2 2σ 2

The mean and variance of the normal distribution satisfy

E[X ] = µ
Var(X ) = σ2 .

9
The Log-Normal Distribution

We say X has a log-normal distribution, or X ∼ LN(µ, σ 2 ), if

log(X ) ∼ N(µ, σ 2 ).

The mean and variance of the log-normal distribution satisfy

E[X ] = exp(µ + σ 2 /2)


Var(X ) = exp(2µ + σ 2 ) (exp(σ 2 ) − 1).

The log-normal distribution plays a very important in financial applications.

10
Financial Engineering & Risk Management
Review of Conditional Expectations and Variances

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Conditional Expectations and Variances

Let X and Y be two random variables.

The conditional expectation identity says

E[X ] = E [E[X |Y ]]

and the conditional variance identity says

Var(X ) = Var(E[X |Y ]) + E[Var(X |Y )].

Note that E[X |Y ] and Var(X |Y ) are both functions of Y and are therefore
random variables themselves.

2
A Random Sum of Random Variables

Let W = X1 + X2 + . . . + XN where the Xi ’s are IID with mean µx and variance


σx2 , and where N is also a random variable, independent of the Xi ’s.

Question: What is E[W ]?


Answer: The conditional expectation identity implies
" "N ##
X
E[W ] = E E Xi | N
i=1
= E [N µx ] = µx E [N ] .

Question: What is Var(W )?


Answer: The conditional variance identity implies

Var(W ) = Var(E[W |N ]) + E[Var(W |N )]


= Var(µx N ) + E[N σx2 ]
= µ2x Var(N ) + σx2 E[N ].

3
An Example: Chickens and Eggs

A hen lays N eggs where N ∼ Poisson(λ). Each egg hatches and yields a chicken
with probability p, independently of the other eggs and N . Let K be the number
of chickens.

Question: What is E[K |N ]?


Answer: We can use indicator functions to answer this question.
PN
In particular, can write K = i=1 1Hi where Hi is the event that the i th egg
hatches. Therefore

1, if i th egg hatches;

1Hi =
0, otherwise.

Also clear that E[1Hi ] = 1 × p + 0 × (1 − p) = p so that


"N # N
X X
E[K |N ] = E 1Hi | N = E [1Hi ] = Np.
i=1 i=1

Conditional expectation formula then gives E[K ] = E[E[K |N ]] = E[Np] = λp.


4
Financial Engineering & Risk Management
Review of Multivariate Distributions

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Multivariate Distributions I
Let X = (X1 . . . Xn )> be an n-dimensional vector of random variables.

Definition. For all x = (x1 , . . . , xn ) ∈ Rn , the joint cumulative distribution


function (CDF) of X satisfies

FX (x) = FX (x1 , . . . , xn ) = P(X1 ≤ x1 , . . . , Xn ≤ xn ).

Definition. For a fixed i, the marginal CDF of Xi satisfies

FXi (xi ) = FX (∞, . . . , ∞, xi , ∞, . . . ∞).

It is straightforward to generalize the previous definition to joint marginal


distributions. For example, the joint marginal distribution of Xi and Xj satisfies

Fij (xi , xj ) = FX (∞, . . . , ∞, xi , ∞, . . . , ∞, xj , ∞, . . . ∞).

We also say that X has joint PDF fX (·, . . . , ·) if


Z x1 Z xn
FX (x1 , . . . , xn ) = ··· fX (u1 , . . . , un ) du1 . . . dun .
−∞ −∞

2
Multivariate Distributions II
Definition. If X1 = (X1 , . . . Xk )> and X2 = (Xk+1 . . . Xn )> is a partition of
X then the conditional CDF of X2 given X1 satisfies

FX2 |X1 (x2 | x1 ) = P(X2 ≤ x2 | X1 = x1 ).

If X has a PDF, fX (·), then the conditional PDF of X2 given X1 satisfies

fX (x) fX1 |X2 (x1 | x2 )fX2 (x2 )


fX2 |X1 (x2 | x1 ) = = (1)
fX1 (x1 ) fX1 (x1 )

and the conditional CDF is then given by


Z xk+1 Z xn
fX (x1 , . . . , xk , uk+1 , . . . , un )
FX2 |X1 (x2 |x1 ) = ··· duk+1 . . . dun
−∞ −∞ fX1 (x1 )

where fX1 (·) is the joint marginal PDF of X1 which is given by


Z ∞ Z ∞
fX1 (x1 , . . . , xk ) = ··· fX (x1 , . . . , xk , uk+1 , . . . , un ) duk+1 . . . dun .
−∞ −∞

3
Independence

Definition. We say the collection X is independent if the joint CDF can be


factored into the product of the marginal CDFs so that

FX (x1 , . . . , xn ) = FX1 (x1 ) . . . FXn (xn ).

If X has a PDF, fX (·) then independence implies that the PDF also factorizes
into the product of marginal PDFs so that

fX (x) = fX1 (x1 ) . . . fXn (xn ).

Can also see from (1) that if X1 and X2 are independent then

fX (x) fX1 (x1 )fX2 (x2 )


fX2 |X1 (x2 | x1 ) = = = fX2 (x2 )
fX1 (x1 ) fX1 (x1 )

– so having information about X1 tells you nothing about X2 .

4
Implications of Independence
Let X and Y be independent random variables. Then for any events, A and B,

P (X ∈ A, Y ∈ B) = P (X ∈ A) P (Y ∈ B) (2)

More generally, for any function, f (·) and g(·), independence of X and Y implies

E[f (X )g(Y )] = E[f (X )]E[g(Y )]. (3)

In fact, (2) follows from (3) since


 
P (X ∈ A, Y ∈ B) = E 1{X∈A} 1{Y ∈B}
   
= E 1{X∈A} E 1{Y ∈B} by (3)
= P (X ∈ A) P (Y ∈ B) .

5
Implications of Independence

More generally, if X1 , . . . Xn are independent random variables then

E [f1 (X1 )f2 (X2 ) · · · fn (Xn )] = E[f1 (X1 )]E[f2 (X2 )] · · · E[fn (Xn )].

Random variables can also be conditionally independent. For example, we say X


and Y are conditionally independent given Z if

E[f (X )g(Y ) | Z ] = E[f (X ) | Z ] E[g(Y ) | Z ].

– used in the (in)famous Gaussian copula model for pricing CDOs!

In particular, let Di be the event that the i th bond in a portfolio defaults.


Not reasonable to assume that the Di ’s are independent. Why?
But maybe they are conditionally independent given Z so that

P(D1 , . . . , Dn | Z ) = P(D1 | Z ) · · · P(Dn | Z )

– often easy to compute this.


6
The Mean Vector and Covariance Matrix

The mean vector of X is given by


>
E[X] := (E[X1 ] . . . E[Xn ])

and the covariance matrix of X satisfies

Σ := Cov(X) := E (X − E[X]) (X − E[X])>


 

so that the (i, j)th element of Σ is simply the covariance of Xi and Xj .

The covariance matrix is symmetric and its diagonal elements satisfy Σi,i ≥ 0.

It is also positive semi-definite so that x> Σx ≥ 0 for all x ∈ Rn .

The correlation matrix, ρ(X), has (i, j)th element ρij := Corr(Xi , Xj )
- it is also symmetric, positive semi-definite and has 1’s along the diagonal.

7
Variances and Covariances

For any matrix A ∈ Rk×n and vector a ∈ Rk we have

E [AX + a] = AE [X] + a (4)


>
Cov(AX + a) = A Cov(X) A . (5)

Note that (5) implies

Var(aX + bY ) = a 2 Var(X ) + b2 Var(Y ) + 2ab Cov(X , Y ).

If X and Y independent, then Cov(X , Y ) = 0


– but converse not true in general.

8
Financial Engineering & Risk Management
The Multivariate Normal Distribution

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
The Multivariate Normal Distribution I

If the n-dimensional vector X is multivariate normal with mean vector µ and


covariance matrix Σ then we write

X ∼ MNn (µ, Σ).

The PDF of X is given by


1 1 > −1
fX (x) = e − 2 (x−µ) Σ (x−µ)
(2π)n/2 |Σ|1/2

where | · | denotes the determinant.

Standard multivariate normal has µ = 0 and Σ = In , the n × n identity matrix


- in this case the Xi ’s are independent.
The moment generating function (MGF) of X satisfies
h > i > 1 >
φX (s) = E e s X = e s µ+ 2 s Σs .

2
The Multivariate Normal Distribution II

Recall our partition of X into X1 = (X1 . . . Xk )> and X2 = (Xk+1 . . . Xn )> .


Can extend this notation naturally so that
   
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22

are the mean vector and covariance matrix of (X1 , X2 ).

Then have following results on marginal and conditional distributions of X:

Marginal Distribution
The marginal distribution of a multivariate normal random vector is itself normal.
In particular, Xi ∼ MN(µi , Σii ), for i = 1, 2.

3
The Bivariate Normal PDF

The Bivariate Normal PDF

4
The Multivariate Normal Distribution III

Conditional Distribution
Assuming Σ is positive definite, the conditional distribution of a multivariate
normal distribution is also a multivariate normal distribution. In particular,

X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )
−1
where µ2.1 = µ2 + Σ21 Σ11 (x1 − µ1 ) and Σ2.1 = Σ22 − Σ21 Σ−1
11 Σ12 .

Linear Combinations
A linear combination, AX + a, of a multivariate normal random vector, X, is
normally distributed with mean vector, AE [X] + a, and covariance matrix,
A Cov(X) A> .

5
Financial Engineering & Risk Management
Introduction to Martingales

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Martingales

Definition. A random process, {Xn : 0 ≤ n ≤ ∞}, is a martingale with respect


to the information filtration, Fn , and probability distribution, P, if
1. EP [|Xn |] < ∞ for all n ≥ 0
2. EP [Xn+m |Fn ] = Xn for all n, m ≥ 0.

Martingales are used to model fair games and have a rich history in the modeling
of gambling problems.

We define a submartingale by replacing condition #2 with

EP [Xn+m |Fn ] ≥ Xn for all n, m ≥ 0.

And we define a supermartingale by replacing condition #2 with

EP [Xn+m |Fn ] ≤ Xn for all n, m ≥ 0.

A martingale is both a submartingale and a supermartingale.


2
Constructing a Martingale from a Random Walk
Pn
Let Sn := i=1 Xi be a random walk where the Xi ’s are IID with mean µ.

Let Mn := Sn − nµ. Then Mn is a martingale because:


"n+m #
X
En [Mn+m ] = En Xi − (n + m)µ
i=1
"n+m #
X
= En Xi − (n + m)µ
i=1
n
" n+m
#
X X
= Xi + En Xi − (n + m)µ
i=1 i=n+1
Xn
= Xi + mµ − (n + m)µ = Mn .
i=1

3
A Martingale Betting Strategy

Let X1 , X2 , . . . be IID random variables with


1
P(Xi = 1) = P(Xi = −1) = .
2
Can imagine Xi representing the result of coin-flipping game:
Win $1 if coin comes up heads
Lose $1 if coin comes up tails
Consider now a doubling strategy where we keep doubling the bet until we
eventually win. Once we win, we stop and our initial bet is $1.

First note that size of bet on n th play is 2n−1


– assuming we’re still playing at time n.

Let Wn denote total winnings after n coin tosses assuming W0 = 0.

Then Wn is a martingale!

4
A Martingale Betting Strategy

To see this, first note that Wn ∈ {1, −2n + 1} for all n. Why?

1. Suppose we win for first time on n th bet. Then

− 1 + 2 + · · · + 2n−2 + 2n−1

Wn =
− 2n−1 − 1 + 2n−1

=
= 1

2. If we have not yet won after n bets then

− 1 + 2 + · · · + 2n−1

Wn =
= −2n + 1.

To show Wn is a martingale only need to show E[Wn+1 | Wn ] = Wn


– then follows by iterated expectations that E[Wn+m | Wn ] = Wn .

5
A Martingale Betting Strategy
There are two cases to consider:
1: Wn = 1: then P(Wn+1 = 1|Wn = 1) = 1 so

E[Wn+1 | Wn = 1] = 1 = Wn (6)

2: Wn = −2n + 1: bet 2n on (n + 1)th toss so Wn+1 ∈ {1, −2n+1 + 1}.


Clear that

P(Wn+1 = 1 | Wn = −2n + 1) = 1/2


P(Wn+1 = −2n+1 + 1 | Wn = −2n + 1) = 1/2

so that

E[Wn+1 | Wn = −2n + 1] = (1/2)1 + (1/2)(−2n+1 + 1)


= −2n + 1 = Wn . (7)

From (6) and (7) we see that E[Wn+1 | Wn ] = Wn .

6
Polya’s Urn
Consider an urn which contains red balls and green balls.
Initially there is just one green ball and one red ball in the urn.

At each time step a ball is chosen randomly from the urn:


1. If ball is red, then it’s returned to the urn with an additional red ball.
2. If ball is green, then it’s returned to the urn with an additional green ball.
Let Xn denote the number of red balls in the urn after n draws. Then
k
P(Xn+1 = k + 1 | Xn = k) =
n+2
n+2−k
P(Xn+1 = k | Xn = k) = .
n+2
Show that Mn := Xn /(n + 2) is a martingale.

(These martingale examples taken from “Introduction to Stochastic Processes”


(Chapman & Hall) by Gregory F. Lawler.)

7
Financial Engineering & Risk Management
Introduction to Brownian Motion

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Brownian Motion

Definition. We say that a random process, {Xt : t ≥ 0}, is a Brownian motion


with parameters (µ, σ) if
1. For 0 < t1 < t2 < . . . < tn−1 < tn

(Xt2 − Xt1 ), (Xt3 − Xt2 ), . . . , (Xtn − Xtn−1 )

are mutually independent.


2. For s > 0, Xt+s − Xt ∼ N(µs, σ 2 s) and
3. Xt is a continuous function of t.

We say that Xt is a B(µ, σ) Brownian motion with drift µ and volatility σ

Property #1 is often called the independent increments property.

Remark. Bachelier (1900) and Einstein (1905) were the first to explore Brownian
motion from a mathematical viewpoint whereas Wiener (1920’s) was the first to
show that it actually exists as a well-defined mathematical entity.
2
Standard Brownian Motion

When µ = 0 and σ = 1 we have a standard Brownian motion (SBM).


We will use Wt to denote a SBM and we always assume that W0 = 0.
Note that if Xt ∼ B(µ, σ) and X0 = x then we can write

Xt = x + µt + σWt (8)

where Wt is an SBM. Therefore see that Xt ∼ N (x + µt, σ 2 t).

3
Sample Paths of Brownian Motion

4
Information Filtrations

For any random process we will use Ft to denote the information available
at time t
- the set {Ft }t≥0 is then the information filtration

- so E[· | Ft ] denotes an expectation conditional on time t information available.

Will usually write E[· | Ft ] as Et [ · ].

Important Fact: The independent increments property of Brownian motion


implies that any function of Wt+s − Wt is independent of Ft and that

(Wt+s − Wt ) ∼ N(0, s).

5
A Brownian Motion Calculation

Question: What is E0 [Wt+s Ws ]?

Answer: We can use a version of the conditional expectation identity to obtain

E0 [Wt+s Ws ] = E0 [(Wt+s − Ws + Ws ) Ws ]
E0 [(Wt+s − Ws ) Ws ] + E0 Ws2 .
 
= (9)

Now we know (why?) E0 [Ws2 ] = s.


To calculate first term on r.h.s. of (9) a version of the conditional expectation
identity implies

E0 [(Wt+s − Ws ) Ws ] = E0 [Es [(Wt+s − Ws ) Ws ]]


= E0 [Ws Es [(Wt+s − Ws )]]
= E0 [Ws 0]
= 0.

Therefore obtain E0 [Wt+s Ws ] = s.


6
Financial Engineering & Risk Management
Geometric Brownian Motion

M. Haugh G. Iyengar
Department of Industrial Engineering and Operations Research
Columbia University
Geometric Brownian Motion

Definition. We say that a random process, Xt , is a geometric Brownian motion


(GBM) if for all t ≥ 0
2

µ− σ2 t + σWt
Xt = e
where Wt is a standard Brownian motion.

We call µ the drift, σ the volatility and write Xt ∼ GBM(µ, σ).

Note that
2

µ− σ2 (t+s) + σWt+s
Xt+s = X0 e
2 2
 
µ− σ2 t + σWt + µ− σ2 s + σ(Wt+s −Wt )
= X0 e
2

µ− σ2 s + σ(Wt+s −Wt )
= Xt e (10)

– a representation that is very useful for simulating security prices.

2
Geometric Brownian Motion

Question: Suppose Xt ∼ GBM(µ, σ). What is Et [Xt+s ]?

Answer: From (10) we have


 2
 
µ− σ2 s + σ(Wt+s −Wt )
Et [Xt+s ] = Et Xt e
2
 h i
µ− σ2 s
= Xt e Et e σ(Wt+s −Wt )
2

µ− σ2 s σ2
s
= Xt e e 2

= e µs Xt

– so the expected growth rate of Xt is µ.

3
Sample Paths of Geometric Brownian Motion

4
Geometric Brownian Motion

The following properties of GBM follow immediately from the definition of BM:

Xt2 Xt3 X
1. Fix t1 , t2 , . . . , tn . Then Xt1 , Xt2 , . . . , Xt tn are mutually independent.
n−1
(For a period of time t, consider 0<t1 <t2 <t3 <t4 …..tn < t)

2. Paths of Xt are continuous as a function of t, i.e., they do not jump.


   
Xt+s σ2
3. For s > 0, log Xt ∼ N (µ − 2 )s, σ2 s .

5
Modeling Stock Prices as GBM

Suppose Xt ∼ GBM(µ, σ). Then clear that:


1. If Xt > 0, then Xt+s is always positive for any s > 0.
- so limited liability of stock price is not violated.

2. The distribution of Xt+s /Xt only depends on s and not on Xt

These properties suggest that GBM might be a reasonable model for stock prices.

Indeed it is the underlying model for the famous Black-Scholes option formula.

You might also like