Stochastic Processes Applications Lecturenotes

Stochastic Processes
Selective Topics and Applications

Nguyen V.M. Man, Ph.D.
January 15, 2013
Keywords. probabilistic model, random process, linear algebra, compu-
tational algebra, statistical inference and modeling
Copyright in 2013 by
Lecturer Nguyen V. M. Man, Ph.D.
Faculty Computer Science and Engineering
Institution University of Technology of HCMC - HCMUT
Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam
Email: mnguyen@cse.hcmut.edu.vn
Ehome www.cse.hcmut.edu.vn/ mnguyen
the AUTHOR
Man Nguyen conducted his Ph.D. research in Applied Mathematics
and Industrial Statistics after following a master program in Computa-
tional Lie Algebras at HCMs University of Science.
The Ph.D. work was about Factorial Experiment Designs using Com-
puter Algebraic methods and Discrete Mathematics, be done at the Eind-
hoven University of Technology, the Netherlands in 2001-2005.
His current research interests include
* Algebraic Statistics and Experimental Designs, and
* Mathematical & Statistical Modeling of practical problems.
For more information, you are welcomed to visit his e-home at
www.cse.hcmut.edu.vn/mnguyen
ii
Contents
1 Background 9
1.1 Introductory Stochastic Processes . . . . . . . . . . . . . 9
1.2 Generating Functions . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 13
1.2.2 Elementary results of Generating Functions . . . 16
1.2.3 Convolutions . . . . . . . . . . . . . . . . . . . . 18
1.2.4 Compound distributions . . . . . . . . . . . . . . 20
2 Markov Chains & Modeling 23
2.1 Homogeneous Markov chains . . . . . . . . . . . . . . . . 24
2.2 Classication of States . . . . . . . . . . . . . . . . . . . 29
2.3 Markov Chain Decomposition . . . . . . . . . . . . . . . 33
2.4 Limiting probabilities & Stationary distributions . . . . . 36
2.5 Theory of stochastic matrix for MC . . . . . . . . . . . . 41
2.6 Spectral Theorem for Diagonalizable Matrices . . . . . . 45
2.7 Markov Chains with Absorbing States . . . . . . . . . . 48
2.7.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8 Chapter Review and Discussion . . . . . . . . . . . . . . 52
iii
iv CONTENTS
3 Random walks & Wiener process 55
3.1 Introduction to Random Walks . . . . . . . . . . . . . . 55
3.2 Random Walk- a mathematical realization . . . . . . . . 56
3.3 Wiener process . . . . . . . . . . . . . . . . . . . . . . . 60
4 Arrival-Type processes 63
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 The Bernoulli process . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Basic facts . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Random Variables Associated with the
Bernoulli Process . . . . . . . . . . . . . . . . . . 66
4.3 The Poisson process . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Poisson distribution . . . . . . . . . . . . . . . . . 66
4.3.2 Poisson process . . . . . . . . . . . . . . . . . . . 67
4.4 Course Review and Discussion . . . . . . . . . . . . . . . 68
5 Probability Modeling and Mathematical Finance 71
5.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1.2 Conditional expectation . . . . . . . . . . . . . . 72
5.1.3 Key properties of Conditional expectation . . . . 74
5.1.4 Filtration . . . . . . . . . . . . . . . . . . . . . . 75
5.1.5 Martingale . . . . . . . . . . . . . . . . . . . . . . 76
5.1.6 Martingale examples . . . . . . . . . . . . . . . . 78
5.1.7 Stopping time . . . . . . . . . . . . . . . . . . . . 81
5.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . 83
5.2.1 A Simple Model for Asset Prices . . . . . . . . . . 83
5.2.2 Stochastic dierential equation . . . . . . . . . . 83
6 Part III: Practical Applications of SP 85
6.1 Statistical Parameter Estimation . . . . . . . . . . . . . 85
6.2 Inventory Control in Logistics . . . . . . . . . . . . . . . 87
6.3 Epidemic processes . . . . . . . . . . . . . . . . . . . . . 89
6.4 Statistical Models in Risk Management . . . . . . . . . . 90
6.5 Optimization Methods for Portfolio Risk Management . . 91
2 CONTENTS
Introduction
We propose a few specic probabilistic techniques used in mathemati-
cally modeling complex phenomena in biology, service systems or nance
activities. These are aimed for graduates in Applied Mathematics and
Statistics.
The aims the course
introduces basic techniques of Stochastic Processes theory, including:

Markov chains and processes (discrete and continuous parame-
ters.)
Random walks, uctuation theory.
Stationary processes, spectral analysis.
Diusion processes.
Applications in nance and transportation.
The structure of the course. The course consists of three parts:
Part I: Motivated topics for studying Stochastic Processes
3
4 CONTENTS
Part II: Fundamental setting of Stochastic Processes
Part III: Connections and research projects
Part I: Motivated topics and Background
Service systems: mathematical model of queueing systems.

Introductory Stochastic Processes: basic concepts
Part II: Basic Stochastic Processes
We will discuss the followings:

Markov Chains and processes
Random walks and Wiener process
Arrival-Type processes
Martingale and Stochastic Calculus
Part III: New applications of SP
We investigate few following applications:

Statistical Models and Simulation in Risk Management
Mathematical and Statistical Model in Transportation Science
Motivated topics of SP
Service systems
Over the last few years the Processor Sharing scheme has attracted re-
newed attention as a convenient and ecient approach for studying band-
width sharing mechanisms such as TCP or any process requiring resource
sharing.
Understanding and computing such those processes to produce a high
performance system with limited resources is a very dicult task. Few
typical aspects of the resource allocation are:
1. the fact that many classes of jobs (clients) come in a system with
distinct rates demands a wise policy to get them through eciently,
2. measuring performance of a system through many dierent param-
eters (metrics) is hard, requires complex mathematical models.
Evolutionary Dynamics
Keywords: critial lineages, virus mutant,mutation, reproductive ratio,
invasion, escape, ecology, vaccine.
5
6 CONTENTS
Introductory Invasion and Escape. Some realistic biological phe-
nomina occur in nature such as: (a) a parasite infecting a new host, (b) a
species trying to invade a new ecological niche, (c) cancer cells escaping
from chemotherapy and, (d) viruses evading anti-microbial therapy.
Typical problems. Imagine a virus of one host species that is trans-
ferred to another host species (HIV, SARS). In the new host, the virus
has a basic reproductive ratio R less than one. Some mutation may
be required to generate a virus mutant attempting to invade the new
host that can lead to an epidemic in the new host species. Few crucial
concerns are:
1. how to calculate the probability that such an attempt succeeds?
2. suppose a successful and eective vaccine is found; but some mu-
tants can breakthrough the protective immunity of the vaccine.
How to calculate the probability that a virus quasispecies contains
an escape mutant that establishes an infection and thereby causes
vaccine failure?
Summary usage We call for a theory to calculate the probability
of non-extinction/ escape for lineages starting from single individuals.
Computing Software
OpenModelica, ScalaLab and R.
Introductory R a statistiscal language
R is a language and environment for statistical computing and graphics.
It is similar to the S language and environment which was developed
CONTENTS 7
at Bell Laboratories (formerly AT&T, now Lucent Technologies). The
R distribution contains functionality for a large number of statistical
procedures. Among these are:
linear and generalized
linear models, nonlinear regression models, time series analysis,
classical parametric and nonparametric tests ...
There is also a large set of functions which provide a exible graphical
environment for creating various kinds of data presentations.
One of Rs strengths is the ease with which well-designed publication-
quality plots can be produced, including mathematical symbols and for-
mulae where needed. Great care has been taken over the defaults for the
minor design choices in graphics, but the user retains full control. R is an
integrated suite of software facilities for data manipulation, calculation
and graphical display. It includes
* an eective data handling and storage facility,
* a suite of operators for calculations on arrays, in particular matrices,
* a large, coherent, integrated collection of intermediate tools for data
analysis,
* graphical facilities for data analysis and display
* a well-developed, simple and eective programming language which
includes conditionals, loops, user-dened recursive functions and input
and output facilities.
Note: most classical statistics and much of the latest methodology is
available for use with R, but users may need to be prepared to do a little
work to nd it.
8 CONTENTS
Chapter 1
Background
1.1 Introductory Stochastic Processes
The concept. A stochastics process is just a collection (usually in-
nite) of random variables, denoted X
t
or X(t); where parameter t often
represents time. State space of a stochastics process consists of all real-
izations x of X
t
, i.e. X
t
= x says the random process is in state x at time
t. Stochastics processes can be generally subdivided into four distinct
categories depending on whether t or X
t
are discrete or continuous:
1. Discrete processes: both are discrete, such as Bernoulli process (die
rolling) or Discrete Time Markov chains.
2. Continuous time discrete state processes: the state space of X
t
is
discrete and the index set, e.g. time set T of t is continuous, as an
interval of the reals R.
Poisson process the number of clients X(t) who has entered
9
10 CHAPTER 1. BACKGROUND
ACB from the time it opened until time t. X(t) will have the
Poisson distribution with the mean E[X(t)] = t ( being the
arrive rate).
Continuous time Markov chain.
Queuing process people not only enter but also leave the
bank, we need the distribution of service time (the time a
client spends in ACB).
3. Continuous processes: both X
t
and t are continuous, such as dif-
fusion process (Brownian motion).
4. Discrete time continuous state processes: X
t
is continuous and t is
discrete the so-called TIME SERIES such as
monthly uctuations of the ination rate of Vietnam,
daily uctuations of a stock market.
Examples
1. Discrete processes: random walk model consisting of positions X
t
of an object (drunkand) at time discrete time point t during 24
hours, whose directional distance from a particular point 0 is mea-
sured in integer units. Here T = {0, 1, 2, . . . , 24}.
2. Discrete time continuous processes: X
t
is the number of births in a
given population during time period [0, t]. Here T = R
+
= [0, )
and the state space is {0, 1, 2, . . . , } The sequence of failure times
of a machine is a specic instance.
1.1. INTRODUCTORY STOCHASTIC PROCESSES 11
3. Continuous processes: X
t
is population density at time t T =
R
+
= [0, ), and the state space of X
t
is R
+
.
4. TIME SERIES of daily uctuations of a stock market
What interesting characteristics of SP that we want to know?
We know a stochastic process is a mathematical model of a probabilistic
experiment that evolves in time and generates a sequence of numerical
values. Three interesting aspects of SP that we want to know:
(a) We tend to focus on the dependencies in the sequence of values
generated by the process. For example, how do future prices of a stock
depend on past values?
(b) We are often interested in long-term averages, involving the
entire sequence of generated values. For example, what is the fraction
of time that a machine is idle?
(c) We sometimes wish to characterize the likelihood or frequency
of certain boundary events. For example, what is the probability
that within a given hour all circuits of some telephone system become
simultaneously busy, or what is the frequency with which some buer in
a computer net- work overows with data?
Few fundamental properties and categories
1. STATIONARY property: A process is stationary when all the X(t)
have the same distribution. That means, for any ,
- the distribution of a stationary process will be unaected by a
shift in the time origin, and
- X(t) and X(t + ) will have the same distributions. For the
rst-order distribution,
F
X
(x; t) = F
X
(x; t +) = F
X
(x); and f
X
(x; t) = f
X
(x).
These processes are found in Arrival-Type Processes. For which,
we are interested in occurrences that have the character of an ar-
rival, such as message receptions at a receiver, job completions in
a manufacturing cell, customer purchases at a store, etc. We will
focus on models in which the interarrival times (the times between
successive arrivals) are independent random variables.
The case where arrivals occur in discrete time and the interarrival
times are geometrically distributed is the Bernoulli process.
The case where arrivals occur in continuous time and the inter-
arrival times are exponentially distributed is the Poisson process.
Bernoulli process and Poisson process will be investigated next.
2. MARKOVIAN (memory-less) property: Many processes with memory-
less property caused by experiments that evolve in time and in
which the future evolution exhibits a probabilistic dependence on
the past.
As an example, the future daily prices of a stock are typically
dependent on past prices. However, in a Markov process, we assume
a very special type of dependence: the next value depends on past
values only through the current value, that is X
i+1
depends only
on X
i
, and not on any previous values.
1.2. GENERATING FUNCTIONS 13
1.2 Generating Functions
1.2.1 Introduction
Probabilistic models often involve several random variables of interest.
For example, in a medical diagnosis context, the results of several tests
may be signicant, or in a networking context, the workloads of several
gateways may be of interest. All of these random variables are associated
with the same experiment, sample space, and probability law, and their
values may relate in interesting ways. Mathematically, a random variable
is a mapping!
Denition 1. A random variable X is a mapping (function) from a
sample space S to the reals R. For any j R, the preimage A :=
X
1
(j) = {w : X(w) = j} S is an event, then we understand
{X = j} = (A) =
wA
(w).
For nite set - sample space S then obviously
{X = j} = (A) =
|A|
|S|
.
A discrete random variable X is the one having nite range Range(X),
described by the probability point or mass distribution (pmf), deter-
mined by {X = j} = p
j
. We must have
p
j
0, and
j
p
j
= 1.
A continuous random variable X is the one having innite range Range(X),
described by the probability density distribution (pdf) f(x), that satises
f(t) 0, and
_
tRange(X)
f(t)dt = 1.
Generating functions are important in handling stochastic processes in-
volving integral-valued random variables.
Multiple random variables. We consider probabilities involving si-
multaneously the numerical values of several random variables and to
investigate their mutual couplings. In this section, we will extend the
concepts of pmf and expectation developed so far to multiple random
variables.
Consider two discrete random variables X, Y : S R associated with
the same experiment. The joint pmf of X and Y is dened by
p
X,Y
(x, y) = P(X = x, Y = y)
for all pairs of numerical values (x, y) that X and Y can take. We will use
the abbreviated notation P(X = x, Y = y) instead of the more precise
notations P({X = x} {Y = y}) or P({X = x} and {Y = y}). That is
P(X = x, Y = y) = P({X = x} {Y = y}) = P({X = x} and {Y = y}).
For the pair of random variables X, Y , we say
Denition 2. X and Y are independent if for all x, y R, we have
P(X = x, Y = y) = {X = x}{Y = y} p
X,Y
(x, y) = p
X
(x) p
Y
(y),
or in terms of conditional probability
({X = x}|{Y = y}) = {X = x}.
This can be extended to the so-called mutually independent of a nite
number n random variables.
Denition 3. The expectation operator denes the expected value of a
random variable X as
E(X) =
xRange(X)
{X = x} x
If we consider X is a function from a sample space S to the naturals
N, then
E(X) =
i=0
{X > i}.(Why?)
Functions of Multiple Random Variables. When there are multi-
ple random variables of interest, it is possible to generate new random
variables by considering functions involving several of these random vari-
ables. In particular, a function Z = g(X, Y ) of the random variables X
and Y denes another random variable. Its pmf can be calculated from
the joint pmf p
X,Y
according to
p
Z
(z) =
(x,y)|g(x,y)=z
p
X,Y
(x, y).
Furthermore, the expected value rule for functions naturally extends and
takes the form
E[g(X, Y )] =
(x,y)
g(x, y) p
X,Y
(x, y).
Theorem 4. We have two important results of expectation.
Linearity E(X +Y ) = E(X) +E(Y ) for any pair of random variables
X, Y
Independence E(X Y ) = E(X) E(Y ) for any pair of independent r.
v. X, Y
Mean, variance and moments of the probability distribution
{X = j} = p
j
m = E(X) =
j=0
j p
j
= P
(1) =
j=0
q
j
= Q(1) (why!?)
Recall that the variance of the probability distribution p
j
is
2
= E(X(X 1)) +E(X) [E(X)]
2
we need to know
E(X(X 1)) =
j=0
j(j 1) p
j
= P
(1) = 2Q
(1)?
Therefore,
2
=?
Exercise: Find the formula of the r-th factorial moment
[r]
= E(X(X 1)(X 2) (X r + 1))
1.2.2 Elementary results of Generating Functions
Suppose we have a sequence of real numbers a
0
, a
1
, a
2
, . . . Introducing
the dummy variable x, we may dene a function
A(x) = a
0
+a
1
x +a
2
x
2
+ =
j=0
a
j
x
j
. (1.2.1)
If the series converges in some real interval x
0
< x < x
0
, the func-
tion A(x) is called the generating function of the sequence {a
j
}.
Fact 1.1. If the sequence {a
j
} is bounded by some cosntant K, then
A(x) converges at least for |x| < 1 [Prove it!]
Fact 1.2. In case of the sequence {a
j
} represents probabilities, we intro-
duce the restriction
a
j
0,
j=0
a
j
= 1.
The corresponding function A(x) is then called a probability-generating
function. We consider the (point) probability distribution and the tail
probability of a random variable X, given by
{X = j} = p
j
, P{X > j} = q
j
,
then the usual distribution function is P{X j} = 1q
j
. The probability-
generating function now is
P(x) =
j=0
p
j
x
j
= E(x
j
), E is the expectation operator.
Also we can dene a generating function for the tail probabilities:
Q(x) =
j=0
q
j
x
j
.
Q(x) is not a probability-generating function, however.
Fact 1.3.
a/ P(1) =
j=0
p
j
1
j
= 1 and |P(x)|

j=0
|p
j
x
j
|

j=0
p
j

1 if |x| < 1. So P(x) is absolutely convergent at least for |x| 1.
b/ Q(x) is absolutely convergent at least for |x| < 1.
c/ Connection between P(x) and Q(x): (check this!)
(1 x)Q(x) = 1 P(x) or P(x) +Q(x) = 1 +xQ(x).
Finding a generating function from a recurrence: multiply both
sides by x
n
. For example, the Fibonacci sequence
f
n
= f
n1
+f
n2
= F(x) = x +xF(x) +x
2
F(x)
Finding a recurrence from a generating function: whenever you
know F(x), we nd its power series P, the coecicents of P before x
n
are
Fibonacci numbers. How? Just remember how to nd a partial fractions
expansion of F(x), in particular a basic expansion
1
1 x
= 1 +x +
2
x
2
+
In general, if G(x) is a generating function of a sequence (g
n
) then
G
(n)
(0) = n!g
n
1.2.3 Convolutions
Now we consider two nonnegative independent integral-valued random
variables X and Y , having the probability distributions
{X = j} = a
j
, P{Y = k} = b
k
. (1.2.2)
The joint probability of the event (X = j, Y = k) is a
j
b
k
obviously. We
form a new random variable S = X+Y , then the event S = r comprises
the mutually exclusive events
(X = 0, Y = r), (X = 1, Y = r 1), , (X = r, Y = 0).
Fact 1.4. The probability distribution of the sum S then is
{S = r} = c
r
= a
0
b
r
+a
1
b
r1
+ +a
r
b
0
.
Proof.
p
S
(r) = P(X+Y = r) =
(x,y):x+y=r
P(X = x and Y = y) ==
x
p
X
(x) p
Y
(rx)
This method of compounding two sequences of numbers (not necessarily
be probabilities) is called convolution. Notation
{c
j
} = {a
j
} {b
j
}
will be used.
Fact 1.5. Dene the generating functions of the sequence {a
j
}, {b
j
} and
{c
j
} by
A(x) =
j=0
a
j
x
j
, B(x) =
j=0
b
j
x
j
, C(x) =
j=0
c
j
x
j
,
it follows that C(x) = A(x)B(x). [check this!]
In practical applications, the sum of several independent integral-
valued random variables X
i
can be dened
S
n
= X
1
+X
2
+ +X
n
, n Z
+
.
If the X
i
have a common probability distribution given by p
j
, with
probability-generating function P(x), then the probability-generating
function of S
n
is P(x)
n
. Clearly, the n-fold convolution of S
n
is
{p
j
} {p
j
} {p
j
} (n factors) = {p
j
}
n
.
1.2.4 Compound distributions
In our discussion so far of sums of random variables, we have always
assumed that the number of variables in the sum is known and xed, i.e.,
it is nonrandom. We now generalize the previous concept of convolution
to the case where the number N of random variables X
k
contributing to
the sum is itself a random variable! In particular, we consider the sum
S
N
= X
1
+X
2
+ +X
N
, where
{X
k
= j} = f
j
,
{N = n} = g
n
,
{S
N
= l} = h
l
.
(1.2.3)
Probability-generating functions of X, N and S are
F(x) =
f
j
x
j
,
G(x) =
g
n
x
n
,
H(x) =
h
l
x
l
.
(1.2.4)
Compute H(x) with respect to F(x) and G(x). Prove that
H(x) = G(F(x)).
Example 1.1. A remote village has three gas stations, and each one
of them is open on any given day with probability 1/2, independently of
the others. The amount of gas available in each gas station is unknown
and is uniformly distributed between 0 and 1000 gallons. We wish to
characterize the distribution of the total amount of gas available at the
gas stations that are open.
The number N of open gas stations is a binomial random variable
with p = 1/2 and the corresponding transform is
G
N
(x) = (1 p +pe
x
)
3
=
1
8
(1 +e
x
)
3
.
The transform (probability-generating function) F
X
(x) associated with
the amount of gas available in an open gas station is
F
X
(x) =
e
1000x
1
1000x
.
The transform H
S
(x) associated with the total amount S of gas avail-
able at the three gas stations of the village that are open is the same as
G
N
(x), except that each occurrence of e
x
is replaced with F
X
(x), i.e.,
H
S
(x) = G(F(x)) =
1
8
(1 +F
X
(x))
3
.
Next chapter will discuss Fundamental Stochastic Processes.

Chapter 2
Markov Chains & Modeling
We discuss the concept of discrete time Markov Chain or just Markov
Chains (MC) in this section. Suppose we have a sequence M of consec-
utive trials, numbered n = 0, 1, 2, . The outcome of the nth trial is
represented by the random variable X
n
, which we assume to be discrete
and to take one of the values j in a nite set Q of discrete outcomes/states
{e
1
, e
2
, e
3
, . . . , e
s
}.
M is called a (discrete time) Markov chain if, while occupying Q states
at each of the unit time points 0, 1, 2, 3, . . . , n1, n, n+1, . . ., M satises
the following property, called
Markov property or Memoryless property:

(X
n+1
= j|X
n
= i, , X
0
= a) = (X
n+1
= j|X
n
= i),
for all n = 0, 1, 2, .
23
24 CHAPTER 2. MARKOV CHAINS & MODELING
(In each time step n to n + 1, the process can stay at the same state e
i
(at both n, n + 1) or move to other state e
j
(at n + 1) with respect to
the memoryless rule, saying the future behavior of system depends only
on the present and not on its past history.)
Denition 5 (One-step transition probability).
Denote the absolute probability of outcome j at the nth trial by
p
j
(n) = (X
n
= j) (2.0.1)
The one-step transition probability, denoted
p
ij
(n + 1) = (X
n+1
= j|X
n
= i),
dened as the conditional probability that the process is in state j at time
n +1 given that the process was in state i at the previous time n, for all
i, j Q.
2.1 Homogeneous Markov chains
If the state transition probabilities p
ij
(n + 1) in a Markov chain M is
independent of time n, they are said to be stationary, time homogeneous
or just homogeneous. The state transition probability in homogeneous
chain then can be written without mention time point n:
p
ij
= (X
n+1
= j|X
n
= i). (2.1.1)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 2.1.1
2.1. HOMOGENEOUS MARKOV CHAINS 25
of these Markov chains must satisfy:
s
j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practice, we are likely given the ini-
tial distribution (the probability distribution of starting position of the
concerned object at time point 0), and the transition probabilities; and
we want to determine the the probability distribution of position X
n
for
any time point n > 0. The Markov property, quantitatively described
through transition probabilities, is represented in the state transition
matrix P = [p
ij
]:
P =
_
_
p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
(2.1.2)
Briey, we have
Denition 6. A (homogeneous) Markov chain M is a triple (Q, p, A)
in which:
Q is a nite set of states (be identied with an alphabet ),
p(0) are initial probabilities, (at initial time point n = 0)
P are state transition probabilities, denoted by a matrix P = [p
ij
]
in which
p
ij
= (X
n+1
= j|X
n
= i)
.
And such that the memoryless property is satised,ie.,
(X
n+1
= j|X
n
= i, , X
0
= a) = (X
n+1
= j|X
n
= i), for all n.
In practice, the initial probabilities p(0) is obtained at the current
time (begining of a research), and the transition probability matrix P is
found from empirical observations in the past. In most cases, the major
concern is using P and p(0) to predict future.
Example 2.1. The Coopmart chain (denoted C) in SG currently con-
trols 60% of the daily processed-food market, their rivals Maximart and
other brands (denoted M) takes the other share. Data from the previous
years (2006 and 2007) show that 88% of Cs customers remained loyal
to C, while 12% switched to rival brands. In addition, 85% of Ms cus-
tomers remained loyal to M, while other 15% switched to C. Assuming
that these trends continue, determine Cs share of the market (a) in 5
years and (b) over the long run.
Proposed solution. Suppose that the brand attraction is time homoge-
neous, for a sample of large enough size n, we denote the customers
attention in the year n by a random variable X
n
. The market share
probability of the whole population then can be approximated by using
the sample statistics, e.g.
P(X
n
= C) =
|{x : X
n
(x) = C}|
n
, and P(X
n
= M) = 1 P(X
n
= C).
Set n = 0 for the current time, the initial probabilities then is
p(0) = [0.6, 0.4] = [P(X
0
= C), P(X
0
= M)].
2.1. HOMOGENEOUS MARKOV CHAINS 27
Obviously we want to know the market share probabilities p(n) = [P(X
n
=
C), P(X
n
= M)] at any year n > 0. We now introduce a transition prob-
ability matrix with labels on rows and columns to be C and M
P =
_
_
C M

C 0.88 0.12
M 0.15 0.85
_
_
=
_
_
1 a = 0.88 a = 0.12
b = 0.15 1 b = 0.85
_
_
, =
_
_
0.88 0.12
0.15 0.85
_
_
,
(2.1.3)
where a = p
CM
= P[X
n+1
= M|X
n
= C], b = p
MC
= P[X
n+1
= C|X
n
=
M].
Higher-order transition probabilities.

The aim: nd the absolute probabilities at any stage n. We write
p
(n)
ij
= (X
n+m
= j|X
m
= i), with p
(1)
ij
= p
ij
(2.1.4)
for the n-step transition probability, being dependent of m N, see
Equation 2.1.1. The n-step transition matrix is denoted as P
(n)
= (p
(n)
ij
).
For the case n = 0, we have
p
(0)
ij
=
ij
= 1 if i = j, and i = j.
Chapman Komopgorov equations. Chapman Komopgorov equations re-
late the n-step transition probabilities and k-step and n k-step transi-
tion probabilities:
p
(n)
ij
=
s
h=1
p
(nk)
ih
p
(k)
hj
, 0 < k < n.
This results in the matrix notation
P
(n)
= P
(nk)
P
(k)
.
Since P
(1)
= P, we get P
(2)
= P
2
, and in general P
(n)
= P
n
.
Let p
(n)
denote the vector form of probability mass distribution (pmf or
absolute probability distribution) associated with X
n
of a Markov process,
that is
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)],
where each p
i
(n) is dened as in 2.0.1.
Proposition 7. The absolute probability distribution p
(n)
at any stage n
of a Markov chain is given in the matrix form
p
(n)
= P
n
p
(0)
, where p
(0)
= p is the initial probability vector. (2.1.5)
Proof. We employ two facts:
* P
(n)
= P
n
, and
* the absolute probability distribution p
(n+1)
at any stage n+1 (asso-
ciated with X
n+1
) can be found by the 1-step transition matrix P = [p
ij
]
and the distribution
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)]
at any stage n (associated with X
n
):
p
j
(n + 1) =
s
i=1
p
ij
p
i
(n), or in the matrix notation p
(n+1)
= P p
(n)
.
Then just do the induction p
(n+1)
= P p
(n)
= P P, p
(n1)
= =
P
n+1
p
(0)
.
Example 2.2 (The Coopmart chain: cont. ). (a/) Cs share of the
market in 5 years can be computed by
p
(5)
= [p
C
(5), p
M
(5)] = P
5
p
(0)
.
2.2. CLASSIFICATION OF STATES 29
Practical Problem 1. A state transition diagram of a nite-state Markov

chain is a line diagram with a vertex corresponding to each state and a
directed line between two vertices i and j if p
ij
> 0. In such a diagram,
if one can move from i and j by a path following the arrows, then i j.
The diagram is useful to determine whether a nite-state Markov
chain is irreducible or not, or to check for periodicities.
Draw the state transition diagrams and classify the states of the MCs
with the following transition probability matrices:
P
1
=
_
_
0 0.5 0.5
0.5 0 0.5
0.5 0.5 0
_
_
; P
2
=
_
_
0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0
_
_
; P
3
=
_
_
0.3 0.4 0 0 0.3
0 1 0 0 0
0 0 0 0.6 0.4
0 0 1 0 0
_
_
2.2 Classication of States
A) Accessible states.
State j is said to be accessible from state i if for some N 0, p
(N)
ij
> 0,
and we write i j. Two states i and j accessible to each other are said
to communicate, and we write i j. If all states communicate with
each other, then we say that the Markov chain is irreducible. Formally,
irreducibility means
i, j Q : N 0[p
(N)
ij
> 0].
B) Recurrent/persistence states and Transient states.
Let A(i) be the set of states that are accessible from i. We say that
i is recurrent if from any future state, there is always some probability
of returning to i and, given enough time, this is certain to happen. By
repeating this argument, if a recurrent state is visited once, it will be
revisited an innite number of times.
A state is called transient if it is not recurrent. In particular, there are
states j A(i) such that i is not accessible from j. After each visit to
state i, there is positive probability that the state enters such a j. Given
enough time, this will happen, and state i cannot be visited after that.
Thus, a transient state will only be visited a nite number of times.
We now formalize concepts of recurrent/persistence state and transient
state.
Let the rst return time T
j
indicate the rst time or the number of steps
the chain is rstly at state j after leaving j after time 0 (if j is never
reached then set T
j
= ) It is a discrete r.v., taking values in {1, 2, 3, ...}.
For any two states i = j and n > 0, let f
n
i,j
be the conditional probability
the chain is rstly at state j after n steps given it was at state i at time
0:
f
n
i,j
:= P[T
j
= n|X
0
= i] = P[X
n
= j, X
k
= j, k = 1, 2, ..., n 1|X
0
= i]
and f
0
i,j
= 0 since T
j
1. Then clearly
f
1
i,j
= P[X
1
= j|X
0
= i] = p
i,j
2.2. CLASSIFICATION OF STATES 31
and, by addition rule and Bellmans optimal principle:
f
n
i,j
=
k=j
p
i,k
f
n1
k,j
, for n = 2, 3, ...
Note that this probability is still valid when j = i:
f
n
j,j
=
k=j
p
j,k
f
n1
k,j
.
The probability of visiting j in nite time (number of steps), starting
from i, is given by
f
i,j
= P[T
j
< |X
0
= i] =
n=0
f
n
i,j
.
Denition 8.
State j is recurrent if
f
j,j
= P[T
j
< |X
0
= j] = 1,
i.e., starting from the state, the process is guaranteed to return to
the state again and again, in fact, innitely many times.
A recurrent state j is said to be positive recurrent if, starting from
the state, the expected number of transitions until the chain return
to the state is nite:
E[T
j
|X
0
= j] =
n=0
nf
n
j,j
< .
State j said to be null recurrent if
E[T
j
|X
0
= j] = .
State j is said to be transient (or nonrecurrent) if
f
j,j
= P[T
j
< |X
0
= j] < 1.
In this case there is positive probability 1 f
j,j
of never returning
to state j.
Practical Problem 2. Try to prove the following key facts of MC.
1. Show that in a nite-state Markov chain, not all states can be tran-
sient, in other words at least one of the states must be recurrent
2. Show that if P is a Markov matrix, then P
n
is also a Markov matrix
for any positive integer n.
3. Verify the transitivity property of Markov chains, that is, if i
j and j k, then i k. (Hint: use Chapman Komopgorov
equations).
Practice. Consider a Markov chain with state space {0, 1} and transition
probability matrix
P =
_
_
1 0
1/2 1/2
_
_
i/ Show that state 0 is recurrent.
ii/ Show that state 1 is transient.
2.3. MARKOV CHAIN DECOMPOSITION 33
2.3 Markov Chain Decomposition
Fact 2.1. In any Markov Chain, the followings are correct.
It can be decomposed into one or more recurrent classes or equiv-
alent classes, plus possibly some transient states. Each equivalent
class contains those states that communicate with each other.
A recurrent state is accessible from all states in its class, but is not
accessible from recurrent states in other classes;
A transient state is not accessible from any recurrent state. But,
at least one, possibly more, recurrent states are accessible from a
given transient state.
For the purpose of understanding long-term behavior of Markov
chain, it is important to analyze chains that consist of a single recur-
rent class. Such Markov chain is called irreducible chain.
For the purpose of understanding short-term behavior, it is also
important to analyze the mechanism by which any particular class of
recurrent states is entered starting from a given transient state.
C) Periodic states.
In a nite Markov Chain M = (Q, , P) (i.e. having nite number
of states), a periodic state i is state to which an agent could go back at
positive integer time points t
0
, 2t
0
, 3t
0
, . . . (multiple of an integer period
t
0
> 1). t
0
is named the period of i, being the greatest common divisor
of the integers {t > 0 : p
(t)
i,i
> 0}.
A Markov Chain is aperiodic if there is no such periodic state, in other
words, if the period of each state i Q is 1.
For example, we could check if a MC has the transition matrix
P =
_
_
0 0 0.6 0.4
0 0 0.3 0.7
0.5 0.5 0 0
0.2 0.8 0 0
_
_
;
then it is periodic. Indeed, if the Markovian random variable (agent)
starts at time 0 in state E
1
, then at time 1 it must be in state E
3
or E
4
,
at time 2 it must be in state E
1
or E
2
. Therefore, it generaly can visit
only E
1
at times 2,4,6, ... Summarizing we have
Denition 9. A nite Markov chain M = (Q, , P) is
1. irreducible i it has only one single recurrent class, or any state
can be accessible from all other states.
2. aperiodic i the period of each state i Q is 1; or it has no
periodic state.
3. ergodic if it is positive recurrent and aperiodic.
It can be shown that recurrence, transientness, and periodicity are all
class properties; that is, if state i is recurrent (positive recurrent, null
recurrent, transient, periodic), then all other states in the same class of
state i inherit the same property.
D) Absorbing states and Absorption probabilities.
2.3. MARKOV CHAIN DECOMPOSITION 35
State j is said to be an absorbing state if p
jj
= 1; that is, once state
j is reached, it is never left.
If there is a unique absorbing state k, its steady-state probability is
1 (because all other states are transient and have zero steady-state
probability), and will be reached with probability 1, starting from
any initial state.
If there are multiple absorbing states, the probability that one
of them will be eventually reached is still 1, but the identity of
the absorbing state to be entered is random and the associated
probabilities may depend on the starting state.
Can we determine precisely absorption probabilities for all the ab-
sorbing states in a MC in the generic case?
Consider a Markov chain X(n) = {X
n
, n 0} with nite state space
E = {1, 2, , N} and transition probability matrix P.
Theorem 10. Let A = {1, , m} be the set of absorbing states and
B = {m+ 1, , N} be a set of nonabsorbing states.
Then the transition probability matrix P can be expressed as
P =
_
I O
R Q
_
where I is mm identity matrix, 0 is an m(Nm) zero matrix,
and the elements of R are the one-step transition probabilities from
nonabsorbing to absorbing states, and the elements of Q are the
one-step transition probabilities among the nonabsorbing states.
Let U = [u
k,j
] be an (N m) m matrix and its elements are the
absorption probabilities for the various absorbing states,
u
k,j
= P[X
n
= j( A)|X
0
= k( B)]
We have
U = (I Q)
1
R = R,
is called the fundamental matrix of the Markov chain X(n).
2.4 Limiting probabilities & Stationary dis-
tributions
From now on we assume that all MCs are nite, aperiodic and irre-
ducible. The irreducibility assumption implies that any state can even-
tually be reached from any other state. Both irreducibility and aperiod-
icity assumptions hold for essentially all practical applications of MCs
(in bioinformatics,...) except for the case of MCs with absorbing states.
Denition 11. Vector p
= (p
1
, p
2
, , p
s
) is called the stationary dis-
tribution of a Markov chain {X
n
, n 0} with the state transition matrix
P if:
p
P = p
.
This equation indicates that a stationary distribution p
is a left eigen-
vector of P with eigenvalue 1. In general, we wish to know limiting
probabilities p
from taking n in the equation

p
()
= P
p
(0)
.
2.4. LIMITINGPROBABILITIES &STATIONARYDISTRIBUTIONS37
We need some general results to determine the stationary distribution
p
and limiting probabilities p
of a Markov chain. For a specic class

of MCs as follows, there exist stationary distribution.
Lemma 12. If M = (Q, , P) is a nite, aperiodic and irreducible
Markov chain, then some power of P is strictly positive.
See a proof in [7], page 79. Such matrices P (that there exists a
natural m such that P
m
> 0) are called regular matrices.
Theorem 13. [Equilibrium distribution] Given a nite, aperiodic and
irreducible Markov chain M = (Q, , P), where Q consists of s states.
Then there exist stationary probabilities
p
i
:= lim
t
p
i
(t),
where the p
i
form a unique solution to the conditions:

s
i=1
p
i
= 1; where each p
i
0;
p
j
=
s
i=1
p
i
p
i,j
.
See the proof in Theorem 19. We discuss here two particular cases
when s = 2 and s > 2.
A) Markov chains that have two states.
At rst we investigate the case of Markov chains that have two states, say
Q = {e
1
, e
2
}. Let a = p
e
1
e
2
and b = p
e
2
e
1
the state transition probabilities
between distinct states in a two state Markov chain, its state transition
matrix is
P =
_
_
p
11
p
21
p
12
p
22
_
_
=
_
_
1 a a
b 1 b
_
_
, where 0 < a < 1, 0 < b < 1.
(2.4.1)
Proposition 14.
a) The n-step transition probability matrix is given by
P
(n)
= P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
+ (1 a b)
n
_
_
a a
b b
_
_
_
_
b) Find the limit matrix when n .
To prove this basic Proposition 14 (computing transition probability
matrix of two state Markov chains), we use a fundamental result of Linear
Algebra that is recalled in Subsection 2.6.
Proof. The eigenvalues of the state transition matrix P found by solving
equation
c() = |I P| = 0
are
1
= 1 and
2
= 1 a b. The spectral decomposition of square
matrix says P can be decomposed into two constituent matrices E
1
, E
2
(since only two eigenvalues was found):
E
1
=
1
2
[P
2
I], E
2
=
1
1
[P
1
I].
That means, E
1
, E
2
are orthogonal matrices, i.e. E
1
E
2
= 0 = E
2
E
1
,
and
P =
1
E
1
+
2
E
2
; E
2
1
= E
1
, E
2
2
= E
2
.
Hence,
P
n
=
n
1
E
1
+
n
2
E
2
= E
1
+ (1 a b)
n
E
2
,
or
2.4. LIMITINGPROBABILITIES &STATIONARYDISTRIBUTIONS39
P
(n)
= P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
+ (1 a b)
n
_
_
a a
b b
_
_
_
_
b) The limit matrix when n :
lim
n
P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
_
_
B) Markov chains that have more than two states.
For s > 2, it is cumbersome to compute constituent matrices E
i
of P,
we could employ the so-called regular property.
Denition 15. Markov chains are regular if there exists m N such
that
P
(m)
= P
m
> 0
(i.e. every matrix entry is positive).
In summary, in a DTMC M that have more than two states, we have 4
cases:
Fact 2.2.
1. M has irreducible, positive recurrent, but periodic states. The
component
i
of the stationary distribution vector must be un-
derstood as the long-run proportion of time that the process is in
state i.
2. M has several closed, positive recurrent classes. In this case, the
transition matrix of the DTMC takes the block form.
In contrast to the irreducible ergodic DTMC, where the limiting
distribution is independent of the initial state, the DTMC with sev-
eral closed, positive recurrent classes has the limiting distribution
that is dependent on the initial state.
3. M has both recurrent and transient classes. In this situation, we
often seek the probabilities that the chain is eventually absorbed
by dierent recurrent classes. See the well-known gamblers ruin
problem.
4. M is an irreducible DTMC with null recurrent or transient states.
This case is only possible when the state space is innite, since any
nite-state, irreducible DTMC must be positive recurrent. In this
case, neither the limiting distribution nor the stationary distribu-
tion exists.
A well-known example of this case is the random walk model.
Practical Problem 3. Consider a Markov chain with state space {0, 1, 2}

and transition probability matrix
P =
_
_
0 0.5 0.5
1 0 0
1 0 0
_
_
;
Show that state 0 is periodic with period 2.
2.5. THEORY OF STOCHASTIC MATRIX FOR MC 41
Practical Problem 4 (The Gamblers Ruin problem). Let two gam-
blers, A and B, initially have k dollars and m dollars, respectively. Sup-
pose that at each round of their game, A wins one dollar from B with
probability p and loses one dollar to B with probability q = 1 p. As-
sume that A and B play until one of them has no money left. Let X
n
be
As capital after round n, where n = 0, 1, 2, and X
0
= k.
(a) Show that X(n) = {X
n
, n 0} is a Markov chain with absorbing
states.
(b) Find its transition probability matrix P. Realize P when p = q =
1/2 and N = 4
(c*) What is the probability of As losing all his money?
2.5 Theory of stochastic matrix for MC
A stochastic matrix is a matrix for which each column sum equals one.
If the row sums also equal one, the matrix is called doubly stochastic.
Hence the transition probability matrix P = [p
ij
] is a stochastic matrix.
Proposition 16. Every stochastic matrix K has
1 as an eigenvalue (possibly with multiple), and
none of the eigenvalues exceeds 1 in absolute value, that is all eigen-
values
i
satisfy |
i
| 1.
Proof.
The spectral radius (K) of any square K is dened as
(K) = max
i
{ eigen values
i
}.
When K is stochastic, (K) = 1. Note that if P is a transition matrix
for a nite-state Markov chain, (then P is stochastic) the multiplicity
of the eigenvalue (K) = 1 is equal to the number of recurrent classes
associated with P .
Fact 2.3. If K is a stochastic matrix then K
m
is a stochastic matrix.
Proof. Let e = [1, 1, , 1]
t
the all-one vector, then use the fact that
Ke = e. Prove that K
m
e = e.
Let A = [a
ij
] > 0 denote that every element a
ij
of A satises the
condition a
ij
> 0.
Denition 17.
A stochastic matrix P = [p
ij
] is ergodic if lim
m
P
m
= L (say)
exists, that is each p
(m)
ij
has a limit when m .
A stochastic matrix P is regular if there exists a natural m such
that P
m
> 0. In our context, a Markov chain, with transition
probability matrix P, is called regular if there exists an m > 0 such
that P
m
> 0, i.e. there is a nite positive integer m such that after
m time-steps, every state has a nonzero chance of being occupied,
no matter what the initial state.
Example 2.3. Is the matrix
P =
_
_
0.88 0.12
0.15 0.85
_
_
regular? ergodic? Calculate the limit matrix L = lim
m
P
m
.
2.5. THEORY OF STOCHASTIC MATRIX FOR MC 43
The limit matrix L = lim
m
P
m
practically shows the longterm be-
haviour (distribution, property) of the process. How to know the exis-
tence L (i.e. the ergodicity of transition matrix P = [p
ij
])?
Theorem 18. A stochastic matrix P = [p
ij
] is ergodic if and only if
* the only eigenvalue of modul (magnitude) 1 is 1 itself, and
* if = 1 has multiplicity k, there exist k independent eigenvectors
associated with this eigenvalue.
For a regular homogeneous Markov chain we have the following theorem
Theorem 19. [Regularity of stochastic matrix] If a stochastic matrix
P = [p
ij
] is regular then
1. 1 is an eigenvalue of multiplicity one, and all other eigenvalues
i
satisfy |
i
| < 1;
2. P is ergodic, that is lim
m
P
m
= L exists. Furthermore, Ls rows
are identical and equal to the stationary distribution p
.
Proof. If (1) is proved then, by Theorem 18, P = [p
ij
] is ergodic. Hence,
when P = [p
ij
] is regular, the limit matrix L = lim
m
P
m
does exist.
By the Spectral Decomposition (2.6.1),
P = E
1
+
2
E
2
+ +
k
E
k
, where all |
i
| < 1, i = 2, , k.
Then, by (2.6.2)
L = lim
m
P
m
= lim
m
(E
1
+
m
2
E
2
+ +
m
k
E
k
) = E
1
.
Let vector p
be the unique left eigenvector associating with the biggest

eigenvalue
1
= 1 (which is simple eigenvalue since it has multiplicity
one), that is
p
P = p
(P 1I) = 0,
(p
is called a stationary distribution of MC). Your nal task is proving

that Ls rows are identical and equal to the stationary distribution p
i.e.: L = [p
, , p
.
Corollary 20. Few important remarks are: (a) for regular MC, the
long-term behavior does not depend on the initial state distribution prob-
abilities p(0); (b) in general, the limiting distributions are inuenced by
the initial distributions p(0), whenever the stochastic matrix P = [p
ij
] is
ergodic but not regular. (See more at problem D).
Example 2.4. Consider a Markov chain with two states and transition
probability matrix
_
_
3/4 1/4
1/2 1/2
_
_
(a) Find the stationary distribution p of the chain. (b) Find lim
n
P
n
by rst evaluating P
n
. (c) Find lim
n
P
n
.
2.6. SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES45
2.6 Spectral Theorem for Diagonalizable
Matrices
Consider a square matrix P of order s with spectrum(P) = {
1
,
2
, ,
k
}
consisting of its eigenvalues. Then:
If {(
1
, x
1
), (
2
, x
2
), , (
k
, x
k
)} are eigenpairs for P, then S =
{x
1
, , x
k
} is a linearly independent set. If B
i
is a basis for the
null space N(P
i
I), then B = B
1
B
2
B
k
is a linearly
independent set
P is diagonalizable if and only if P possesses a complete set of
eigenvectors (i.e. a set of s linearly independent vectors). More-
over, H
1
PH = D = (
1
,
2
, ,
s
) if and only if the columns
of H constitute a complete set of eigenvectors and the
j
s are the
associated eigenvalues- i.e., each (
j
, H[, j]) is an eigenpair for P.
Spectral Theorem for Diagonalizable Matrices. A square matrix
P of order s with spectrum (P) = {
1
,
2
, ,
k
} consisting of eigen-
values is diagonalizable if and only if there exist constituent matrices
{E
1
, E
2
, , E
k
} (called the spectral set) such that
P =
1
E
1
+
2
E
2
+ +
k
E
k
, (2.6.1)
where the E
i
s have the following properties:
E
i
E
j
= 0 whenever i = j, and E
2
i
= E
i
for all i = 1..k
E
1
+E
2
+ +E
k
= I
In practice we employ Fact 2.6.1 in two ways:
Way 1: if we know the decomposition 2.6.1 explicitly, then we can
compute powers
P
m
=
m
1
E
1
+
m
2
E
2
+ +
m
k
E
k
, for any integer m > 0. (2.6.2)
Way 2: if we know P is diagonalizable then we nd the constituent
matrices E
i
by:
* nding the nonsingular matrix H = (x
1
|x
2
| |x
k
), where each x
i
is a basis left eigenvector of the null subspace
N(P
i
I) = {v : (P
i
I)(v) = 0 Pv =
i
v};
** then, P = HDH
1
= (x
1
|x
2
| |x
k
) D H
1
where
D = diag(
1
, ,
k
) the diagonal matrix, and
H
1
= K
=
_
_
y
t
1
y
t
2
.
.
.
y
t
k
_
_
; (i.e.K = (y
1
|y
2
| |y
k
)).
Here each y
i
is a basis right eigenvector of the null subspace
N(P
i
I) = {v : v
P =
i
v
}.
The constituent matrices E
i
= x
i
y
t
i
.
Example 2.5. Diagonalize the following matrix and provide its spectral
decomposition.
P =
_
_
1 4 4
8 11 8
8 8 5
_
_
.
2.6. SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES47
The characteristic equation is
p() = det(P I) =
3
+ 5
2
+ 3 9 = 0.
So = 1 is a simple eigenvalue, and = 3 is repeated twice (its
algebraic multiplicity is 2). Any set of vectors x satisfying
x N(P I) (P I)x = 0
can be taken as a basis of the eigenspace (or null space) N(P I).
Bases of for the eigenspaces are:
N(P1I) = span
_
[1, 2, 2]
_
; and N(P+3I) = span
_
[1, 1, 0]
, [1, 0, 1]
_
.
Easy to check that these three eigenvectors x
i
form a linearly indepen-
dent set, then P is diagonalizable. The nonsingular matrix (also called
similarity transformation matrix)
H = (x
1
|x
2
|x
3
) =
_
_
1 1 1
2 1 0
2 0 1
_
_
;
will diagonalize P, and since P = HDH
1
we have
H
1
PH = D = (
1
,
2
,
2
) = (1, 3, 3) =
_
_
1 0 0
0 3 0
0 0 3
_
_
Here, H
1
=
_
_
1 1 1
2 3 2
2 2 1
_
_
implies that
y
t
1
= [1, 1, 1], y
t
2
= [2, 3, 2], y
t
3
= [2, 2, 1]. Therefore, the con-
stituent matrices
E
1
= x
1
y
t
1
=
_
_
1 1 1
2 2 2
2 2 2
_
_
; E
2
= x
2
y
t
2
=
_
_
2 3 2
2 3 2
0 0 0
_
_
; E
3
= x
3
y
t
3
=
_
_
2 2 1
0 0 0
2 2 1
_
_
.
Obviously,
P =
1
E
1
+
2
E
2
+
3
E
3
=
_
_
1 4 4
8 11 8
8 8 5
_
_
.
2.7 Markov Chains with Absorbing States
2.7.1 Theory
Two quetions:
/ if there are at least two absorbing states, what is the probability
that a specic absorbing state is the one eventually entered?
/ what is the mean time until an absorbing state is eventually en-
tered?
Question . The probability that a specic absorbing state is the one
eventually entered.
Theorem 21. Consider a Markov chain X(n) = {X
n
, n 0} with nite
state space E = {1, 2, , N} and transition probability matrix P. Let
A = {1, , m} be the set of absorbing states and B = {m + 1, , N}
be a set of nonabsorbing states.
2.7. MARKOV CHAINS WITH ABSORBING STATES 49
Then the transition probability matrix P can be expressed as
P =
_
I O
R T
_
where
I is mm identity matrix,
0 is an m(N m) zero matrix,
the elements of R are the one-step transition probabilities from non-
absorbing to absorbing states, and
the elements of T are the one-step transition probabilities among
the nonabsorbing states.
Let U = [u
k,j
] be an (N m) m matrix and its elements are the
absorption probabilities for the various absorbing states,
u
k,j
= P[X
n
= j( A)|X
0
= k( B)]
We have
U = (I Q)
1
R = R,
is called the fundamental matrix of the Markov chain X(n).
Hint: Form the power
P
n
=
_
I O
U
n
T
n
_
where U
n
= f(R, T) a matrix expression of R and T. Then to prove that
lim
n
T
n
= O
we could equivalently check that absorption of X(n) in one or another
of the absorbing states is certain. Formally, you could prove
Lemma 22.
lim
n
P[X
n
B] = 0 or lim
n
P[X
n
A] = 1.
Question . The mean time until an absorbing state is eventually

entered.
Let T
k
denote the total time units (or steps) to absorption from state
k (meaning X
0
= k), where k = m+ 1..N. Let
T = [T
m+1
, T
m+2
, , T
N
]
Then it can be shown that the mean time E(T
k
) to absorption from
state k
E(T
k
) =
N
i=m+1
[k, i]
where [k, i] the (k, i)th element of the fundamental matrix .
Proof. Let

W = [n
j,k
] ,where n
j,k
is the number of times the state k( B)
is occupied until absorption takes place when X
n
starts in state j( B).
Then
T
j
=
N
k=m+1
n
j,k
,
then calculate E(n
j,k
).
2.7. MARKOV CHAINS WITH ABSORBING STATES 51
Example 2.6. Consider a simple random walk X(n) with absorbing bar-
riers at state 0 and state N = 3 = m
A
+ m
B
as in the Gamblers Ruin
problem; where m
A
= 2USD is A capital and m
B
= 1USD is B capital
at round 0. Can you write out
a/ the transition probability matrix P, known that p = P[ A wins ] in
each round, where 0 < p < 1;
b/ the probabilities of absorption into states 0 and 3;
c/ the expected time (or steps) to absorption when X
0
= 1 and when
X
0
= 2.
Hint: the fundamental matrix of the Markov chain X(n)
= (I Q)
1
=
1
1 pq
_
_
1 p
q 1
_
_
The matrix of absorption probabilities from the various non-absorbing
states
U =
_
_
u
1,0
u
1,3
u
2,0
u
2,3
_
_
= R =
1
1 pq
_
_
q p
2
q
2
p
_
_
Example 2.7 (Alignment of two DNA words).
2.8 Chapter Review and Discussion
Application in Large Deviation theory. We are interested in a
practical situation in insurance industry, originally realized from 1932
by F. Esscher, (Notices of AMS, Feb 2008).
Problem: too many claims could be made against the insurance com-
pany, we worry about the total claim amount exceeding the reserve fund
set aside for paying these claims.
Our aim: to compute the probability of this event.
Modeling. Each individual claim is a random variable, we assume
some distribution for it, and the total claim is then the sum S of a large
number of (independent or not) random variables. The probability that
this sum exceeds a certain reserve amount is the tail probability of the
sum S of independent random variables.
Large Deviation theory invented by Esscher requires the calculation of the
moment generating functions! If your random variables are independent
then the moment generating functions are the product of the individual
ones, but if they are not (like in a Markov chain) then there is no longer
just one moment generating function!
Research project: study Large Deviation theory to solve this problem.
Practical Problem 5 (Brand switching model for consumer behavior).

Suppose there are several brands of a product competing in a market
(for example, those brands might be competing brands of soft drinks).
Assume that every week a consumer buys one of the three brands, labeled
as 1, 2, and 3. In each week, a consumer may either buy the same
2.8. CHAPTER REVIEW AND DISCUSSION 53
brand he bought the previous week or switch to a dierent brand. A
consumers preference can be inuenced by many factors, such as brand
loyalty and brand pressure (i.e., a consumer is persuaded to purchase the
same brand). To gauge consumer behavior, sample surveys are frequently
conducted. Suppose that one of such surveys identies the following
consumer behavior:
Following week

Current week Brand 1 Brand 2 Brand 3
Brand 1 0.51 0.35 0.14
Brand 2 0.12 0.80 0.08
Brand 3 0.03 0.05 0.92
The market share of a brand during a period is dened as the
average proportion of people who buy the brand during the period. Our
questions are:
a/ What is the market share of a specic brand in a short run (say
in 3 months) or in a long run (say in 3 years)?
b/ How does repeat business, due to brand loyalty and brand pres-
sure, aect a companys market share and protability?
c/ What is the expected number of weeks that a consumer stays
with a particular brand?
Chapter 3
Random walks & Wiener
process
Random walks are special cases of Markov chain, thus can be studied by
Markov chain methods.
3.1 Introduction to Random Walks
We use random walks to supply the math base for BLAST. BLAST is a
procedure often employed in Biomatics that
searches for high-scoring local alignments between two sequences,
then tests for signicance of the scores found via P-values.
Example 3.1. Consider a simple case of the two aligned DNA sequences
ggagactgtagacagctaatgctata
gaacgccctagccacgagcccttatc
55
56 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
Suppose we give
- a score +1 if the two nucleotides in corresponding positions are the
same and
- a score -1 if they are dierent.
When we compare two sequences from left to right, the accumulated
score performs a random walk, or better a simple random walk in one
dimension. The following theory although mentions the generic case, but
we will use this example and BLAST as running example.
3.2 Random Walk- a mathematical real-
ization
Let Z
1
, Z
2
, be independent identically distributed r.v.s with
P(Z
n
= 1) = p and P(Z
n
= 1) = q = 1 p
for all n. Let
X
n
=
n
i=1
Z
i
, n = 1, 2, and X
0
= 0.
The collection of r.v.s {X
n
, n 0} is a random process, and it is called
the simple random walk in one dimension.
(a) Describe the simple random walk X(n).
(b) Construct a typical sample sequence (or realization) of X(n).
(c) Find the probability that X(n) = 2 after four steps.
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 57
(d) Verify the result of part (a) by enumerating all possible sample
sequences that lead to the value X(n) = 2 after four steps.
(e) Find the mean and variance of the simple random walk X(n). Find
the autocorrelation function R
X
(n, m) of the simple random walk
X(n).
(f) Show that the simple random walk X(n) is a Markov chain.
(g) Find its one-step transition probabilities.
(h) Derive the rst-order probability distribution of the random walk
X(n).
Solution.
(a) Describe the simple random walk. X(n) is a discrete-parameter (or
time), discrete-state random process. The state space is E = {..., 2, 1, 0, 1, 2, ...},
and the index parameter set is T = {0, 1, 2, ...}.
(b) Typical sample sequence. A sample sequence x(n) of a simple ran-
dom walk X(n) can be produced by tossing a coin every second and
letting x(n) increase by unity if a head H appears and decrease by unity
if a tail T appears. Thus, for instance, we have a small realization of
X(n) in Table 3.2:
The sample sequence x(n) obtained above is plotted in (n, x(n))-plane.
The simple random walk X(n) specied in this problem is said to be
unrestricted because there are no bounds on the possible values of X.
The simple random walk process is often used in Game Theory or
Biomatics.
n 0 1 2 3 4 5 6 7 8 9 10
Coin tossing H T T H H H T H H T
x
n
0 1 0 - 1 0 1 2 1 2 3 2
Table 3.1: Simple random walk from Coin tossing
Remark 3.1. We dene the ladder points to be the points in the walk
lower than any previously reached point. An excursion in a walk is the
part of the walk from a ladder point to the highest point attained before
the next ladder point.
BLAST theory focus on the maximum heights achieved by theses
excursions.
(c) The probability that X(n) = 2 after four steps.
We compute the rst-order probability distribution of the random walk
X(n):
p
n
(k) = P(X
n
= k), with boundary conditions p
0
(0) = 1, and p
n
(k) = 0 if n < |k|.
Thus n |k|. We nd that
p
n
(k) =
_
n
(n +k)/2
_
p
(n+k)/2
q
(n+k)/2
, where q = 1 p; (3.2.1)
by letting A, B to be the r.v.s indicating the number of +1 and -1, and
as a result
A +B = n, A B = X
n
.
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 59
When X(n) = k, we see that
A = (n +k)/2,
which is a binomial r.v. with parameters (n, p).
Conclude: the probability distribution of X(n) is given by 3.2.1, in which
n |k|, and n, k must be both even or odd.
Set k = 2 and n = 4 in 3.2.1 to get the concerned probability p
4
(2)
that X(4) = 2
(d) Verify the result of part (a) by enumerating all possible sample se-
quences that lead to the value X(n) = 2 after four steps. DIY!
(e) The mean and variance of the simple random walk X(n). Use the
fact
P(Z
n
= +1) = p and P(Z
n
= 1) = 1 p.

3.3 Wiener process
Counting process. A random process {X(t), t 0} is said to be a
counting process if X(t) represents the total number of events that
have occurred in the interval (0, t). From its denition, we see that for
a counting process, X(t) must satisfy the following conditions:
X(t) 0 and X(0) = 0.
X(t) is integer valued.
X(s) X(t) if s < t.
X(t)X(s) equals the number of events that have occurred on the
interval (s, t).
Independent increments and stationary increments. A counting
process X(t) is said to possess independent increments if the numbers of
events which occur in disjoint time intervals are independent.
A counting process X(t) is said to possess stationary increments if X(t+
h) X(s +h) (the number of events in the interval (s +h, t +h) has the
same distribution as X(t) X(s) (the number of events in the interval
(s, t)), for all s < t and h > 0.
Wiener process. A random process {X(t), t 0} is called a Wiener
process if
1. X(t) has stationary independent increments
2. The increment X(t) X(s) (t > s) is normally distributed
3.3. WIENER PROCESS 61
3. E[X(t)] = 0, and
4. X(0) = 0.
The Wiener process is also known as the Brownian motion process, since
it originates as a model for Brownian motion, the motion of particles
suspended in a uid.
Denition 23. A random process {X(t), t 0} is called a Wiener pro-
cess with drift coecient if
1. X(t) has stationary independent increments
2. X(t) is normally distributed with mean E[X(t)] = t, and
3. X(0) = 0.
Chapter 4
Arrival-Type processes
4.1 Introduction
In Stochastic processes, we are interested in few distinct properties:
(a) the dependencies in the sequence of values generated by the
process. For example, how do future prices of a stock depend on
past values?
(b) long-term averages, involving the entire sequence of generated
values. For example, what is the fraction of time that a machine
is idle?
(c) the likelihood or frequency of certain boundary events. For
example, what is the probability that within a given hour all cir-
cuits of some telephone system become simultaneously busy?
63
64 CHAPTER 4. ARRIVAL-TYPE PROCESSES
In this chapter, we will discuss the rst major category of stochastic
processes, Arrival-Type Processes. We are interested in occurrences that
have the character of an arrival, such as
- message receptions at a receiver,
- job completions in a manufacturing cell,
- customer purchases at a store, etc.
We will focus on models in which the interarrival times (the times be-
tween successive arrivals) are independent random variables.
First, we consider the case where arrivals occur in discrete time
and the interarrival times are geometrically distributed this is the
Bernoulli process.
Then we consider the case where arrivals occur in continuous time
and the interarrival times are exponentially distributed this is the
Poisson process.
4.2 The Bernoulli process
4.2.1 Basic facts
The Bernoulli process can be visualized as a sequence of independent
coin tosses, where the probability of heads in each toss is a xed number
p in the range 0 < p < 1. In general, the Bernoulli process consists of
a sequence of Bernoulli trials, where each trial produces
a 1 (a success) with probability p, and
a 0 (a failure) with probability 1 p, independently of what happens
in other trials.
4.2. THE BERNOULLI PROCESS 65
There are many realizations of Bernoulli process. Coin tossing is just a
paradigm involving a sequence of independent binary outcomes. The se-
quence Z
1
, Z
2
, of independent identically distributed r.v.s in Section
3 is another paradigm for the same phenomenon.
In practice, a Bernoulli process is often used to model systems involving
arrivals of customers or jobs at service centers. Here, time is discretized
into periods, and a success at the k-th trial is associated with the arrival
of at least one customer at the service center during the k-th period. In
fact, we will often use the term arrival in place of success when this is
justied by the context.
Given an arrival process, one is often interested in random variables such
as the number of arrivals within a certain time period, or the time until
the rst arrival. For the case of a Bernoulli process, some answers are
already available from earlier chapters. Here is a summary of the main
facts.
Bernoulli Distribution B(p) describes a random variable that can take
only two possible values, i.e. X = {0, 1}. The distribution is described
by a probability function
p(1) = P(X = 1) = p, p(0) = P(X = 0) = 1 p for some p [0, 1].
It is easy to check that E(X) = p, Var(X) = p(1 p).
4.2.2 Random Variables Associated with the
Bernoulli Process
Binomial distribution B(n, p). This distribution describes a random
variable X that is a number of successes in n independent Bernoulli trials
with probability of success p.
In other words, X is a sum of n independent Bernoulli r.v. Therefore,

X takes values in X = {0, 1, ..., n} and the distribution is given by a
probability function
p(k) = P(X = k) =
_
n
k
_
p
k
(1 p)
nk
.
It is easy to check that E(X) = np, Var(X) = np(1 p).
4.3 The Poisson process

4.3.1 Poisson distribution
It is another discrete probability distribution, used to determined the
probability of a designated number of successes per unit of time when the
successes/events are independent and the average number of successes
per unit of time remains constant. The Poisson distribution is
p(x) =
e
x
x!
x = 0, 1, 2, ... (4.3.1)
where
x = designated number of successes, e = 2.71 the natural base,
4.3. THE POISSON PROCESS 67
> 0 a constant = the average number of successes per unit of time
period
The Poisson distributions mean and the variance are
= ;
2
= = .
Example 4.1 (Poisson distribution usage). We often model the number
of defects or non-conformities that occur in a unit of product (unit area,
volume, and most frequently unit of time...) say, a semiconductor device,
by a Poisson distribution. The number of wire-bonding defects per unit
X is Poisson distributed with parameter = 4. Compute the probability
that a randomly selected semiconductor device will contain two or fewers
wire-bonding defects.
This probability is
(x 2) = p(0) +p(1) +p(2) =
2
x=0
e
x
x!
= 0.2381.
4.3.2 Poisson process
The Poisson process can be viewed as a continuous-time analog of the
Bernoulli process and applies to situations where there is no natural way
of dividing time into discrete periods. We consider an arrival process
that evolves in continuous time, in the sense that any real number t is a
possible arrival time.
Denition 24. A counting process X(t) is said to be a Poisson (count-
ing) process with positive rate (or intensity) if
X(0) = 0, and X(t) has independent increments.
The number of events in any interval of length t is Poisson dis-
tributed with mean t; that is, for all s, t > 0,
P[X(t +s) X(s) = n] =
e
t
(t)
n
n!
n = 0, 1, 2, ... (4.3.2)
4.4 Course Review and Discussion
Practical Problem 6.
1. Prove that a Poisson process X(t) with positive rate has station-
ary increments, and
E[X(t)] = t, Var[X(t)] = t.
2. Practice. Patients arrive at the doctors oce according to a Pois-
son process with rate = 1/10 minute. The doctor will not see a
patient until at least three patients are in the waiting room.
a/ Find the expected waiting time until the rst patient is admitted
to see the doctor.
b/ What is the probability that nobody is admitted to see the doctor
in the rst hour?
Theorem 25. If every eigenvalue of a matrix P yields linearly indepen-
dent left eigenvectors in number equal to its multuiplicity, then
1. there exists a nonsingular matrix M whose rows are left eigenvec-
tors of P, such that
4.4. COURSE REVIEW AND DISCUSSION 69
2. D = MPM
1
is a diagonal matrix with diagonal elements are the
eigenvalues of P, repeated according to multiplicity.
Practical Problem 7 (MC for Business Intelligence). Consider a case
study of mobile phone industry in VN. Due to a most recent survey,
there are four big mobile producers/sellers N, S, M and L, and their
market distributions in 2007 is given by the stochastic matrix:
P =
_
_
N M L S

N 1 0 0 0
M 0.4 0 0.6 0
L 0.2 0 0.1 0.7
S 0 0 0 1
_
_
Is P regular? ergodic?
Find the long term distribution matrix L = lim
m
P
m
.
What is your conclusion?
(Remark that the state N and S are called absorpting states).
Chapter 5
Probability Modeling and
Mathematical Finance
Probability modeling in nance provides instruments to rationalize the
unknown by imbedding it into a coherent framework. Three key com-
ponents should be distinguished: randomness, uncertainty and chaos.
Kolmogorov dened randomness in terms of non-uniqueness and non-
regularity (as a die with six faces or the expansion of ). Kalman dened
chaos as randomness without probability.
Few areas that employ much probability modeling include: weather fore-
casting, biology and nancial forecasting. In general, in order to model
uncertainty we seek to distinguish the known from the unknown and nd
some mechanisms (theories, intuition, common sense...) to reconcile our
knownledge with our lack of it.
71
72CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
5.1 Martingales
5.1.1 History
Girolamo Cardano in his book The Book of Game of Chance in 1565
proposed the notion of fair game. He stated: The most fundamental
priciple of all in gambling is simply equal conditions, ... . This is the
essence of the Martingale, however until 1900, in Bacheliers thesis that
a mathematical model of a fair game- or martingale- was proposed.
Nowadays, we understand the concept of a fair game or martingale, in
money terms, states that the expected prot at a given time given the
total past capital is null with probability one.
Throughout this chapter we assume that (, F, P) is a xed probability
space, where
is a sample space representing the set of all possible outcomes,
F is a -algebra of subsets of representing the events to which
we can assign probabilities, and
P is a probability measure on (, F).
The expectation with respect to P will be denoted by E[.].
5.1.2 Conditional expectation
Let X and Z be two r.vs on the same (, F, P)-space. Suppose X has
range {x
1
, x
2
, . . . , x
m
} and Z has range {z
1
, z
2
, . . . , z
n
}. We know that
P[X = x
i
|Z = z
j
] :=
P[X = x
i
, Z = z
j
]
P[Z = z
j
]
5.1. MARTINGALES 73
and also
E[X|Z = z
j
] =
i
x
i
P[X = x
i
|Z = z
j
].
Denition 26. The random variable Y = E[X|Z], the conditional ex-
pectation of X given Z, is dened as follows:
(a) if Z() = z
j
, then Y () := E[X|Z = z
j
] =: y
j
(say).
Justication. In this way we could do partitioning the space into
Z-atoms Z = z
j
, on which Z is constant. The -algebra G = (Z)
generated by Z consists of sets {Z B}, B B, the Borel set. Therefore
G = (Z) consists precisely of the 2
n
possible unions of the n Z-atoms.
Note from (a) that Y is constant on Z-atoms, so better we say
(b) Y is G measurable.
Theorem 27 (Kolmogorov 1933). Let (, F, P) be a probability space
and X a random variable with E[|X|] < . Let G be sub--algebra of
F. Then there exists a random variable Y such that
a) Y is G-measurable,
b) E[|Y |] < ,
c) for every G G we have
_
G
Y dP =
_
G
XdP, G G.
Moreover, if Y
1
is another random variable with these properties then
Y
1
= Y almost surely (a.s.), that is P[Y
1
= Y ] = 1.
A random variable Y with properties a)-c) is called a version of the
conditional expectation E[X|G] of X given G, and we write Y = E[X|G]
a.s.
Proof. Since G is generated by Z, or any G G is a union of the n
Z-atoms, so we rst prove that
_
Z=z
j
Y dP = y
j
P[Z = z
j
] = ... =
_
Z=z
j
XdP.
Write G
j
= {Z = z
j
} then this equation means E[Y I
G
j
] = E[X I
G
j
]...
Note 5.1. We often write
E[X|Z] for E[X|G] = E[X|(Z)]; and
E[X|Z
1
, Z
2
, . . .] for E[X|(Z
1
, Z
2
, . . .)].
Fact 5.2. if U is a non-negative bounded r.v., then
E[U|G] 0, a.s.
5.1.3 Key properties of Conditional expectation
See textbook.
5.1. MARTINGALES 75
5.1.4 Filtration
A ltration is a family {F
t
, t = 0, 1, . . . , T} of sub--algebras indexed by
t = 0, 1, . . . , T such that
F
0
F
1
F
2
. . . F
T
;
that is the family is increasing with time. Intuitively, for each t =
0, 1, . . . , T, the -algebra F
t
tells us which events may be observed by
time t.
If the sample space is a nite set, often the -algebra F
0
is trivial,
consisting simply of the empty set and the whole sample space . We
also often write just {F
t
} instead of the lengthy {F
t
, t = 0, 1, . . . , T},
and can assume that F
T
= F (since shall be considering only random
variables that are F
T
-measurable).
Denition 28. We call the quadruple (, F, {F
t
}, P) a ltered probabil-
ity space.
We x a ltered probability space (, F, {F

t
}, P) from now on. Given
d N.
Ad-dimensional stochastic process with time index set {0, 1, . . . , T},
dened on the provided ltered probability space, is a collection
X = {X
t
, t = 0, 1, . . . , T}
where each X
t
is a d-dimensional random vector, i.e. a function
X
t
: R
d
such that
X
1
t
(B) { : X
t
() B} F
for each subset B of R
d
.
The process X = {X
t
, t 0} is called adapted (to the ltration
{F
t
}) if for each t, X
t
is F
t
-measurable, i.e.
if X
1
t
(B) F
t
for each set B of R
d
and for each t = 0, 1, . . . , T.
We often write X
t
F
t
as shorthand for X
1
t
(B) F
t
for all sets
B in R
d
.
Two d-dimensional stochastic processes Y = {Y
t
} and Z = {Z
t
}
are modications of one another if P(Y
t
= Z
t
) = 1 for each t =
0, 1, . . . , T.
5.1.5 Martingale
A collection/ process M = {M
t
, F
t
, t = 0, 1, . . . , T}, where each M
t
is a
real-valued random variable, is called a martingale if the following three
conditions hold:
1. E[|M
t
|] < for t = 0, 1, . . . , T,
2. M
t
is F
t
-measurable for t = 0, 1, . . . , T, [i.e. the process M is
adapted]
3. the conditional expectation
E[M
t
|F
t1
] = M
t1
for t = 1, . . . , T.
5.1. MARTINGALES 77
In our discrete time setting, condition 3. can be equivalently re-
placed by
3.
E[M
t
|F
s
] = M
s
for all s < t in {0, 1, . . . , T}.
We call M a sub-martingale if the = in condition 3. or 3. is replaced
by ; call M a super-martingale if the = in condition 3. or 3. is
replaced by .
When describing (sub/super)martingales we will sometimes omit the l-

tration F
t
from the notion for M when it is understood.
Interpretation of Martingale in Finance
The martingale is considered to be a necessary condition for an ecient
asset market, one in which the information contained in past prices is
instantly, fully and perpetually reected in the assets current price. We
identify
M = {M
t
= p
t
the assets price at t},
and denote the ltration
t
= {p
0
, p
1
, . . . , p
t
} for an asset price history
at time t = 0, 1, 2 . . . expressing the relevant information we have at this
time regarding the time series. Then we could think that in a martingale
process each process event (as a new price)
is independent and can be summed (or intergrable); and
has the property that its conditional expectation remains the same
(i.e. time-invariant).
Hence, M = {M
t
= p
t
} is a martingale i the expected next period price
is equal to the current price:
E[p
t+1
|p
0
, p
1
, . . . , p
t
] = p
t
or equivalently E[p
t+1
|
t
] = p
t
for any time t.
If instead asset prices decrease (or increase) in expectation over time, we
have a super-martingale (sub-martingale):
E[p
t+1
|
t
] p
t
Observation 1. Martingales may also be dened with respect to other
processes.
If, for example, P = {p
t
, t 0} is price process and Y = {y
t
, t 0}
is interest rate process, we can say that P is a martingale with respect
to Y if
E[|p
t
|] < , and E[p
t+1
|y
0
, y
1
, . . . , y
t
] = p
t
, t.
Fact 5.3. By induction, a martingale implies an invariant mean:
E[p
t+1
] = E[p
t
] = = E[p
0
].
5.1.6 Martingale examples
Example 5.1. Sum of independent zero-mean r.vs. Let X
1
, X
2
, . . .
be a sequence of independent r.vs with E[|X
n
|] < , n and E[X
n
] = 0.
Dene S
0
= 0, F
0
= {, } and
S
n
:= X
1
+X
2
+X
3
+ +X
n
,
5.1. MARTINGALES 79
F
n
:= (X
1
, X
2
, X
3
, . . . , X
n
).
Then you can prove for n 1 that
E[S
n
|F
n1
] = S
n1
a.s.
Example 5.2. Geometric Random Walks and a specic case.
The essential idea underlying the random walk for real processes is the
assumption of mutually independent increments of the order of magni-
tude for each point in time. However, economic time series in particular
do not satisfy the latter assumption. Seasonal uctuations of monthly
sales gures for example are in absolute terms signicantly greater if the
yearly average sales gure is high. By contrast, the relative or percent-
age changes are stable over time and do not depend on the current
level of X
t
.
Analogously to the random walk X
t
=
t
i=0
Z
i
with i.i.d. absolute
increments Z
t
= X
t
X
t1
, a geometric random walk {X
t
; t 0} is
assumed to have i.i.d. relative increments
R
t
=
X
t
X
t1
for t = 1, 2, . . .
For a specic case, the geometric binomial random walk
X
t
= R
t
X
t1
= X
0
k=1
tR
k
where X
0
, R
1
, R
2
, . . . are mutually independent, each R
k
is Bernoulli, and
for u > 1 (up), d < 1 (down):
P(R
k
= u) = p, P(R
k
= d) = 1 p, 0 < p < 1.
We obtain the expectation E(R
k
) = (u d)p +d for any k = 1, 2, . . .
If E(R
k
) = 1 then the process is on average stable, which is the case for
p =
1 d
u d
.
Example 5.3. Let the stock price S
t
be dened in terms of a Bernoulli
event. That mean the stock prices only can either increase/grow or de-
crease/fall from period to period following a stable geometric binomial
random walk.
Hence S
t
changes at rates u > 1 (up) and d < 1 (down) with proba-
bilities
S
t
=
_
_
_
uS
t1
with probability p =
1d
ud
dS
t1
with probability 1 p =
u1
ud
(5.1.1)
Proposition 29. The stock price process S
t
is a martingale.
Proof. We rst have to show that it satises condition 1., namely E[|S
t
|] <
. Indeed, due to the independence assumption
E[S
t
] = E[S
0
](E[R
1
])
t
= E[S
0
][(u d)p +d]
t
< .
Next it is a constant mean process with
E[S
t+1
|S
t
, S
t1
, . . . , S
0
] = E[S
t+1
|S
t
] = S
t
since
E[S
t+1
|S
t
] = uS
t
p +dS
t
(1 p) = S
t
[up +d(1 p)] = S
t
.
5.1. MARTINGALES 81
Example 5.4. Product of non-negative independent r.vs of mean
1. Let X
1
, X
2
, . . . be a sequence of independent non-negative r.vs with
E[X
n
] = 1n.
Dene M
0
= 0, F
0
= {, } and
M
n
:= X
1
X
2
X
3
. . . X
n
, F
n
:= (X
1
, X
2
, X
3
, . . . , X
n
).
The process M is a martingale. (Why?)
5.1.7 Stopping time
Denition 30. A (discrete) stopping time is a function : {0, 1, . . . , T}
{}
such that
{ = t} F
t
for t = 0, 1, . . . , T . . . ()
Obviously for such a stopping time we see:
{ = } = \(
T
_
t=0
{ = t}) F
T
.
For convenience we dene F
= F
T
, and then () also holds with
t = .
Justication. Intuitively is a time when you can decide to stop
playing our game. Whether or not you stop immediately after the n-
th game depends only on the history up to (and including) time n:
{ = n} = { : () = n} F
n
.
Fact 5.4. With any (discrete) stopping time , there is a -algebra de-
ned by
F
= {A F
: A { = t} F
t
for t = 0, 1, . . . , T}.
Lemma 31. If and are two stopping times, then
= min(, ), and = max(, )
both also are stopping times.
5.2. STOCHASTIC CALCULUS 83
5.2 Stochastic Calculus
Our basic assumption is, we do not know and can not predict tomorrows
values of asset prices. The past history of the asset value is there as a
nancial time series for us to examine as much as we want, but we can
not use it to forecast the next move that the asset will make. This does
not mean that it tells us nothing.
We know what are their mean and variance and, generally, what
is thelikely distribution of future asset prices. These qualities must be
determined by a statistical analysis of of historical data.
5.2.1 A Simple Model for Asset Prices
Now suppose that at time t the asset price is S. Let us consider a small
subsequent time interval dr, during which S changestoS +dS.
Drift is a measure of the average rate of growth of the asset price,
Volatility measures the standard deviation of the returns. It is
represented by a random sample drawn from a normal distribution with
mean zero and adds a term dX to the the corresponding return on the
asset, dS/S:
dS
S
= dt + dX
5.2.2 Stochastic dierential equation
Stochastic Integral. In order to introduce a stochastic process as a
solution of a stochastic dierential equation we introduce the concept of
the Itointegral: a stochastic integral with respect to a Wiener process.
Formally the construction of the Itointegral is similar to the Stieltjesin-
tegral. However, instead of integrating with respect to a deterministic
function (Stieltjesintegral), the Itointegral integrates with respect to a
random function, more precisely, the path of a Wiener process. Since
the integrant itself can be random, i.e. it can be a path of a stochastic
process, one has to analyze the mutual dependencies of the integrant and
the Wiener process.
Chapter 6
Part III: Practical
Applications of SP
First we discuss Statistical Parameter Estimation and its role in
industry, engineering and service... through specic models.
6.1 Statistical Parameter Estimation
Practical Problem 8 (The Brand switching model of Problem 5 of
Section ??).
Our question now is: how to nd the transition matrix P from sample
surveys? We dene [due to Whitaker (1978)]
brand loyalty as the proportion of consumers who repurchase a
brand on the next occasion without persuasion,
and purchasing pressure as the proportion of consumers who are
persuaded to purchase a brand on the next occasion.
85
86 CHAPTER 6. PART III: PRACTICAL APPLICATIONS OF SP
Denote w
i
and d
i
, respectively, as the values of brand loyalty and
brand pressure for brand i, where both w
i
and d
i
are between 0 and 1
and
i
d
i
= 1. To illustrate, consider the following three-brand case:
Brand 1 Brand 2 Brand 3
Brand loyalty w
i
0.3 0.6 0.9
Purchasing pressure d
i
0.3 0.5 0.2
Could we give a formula to compute brand switching probabilities p
i,j
(i.e. transition probabilities) in terms of w
i
and d
i
?
Practical Problem 9 (A model of social mobility).
A problem in the study of social structure is about the transitions be-
tween the social status of successive generations in a family. Sociologists
often assume that the social class of a son depends only on his parents
social class, but not on his grandparents.
Glass (1954, UK) identied three social classes:
- upper class (executive, managerial, high administrative, profes-
sional),
- middle class (high grade supervisor, non-manual, skilled manual),
and
- lower class (semi-skilled or unskilled).
Each family in the society occupies one of the three social classes, and its
occupation evolves across dierent generations. Glass analyzed a random
6.2. INVENTORY CONTROL IN LOGISTICS 87
sample of 3500 male residents in UK and estimated that the transitions
between the social classes of successive generations in a family were as
the following:
Following generation

Current generation Upper class Middle class Lower class
Upper class 0.45 0.48 0.07
Middle class 0.05 0.70 0.25
Lower class 0.01 0.50 0.49
Model this social mobility by a DTMC, prove that it is irreducible
and ergodic.
What is the distribution of a familys occupation in the long run?
How many generations are necessary for a lower-class family to
become a higher-class family?
6.2 Inventory Control in Logistics
Current need for Logistics Optimization. Inventory management
is one of the oldest and most studied areas of Operation Research, in
particular, it is useful for Logistics Industry at developed nations but
less natural resources. Logistics plays a key role in Import and Export
activities, as witnessed by the cargo stagnancy in ports of HCMC and
Southern Vietnam recently.
Steps of the investigation
A. Pre-graduate research period
a/ Learn basic concepts of Economic Order Quantity (ECQ) and
Economic Production Quantity (EPQ) of the problem. Take
the course Stochastic Processes at AM section, HCMUT.
b/ Algebraically model the simple case: Predict uncertain
demand in a single period At the same time, visit a manufac-
turing or sale rm in HCMC, for industrial internship, to see
how inventory matter in those sectors, in 2-4 weeks.
c/ Design an application package that allows manuauver inven-
tory events. Study a computing machinary such as Singular,
Matlab, R or OpenModelica.
B. Graduate research period
d/ Investigate more complex cases in Inventory Control
e/ Learn approriate algorithms and implement them
f/ Implement a pilot application package with basic functionality
allowing prediction of small cases.
6.3. EPIDEMIC PROCESSES 89
6.3 Epidemic processes
One of the most urgent tasks of megacities is prevent epidemic break-

outs by describing mathematically their stochastic nature and spacial-
temporal evolution.
The application of mathematical moddeling to the spread of epidemics
has a long history and was initiated by Daniel Bernoullis work on the
eect of cowpox inoculation on the spread of smallpox in 1760.
General descrition. we are only interested in temporal epidemics model,
via SARS Epidemic.
SARS (Severe Acute Respiratory Syndrome.)
The aim Controlability of SARS Epidemics
How to? Use of Compartmental Models
The aim To study the controlability of SARS Epidemics based on
the compartmental model, see [6].
How to? The method employes the eigenvalue theory of matrices,
and the standard deterministic systemic inammatory response (SIR)
model.
6.4 Statistical Models in Risk Management
Introductory. We aim at investigating Statistical Methods for a specic
risk management problem using Probabilistic techniques combined with
Computer Algebra and Numerical Simulation.
Basic terms. Modern project management requires forecasting tech-
niques for costs, duration (time) and performance of a project, not only
under normal circumtances, but also under external extremities (extreme
events).
Problem Formulation. We study the project costs, specically predict
the probability and the bad/good impact of various discruptive/high-
risky events on the project performance ultimately, by using a Bayesian
framework, computer algebra and few ideas from Experiment Designs.
A few typical aspects of Modeling External Risks are:
1. how to employ experts opinions and past company performance
in an eective way,
2. what mathematical and statistical techniques could be used to pre-
dict and analyze unexpected-but-servere-impact events in project
management?
Scope of our investigation. We aim to conduct a case study in the
area of Finance, Construction or Banking sector using Bayesian Statis-
tics. In particular, we try to analyze, measure and assess mathematically
the project global performance by studying concrete and coherence com-
ponents of a project. Our proposal has few goals:
6.5. OPTIMIZATIONMETHODS FOR PORTFOLIORISKMANAGEMENT91
1. to investigate and employ powerful techniques of computational
algebra and Bayesian statistics in project management;
2. if time allows, to investigate the problem when removing few limi-
tations of the model.
a/ Learn the context of the problem
b/ Algebraically model the problems, in particular the con-
cept of gravities of a project and the additive case of event
eects.
c/ Design an application package that allows manuauver extrem-
ities/external events. Study a computing machinary such as
Matlab, R or OpenModelica.
d/ Forecasting the eects of events in the maximum case
e/ Learn the Metropolis-Hasting algorithm and relevants
f/ Implement a pilot application package with basic functionality
allowing prediction of small cases.
6.5 Optimization Methods for Portfolio Risk
Management
Introductory. The Modern Portfolio theory (MPT) invented by Harry
Markowitz in the 60s still plays an important role in many areas of Fi-
nance and Insuarance Management. MPT discusses on how risk-averse
investors can construct portfolios to optimize or maximize expected re-
turn based on a given level of market risk, emphasizing that risk is an
inherent part of higher reward.
Basic terms. A Markowitz Ecient Portfolio is one where no added
diversication can lower the portfolios risk for a given return expecta-
tion (alternately, no additional expected return can be gained without
increasing the risk of the portfolio). The Markowitz Ecient Frontier
is the set of all portfolios that will give the highest expected return for
each given level of risk. These concepts of eciency were essential to the
development of the Capital Asset Pricing Model (CAPM).
Problem Formulation. Mathematically, MPT models an assets re-
turn as a random variable, and models a portfolio as a weighted combi-
nation of assets so that the return of a portfolio is the weighted combi-
nation of the assets returns. Moreover, a portfolios return is a random
variable, and consequently has an expected value and a variance. Risk,
in this model, is the standard deviation of return.
Scope of our investigation. Uitility Functions in Risk Theory and
the portfolio investment problem have a very close relationship. We
aim at investigating this relationship, using Optimization Methods for a
specic risk problem in Portfolio Management, and combining algebraic
technique with quadratic optimization. Our proposal has few goals:
1. Formulate the expected utility maximization problem and then
2. employ Computational algebra and quadratic optimization tech-
niques together with appropriate software to solve.
Study Uitility Functions in Risk Theory
Study The Modern Portfolio theory
Design a pilot package
Solve the expected utility maximization problem
Implement the pilot package
History of our research problem. Modern portfolio theory (MPT)or
portfolio theorywas introduced by Harry Markowitz with his paper Port-
folio Selection, which appeared in the 1952 Journal of Finance. Thirty-
eight years later, he shared a Nobel Prize with Merton Miller and
William Sharpe for what has become a broad theory for portfolio selec-
tion.
Prior to Markowitzs work, investors focused on assessing the risks and
rewards of individual securities in constructing their portfolios. Standard
investment advice was to identify those securities that oered the best op-
portunities for gain with the least risk and then construct a portfolio from
these. Following this advice, an investor might conclude that railroad
stocks all oered good risk-reward characteristics and compile a port-
folio entirely from these. Intuitively, this would be foolish! Markowitz
formalized this intuition. Detailing a mathematics of diversication, he
proposed that investors focus on selecting portfolios based on their over-
all risk-reward characteristics instead of merely compiling portfolios
from securities that each individually have attractive risk-reward charac-
teristics. In a nutshell, inventors should select portfolios not individual
securities.
If we treat single-period returns for various securities as random vari-
ables, we can assign them expected values, standard deviations and cor-
relations. Based on these, we can calculate the expected return and
volatility of any portfolio constructed with those securities. We may
treat volatility and expected return as proxys for risk and reward. Out
of the entire universe of possible portfolios, certain ones will optimally
balance risk and reward. These comprise what Markowitz called an e-
cient frontier of portfolios. An investor should select a portfolio that
lies on the ecient frontier.
James Tobin (1958) expanded on Markowitzs work by adding a risk-free
asset to the analysis. This made it possible to leverage or deleverage
portfolios on the ecient frontier. This lead to the notions of a super-
ecient portfolio and the capital market line. Through leverage, port-
folios on the capital market line are able to outperform portfolio on the
ecient frontier.
Sharpe (1964) formalized the capital asset pricing model (CAPM).
This makes strong assumptions that lead to interesting conclusions. Not
only does the market portfolio sit on the ecient frontier, but it is actu-
ally Tobins super-ecient portfolio. According to CAPM, all investors
should hold the market portfolio, leveraged or de-leveraged with posi-
tions in the risk-free asset. CAPM also introduced beta and relates an
assets expected return to its beta.
Portfolio theory provides a broad context for understanding the inter-
actions of systematic risk and reward. It has profoundly shaped how
institutional portfolios are managed, and motivated the use of passive in-
vestment management techniques. The mathematics of portfolio theory
is used extensively in nancial risk management and was a theoretical
precursor for todays value-at-risk measures.
Bibliography
[1] A.K. Basu, Introduction to Stochastic Processes, Alpha Science 2005
[2] L. Kleinrock, Queueing Systems, vol 2, John Wiley & Sons, 1976
[3] L. Kleinrock, Time-shared systems: A theoretical treatment, Journal
of the ACM 14 (2), 1967, 242-261.
[4] Bernd Sturmfels, Solving Polynomial Systems, AMS, 2002
[5] Lior Pachter and Bernd Sturmfels, Algebraic Statistics for Compu-
tational Biology, http://bio.math.berkeley.edu/ascb/
[6] Zhang, Ma and Wu, A compartmental model for analysis of SARS
transmission patterns and outbreak control measures in China, Ap-
plied Mathematics and Computation, 162, 2005
[7] Peter Clote, Computational Molecular Biology, Wiley 2000.
97

Stochastic Processes Applications Lecturenotes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stochastic Processes Applications Lecturenotes

Uploaded by

Copyright:

Available Formats

Stochastic Processes

Selective Topics and Applications

introduces basic techniques of Stochastic Processes theory, including:

Part I: Motivated topics and Background

Service systems: mathematical model of queueing systems.

We will discuss the followings:

We investigate few following applications:

Next chapter will discuss Fundamental Stochastic Processes.

Markov property or Memoryless property:

Higher-order transition probabilities.

Practical Problem 1. A state transition diagram of a nite-state Markov

from taking n in the equation

and limiting probabilities p

of a Markov chain. For a specic class

Practical Problem 3. Consider a Markov chain with state space {0, 1, 2}

be the unique left eigenvector associating with the biggest

is called a stationary distribution of MC). Your nal task is proving

Question . The mean time until an absorbing state is eventually

Practical Problem 5 (Brand switching model for consumer behavior).

60 CHAPTER 3. RANDOM WALKS & WIENER PROCESS

In other words, X is a sum of n independent Bernoulli r.v. Therefore,

4.3 The Poisson process

We x a ltered probability space (, F, {F

When describing (sub/super)martingales we will sometimes omit the l-

One of the most urgent tasks of megacities is prevent epidemic break-

You might also like