Professional Documents
Culture Documents
1.2.1 Introduction
Probabilistic models often involve several random variables of interest.
For example, in a medical diagnosis context, the results of several tests
may be signicant, or in a networking context, the workloads of several
gateways may be of interest. All of these random variables are associated
with the same experiment, sample space, and probability law, and their
values may relate in interesting ways. Mathematically, a random variable
is a mapping!
Denition 1. A random variable X is a mapping (function) from a
sample space S to the reals R. For any j R, the preimage A :=
X
1
(j) = {w : X(w) = j} S is an event, then we understand
{X = j} = (A) =
wA
(w).
For nite set - sample space S then obviously
{X = j} = (A) =
|A|
|S|
.
A discrete random variable X is the one having nite range Range(X),
described by the probability point or mass distribution (pmf), deter-
mined by {X = j} = p
j
. We must have
p
j
0, and
j
p
j
= 1.
14 CHAPTER 1. BACKGROUND
A continuous random variable X is the one having innite range Range(X),
described by the probability density distribution (pdf) f(x), that satises
f(t) 0, and
_
tRange(X)
f(t)dt = 1.
Generating functions are important in handling stochastic processes in-
volving integral-valued random variables.
Multiple random variables. We consider probabilities involving si-
multaneously the numerical values of several random variables and to
investigate their mutual couplings. In this section, we will extend the
concepts of pmf and expectation developed so far to multiple random
variables.
Consider two discrete random variables X, Y : S R associated with
the same experiment. The joint pmf of X and Y is dened by
p
X,Y
(x, y) = P(X = x, Y = y)
for all pairs of numerical values (x, y) that X and Y can take. We will use
the abbreviated notation P(X = x, Y = y) instead of the more precise
notations P({X = x} {Y = y}) or P({X = x} and {Y = y}). That is
P(X = x, Y = y) = P({X = x} {Y = y}) = P({X = x} and {Y = y}).
For the pair of random variables X, Y , we say
Denition 2. X and Y are independent if for all x, y R, we have
P(X = x, Y = y) = {X = x}{Y = y} p
X,Y
(x, y) = p
X
(x) p
Y
(y),
or in terms of conditional probability
1.2. GENERATING FUNCTIONS 15
({X = x}|{Y = y}) = {X = x}.
This can be extended to the so-called mutually independent of a nite
number n random variables.
Denition 3. The expectation operator denes the expected value of a
random variable X as
E(X) =
xRange(X)
{X = x} x
If we consider X is a function from a sample space S to the naturals
N, then
E(X) =
i=0
{X > i}.(Why?)
Functions of Multiple Random Variables. When there are multi-
ple random variables of interest, it is possible to generate new random
variables by considering functions involving several of these random vari-
ables. In particular, a function Z = g(X, Y ) of the random variables X
and Y denes another random variable. Its pmf can be calculated from
the joint pmf p
X,Y
according to
p
Z
(z) =
(x,y)|g(x,y)=z
p
X,Y
(x, y).
Furthermore, the expected value rule for functions naturally extends and
takes the form
E[g(X, Y )] =
(x,y)
g(x, y) p
X,Y
(x, y).
16 CHAPTER 1. BACKGROUND
Theorem 4. We have two important results of expectation.
Linearity E(X +Y ) = E(X) +E(Y ) for any pair of random variables
X, Y
Independence E(X Y ) = E(X) E(Y ) for any pair of independent r.
v. X, Y
Mean, variance and moments of the probability distribution
{X = j} = p
j
m = E(X) =
j=0
j p
j
= P
(1) =
j=0
q
j
= Q(1) (why!?)
Recall that the variance of the probability distribution p
j
is
2
= E(X(X 1)) +E(X) [E(X)]
2
we need to know
E(X(X 1)) =
j=0
j(j 1) p
j
= P
(1) = 2Q
(1)?
Therefore,
2
=?
Exercise: Find the formula of the r-th factorial moment
[r]
= E(X(X 1)(X 2) (X r + 1))
1.2.2 Elementary results of Generating Functions
Suppose we have a sequence of real numbers a
0
, a
1
, a
2
, . . . Introducing
the dummy variable x, we may dene a function
A(x) = a
0
+a
1
x +a
2
x
2
+ =
j=0
a
j
x
j
. (1.2.1)
1.2. GENERATING FUNCTIONS 17
If the series converges in some real interval x
0
< x < x
0
, the func-
tion A(x) is called the generating function of the sequence {a
j
}.
Fact 1.1. If the sequence {a
j
} is bounded by some cosntant K, then
A(x) converges at least for |x| < 1 [Prove it!]
Fact 1.2. In case of the sequence {a
j
} represents probabilities, we intro-
duce the restriction
a
j
0,
j=0
a
j
= 1.
The corresponding function A(x) is then called a probability-generating
function. We consider the (point) probability distribution and the tail
probability of a random variable X, given by
{X = j} = p
j
, P{X > j} = q
j
,
then the usual distribution function is P{X j} = 1q
j
. The probability-
generating function now is
P(x) =
j=0
p
j
x
j
= E(x
j
), E is the expectation operator.
Also we can dene a generating function for the tail probabilities:
Q(x) =
j=0
q
j
x
j
.
Q(x) is not a probability-generating function, however.
Fact 1.3.
a/ P(1) =
j=0
p
j
1
j
= 1 and |P(x)|
j=0
|p
j
x
j
|
j=0
p
j
1 if |x| < 1. So P(x) is absolutely convergent at least for |x| 1.
b/ Q(x) is absolutely convergent at least for |x| < 1.
18 CHAPTER 1. BACKGROUND
c/ Connection between P(x) and Q(x): (check this!)
(1 x)Q(x) = 1 P(x) or P(x) +Q(x) = 1 +xQ(x).
Finding a generating function from a recurrence: multiply both
sides by x
n
. For example, the Fibonacci sequence
f
n
= f
n1
+f
n2
= F(x) = x +xF(x) +x
2
F(x)
Finding a recurrence from a generating function: whenever you
know F(x), we nd its power series P, the coecicents of P before x
n
are
Fibonacci numbers. How? Just remember how to nd a partial fractions
expansion of F(x), in particular a basic expansion
1
1 x
= 1 +x +
2
x
2
+
In general, if G(x) is a generating function of a sequence (g
n
) then
G
(n)
(0) = n!g
n
1.2.3 Convolutions
Now we consider two nonnegative independent integral-valued random
variables X and Y , having the probability distributions
{X = j} = a
j
, P{Y = k} = b
k
. (1.2.2)
The joint probability of the event (X = j, Y = k) is a
j
b
k
obviously. We
form a new random variable S = X+Y , then the event S = r comprises
the mutually exclusive events
(X = 0, Y = r), (X = 1, Y = r 1), , (X = r, Y = 0).
1.2. GENERATING FUNCTIONS 19
Fact 1.4. The probability distribution of the sum S then is
{S = r} = c
r
= a
0
b
r
+a
1
b
r1
+ +a
r
b
0
.
Proof.
p
S
(r) = P(X+Y = r) =
(x,y):x+y=r
P(X = x and Y = y) ==
x
p
X
(x) p
Y
(rx)
This method of compounding two sequences of numbers (not necessarily
be probabilities) is called convolution. Notation
{c
j
} = {a
j
} {b
j
}
will be used.
Fact 1.5. Dene the generating functions of the sequence {a
j
}, {b
j
} and
{c
j
} by
A(x) =
j=0
a
j
x
j
, B(x) =
j=0
b
j
x
j
, C(x) =
j=0
c
j
x
j
,
it follows that C(x) = A(x)B(x). [check this!]
In practical applications, the sum of several independent integral-
valued random variables X
i
can be dened
S
n
= X
1
+X
2
+ +X
n
, n Z
+
.
If the X
i
have a common probability distribution given by p
j
, with
probability-generating function P(x), then the probability-generating
function of S
n
is P(x)
n
. Clearly, the n-fold convolution of S
n
is
{p
j
} {p
j
} {p
j
} (n factors) = {p
j
}
n
.
20 CHAPTER 1. BACKGROUND
1.2.4 Compound distributions
In our discussion so far of sums of random variables, we have always
assumed that the number of variables in the sum is known and xed, i.e.,
it is nonrandom. We now generalize the previous concept of convolution
to the case where the number N of random variables X
k
contributing to
the sum is itself a random variable! In particular, we consider the sum
S
N
= X
1
+X
2
+ +X
N
, where
{X
k
= j} = f
j
,
{N = n} = g
n
,
{S
N
= l} = h
l
.
(1.2.3)
Probability-generating functions of X, N and S are
F(x) =
f
j
x
j
,
G(x) =
g
n
x
n
,
H(x) =
h
l
x
l
.
(1.2.4)
Compute H(x) with respect to F(x) and G(x). Prove that
H(x) = G(F(x)).
Example 1.1. A remote village has three gas stations, and each one
of them is open on any given day with probability 1/2, independently of
the others. The amount of gas available in each gas station is unknown
and is uniformly distributed between 0 and 1000 gallons. We wish to
characterize the distribution of the total amount of gas available at the
gas stations that are open.
The number N of open gas stations is a binomial random variable
1.2. GENERATING FUNCTIONS 21
with p = 1/2 and the corresponding transform is
G
N
(x) = (1 p +pe
x
)
3
=
1
8
(1 +e
x
)
3
.
The transform (probability-generating function) F
X
(x) associated with
the amount of gas available in an open gas station is
F
X
(x) =
e
1000x
1
1000x
.
The transform H
S
(x) associated with the total amount S of gas avail-
able at the three gas stations of the village that are open is the same as
G
N
(x), except that each occurrence of e
x
is replaced with F
X
(x), i.e.,
H
S
(x) = G(F(x)) =
1
8
(1 +F
X
(x))
3
.
23
24 CHAPTER 2. MARKOV CHAINS & MODELING
(In each time step n to n + 1, the process can stay at the same state e
i
(at both n, n + 1) or move to other state e
j
(at n + 1) with respect to
the memoryless rule, saying the future behavior of system depends only
on the present and not on its past history.)
Denition 5 (One-step transition probability).
Denote the absolute probability of outcome j at the nth trial by
p
j
(n) = (X
n
= j) (2.0.1)
The one-step transition probability, denoted
p
ij
(n + 1) = (X
n+1
= j|X
n
= i),
dened as the conditional probability that the process is in state j at time
n +1 given that the process was in state i at the previous time n, for all
i, j Q.
2.1 Homogeneous Markov chains
If the state transition probabilities p
ij
(n + 1) in a Markov chain M is
independent of time n, they are said to be stationary, time homogeneous
or just homogeneous. The state transition probability in homogeneous
chain then can be written without mention time point n:
p
ij
= (X
n+1
= j|X
n
= i). (2.1.1)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 2.1.1
2.1. HOMOGENEOUS MARKOV CHAINS 25
of these Markov chains must satisfy:
s
j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practice, we are likely given the ini-
tial distribution (the probability distribution of starting position of the
concerned object at time point 0), and the transition probabilities; and
we want to determine the the probability distribution of position X
n
for
any time point n > 0. The Markov property, quantitatively described
through transition probabilities, is represented in the state transition
matrix P = [p
ij
]:
P =
_
_
p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
(2.1.2)
Briey, we have
Denition 6. A (homogeneous) Markov chain M is a triple (Q, p, A)
in which:
Q is a nite set of states (be identied with an alphabet ),
p(0) are initial probabilities, (at initial time point n = 0)
P are state transition probabilities, denoted by a matrix P = [p
ij
]
in which
p
ij
= (X
n+1
= j|X
n
= i)
.
26 CHAPTER 2. MARKOV CHAINS & MODELING
And such that the memoryless property is satised,ie.,
(X
n+1
= j|X
n
= i, , X
0
= a) = (X
n+1
= j|X
n
= i), for all n.
In practice, the initial probabilities p(0) is obtained at the current
time (begining of a research), and the transition probability matrix P is
found from empirical observations in the past. In most cases, the major
concern is using P and p(0) to predict future.
Example 2.1. The Coopmart chain (denoted C) in SG currently con-
trols 60% of the daily processed-food market, their rivals Maximart and
other brands (denoted M) takes the other share. Data from the previous
years (2006 and 2007) show that 88% of Cs customers remained loyal
to C, while 12% switched to rival brands. In addition, 85% of Ms cus-
tomers remained loyal to M, while other 15% switched to C. Assuming
that these trends continue, determine Cs share of the market (a) in 5
years and (b) over the long run.
Proposed solution. Suppose that the brand attraction is time homoge-
neous, for a sample of large enough size n, we denote the customers
attention in the year n by a random variable X
n
. The market share
probability of the whole population then can be approximated by using
the sample statistics, e.g.
P(X
n
= C) =
|{x : X
n
(x) = C}|
n
, and P(X
n
= M) = 1 P(X
n
= C).
Set n = 0 for the current time, the initial probabilities then is
p(0) = [0.6, 0.4] = [P(X
0
= C), P(X
0
= M)].
2.1. HOMOGENEOUS MARKOV CHAINS 27
Obviously we want to know the market share probabilities p(n) = [P(X
n
=
C), P(X
n
= M)] at any year n > 0. We now introduce a transition prob-
ability matrix with labels on rows and columns to be C and M
P =
_
_
C M
C 0.88 0.12
M 0.15 0.85
_
_
=
_
_
1 a = 0.88 a = 0.12
b = 0.15 1 b = 0.85
_
_
, =
_
_
0.88 0.12
0.15 0.85
_
_
,
(2.1.3)
where a = p
CM
= P[X
n+1
= M|X
n
= C], b = p
MC
= P[X
n+1
= C|X
n
=
M].
h=1
p
(nk)
ih
p
(k)
hj
, 0 < k < n.
This results in the matrix notation
P
(n)
= P
(nk)
P
(k)
.
28 CHAPTER 2. MARKOV CHAINS & MODELING
Since P
(1)
= P, we get P
(2)
= P
2
, and in general P
(n)
= P
n
.
Let p
(n)
denote the vector form of probability mass distribution (pmf or
absolute probability distribution) associated with X
n
of a Markov process,
that is
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)],
where each p
i
(n) is dened as in 2.0.1.
Proposition 7. The absolute probability distribution p
(n)
at any stage n
of a Markov chain is given in the matrix form
p
(n)
= P
n
p
(0)
, where p
(0)
= p is the initial probability vector. (2.1.5)
Proof. We employ two facts:
* P
(n)
= P
n
, and
* the absolute probability distribution p
(n+1)
at any stage n+1 (asso-
ciated with X
n+1
) can be found by the 1-step transition matrix P = [p
ij
]
and the distribution
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)]
at any stage n (associated with X
n
):
p
j
(n + 1) =
s
i=1
p
ij
p
i
(n), or in the matrix notation p
(n+1)
= P p
(n)
.
Then just do the induction p
(n+1)
= P p
(n)
= P P, p
(n1)
= =
P
n+1
p
(0)
.
Example 2.2 (The Coopmart chain: cont. ). (a/) Cs share of the
market in 5 years can be computed by
p
(5)
= [p
C
(5), p
M
(5)] = P
5
p
(0)
.
2.2. CLASSIFICATION OF STATES 29
_
0 0.5 0.5
0.5 0 0.5
0.5 0.5 0
_
_
; P
2
=
_
_
0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0
_
_
; P
3
=
_
_
0.3 0.4 0 0 0.3
0 1 0 0 0
0 0 0 0.6 0.4
0 0 1 0 0
_
_
2.2 Classication of States
A) Accessible states.
State j is said to be accessible from state i if for some N 0, p
(N)
ij
> 0,
and we write i j. Two states i and j accessible to each other are said
to communicate, and we write i j. If all states communicate with
each other, then we say that the Markov chain is irreducible. Formally,
irreducibility means
i, j Q : N 0[p
(N)
ij
> 0].
B) Recurrent/persistence states and Transient states.
30 CHAPTER 2. MARKOV CHAINS & MODELING
Let A(i) be the set of states that are accessible from i. We say that
i is recurrent if from any future state, there is always some probability
of returning to i and, given enough time, this is certain to happen. By
repeating this argument, if a recurrent state is visited once, it will be
revisited an innite number of times.
A state is called transient if it is not recurrent. In particular, there are
states j A(i) such that i is not accessible from j. After each visit to
state i, there is positive probability that the state enters such a j. Given
enough time, this will happen, and state i cannot be visited after that.
Thus, a transient state will only be visited a nite number of times.
We now formalize concepts of recurrent/persistence state and transient
state.
Let the rst return time T
j
indicate the rst time or the number of steps
the chain is rstly at state j after leaving j after time 0 (if j is never
reached then set T
j
= ) It is a discrete r.v., taking values in {1, 2, 3, ...}.
For any two states i = j and n > 0, let f
n
i,j
be the conditional probability
the chain is rstly at state j after n steps given it was at state i at time
0:
f
n
i,j
:= P[T
j
= n|X
0
= i] = P[X
n
= j, X
k
= j, k = 1, 2, ..., n 1|X
0
= i]
and f
0
i,j
= 0 since T
j
1. Then clearly
f
1
i,j
= P[X
1
= j|X
0
= i] = p
i,j
2.2. CLASSIFICATION OF STATES 31
and, by addition rule and Bellmans optimal principle:
f
n
i,j
=
k=j
p
i,k
f
n1
k,j
, for n = 2, 3, ...
Note that this probability is still valid when j = i:
f
n
j,j
=
k=j
p
j,k
f
n1
k,j
.
The probability of visiting j in nite time (number of steps), starting
from i, is given by
f
i,j
= P[T
j
< |X
0
= i] =
n=0
f
n
i,j
.
Denition 8.
State j is recurrent if
f
j,j
= P[T
j
< |X
0
= j] = 1,
i.e., starting from the state, the process is guaranteed to return to
the state again and again, in fact, innitely many times.
A recurrent state j is said to be positive recurrent if, starting from
the state, the expected number of transitions until the chain return
to the state is nite:
E[T
j
|X
0
= j] =
n=0
nf
n
j,j
< .
State j said to be null recurrent if
E[T
j
|X
0
= j] = .
32 CHAPTER 2. MARKOV CHAINS & MODELING
State j is said to be transient (or nonrecurrent) if
f
j,j
= P[T
j
< |X
0
= j] < 1.
In this case there is positive probability 1 f
j,j
of never returning
to state j.
Practical Problem 2. Try to prove the following key facts of MC.
1. Show that in a nite-state Markov chain, not all states can be tran-
sient, in other words at least one of the states must be recurrent
2. Show that if P is a Markov matrix, then P
n
is also a Markov matrix
for any positive integer n.
3. Verify the transitivity property of Markov chains, that is, if i
j and j k, then i k. (Hint: use Chapman Komopgorov
equations).
Practice. Consider a Markov chain with state space {0, 1} and transition
probability matrix
P =
_
_
1 0
1/2 1/2
_
_
i/ Show that state 0 is recurrent.
ii/ Show that state 1 is transient.
2.3. MARKOV CHAIN DECOMPOSITION 33
2.3 Markov Chain Decomposition
Fact 2.1. In any Markov Chain, the followings are correct.
It can be decomposed into one or more recurrent classes or equiv-
alent classes, plus possibly some transient states. Each equivalent
class contains those states that communicate with each other.
A recurrent state is accessible from all states in its class, but is not
accessible from recurrent states in other classes;
A transient state is not accessible from any recurrent state. But,
at least one, possibly more, recurrent states are accessible from a
given transient state.
For the purpose of understanding long-term behavior of Markov
chain, it is important to analyze chains that consist of a single recur-
rent class. Such Markov chain is called irreducible chain.
For the purpose of understanding short-term behavior, it is also
important to analyze the mechanism by which any particular class of
recurrent states is entered starting from a given transient state.
C) Periodic states.
In a nite Markov Chain M = (Q, , P) (i.e. having nite number
of states), a periodic state i is state to which an agent could go back at
positive integer time points t
0
, 2t
0
, 3t
0
, . . . (multiple of an integer period
t
0
> 1). t
0
is named the period of i, being the greatest common divisor
of the integers {t > 0 : p
(t)
i,i
> 0}.
34 CHAPTER 2. MARKOV CHAINS & MODELING
A Markov Chain is aperiodic if there is no such periodic state, in other
words, if the period of each state i Q is 1.
For example, we could check if a MC has the transition matrix
P =
_
_
0 0 0.6 0.4
0 0 0.3 0.7
0.5 0.5 0 0
0.2 0.8 0 0
_
_
;
then it is periodic. Indeed, if the Markovian random variable (agent)
starts at time 0 in state E
1
, then at time 1 it must be in state E
3
or E
4
,
at time 2 it must be in state E
1
or E
2
. Therefore, it generaly can visit
only E
1
at times 2,4,6, ... Summarizing we have
Denition 9. A nite Markov chain M = (Q, , P) is
1. irreducible i it has only one single recurrent class, or any state
can be accessible from all other states.
2. aperiodic i the period of each state i Q is 1; or it has no
periodic state.
3. ergodic if it is positive recurrent and aperiodic.
It can be shown that recurrence, transientness, and periodicity are all
class properties; that is, if state i is recurrent (positive recurrent, null
recurrent, transient, periodic), then all other states in the same class of
state i inherit the same property.
D) Absorbing states and Absorption probabilities.
2.3. MARKOV CHAIN DECOMPOSITION 35
State j is said to be an absorbing state if p
jj
= 1; that is, once state
j is reached, it is never left.
If there is a unique absorbing state k, its steady-state probability is
1 (because all other states are transient and have zero steady-state
probability), and will be reached with probability 1, starting from
any initial state.
If there are multiple absorbing states, the probability that one
of them will be eventually reached is still 1, but the identity of
the absorbing state to be entered is random and the associated
probabilities may depend on the starting state.
Can we determine precisely absorption probabilities for all the ab-
sorbing states in a MC in the generic case?
Consider a Markov chain X(n) = {X
n
, n 0} with nite state space
E = {1, 2, , N} and transition probability matrix P.
Theorem 10. Let A = {1, , m} be the set of absorbing states and
B = {m+ 1, , N} be a set of nonabsorbing states.
Then the transition probability matrix P can be expressed as
P =
_
I O
R Q
_
where I is mm identity matrix, 0 is an m(Nm) zero matrix,
and the elements of R are the one-step transition probabilities from
nonabsorbing to absorbing states, and the elements of Q are the
one-step transition probabilities among the nonabsorbing states.
36 CHAPTER 2. MARKOV CHAINS & MODELING
Let U = [u
k,j
] be an (N m) m matrix and its elements are the
absorption probabilities for the various absorbing states,
u
k,j
= P[X
n
= j( A)|X
0
= k( B)]
We have
U = (I Q)
1
R = R,
is called the fundamental matrix of the Markov chain X(n).
2.4 Limiting probabilities & Stationary dis-
tributions
From now on we assume that all MCs are nite, aperiodic and irre-
ducible. The irreducibility assumption implies that any state can even-
tually be reached from any other state. Both irreducibility and aperiod-
icity assumptions hold for essentially all practical applications of MCs
(in bioinformatics,...) except for the case of MCs with absorbing states.
Denition 11. Vector p
= (p
1
, p
2
, , p
s
) is called the stationary dis-
tribution of a Markov chain {X
n
, n 0} with the state transition matrix
P if:
p
P = p
.
This equation indicates that a stationary distribution p
is a left eigen-
vector of P with eigenvalue 1. In general, we wish to know limiting
probabilities p
p
(0)
.
2.4. LIMITINGPROBABILITIES &STATIONARYDISTRIBUTIONS37
We need some general results to determine the stationary distribution
p
i
:= lim
t
p
i
(t),
where the p
i
form a unique solution to the conditions:
s
i=1
p
i
= 1; where each p
i
0;
p
j
=
s
i=1
p
i
p
i,j
.
See the proof in Theorem 19. We discuss here two particular cases
when s = 2 and s > 2.
A) Markov chains that have two states.
At rst we investigate the case of Markov chains that have two states, say
Q = {e
1
, e
2
}. Let a = p
e
1
e
2
and b = p
e
2
e
1
the state transition probabilities
between distinct states in a two state Markov chain, its state transition
matrix is
P =
_
_
p
11
p
21
p
12
p
22
_
_
=
_
_
1 a a
b 1 b
_
_
, where 0 < a < 1, 0 < b < 1.
(2.4.1)
38 CHAPTER 2. MARKOV CHAINS & MODELING
Proposition 14.
a) The n-step transition probability matrix is given by
P
(n)
= P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
+ (1 a b)
n
_
_
a a
b b
_
_
_
_
b) Find the limit matrix when n .
To prove this basic Proposition 14 (computing transition probability
matrix of two state Markov chains), we use a fundamental result of Linear
Algebra that is recalled in Subsection 2.6.
Proof. The eigenvalues of the state transition matrix P found by solving
equation
c() = |I P| = 0
are
1
= 1 and
2
= 1 a b. The spectral decomposition of square
matrix says P can be decomposed into two constituent matrices E
1
, E
2
(since only two eigenvalues was found):
E
1
=
1
2
[P
2
I], E
2
=
1
1
[P
1
I].
That means, E
1
, E
2
are orthogonal matrices, i.e. E
1
E
2
= 0 = E
2
E
1
,
and
P =
1
E
1
+
2
E
2
; E
2
1
= E
1
, E
2
2
= E
2
.
Hence,
P
n
=
n
1
E
1
+
n
2
E
2
= E
1
+ (1 a b)
n
E
2
,
or
2.4. LIMITINGPROBABILITIES &STATIONARYDISTRIBUTIONS39
P
(n)
= P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
+ (1 a b)
n
_
_
a a
b b
_
_
_
_
b) The limit matrix when n :
lim
n
P
n
=
1
a +b
_
_
_
_
b a
b a
_
_
_
_
B) Markov chains that have more than two states.
For s > 2, it is cumbersome to compute constituent matrices E
i
of P,
we could employ the so-called regular property.
Denition 15. Markov chains are regular if there exists m N such
that
P
(m)
= P
m
> 0
(i.e. every matrix entry is positive).
In summary, in a DTMC M that have more than two states, we have 4
cases:
Fact 2.2.
1. M has irreducible, positive recurrent, but periodic states. The
component
i
of the stationary distribution vector must be un-
derstood as the long-run proportion of time that the process is in
state i.
40 CHAPTER 2. MARKOV CHAINS & MODELING
2. M has several closed, positive recurrent classes. In this case, the
transition matrix of the DTMC takes the block form.
In contrast to the irreducible ergodic DTMC, where the limiting
distribution is independent of the initial state, the DTMC with sev-
eral closed, positive recurrent classes has the limiting distribution
that is dependent on the initial state.
3. M has both recurrent and transient classes. In this situation, we
often seek the probabilities that the chain is eventually absorbed
by dierent recurrent classes. See the well-known gamblers ruin
problem.
4. M is an irreducible DTMC with null recurrent or transient states.
This case is only possible when the state space is innite, since any
nite-state, irreducible DTMC must be positive recurrent. In this
case, neither the limiting distribution nor the stationary distribu-
tion exists.
A well-known example of this case is the random walk model.
_
0 0.5 0.5
1 0 0
1 0 0
_
_
;
Show that state 0 is periodic with period 2.
2.5. THEORY OF STOCHASTIC MATRIX FOR MC 41
Practical Problem 4 (The Gamblers Ruin problem). Let two gam-
blers, A and B, initially have k dollars and m dollars, respectively. Sup-
pose that at each round of their game, A wins one dollar from B with
probability p and loses one dollar to B with probability q = 1 p. As-
sume that A and B play until one of them has no money left. Let X
n
be
As capital after round n, where n = 0, 1, 2, and X
0
= k.
(a) Show that X(n) = {X
n
, n 0} is a Markov chain with absorbing
states.
(b) Find its transition probability matrix P. Realize P when p = q =
1/2 and N = 4
(c*) What is the probability of As losing all his money?
2.5 Theory of stochastic matrix for MC
A stochastic matrix is a matrix for which each column sum equals one.
If the row sums also equal one, the matrix is called doubly stochastic.
Hence the transition probability matrix P = [p
ij
] is a stochastic matrix.
Proposition 16. Every stochastic matrix K has
1 as an eigenvalue (possibly with multiple), and
none of the eigenvalues exceeds 1 in absolute value, that is all eigen-
values
i
satisfy |
i
| 1.
Proof.
42 CHAPTER 2. MARKOV CHAINS & MODELING
The spectral radius (K) of any square K is dened as
(K) = max
i
{ eigen values
i
}.
When K is stochastic, (K) = 1. Note that if P is a transition matrix
for a nite-state Markov chain, (then P is stochastic) the multiplicity
of the eigenvalue (K) = 1 is equal to the number of recurrent classes
associated with P .
Fact 2.3. If K is a stochastic matrix then K
m
is a stochastic matrix.
Proof. Let e = [1, 1, , 1]
t
the all-one vector, then use the fact that
Ke = e. Prove that K
m
e = e.
Let A = [a
ij
] > 0 denote that every element a
ij
of A satises the
condition a
ij
> 0.
Denition 17.
A stochastic matrix P = [p
ij
] is ergodic if lim
m
P
m
= L (say)
exists, that is each p
(m)
ij
has a limit when m .
A stochastic matrix P is regular if there exists a natural m such
that P
m
> 0. In our context, a Markov chain, with transition
probability matrix P, is called regular if there exists an m > 0 such
that P
m
> 0, i.e. there is a nite positive integer m such that after
m time-steps, every state has a nonzero chance of being occupied,
no matter what the initial state.
Example 2.3. Is the matrix
P =
_
_
0.88 0.12
0.15 0.85
_
_
regular? ergodic? Calculate the limit matrix L = lim
m
P
m
.
2.5. THEORY OF STOCHASTIC MATRIX FOR MC 43
The limit matrix L = lim
m
P
m
practically shows the longterm be-
haviour (distribution, property) of the process. How to know the exis-
tence L (i.e. the ergodicity of transition matrix P = [p
ij
])?
Theorem 18. A stochastic matrix P = [p
ij
] is ergodic if and only if
* the only eigenvalue of modul (magnitude) 1 is 1 itself, and
* if = 1 has multiplicity k, there exist k independent eigenvectors
associated with this eigenvalue.
For a regular homogeneous Markov chain we have the following theorem
Theorem 19. [Regularity of stochastic matrix] If a stochastic matrix
P = [p
ij
] is regular then
1. 1 is an eigenvalue of multiplicity one, and all other eigenvalues
i
satisfy |
i
| < 1;
2. P is ergodic, that is lim
m
P
m
= L exists. Furthermore, Ls rows
are identical and equal to the stationary distribution p
.
Proof. If (1) is proved then, by Theorem 18, P = [p
ij
] is ergodic. Hence,
when P = [p
ij
] is regular, the limit matrix L = lim
m
P
m
does exist.
By the Spectral Decomposition (2.6.1),
P = E
1
+
2
E
2
+ +
k
E
k
, where all |
i
| < 1, i = 2, , k.
Then, by (2.6.2)
L = lim
m
P
m
= lim
m
(E
1
+
m
2
E
2
+ +
m
k
E
k
) = E
1
.
Let vector p
P = p
(P 1I) = 0,
44 CHAPTER 2. MARKOV CHAINS & MODELING
(p
i.e.: L = [p
, , p
.
Corollary 20. Few important remarks are: (a) for regular MC, the
long-term behavior does not depend on the initial state distribution prob-
abilities p(0); (b) in general, the limiting distributions are inuenced by
the initial distributions p(0), whenever the stochastic matrix P = [p
ij
] is
ergodic but not regular. (See more at problem D).
Example 2.4. Consider a Markov chain with two states and transition
probability matrix
_
_
3/4 1/4
1/2 1/2
_
_
(a) Find the stationary distribution p of the chain. (b) Find lim
n
P
n
by rst evaluating P
n
. (c) Find lim
n
P
n
.
2.6. SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES45
2.6 Spectral Theorem for Diagonalizable
Matrices
Consider a square matrix P of order s with spectrum(P) = {
1
,
2
, ,
k
}
consisting of its eigenvalues. Then:
If {(
1
, x
1
), (
2
, x
2
), , (
k
, x
k
)} are eigenpairs for P, then S =
{x
1
, , x
k
} is a linearly independent set. If B
i
is a basis for the
null space N(P
i
I), then B = B
1
B
2
B
k
is a linearly
independent set
P is diagonalizable if and only if P possesses a complete set of
eigenvectors (i.e. a set of s linearly independent vectors). More-
over, H
1
PH = D = (
1
,
2
, ,
s
) if and only if the columns
of H constitute a complete set of eigenvectors and the
j
s are the
associated eigenvalues- i.e., each (
j
, H[, j]) is an eigenpair for P.
Spectral Theorem for Diagonalizable Matrices. A square matrix
P of order s with spectrum (P) = {
1
,
2
, ,
k
} consisting of eigen-
values is diagonalizable if and only if there exist constituent matrices
{E
1
, E
2
, , E
k
} (called the spectral set) such that
P =
1
E
1
+
2
E
2
+ +
k
E
k
, (2.6.1)
where the E
i
s have the following properties:
E
i
E
j
= 0 whenever i = j, and E
2
i
= E
i
for all i = 1..k
E
1
+E
2
+ +E
k
= I
46 CHAPTER 2. MARKOV CHAINS & MODELING
In practice we employ Fact 2.6.1 in two ways:
Way 1: if we know the decomposition 2.6.1 explicitly, then we can
compute powers
P
m
=
m
1
E
1
+
m
2
E
2
+ +
m
k
E
k
, for any integer m > 0. (2.6.2)
Way 2: if we know P is diagonalizable then we nd the constituent
matrices E
i
by:
* nding the nonsingular matrix H = (x
1
|x
2
| |x
k
), where each x
i
is a basis left eigenvector of the null subspace
N(P
i
I) = {v : (P
i
I)(v) = 0 Pv =
i
v};
** then, P = HDH
1
= (x
1
|x
2
| |x
k
) D H
1
where
D = diag(
1
, ,
k
) the diagonal matrix, and
H
1
= K
=
_
_
y
t
1
y
t
2
.
.
.
y
t
k
_
_
; (i.e.K = (y
1
|y
2
| |y
k
)).
Here each y
i
is a basis right eigenvector of the null subspace
N(P
i
I) = {v : v
P =
i
v
}.
The constituent matrices E
i
= x
i
y
t
i
.
Example 2.5. Diagonalize the following matrix and provide its spectral
decomposition.
P =
_
_
1 4 4
8 11 8
8 8 5
_
_
.
2.6. SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES47
The characteristic equation is
p() = det(P I) =
3
+ 5
2
+ 3 9 = 0.
So = 1 is a simple eigenvalue, and = 3 is repeated twice (its
algebraic multiplicity is 2). Any set of vectors x satisfying
x N(P I) (P I)x = 0
can be taken as a basis of the eigenspace (or null space) N(P I).
Bases of for the eigenspaces are:
N(P1I) = span
_
[1, 2, 2]
_
; and N(P+3I) = span
_
[1, 1, 0]
, [1, 0, 1]
_
.
Easy to check that these three eigenvectors x
i
form a linearly indepen-
dent set, then P is diagonalizable. The nonsingular matrix (also called
similarity transformation matrix)
H = (x
1
|x
2
|x
3
) =
_
_
1 1 1
2 1 0
2 0 1
_
_
;
will diagonalize P, and since P = HDH
1
we have
H
1
PH = D = (
1
,
2
,
2
) = (1, 3, 3) =
_
_
1 0 0
0 3 0
0 0 3
_
_
Here, H
1
=
_
_
1 1 1
2 3 2
2 2 1
_
_
implies that
48 CHAPTER 2. MARKOV CHAINS & MODELING
y
t
1
= [1, 1, 1], y
t
2
= [2, 3, 2], y
t
3
= [2, 2, 1]. Therefore, the con-
stituent matrices
E
1
= x
1
y
t
1
=
_
_
1 1 1
2 2 2
2 2 2
_
_
; E
2
= x
2
y
t
2
=
_
_
2 3 2
2 3 2
0 0 0
_
_
; E
3
= x
3
y
t
3
=
_
_
2 2 1
0 0 0
2 2 1
_
_
.
Obviously,
P =
1
E
1
+
2
E
2
+
3
E
3
=
_
_
1 4 4
8 11 8
8 8 5
_
_
.
2.7 Markov Chains with Absorbing States
2.7.1 Theory
Two quetions:
/ if there are at least two absorbing states, what is the probability
that a specic absorbing state is the one eventually entered?
/ what is the mean time until an absorbing state is eventually en-
tered?
Question . The probability that a specic absorbing state is the one
eventually entered.
Theorem 21. Consider a Markov chain X(n) = {X
n
, n 0} with nite
state space E = {1, 2, , N} and transition probability matrix P. Let
A = {1, , m} be the set of absorbing states and B = {m + 1, , N}
be a set of nonabsorbing states.
2.7. MARKOV CHAINS WITH ABSORBING STATES 49
Then the transition probability matrix P can be expressed as
P =
_
I O
R T
_
where
I is mm identity matrix,
0 is an m(N m) zero matrix,
the elements of R are the one-step transition probabilities from non-
absorbing to absorbing states, and
the elements of T are the one-step transition probabilities among
the nonabsorbing states.
Let U = [u
k,j
] be an (N m) m matrix and its elements are the
absorption probabilities for the various absorbing states,
u
k,j
= P[X
n
= j( A)|X
0
= k( B)]
We have
U = (I Q)
1
R = R,
is called the fundamental matrix of the Markov chain X(n).
Hint: Form the power
P
n
=
_
I O
U
n
T
n
_
where U
n
= f(R, T) a matrix expression of R and T. Then to prove that
lim
n
T
n
= O
50 CHAPTER 2. MARKOV CHAINS & MODELING
we could equivalently check that absorption of X(n) in one or another
of the absorbing states is certain. Formally, you could prove
Lemma 22.
lim
n
P[X
n
B] = 0 or lim
n
P[X
n
A] = 1.
i=m+1
[k, i]
where [k, i] the (k, i)th element of the fundamental matrix .
Proof. Let
W = [n
j,k
] ,where n
j,k
is the number of times the state k( B)
is occupied until absorption takes place when X
n
starts in state j( B).
Then
T
j
=
N
k=m+1
n
j,k
,
then calculate E(n
j,k
).
2.7. MARKOV CHAINS WITH ABSORBING STATES 51
Example 2.6. Consider a simple random walk X(n) with absorbing bar-
riers at state 0 and state N = 3 = m
A
+ m
B
as in the Gamblers Ruin
problem; where m
A
= 2USD is A capital and m
B
= 1USD is B capital
at round 0. Can you write out
a/ the transition probability matrix P, known that p = P[ A wins ] in
each round, where 0 < p < 1;
b/ the probabilities of absorption into states 0 and 3;
c/ the expected time (or steps) to absorption when X
0
= 1 and when
X
0
= 2.
Hint: the fundamental matrix of the Markov chain X(n)
= (I Q)
1
=
1
1 pq
_
_
1 p
q 1
_
_
The matrix of absorption probabilities from the various non-absorbing
states
U =
_
_
u
1,0
u
1,3
u
2,0
u
2,3
_
_
= R =
1
1 pq
_
_
q p
2
q
2
p
_
_
Example 2.7 (Alignment of two DNA words).
52 CHAPTER 2. MARKOV CHAINS & MODELING
2.8 Chapter Review and Discussion
Application in Large Deviation theory. We are interested in a
practical situation in insurance industry, originally realized from 1932
by F. Esscher, (Notices of AMS, Feb 2008).
Problem: too many claims could be made against the insurance com-
pany, we worry about the total claim amount exceeding the reserve fund
set aside for paying these claims.
Our aim: to compute the probability of this event.
Modeling. Each individual claim is a random variable, we assume
some distribution for it, and the total claim is then the sum S of a large
number of (independent or not) random variables. The probability that
this sum exceeds a certain reserve amount is the tail probability of the
sum S of independent random variables.
Large Deviation theory invented by Esscher requires the calculation of the
moment generating functions! If your random variables are independent
then the moment generating functions are the product of the individual
ones, but if they are not (like in a Markov chain) then there is no longer
just one moment generating function!
Research project: study Large Deviation theory to solve this problem.
i=1
Z
i
, n = 1, 2, and X
0
= 0.
The collection of r.v.s {X
n
, n 0} is a random process, and it is called
the simple random walk in one dimension.
(a) Describe the simple random walk X(n).
(b) Construct a typical sample sequence (or realization) of X(n).
(c) Find the probability that X(n) = 2 after four steps.
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 57
(d) Verify the result of part (a) by enumerating all possible sample
sequences that lead to the value X(n) = 2 after four steps.
(e) Find the mean and variance of the simple random walk X(n). Find
the autocorrelation function R
X
(n, m) of the simple random walk
X(n).
(f) Show that the simple random walk X(n) is a Markov chain.
(g) Find its one-step transition probabilities.
(h) Derive the rst-order probability distribution of the random walk
X(n).
Solution.
(a) Describe the simple random walk. X(n) is a discrete-parameter (or
time), discrete-state random process. The state space is E = {..., 2, 1, 0, 1, 2, ...},
and the index parameter set is T = {0, 1, 2, ...}.
(b) Typical sample sequence. A sample sequence x(n) of a simple ran-
dom walk X(n) can be produced by tossing a coin every second and
letting x(n) increase by unity if a head H appears and decrease by unity
if a tail T appears. Thus, for instance, we have a small realization of
X(n) in Table 3.2:
The sample sequence x(n) obtained above is plotted in (n, x(n))-plane.
The simple random walk X(n) specied in this problem is said to be
unrestricted because there are no bounds on the possible values of X.
The simple random walk process is often used in Game Theory or
Biomatics.
58 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
n 0 1 2 3 4 5 6 7 8 9 10
Coin tossing H T T H H H T H H T
x
n
0 1 0 - 1 0 1 2 1 2 3 2
Table 3.1: Simple random walk from Coin tossing
Remark 3.1. We dene the ladder points to be the points in the walk
lower than any previously reached point. An excursion in a walk is the
part of the walk from a ladder point to the highest point attained before
the next ladder point.
BLAST theory focus on the maximum heights achieved by theses
excursions.
(c) The probability that X(n) = 2 after four steps.
We compute the rst-order probability distribution of the random walk
X(n):
p
n
(k) = P(X
n
= k), with boundary conditions p
0
(0) = 1, and p
n
(k) = 0 if n < |k|.
Thus n |k|. We nd that
p
n
(k) =
_
n
(n +k)/2
_
p
(n+k)/2
q
(n+k)/2
, where q = 1 p; (3.2.1)
by letting A, B to be the r.v.s indicating the number of +1 and -1, and
as a result
A +B = n, A B = X
n
.
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 59
When X(n) = k, we see that
A = (n +k)/2,
which is a binomial r.v. with parameters (n, p).
Conclude: the probability distribution of X(n) is given by 3.2.1, in which
n |k|, and n, k must be both even or odd.
Set k = 2 and n = 4 in 3.2.1 to get the concerned probability p
4
(2)
that X(4) = 2
(d) Verify the result of part (a) by enumerating all possible sample se-
quences that lead to the value X(n) = 2 after four steps. DIY!
(e) The mean and variance of the simple random walk X(n). Use the
fact
P(Z
n
= +1) = p and P(Z
n
= 1) = 1 p.
4.1 Introduction
In Stochastic processes, we are interested in few distinct properties:
(a) the dependencies in the sequence of values generated by the
process. For example, how do future prices of a stock depend on
past values?
(b) long-term averages, involving the entire se- quence of generated
values. For example, what is the fraction of time that a machine
is idle?
(c) the likelihood or frequency of certain boundary events. For
example, what is the probability that within a given hour all cir-
cuits of some telephone system become simultaneously busy?
63
64 CHAPTER 4. ARRIVAL-TYPE PROCESSES
In this chapter, we will discuss the rst major category of stochastic
processes, Arrival-Type Processes. We are interested in occurrences that
have the character of an arrival, such as
- message receptions at a receiver,
- job completions in a manufacturing cell,
- customer purchases at a store, etc.
We will focus on models in which the interarrival times (the times be-
tween successive arrivals) are independent random variables.
First, we consider the case where arrivals occur in discrete time
and the interarrival times are geometrically distributed this is the
Bernoulli process.
Then we consider the case where arrivals occur in continuous time
and the interarrival times are exponentially distributed this is the
Poisson process.
4.2 The Bernoulli process
4.2.1 Basic facts
The Bernoulli process can be visualized as a sequence of independent
coin tosses, where the probability of heads in each toss is a xed number
p in the range 0 < p < 1. In general, the Bernoulli process consists of
a sequence of Bernoulli trials, where each trial produces
a 1 (a success) with probability p, and
a 0 (a failure) with probability 1 p, independently of what happens
in other trials.
4.2. THE BERNOULLI PROCESS 65
There are many realizations of Bernoulli process. Coin tossing is just a
paradigm involving a sequence of independent binary outcomes. The se-
quence Z
1
, Z
2
, of independent identically distributed r.v.s in Section
3 is another paradigm for the same phenomenon.
In practice, a Bernoulli process is often used to model systems involving
arrivals of customers or jobs at service centers. Here, time is discretized
into periods, and a success at the k-th trial is associated with the arrival
of at least one customer at the service center during the k-th period. In
fact, we will often use the term arrival in place of success when this is
justied by the context.
Given an arrival process, one is often interested in random variables such
as the number of arrivals within a certain time period, or the time until
the rst arrival. For the case of a Bernoulli process, some answers are
already available from earlier chapters. Here is a summary of the main
facts.
Bernoulli Distribution B(p) describes a random variable that can take
only two possible values, i.e. X = {0, 1}. The distribution is described
by a probability function
p(1) = P(X = 1) = p, p(0) = P(X = 0) = 1 p for some p [0, 1].
It is easy to check that E(X) = p, Var(X) = p(1 p).
66 CHAPTER 4. ARRIVAL-TYPE PROCESSES
4.2.2 Random Variables Associated with the
Bernoulli Process
Binomial distribution B(n, p). This distribution describes a random
variable X that is a number of successes in n independent Bernoulli trials
with probability of success p.
x
x!
x = 0, 1, 2, ... (4.3.1)
where
x = designated number of successes, e = 2.71 the natural base,
4.3. THE POISSON PROCESS 67
> 0 a constant = the average number of successes per unit of time
period
The Poisson distributions mean and the variance are
= ;
2
= = .
Example 4.1 (Poisson distribution usage). We often model the number
of defects or non-conformities that occur in a unit of product (unit area,
volume, and most frequently unit of time...) say, a semiconductor device,
by a Poisson distribution. The number of wire-bonding defects per unit
X is Poisson distributed with parameter = 4. Compute the probability
that a randomly selected semiconductor device will contain two or fewers
wire-bonding defects.
This probability is
(x 2) = p(0) +p(1) +p(2) =
2
x=0
e
x
x!
= 0.2381.
4.3.2 Poisson process
The Poisson process can be viewed as a continuous-time analog of the
Bernoulli process and applies to situations where there is no natural way
of dividing time into discrete periods. We consider an arrival process
that evolves in continuous time, in the sense that any real number t is a
possible arrival time.
Denition 24. A counting process X(t) is said to be a Poisson (count-
ing) process with positive rate (or intensity) if
X(0) = 0, and X(t) has independent increments.
68 CHAPTER 4. ARRIVAL-TYPE PROCESSES
The number of events in any interval of length t is Poisson dis-
tributed with mean t; that is, for all s, t > 0,
P[X(t +s) X(s) = n] =
e
t
(t)
n
n!
n = 0, 1, 2, ... (4.3.2)
4.4 Course Review and Discussion
Practical Problem 6.
1. Prove that a Poisson process X(t) with positive rate has station-
ary increments, and
E[X(t)] = t, Var[X(t)] = t.
2. Practice. Patients arrive at the doctors oce according to a Pois-
son process with rate = 1/10 minute. The doctor will not see a
patient until at least three patients are in the waiting room.
a/ Find the expected waiting time until the rst patient is admitted
to see the doctor.
b/ What is the probability that nobody is admitted to see the doctor
in the rst hour?
Theorem 25. If every eigenvalue of a matrix P yields linearly indepen-
dent left eigenvectors in number equal to its multuiplicity, then
1. there exists a nonsingular matrix M whose rows are left eigenvec-
tors of P, such that
4.4. COURSE REVIEW AND DISCUSSION 69
2. D = MPM
1
is a diagonal matrix with diagonal elements are the
eigenvalues of P, repeated according to multiplicity.
Practical Problem 7 (MC for Business Intelligence). Consider a case
study of mobile phone industry in VN. Due to a most recent survey,
there are four big mobile producers/sellers N, S, M and L, and their
market distributions in 2007 is given by the stochastic matrix:
P =
_
_
N M L S
N 1 0 0 0
M 0.4 0 0.6 0
L 0.2 0 0.1 0.7
S 0 0 0 1
_
_
Is P regular? ergodic?
Find the long term distribution matrix L = lim
m
P
m
.
What is your conclusion?
(Remark that the state N and S are called absorpting states).
70 CHAPTER 4. ARRIVAL-TYPE PROCESSES
Chapter 5
Probability Modeling and
Mathematical Finance
Probability modeling in nance provides instruments to rationalize the
unknown by imbedding it into a coherent framework. Three key com-
ponents should be distinguished: randomness, uncertainty and chaos.
Kolmogorov dened randomness in terms of non-uniqueness and non-
regularity (as a die with six faces or the expansion of ). Kalman dened
chaos as randomness without probability.
Few areas that employ much probability modeling include: weather fore-
casting, biology and nancial forecasting. In general, in order to model
uncertainty we seek to distinguish the known from the unknown and nd
some mechanisms (theories, intuition, common sense...) to reconcile our
knownledge with our lack of it.
71
72CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
5.1 Martingales
5.1.1 History
Girolamo Cardano in his book The Book of Game of Chance in 1565
proposed the notion of fair game. He stated: The most fundamental
priciple of all in gambling is simply equal conditions, ... . This is the
essence of the Martingale, however until 1900, in Bacheliers thesis that
a mathematical model of a fair game- or martingale- was proposed.
Nowadays, we understand the concept of a fair game or martingale, in
money terms, states that the expected prot at a given time given the
total past capital is null with probability one.
Throughout this chapter we assume that (, F, P) is a xed probability
space, where
is a sample space representing the set of all possible outcomes,
F is a -algebra of subsets of representing the events to which
we can assign probabilities, and
P is a probability measure on (, F).
The expectation with respect to P will be denoted by E[.].
5.1.2 Conditional expectation
Let X and Z be two r.vs on the same (, F, P)-space. Suppose X has
range {x
1
, x
2
, . . . , x
m
} and Z has range {z
1
, z
2
, . . . , z
n
}. We know that
P[X = x
i
|Z = z
j
] :=
P[X = x
i
, Z = z
j
]
P[Z = z
j
]
5.1. MARTINGALES 73
and also
E[X|Z = z
j
] =
i
x
i
P[X = x
i
|Z = z
j
].
Denition 26. The random variable Y = E[X|Z], the conditional ex-
pectation of X given Z, is dened as follows:
(a) if Z() = z
j
, then Y () := E[X|Z = z
j
] =: y
j
(say).
Justication. In this way we could do partitioning the space into
Z-atoms Z = z
j
, on which Z is constant. The -algebra G = (Z)
generated by Z consists of sets {Z B}, B B, the Borel set. Therefore
G = (Z) consists precisely of the 2
n
possible unions of the n Z-atoms.
Note from (a) that Y is constant on Z-atoms, so better we say
(b) Y is G measurable.
Theorem 27 (Kolmogorov 1933). Let (, F, P) be a probability space
and X a random variable with E[|X|] < . Let G be sub--algebra of
F. Then there exists a random variable Y such that
a) Y is G-measurable,
b) E[|Y |] < ,
c) for every G G we have
_
G
Y dP =
_
G
XdP, G G.
74CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
Moreover, if Y
1
is another random variable with these properties then
Y
1
= Y almost surely (a.s.), that is P[Y
1
= Y ] = 1.
A random variable Y with properties a)-c) is called a version of the
conditional expectation E[X|G] of X given G, and we write Y = E[X|G]
a.s.
Proof. Since G is generated by Z, or any G G is a union of the n
Z-atoms, so we rst prove that
_
Z=z
j
Y dP = y
j
P[Z = z
j
] = ... =
_
Z=z
j
XdP.
Write G
j
= {Z = z
j
} then this equation means E[Y I
G
j
] = E[X I
G
j
]...
Note 5.1. We often write
E[X|Z] for E[X|G] = E[X|(Z)]; and
E[X|Z
1
, Z
2
, . . .] for E[X|(Z
1
, Z
2
, . . .)].
Fact 5.2. if U is a non-negative bounded r.v., then
E[U|G] 0, a.s.
5.1.3 Key properties of Conditional expectation
See textbook.
5.1. MARTINGALES 75
5.1.4 Filtration
A ltration is a family {F
t
, t = 0, 1, . . . , T} of sub--algebras indexed by
t = 0, 1, . . . , T such that
F
0
F
1
F
2
. . . F
T
;
that is the family is increasing with time. Intuitively, for each t =
0, 1, . . . , T, the -algebra F
t
tells us which events may be observed by
time t.
If the sample space is a nite set, often the -algebra F
0
is trivial,
consisting simply of the empty set and the whole sample space . We
also often write just {F
t
} instead of the lengthy {F
t
, t = 0, 1, . . . , T},
and can assume that F
T
= F (since shall be considering only random
variables that are F
T
-measurable).
Denition 28. We call the quadruple (, F, {F
t
}, P) a ltered probabil-
ity space.
t
i=0
Z
i
with i.i.d. absolute
increments Z
t
= X
t
X
t1
, a geometric random walk {X
t
; t 0} is
assumed to have i.i.d. relative increments
R
t
=
X
t
X
t1
for t = 1, 2, . . .
For a specic case, the geometric binomial random walk
X
t
= R
t
X
t1
= X
0
k=1
tR
k
where X
0
, R
1
, R
2
, . . . are mutually independent, each R
k
is Bernoulli, and
for u > 1 (up), d < 1 (down):
P(R
k
= u) = p, P(R
k
= d) = 1 p, 0 < p < 1.
80CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
We obtain the expectation E(R
k
) = (u d)p +d for any k = 1, 2, . . .
If E(R
k
) = 1 then the process is on average stable, which is the case for
p =
1 d
u d
.
Example 5.3. Let the stock price S
t
be dened in terms of a Bernoulli
event. That mean the stock prices only can either increase/grow or de-
crease/fall from period to period following a stable geometric binomial
random walk.
Hence S
t
changes at rates u > 1 (up) and d < 1 (down) with proba-
bilities
S
t
=
_
_
_
uS
t1
with probability p =
1d
ud
dS
t1
with probability 1 p =
u1
ud
(5.1.1)
Proposition 29. The stock price process S
t
is a martingale.
Proof. We rst have to show that it satises condition 1., namely E[|S
t
|] <
. Indeed, due to the independence assumption
E[S
t
] = E[S
0
](E[R
1
])
t
= E[S
0
][(u d)p +d]
t
< .
Next it is a constant mean process with
E[S
t+1
|S
t
, S
t1
, . . . , S
0
] = E[S
t+1
|S
t
] = S
t
since
E[S
t+1
|S
t
] = uS
t
p +dS
t
(1 p) = S
t
[up +d(1 p)] = S
t
.
5.1. MARTINGALES 81
Example 5.4. Product of non-negative independent r.vs of mean
1. Let X
1
, X
2
, . . . be a sequence of independent non-negative r.vs with
E[X
n
] = 1n.
Dene M
0
= 0, F
0
= {, } and
M
n
:= X
1
X
2
X
3
. . . X
n
, F
n
:= (X
1
, X
2
, X
3
, . . . , X
n
).
The process M is a martingale. (Why?)
5.1.7 Stopping time
Denition 30. A (discrete) stopping time is a function : {0, 1, . . . , T}
{}
such that
{ = t} F
t
for t = 0, 1, . . . , T . . . ()
Obviously for such a stopping time we see:
{ = } = \(
T
_
t=0
{ = t}) F
T
.
For convenience we dene F
= F
T
, and then () also holds with
t = .
Justication. Intuitively is a time when you can decide to stop
playing our game. Whether or not you stop immediately after the n-
th game depends only on the history up to (and including) time n:
{ = n} = { : () = n} F
n
.
82CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
Fact 5.4. With any (discrete) stopping time , there is a -algebra de-
ned by
F
= {A F
: A { = t} F
t
for t = 0, 1, . . . , T}.
Lemma 31. If and are two stopping times, then
= min(, ), and = max(, )
both also are stopping times.
5.2. STOCHASTIC CALCULUS 83
5.2 Stochastic Calculus
Our basic assumption is, we do not know and can not predict tomorrows
values of asset prices. The past history of the asset value is there as a
nancial time series for us to examine as much as we want, but we can
not use it to forecast the next move that the asset will make. This does
not mean that it tells us nothing.
We know what are their mean and variance and, generally, what
is thelikely distribution of future asset prices. These qualities must be
determined by a statistical analysis of of historical data.
5.2.1 A Simple Model for Asset Prices
Now suppose that at time t the asset price is S. Let us consider a small
subsequent time interval dr, during which S changestoS +dS.
Drift is a measure of the average rate of growth of the asset price,
Volatility measures the standard deviation of the returns. It is
represented by a random sample drawn from a normal distribution with
mean zero and adds a term dX to the the corresponding return on the
asset, dS/S:
dS
S
= dt + dX
5.2.2 Stochastic dierential equation
Stochastic Integral. In order to introduce a stochastic process as a
solution of a stochastic dierential equation we introduce the concept of
the Itointegral: a stochastic integral with respect to a Wiener process.
84CHAPTER 5. PROBABILITYMODELINGANDMATHEMATICAL FINANCE
Formally the construction of the Itointegral is similar to the Stieltjesin-
tegral. However, instead of integrating with respect to a deterministic
function (Stieltjesintegral), the Itointegral integrates with respect to a
random function, more precisely, the path of a Wiener process. Since
the integrant itself can be random, i.e. it can be a path of a stochastic
process, one has to analyze the mutual dependencies of the integrant and
the Wiener process.
Chapter 6
Part III: Practical
Applications of SP
First we discuss Statistical Parameter Estimation and its role in
industry, engineering and service... through specic models.
6.1 Statistical Parameter Estimation
Practical Problem 8 (The Brand switching model of Problem 5 of
Section ??).
Our question now is: how to nd the transition matrix P from sample
surveys? We dene [due to Whitaker (1978)]
brand loyalty as the proportion of consumers who repurchase a
brand on the next occasion without persuasion,
and purchasing pressure as the proportion of consumers who are
persuaded to purchase a brand on the next occasion.
85
86 CHAPTER 6. PART III: PRACTICAL APPLICATIONS OF SP
Denote w
i
and d
i
, respectively, as the values of brand loyalty and
brand pressure for brand i, where both w
i
and d
i
are between 0 and 1
and
i
d
i
= 1. To illustrate, consider the following three-brand case:
Brand 1 Brand 2 Brand 3
Brand loyalty w
i
0.3 0.6 0.9
Purchasing pressure d
i
0.3 0.5 0.2
Could we give a formula to compute brand switching probabilities p
i,j
(i.e. transition probabilities) in terms of w
i
and d
i
?
Practical Problem 9 (A model of social mobility).
A problem in the study of social structure is about the transitions be-
tween the social status of successive generations in a family. Sociologists
often assume that the social class of a son depends only on his parents
social class, but not on his grandparents.
Glass (1954, UK) identied three social classes:
- upper class (executive, managerial, high administrative, profes-
sional),
- middle class (high grade supervisor, non-manual, skilled manual),
and
- lower class (semi-skilled or unskilled).
Each family in the society occupies one of the three social classes, and its
occupation evolves across dierent generations. Glass analyzed a random
6.2. INVENTORY CONTROL IN LOGISTICS 87
sample of 3500 male residents in UK and estimated that the transitions
between the social classes of successive generations in a family were as
the following:
Following generation
Current generation Upper class Middle class Lower class
Upper class 0.45 0.48 0.07
Middle class 0.05 0.70 0.25
Lower class 0.01 0.50 0.49
Model this social mobility by a DTMC, prove that it is irreducible
and ergodic.
What is the distribution of a familys occupation in the long run?
How many generations are necessary for a lower-class family to
become a higher-class family?
6.2 Inventory Control in Logistics
Current need for Logistics Optimization. Inventory management
is one of the oldest and most studied areas of Operation Research, in
88 CHAPTER 6. PART III: PRACTICAL APPLICATIONS OF SP
particular, it is useful for Logistics Industry at developed nations but
less natural resources. Logistics plays a key role in Import and Export
activities, as witnessed by the cargo stagnancy in ports of HCMC and
Southern Vietnam recently.
Steps of the investigation
A. Pre-graduate research period
a/ Learn basic concepts of Economic Order Quantity (ECQ) and
Economic Production Quantity (EPQ) of the problem. Take
the course Stochastic Processes at AM section, HCMUT.
b/ Algebraically model the simple case: Predict uncertain
demand in a single period At the same time, visit a manufac-
turing or sale rm in HCMC, for industrial internship, to see
how inventory matter in those sectors, in 2-4 weeks.
c/ Design an application package that allows manuauver inven-
tory events. Study a computing machinary such as Singular,
Matlab, R or OpenModelica.
B. Graduate research period
d/ Investigate more complex cases in Inventory Control
e/ Learn approriate algorithms and implement them
f/ Implement a pilot application package with basic functionality
allowing prediction of small cases.
6.3. EPIDEMIC PROCESSES 89
6.3 Epidemic processes