Markov Chains and Stochastic Processes (Part I

Stochastic processes and
Markov chains (part I)
Wessel van Wieringen

w n van wieringen@vu nl
w.n.van.wieringen@vu.nl
Department of Epidemiology and Biostatistics, VUmc

& Department of Mathematics,
Mathematics VU University
Amsterdam, The Netherlands
Stochastic processes
Example 1
• The intensity of the sun.
• Measured every day by
the KNMI.
• Stochastic variable Xt
represents the sun’s
intensity at day t, 0 ≤ t ≤
T. Hence, Xt assumes
values in R+ (positive
values only)
only).
Example 2
• DNA sequence of 11 bases long.
• At each base position there is an
A, C, G or T.
• Stochastic variable Xi is the base
at position i, i = 1,…,11.
• In case the sequence
q has been
observed, say:
(x1, x2, …, x11) = ACCCGATAGCT,
then A is the realization of X1, C
that of X2, et cetera.
position … t t+1 t+2 t+3 …
base … A A T C …
… A A A A …
… G G G G …
… C C C C …
… T T T T …
position position position position

t t+1 t+2 t+3
Example 3
• A patient’s heart pulse during surgery.
• Measured continuously during interval [0 [0, T].
T]
• Stochastic variable Xt represents the occurrence of
a heartbeat at time t, 0 ≤ t ≤ T. Hence, Xt assumes
only the values 0 (no heartbeat) and 1 (heartbeat).
Example 4
• Brain activity of a human
under experimental
conditions.
• Measured continuously
during interval [0, T].
• Stochastic variable Xt
represents the magnetic
field at time t, 0 ≤ t ≤ T.
Hence, Xt assumes values
on R.
Differences between examples
Xt
Discrete Continuous
te
Discret
Example 2 Example 1
D
Time
ntinuous
Example 3 Example 4
Con
The state space S is the collection of all possible values

that the random variables of the stochastic process may
assume.
If S = {{E1, E2, …,, Es} discrete,, then Xt is a discrete

stochastic variable.
→ examples 2 and 3.
If S = [0, ∞)) discrete, then Xt is a continuous stochastic

variable.
→ examplesp 1 and 4.
Stochastic process:
• Discrete time: t = 0,
0 11, 2
2, 3
3, ….
• S = {-3, -2, -1, 0, 1, 2, …..}
• P(one
( stepp up)
p) = ½ = P(one
( step
p down))
• Absorbing state at -3
7
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
-2
-3
-4
The first passage time of a certain state Ei in S is the
time t at which Xt = Ei for the first time since the start
of the process.
7
What is the first p
passage
g
6 time of 2? And of -1?
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
-2
-3
-4
The absorbing state is a state Ei for which the
following holds: if Xt = Ei than Xt+1
t 1 = Ei for all s ≥ tt.
The process will never leave state Ei.
7
6 The absorbing state is -3.
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
-2
-3
-4
The time of absorption of an absorbing state is the
first passage time of that state
state.
7
6 The time of absorption is 17.
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
-2
-3
-4
The recurrence time is the first time t at which the
process has returned to its initial state
state.
7
6 The recurrence time is 14.
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
-2
-3
-4
A stochastic process is described by a collection of

time points, the state space and the simultaneous
distribution of the variables Xt, i.e., the distributions of
all Xt and their dependency.
dependency
There are two important types of processes:

• Poisson process: all variables are identically and
independently distributed. Examples: queues for
counters, call centers, servers, et cetera.
• Markov process: the variables are dependent in a
simple
p manner.
Markov processes
Markov processes
A 1st order Markov process in discrete time is a sto-
chastic process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt).
In other words, only the present determines the future,
the past is irrelevant.
From every 7
state with 6
5
probability 0
0.55 4
3
one step up or
2
1
0
down (except -1
-2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
for state -3). -3

-4
Markov processes
The 1st order Markov property, i.e.
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt),

does not imply independence between Xt-1 and Xt+1.
In fact, often P(Xt+1=xt+1 | Xt-1=xt-1) is unequal to zero.
For instance, 7
P(Xt+1=2 | Xt-1=0) = 6
5
P(Xt+1=22 | Xt=1)
1)
4
3
2
* P(Xt=1 | Xt-1=0) 1
0
= ¼.
¼ -1
-2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-3
-4
Markov processes
An m-order Markov process in discrete time is a stochastic
process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1)
= P(Xt+1=xt+1 | Xt=xt,…,Xt-m+1= xt-m+1).
Loosely, the future depends on the most recent past.
Suppose m=9:
9 preceeding
t+1
bases
… CTTCCTCAAGATGCGTCCAATCCCTCAAAC …
Distribution of the t+1-th base
depends on the 9 preceeding ones.
Markov processes
A Markov process is called a Markov chain if the state
space is discrete
discrete, ii.e.,
e is finite or countable
countable.
In these lecture series we consider Markov chains in

discrete time.
Recall the DNA

example.
Markov processes
The transition probabilities are the
P(Xt+1=xt+1 | Xt=xt),
but also the
P(Xt+1=xt+1 | Xs=xs) for s < t,
where xt+1, xt in {E1, …, Es} = S.
S
P(Xt+1=3 | Xt=2) = 0.5 6

5
4
P(Xt+1=0 | Xt=1) = 0.5 3

2
1
( t+1=5 | Xt=7)) = 0
P(X 0
-1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-2
-3
-4
Markov processes
A Markov process is called time homogeneous if the
transition probabilities are independent of t:
P(Xt+1=x1 | Xt=x2) = P(Xs+1=x1 | Xs=x2).
For example:
P(X2=-1
= 1 | X1=2) = P(X6=-1
= 1 | X5=2)
P(X584=5 | X583=4) = P(X213=5 | X212=4)
7
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-1
1
-2
-3
-4
Markov processes
Example of a time inhomogeneous process
Bathtub
h(t) curve
Infant mortality Random Wearout failures

failures
Markov processes
Consider a DNA sequence of 11 bases. Then, S={a, c,
t} Xi is the base of position i,
g t},
g, i and {Xi}i=1, …, 11 is a
Markov chain if the base of position i only depends on
the base of position ii-1,
1, and not on those before ii-1. 1.
If this is plausible, a Markov chain is an acceptable
model for base ordering in DNA sequences
sequences.
A C
G T
state diagram
Markov processes
In remainder, only time homogeneous Markov processes.
We then denote the transition probabilities of a finite time
homogeneous Markov chain in discrete time {Xt}t=1,2,… with
S {E1, …, Es} as:
S={E
P(Xt+1=Ej | Xt=Ei) = pij (does not depend on t).
Putting the pij in a matrix yields the transition matrix:
The rows of this matrix sum to one.

Markov processes
In the DNA example:
pAA pCC
where pAC
A pCA
C
pAA = P(Xt+1 = A | Xt = A) pGC
pGC
and: pGA pAG pTC pCT
pAA + pAC + pAG + pAT = 1

pGT
pCA + pCC + pCG + pCT = 1 G T
pTG
et cetera pGG pTT
Markov processes
The initial distribution π = (π1, …,πs)T gives the
probabilities
b bili i off the
h iinitial
i i l state:
πi = P(X1 = Ei) for i = 1, …, s,
and π1 + … + πs = 1.
Together
g the initial distribution π and the transition
matrix P determine the probability distribution of the
process, the Markov chain {Xt}t=1,2, …
Markov processes
We now show how the couple (π (π, P) determines the

probability distribution (transition probabilities) of time
steps
p larger
g than one.
Hereto define for

n=2 :
generall n :
Markov processes
For n = 2 :
just the definition

Markov processes
For n = 2 :
use the fact that

P(A B) + P(A,
P(A, P(A BC) = P(A)
Markov processes
For n = 2 :
use the definition of conditional probability:

P(A B) = P(A | B) P(B)
P(A,
Markov processes
For n = 2 :
use the Markov property

Markov processes
For n = 2 :
Markov processes
In summary,
summary we have shown that:
In a similar fashion we can show that:
IIn words:
d th
the ttransition
iti matrix
t i for
f n steps
t is
i the
th one-
step transition matrix raised to the power n.
Markov processes
The general case:
is proven by induction to n.
One needs the Kolmogorov-Chapman equations:
for all n, m ≥ 0 and i, j =1,…,S.

Markov processes
Proof of the Kolmogorov-Chapman equations:
Markov processes
Induction proof of
Assume
Then:
Markov processes
A numerical example:
Then:
matrix multiplication (“rows

( rows times columns
columns”))
Markov processes
Thus:
0.6490 = P(Xt+2 = I | Xt = I) is composed of two probabilities:
probability of this route plus probability of this route

I I I I I I
II II II II II II
position position position position position position

t t+1 t+2 t t+1 t+2
Markov processes
In similar fashion we may obtain:
Sum over probs. of all possible paths between 2 states:

I I I I I I
II II II II II II
pos. t pos. t+1 pos. t+2 pos. t+3 pos. t+4 pos. t+5
Similarly:
Markov process
How do we sample from a Markov process?
Reconsider the DNA example. pAA pCC
pAC
A pCA
C
A C G T pGC
pGC
pGA pAG pTC pCT
pGT
to G T
pTG
A C G T pGG pTT
f A
r C
o G
m T
Markov process
1 Sample the first base from the initial distribution π:
P(X1=A) = 0.45, P(X1=C) = 0.10, et cetera
Suppose we have drawn an A. Our DNA sequence
now consists of one base, namely A.
After the first step,
step our sequence looks like:
A
2 Sample the next base using P given the previous base.

Here X1=A, thus we sample from:
P(X2=A
A | X1=A)
A) = 0
0.1,
1 P(X2=CC | X1=A)
A) = 0.1,
0 1 ett cetera.
t
In case X1=C, we would have sampled from:
P(X2=A | X1=C) = 0.2, P(X2=C | X1=C) = 0.3, et cetera.
Markov process
2 After the second step:
AT
3 The last step is iterated until

ntil the last base
base.
After the third step:
ATC
9 After nine steps:

ATCCGATGC
CCG GC
Andrey Markov
Andrey Markov
1856 1922
1856–1922
Andrey Markov
In the famous first application of Markov chains, Andrey

Markov studied the sequence of 20,000 letters in Pushkin’s
poem “Eugeny Onegin”, discovering that
- the stationary vowel probability is 0.432,
- the probability of a vowel following a vowel is 0.128, and
- the probability of a vowel following a consonant is 0
0.663.
663
Markov also studied the sequence of 100,000 letters in

Aksakov’s novel “The Childhood of Bagrov, the Grandson”.
This was in 1913

1913, long before the computer age!
Basharinl et al. (2004)

Parameter estimation
T A G T T C G G A A G C A C G T G G A T G A G G A A C C C G C C A C C C C A T C C G C C T G G T G T T T G A T A G C T G T A T T C G G
G C A T G G C G G C C C A C G C T A T T C G T C T G T A C G C A A G C C T C G T T T T A T C C C C G C C G C C A G A G T A C C G C A G
C A C C C G G C G C G A C G G G A C C G C G C C G C G G C C A A C T A G C G C A G C C G C A G A C C G C C A C G T G C G C A G G A C T
T C C C A C C C T C A C C C T T C C G C T C A C C C T T A T C G G A G G G C A C T C G C A C G G A T A G T T G G T A G G G G G A C A T
C C C A G G G T A G C C G C A C G A T G C G G A T A G G G G T A T T T A T G G C G C C C T C A C C C A G G C G A T C T C G G G C C C G
G C T G G G C G C C G G G C G G A C C C A G A T A G A T C G G T G C C C C G G C C C C A A C C G A A G G G T A G G A C T G T C G G A C
C T G C G G C G C C C G C G A C G A T A C G A C A G A C A G G A A A G A A A G C A T C C A G G G A C G G C G C G C T T G C T G G A C T
G A C C A G C C A C T C C A T G G G G T C C G T A C C C G C C C G G G G G C A C T T T C G C C C C C G C G G A G A G C T G A G A C A A
T C G A T G T G C C C A G C C G A G G G G G C C C G C C A G C C C A G G G G A C C G G A C T A A G G G C C A C C G T A G G G T A C A G
C A A C C C C G G A C A C T G G C G A C C C C G C A C C A C C C C C A C C C G T A G C T C A G T A C A C G A G G A G A G G A A C C C G
C G C G A A G C C C C T G G G C G C T C T T G C T G G T C A G A A A T A C C G T C G T T T A T C C G T G G A C T C G T A C G A C C T G
C C A C C C T G G C A G G A C T G A C T G C C G C C A C C C C C C G T C C A C T A C G G A A A G T G A G G G G G C C A A C C G C G T A
G G A C A A G A G G G T C T C G C C T C G G G G G G T T G C C G G C C G G G G C A G G T T C A G G A G T T A T T T C C A G G C C C C A
C
T
G
G
G
G
A
A
C
G
T
A
G
C
T
C
T
G
T
T
T
G
T
C
C
G
C
C
G
A How do we estimate the parameters
T
A
C
G
C
A
C
G
A
G
G
G
A
G
T
G
G
C
G
T
G
A
A
C
T
C
G
C
G
A
A
C
A
T
T
T
T
T
A
C
A
T
C
G
G
C
C
G
A
T
C
A
G
T
C
A
C
T
A
A
T
C
C
G
A
T
C
T
A
A
C
G
C
G
T
A
G
C
C
T
A
C
C
G
G
A
C
A
C
G
G
C
G
C
C
C
C
A
C
G
C
G
C
G
of a Markov chain from the data?

T A G T G A C G G A C G C G C T C G A T C C C C C G T T C A C C G G G A C C C T G C C G G A T T T A A G G C G C G T G C C G G G C C C
G T G C T C A C C C C C G G A C A G C C A G A C A C A T T T C T C C T G C G A G T G G T G C C G C T T A C C G G G T C T G A T A C G G
A G T A G A G G C C C C C C G G C T A T A G C C A C A T G C C T A C A T C C T C G C G T C A C A C G G G C T C A G T T C G C C T G G G
G A C G G A T G C C C C G A T C G C G T G G G A T A C C C G C G G C T T A C G A A T C A A A C A C G C C G C G G C G T G C G G G C G G
G
G
G
T
A
G
G
T
G
C
C
G
C
G
T
C
A
G
G
C
G
C
C
G
T
C
T
A
C
C Maximum likelihood!
G
T
G
C
G
C
A
C
G
T
G
G
G
C
G
G
C
T
A
G
T
C
T
C
G
G
C
G
G
T
C
C
C
T
C
C
T
C
C
T
G
T
A
G
T
T
A
G
C
T
G
A
A
A
C
C
C
G
G
G
G
T
G
G
T
A
C
T
C
A
C
C
C
G
G
G
C
G
C
G
C
G
A
T
C
A
G
C
C
C
T
G
C
G
T
T
T
G
T
T
G
G
G
C
C A G C A G G C C T C T A A G C T A T A G C C C C T G C C C A C C T T C G T G A C A C C C G G G T T G C C C C C A C G T C C C G G T A
G T C A A A G G G C T A C C C A G C G C C G C A T C G A C G T C C C T T G G G T G G C G G T T A A C A G C A G A T C T T T G C C A G A
A G T C G C G C A A C A A A G G C A C T C T G C A G C C T G G C C A C C A G G C G G C C C T C T C G A G C G G C C A G C A G G A C C C
G G G C A C G G C A C T C T A T A G C A C G G T A G C C A C T G C A G T A C C G T A G G A C C A C C C G G T G A C A C A G T G G A G A
C C G G G G T T A C A C C A C T C C C C C G A G G T T A G C C A T C C T C G A G A A C G A C T A A G C G C G G G A A C G A G G C G A G
C A G C C A A T A C G T G C G A A G A A C G T C T G C G G A G C G C C A T A C A C G A T A A C T A C G G T T T A T A G G T A C G T C G
C A T G G A G G T T C G C G G C T G T A A A A C C C A C A C A T A C T C G G T C C C A C A C G G G T G T A A G C A C T C G T C A G T C
C G C A G C C T C G T C T C G C C G A G G A T C A C G A G T C G G T C G C C C T C A A A G G A G C T T C A A G T T G G A C C C T G A C
C G G G G A C T T G A G A A G T A A G C A C G C C C G A C G A A G C G G G G C C A G C A C C G G T A A A C C T G C G A C A A G T C C T
T T A C C G C T A G G G T A C T G C G T G T C A A C C A G T C A C G A G T G C C C C A C C C T G G C G A C C C G G C G A G G A C C G C
C C C C T T C C G G G A A G G T C A G A C C T G T G A G C A G A G A G C C A G G A T G C C A C T A A C G C T C A C C G T A C C T T C C
G C C T G A C T C G C C G G T C C G C G C G C G A G C A A C C C C T A A C C A C C A A C C G G C G G A G G A C C C A G C A G A A T A T
G G C G A A C C G G C C C C C G C G A C T G C G C G A T C G G C A G A C T C C T G T G C G G G G A G C A T A C A G G C T C T C C T C G
G A G C G T G G C T C A G T G C C G C G T G G G C C G C T T T C T C C A G C C C C T A C A C G C C C A C T A C T C G A A C G T T T T T
G T G G G C C T G A C T T G C C G C G T A C C C G C G C C T T T C G C C C T G A C G A C C C G C G T T A C C T A T G A G G G A T C A T
G T G G G G T G G G G C C C C G G G T C C A C G C T A T C G A C G T A T G G T T G C T G C G T C A A C T C A C G C T C G A T G T A C A
C G C C G T C C T C G T A A C A G C G A C C A C G C T C C C G A A G C T G G A T A A A C G A C T T C A G C A A T A A G G G A C G G T C
A G G G G G G A G G A C G T A C G T A C T G G G C A G G C A G G C G G C T G G C A C C C T G A G G C A A A T G C G G A C G C G A A C C
A T C C A A G A C G G G C G A C A A A A C G A C C G G A A G A G C G A G G A G A C G T A G C A C C G T T T C C G C G G C T T C C G A A
C G G A G C T A A C A G A G A C C C G C C G G A T C T C G T T G G T A C C G A G C C G C C G T A T A C A T C A C A C C T G C A C G G C
C A G T G C C T G C C T T A G T T C G G G C G C T A G C C C G T C G T C C C T A T T T A A C T G C G C G A G T G G C G A G C T C G G C
Using the definition of conditional probability, the

likelihood can be decomposed as follows:
where x1, …, xT in S = {E1, …, ES}.

Using the Markov property, we obtain:

Furthermore, e.g.:
where only one transition probability at the time enters the

likelihood due to the indicator function
likelihood, function.
The likelihood then becomes:
where, e.g.,
Now note that, e.g.,

pAA + pAC + pAG + pAT = 1.
Or,
pAT = 1 - pAA - pAC - pAG.
Substitute this in the likelihood, and take the logarithm

to arrive at the log
log-likelihood.
likelihood
Note: it is irrelevant which transition probability is

substituted.
The log
log-likelihood:
likelihood:
Differentiation the log

log-likelihood
likelihood yields, e.g.:
Equate the derivatives to zero. This yields four systems of

equations to solve
solve, e
e.g.:
g:
The transition probabilities are then estimated by:
V if 2ndd order
Verify d part.
t derivatives
d i ti off llog-likehood
lik h d are negative!
ti !
If one may assume the observed data is a realization of a

stationary Markov chain, the initial distribution is estimated
by the stationary distribution.
If only
y one realization of a stationaryy Markov chain is
available and stationarity cannot be assumed, the initial
distribution is estimated by:
Thus,, for the following

g sequence:
q ATCGATCGCA,
,
tabulation yields
The maximum likelihood estimates thus become:

In R:
> DNAseq <- c("A", "T", "C", "G", "A",
"T", "C", "G", "C" "A")
> table(DNAseq)
DNAseq
A C G T
3 3 2 2
> table(DNAseq[1:9], DNAseq[2:10])
A C G T
A 0 0 0 2
C 1 0 2 0
G 1 1 0 0
T 0 2 0 0
Testing the order of
the Markov chain
Testing the order of a Markov chain
Often the order of the Markov chain is unknown and
needs
d to b
be d
determined
i d ffrom the
h ddata. Thi
This iis d
done using
i
a Chi-square test for independence.
Consider a DNA-sequence. To assess the order, one first

tests whether the sequence consists of independent
letters. Hereto count the nucleotides, di-nucleotides and
tri-nucleotides in the sequence:
q
The null hypothesis of independence between the letters
off the
th DNA sequence evaluated
l t db
by th
the χ2 statistic:
t ti ti
which has (4-1) x (4-1) = 9 degrees of freedom.
If the null hypothesis cannot be rejected,

rejected one would fit a
Markov model of order m=0. However, if H0 can be
rejected one would test for higher order dependence.
To test for 1st order dependence, consider how often a di-
nucleotide e
nucleotide, e.g.,
g CT,
CT is followed by
by, e
e.g.,
g T.T
The 1st order dependence hypothesis is evaluated by:
where
This is χ2 distributed with (16

(16-1)
1) x (4-1)
(4 1) = 45 d.o.f..
dof
A 1st order Markov chain provides a reasonable description of the
sequence of the hlyE gene of the E
E.coli
coli bacteria
bacteria.
Independence test:
Chi-sq
q stat: 22.45717,
, p
p-value: 0.00754
1st order dependence test:
Chi-sq stat: 55.27470, p-value: 0.14025
The sequence
q of the p
prrA g
gene of the E.coli bacteria requires
q a higher
g
order Markov chain model.
Independence test:
Chi-sq
hi stat: 33.51356,
33 51356 p-value:
l 1
1.266532e-04
266532 04
1st order dependence test:
Chi-sq stat: 114
114.56290,
56290 p-value: 5
5.506452e-08
506452e-08
In R (the independence case, assuming the DNAseq-object is
a character-object
j containing
g the sequence):
q )
> # calculate nucleotide and dimer frequencies
> nuclFreq <- matrix(table(DNAseq), ncol=1)
> dimerFreq <- table(DNAseq[1:(length(DNAseq)-1)],
DNAseq[2:length(DNAseq)])
> # calculate expected dimer frequencies

> dimerExp <- nuclFreq %*% t(nuclFreq)/(length(DNAseq)-1)
> # calculate test statistic and p-value

> teststat <- sum((dimerFreq - dimerExp)^2 / dimerExp)
> pval <-
< exp(pchisq(teststat,
exp(pchisq(teststat 99, lower
lower.tail=FALSE,
tail=FALSE
log.p=TRUE))
E
Exercise:
i modify
dif th
the code
d above
b ffor th
the 1st order
d ttest.
t
Application:
pp
sequence
di i i ti
discrimination
Application: sequence discrimination
We have fitted 1st order Markov chain models to a

representative coding and noncoding sequence of the
E.coli bacteria.
Pcoding Pnoncoding
A C G T A C G T
A 0.321 0.257 0.211 0.211 A 0.320 0.278 0.231 0.172
C 0.319 0.207 0.266 0.208 C 0.295 0.205 0.286 0.214
G 0.259 0.284 0.237 0.219 G 0.241 0.261 0.233 0.265
T 0.223 0.243 0.309 0.225 T 0.283 0.238 0.256 0.223
These models can used to discriminate between

coding and noncoding sequences
sequences.
For a new sequence with unknown function calculate the

likelihood under the coding and noncoding model, e.g.:
Th
These lik
likelihoods
lih d are compared
dbby means off th
their
i llog ratio:
ti
the log-likelihood ratio.

If the log likelihood LR(X) of a new sequence exceeds a

certain threshold, the sequence is classified as coding
and non-coding otherwise.
Back to E.coli
To illustrate the p
potential of the discrimination approach,
pp ,
we calculate the log likelihood ratio for large set of known
coding and noncoding sequences. The distribution of the
t
two sets
t off LR(X)’s
LR(X)’ are compared. d
noncoding
di coding
di
sequences sequences
cy
frequenc
f
log likelihood ratio

flip mirror
violinplot
noncoding coding
Conclusion E.coli example

Comparison of the distributions indicate that one could
discriminate reasonablyy between coding g and noncoding g
sequences on the basis of simple 1st order Markov.
Improvements:
- Higher order Markov model.
- Incorporate more structure information of the DNA.
References &
further reading
References and further reading
Basharin, G.P., Langville, A.N., Naumov, V.A (2004),

“The life and work of A.A. Markov”, Linear Algebra
Applications 386,
and its Applications, 386 3-26.
3 26
Ewens, W.J, Grant, G (2006), Statistical Methods for
Bioinformatics Springer,
Bioinformatics, Springer New York
York.
Reinert, G., Schbath, S., Waterman, M.S. (2000),
Probabilistic and statistical properties of words: an
“Probabilistic
overview”, Journal of Computational Biology, 7, 1-46.
This material is provided under the Creative Commons
Attribution/Share-Alike/Non-Commercial License.
See http://www.creativecommons.org for details.

Markov Chains and Stochastic Processes (Part I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Markov Chains and Stochastic Processes (Part I

Uploaded by

Copyright:

Available Formats

Stochastic processes and

Markov chains (part I)

Wessel van Wieringen

Department of Epidemiology and Biostatistics, VUmc

position position position position

The state space S is the collection of all possible values

If S = {{E1, E2, …,, Es} discrete,, then Xt is a discrete

If S = [0, ∞)) discrete, then Xt is a continuous stochastic

A stochastic process is described by a collection of

There are two important types of processes:

for state -3). -3

P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt),

Loosely, the future depends on the most recent past.

In these lecture series we consider Markov chains in

Recall the DNA

P(Xt+1=3 | Xt=2) = 0.5 6

P(Xt+1=0 | Xt=1) = 0.5 3

Infant mortality Random Wearout failures

Putting the pij in a matrix yields the transition matrix:

The rows of this matrix sum to one.

pAA + pAC + pAG + pAT = 1

We now show how the couple (π (π, P) determines the

Hereto define for

just the definition

use the fact that

use the definition of conditional probability:

use the Markov property

In a similar fashion we can show that:

The general case:

One needs the Kolmogorov-Chapman equations:

for all n, m ≥ 0 and i, j =1,…,S.

matrix multiplication (“rows

0.6490 = P(Xt+2 = I | Xt = I) is composed of two probabilities:

probability of this route plus probability of this route

position position position position position position

Sum over probs. of all possible paths between 2 states:

2 Sample the next base using P given the previous base.

3 The last step is iterated until

9 After nine steps:

In the famous first application of Markov chains, Andrey

Markov also studied the sequence of 100,000 letters in

This was in 1913

Basharinl et al. (2004)

of a Markov chain from the data?

Using the definition of conditional probability, the

where x1, …, xT in S = {E1, …, ES}.

Using the Markov property, we obtain:

where only one transition probability at the time enters the

The likelihood then becomes:

Now note that, e.g.,

Substitute this in the likelihood, and take the logarithm

Note: it is irrelevant which transition probability is

Differentiation the log

Equate the derivatives to zero. This yields four systems of

If one may assume the observed data is a realization of a

Thus,, for the following

The maximum likelihood estimates thus become:

Consider a DNA-sequence. To assess the order, one first

which has (4-1) x (4-1) = 9 degrees of freedom.

If the null hypothesis cannot be rejected,

The 1st order dependence hypothesis is evaluated by:

This is χ2 distributed with (16

> # calculate expected dimer frequencies