Professional Documents
Culture Documents
Lecture #5
.
Dependencies along the genome
In previous classes we assumed every letter in a
sequence is sampled randomly from some
distribution q(⋅) over the alpha bet {A,C,T,G}.
2
Finite Markov Chain
3
Markov Chain (cont.)
X1 X2 Xn-1 Xn
p ( x1 ) axi1xi
i 2
Similarly, (X1,…, Xi ,…) is a sequence of probability
distributions over D. There is a rich theory which studies the
properties of these sequences. A bit of it is presented next.
4
Markov Chain (cont.)
X1 X2 Xn-1 Xn
5
Matrix Representation
A B C D The transition probabilities
0.95 0 0.05 0 Matrix M =(ast)
A
M is a stochastic Matrix:
B 0.2 0.5 0 0.3
a t st 1
0 0.2 0 0.8
C The initial distribution vector
(u1…um) defines the distribution
0 0 1 0
D of X1 (p(X1=si)=ui) .
6
Matrix Representation
A B C D
0.95 0 0.05 0 Example: if X1=(0, 1, 0, 0)
A
then X2=(0.2, 0.5, 0, 0.3)
B 0.2 0.5 0 0.3
And if X1=(0, 0, 0.5, 0.5)
0 0.2 0 0.8 then X2=(0, 0.1, 0.5, 0.4).
C
0 0 1 0
D
7
Representation of a Markov Chain as a
Digraph
A B C D
0.95 0.95 0 0.05 0
A
0.2 0.5 0 0.3
B
0.2 0.5
A B C
0 0.2 0 0.8
0 0 1 0
0.05 0.2 0.3 D
0.8
C D
1
8
Properties of Markov Chain states
States of Markov chains are classified by the digraph
representation (omitting the actual probability values)
9
Another example of Recurrent and
Transient States
10
Irreducible Markov Chains
A B E A B
C D C D
1
11
Periodic States
A B E A state s has a period k if k is the
GCD of the lengths of all the
cycles that pass via s. (in the
shown graph the period of A is 2).
C D
12
Ergodic Markov Chains
13
Stationary Distributions for Markov
Chains
V is a stationary distribution
14
Stationary Distributions for a Markov
Chain M
15
“Good” Markov chains
A Markov Chains is good if the distributions Xi , as i→ ∞:
17
Bad case 1: Mutual Unreachabaility
A B
Consider two initial distributions:
• p(X1=A)=1 (p(X1 = x)=0 if x≠A).
C D
• p(X1= C) = 1
A B
C D
19
Bad case 2: Transient States
C D
X
21
Bad case 3: Periodic Markov Chains
A B E
C D
22
Bad case 3: Periodic States
A B E
C D
Fact 3: In a periodic Markov Chain (of period k >1) there are initial
distributions under which the states are visited in a periodic manner.
Under such initial distributions Xi does not converge as i→ ∞.
24
Use of Markov Chains in Genome search:
Modeling CpG Islands
In human genomes the pair CG often transforms to
(methyl-C) G which often transforms to TG.
25
Example: CpG Island (Cont.)
28
Question 1: Using two Markov chains
A+ (For CpG islands):
30
Discriminating between the two models
X1 X2 XL-1 XL
p ( x | + model)
∏p
i =0
+ ( xi +1 | xi )
RATIO = = L −1
p ( x | − model)
∏ p (x
i =0
− i +1 | xi )
p(x1...x L| + ) p +(xi|xi −1 )
log Q = log
p(x1...x L| − )
= ∑i
log
p −(xi|xi −1 )
32
Where do the parameters (transition-
probabilities) come from ?
33
Maximum Likelihood Estimate (MLE) of
the parameters (using labeled data)
X1 X2 XL-1 XL
35
Question 2: Finding CpG Islands
T T T T
x1 x2 XL-1 xL
L
A Markov chain (s1,…,sL): p( s1 ,K , sL ) p( si | si 1 )
i 1
and for each state s and a symbol x we have p(Xi=x|Si=s)
37
Hidden Markov Model
S1
M S2
M M SL-1
M SL
T T T T
x1 x2 XL-1 xL
Notations:
Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast
Emission probabilities: p(Xi = b| Si = s) = es(b)
L
For Markov Chains we know: p( s) p ( s1 ,K , sL ) p ( si | si 1 )
i 1
38
Hidden Markov Model
S1
M S2
M M SL-1
M SL
T T T T
x1 x2 XL-1 xL
39