Professional Documents
Culture Documents
Bioinformatics
Lecture 6
Hidden Markov Model for
Sequence Alignment
Outline
Hidden Markov Model (HMM)
z From Finite State Machine to Finite Markov Model
Appendix
1
Finite Markov Chain
A Finite Markov Chain with four states
C A T C
X1 X2 Xn-1 Xn
2
state 1 state 2 state i state j state N − 1 state N
α1 α 2
α i α j α k −1
α k
⎡ p(e | e) p( f | e) ⎤
[ p(e), p( f )]t +1 = [ p(e), p( f )]t ⎢ ⎥
⎛ p (e | e) p (e | f ) ⎞ ⎣ p(e | f ) p( f | f )⎦
⎜⎜ ⎟ p(f|e)
⎝ p ( f | e) p( f | f ) ⎟⎠
e
f
p(e|e) p(f|f)
p(e|f) p(T|f)
p(A|e) p(C|e)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
3
HMM: An example
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 XL-1 xL
Notations:
Sequence of symbols (output) (X1…XL), modeled by the Emission
probabilities: p(Xi = b| Si = s) = es(b)
4
Hidden Markov Model
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 XL-1 xL
M M M M
S1 S2 SL-1 SL
x1 x2 XL-1 xL
p(f|e)
e
f
p(e|e) p(f|f)
p(A|e) p(C|e) p(e|f) p(T|f)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
Given the “visible” sequence x=(x1,…,xL),: How to estimate
the parameters for HMM? Baum-Welch Algorithm 10
5
p jj α (i t ) = ∑ α (jt −1) pij
pij j
p kj
α1 α2 αi α j ααi k=− 1lim t → ∞ααki( t )
p(f|e)
e
f
p(e|e) p(f|f)
p(e|f) p(T|f)
p(A|e) p(C|e)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
p( X ) =
p( A) =
p( e ) p( A | e ) + p( f ) p( A | f ) p(e) p( X | e) + p( f ) p( X | f )
12
6
1. Most Probable state path
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
13
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
p ( s, x)
Since p ( s| x) = α p ( s, x)
p ( x)
14
7
Model 3-Hidden Markov Chain
15
s1 s2 si
x1 x2 xi
16
8
Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
17
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p(xi | si = l ) = el ( xi )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
18
9
s1 s2 si-1 si
x1 x2 xi-1 xi
s1 s2 si-1 si
x1 x2 xi-1 xi
Viterbi’s algorithm
s1 s2 si sL-1 sL
0
x1 x2 xi xL-1 xL
10
Parameter Estimation for HMM
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
An HMM model is defined by the parameters: akl and ek(b), for all
states k,l and all symbols b.
Let θ denote the collection of these parameters.
akl
k l
ek(b)
b
21
{X1,...,Xn}
X1
Xj
Xn
22
11
Case 1: Sequences are fully known
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
23
Case 1 (Cont)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
24
12
ML for Parameter Estimation (CASE 1)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
25
Case 1 (cont)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
26
13
Case 1 (cont)
So we need to find akl’s and ek(b)’s which maximize:
∏ kl
a Akl
∏ k
[ e (b )] Ek (b )
( k ,l ) ( k ,b )
Subject to:
[a kl , ek (b ) ≥ 0 ]
27
14
Generalization for n outcomes (cont)
By treatment identical to the die case, the maximum is obtained
when for all i:
ni nk
=
θi θk
Hence the MLE is given by:
ni
θi = i = 1,.., k
n
29
Fractional Exponents
ni
θi = i = 1,.., k
n
30
15
Side comment: Sufficient Statistics
To compute the probability of data in the die example
we only require to record the number of times Ni
falling on side i (namely,N1, N2,…,N6).
We do not need to recall the entire sequence of
outcomes
(
P ( Data | Θ) = θ1N1 ⋅θ 2N 2 ⋅θ 3N 3 ⋅θ 4N 4 ⋅θ 5N 5 ⋅ 1 − ∑i =1θ i
5
)
N6
31
Sufficient Statistics
A sufficient statistics is a function of the data that
summarizes the relevant information for the
likelihood
Formally, s(Data) is a sufficient statistics if for any
two datasets D and D’
z s(Data) = s(Data’ ) ⇒ P(Data|θ) = P(Data’|θ)
Exercise:
Define “sufficient
statistics” for the
HMM model.
Datasets
Statistics
32
16
Case 2: State paths are unknown:
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
33
34
17
Baum-Welch training (EM algorithm for HMM)
the process is iterated as follows
z Estimate Akl and Ek(b) by considering probable
paths for the training sequences using the current
values of akl and ek(b)
z Use the approach for case 1 to derive new values
of the a’s and e’s
Akl Ek (b)
akl = , and ek (b) =
∑ l ' Akl ' ∑ b ' Ek (b ')
Ald denotes the number of transitions from state l to state d
Ek(b) denotes the number of emitting b as in state k
35
.. ..
s1
x1 xi-1 xi xL
18
Step 1: Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi x-1 xL
19
The forward algorithm
s1 s2 si
x1 x2 xi
P(x1,x2,s2) = Σ s P(x1,s1,s2,x2)
1
{Second step}
= Σ s P(x1,s1) P(s2 | x1,s1) P(x2 | x1,s1,s2)
Last equality due 1
to conditional
independence = Σs 1
P(x1,s1) P(s2 | s1) P(x2 | s2)
{step i}
P(x1,…,xi,si) = Σs i-1
P(x1,…,xi-1, si-1) P(si | si-1 ) P(xi | si)
39
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
40
20
The backward algorithm
si si+1 sL-1 sL
xi+1 xL-1 xL
P(xL| sL-1) = Σ sLP(xL ,sL |sL-1) = Σ sLP(sL |sL-1) P(xL |sL-1 ,sL )=
Last equality due to
conditional independence = Σ s P(sL |sL-1) P(xL |sL )
L
{first step}
{step i}
P(xi+1,…,xL|si) = Σ si+1 P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
=b(si) =b(si+1)
41
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
=p(x1,…,xL),
42
21
Step 1
for each pair (k,l), compute the expected number of state
transitions from k to l:
n
1 L j
Akl = ∑ j ∑
p( si −1 =k , si =l , xXj | θ )
j =1 p ( x ) i =1
n L
1
=∑ jj ∑ f kj (i − 1)akl el ( xi )bl j (i )
j =1 p( xX ) i =1
n L
1 X j |θ )
Akl = ∑ ∑ p ( si −1 =k , si =l , x j
p ( xX ) jj
j =1 i =1
n L
1
=∑ ∑ f kj (i − 1)akl el ( xi )bl j (i )
j
j =1 p ( xX ) j
i =1 43
Baum-Welch: Step 2
k
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
b
for each state k and each symbol b, compute the expected
number of emissions of b from k:
n
1
Ek ( b ) = ∑ j ∑ f k j (i )bkj (i )
j =1 p ( x
Xj ) i: xij = b
44
22
Baum-Welch: step 3
46
23
The Transition Probabilities
M X Y
Transitions probabilities 1-2δ δ δ
M
δ: transition from M to an
insert state X 1- ε ε
ε: staying in an insert state
1- ε ε
Y
Emission Probabilities
Match: (a,b) with pab – only from M states
47
END 1
48
24
HMM for Sequence Alignment: detailed
algorithm
•Most probable path is the best alignment
⎛ (1 − 2δ )v M (i − 1, j − 1) ⎞
⎜ ⎟
v M [i, j ] = pxi y j max ⎜ (1 − ε )v X (i − 1, j − 1) ⎟
⎜ ⎟
⎜ (1 − ε )vY (i − 1, j − 1) ⎟
⎝ ⎠
49
X
⎛ δ v M (i − 1, j ) ⎞
v [i, j ] = qxi max ⎜ ⎟
⎜ ε v X (i − 1, j ) ⎟
⎝ ⎠
Y
⎛ δ v M (i, j − 1) ⎞
v [i, j ] = q y j max ⎜ ⎟
⎜ ε vY (i, j − 1) ⎟
⎝ ⎠
Termination:
(
v E = τ max v M (n, m ), v X (n, m ), vY (n, m ) )
50
25
p ij p jj α (t)
i = ∑ j
α ( t −1)
j p ij
p kj
α 1
α 2
α i α j α αi k=− 1lim (t )
t → ∞ α ki α
either or
G (x µ, Σ )
q(x) = ∑r=1αrG x µr , Σr
k
( )
.
26