Lecture07 HMM S

Algorithms in
Bioinformatics
Lecture 6
Hidden Markov Model for
Sequence Alignment
Outline
Hidden Markov Model (HMM)
z From Finite State Machine to Finite Markov Model
z From Finite Markov Model to Hidden Markov Model
Find the Most Probable Path for HMM
Parameter Estimation for HMM (by EM Algorithm)
HMM for Sequence Alignment
Appendix
1
Finite Markov Chain
A Finite Markov Chain with four states
Sequence modeled by Finite Markov

Chain
For instance, the sequence CA….TC is modeled as
follows
C A T C
Similarly, (X1,…, Xi ,…) is a sequence of probability

distributions over D.
X1 X2 Xn-1 Xn
2
state 1 state 2 state i state j state N − 1 state N
α1 α 2
α i α j α k −1
α k
⎡ p(e | e) p( f | e) ⎤
[ p(e), p( f )]t +1 = [ p(e), p( f )]t ⎢ ⎥
⎛ p (e | e) p (e | f ) ⎞ ⎣ p(e | f ) p( f | f )⎦
⎜⎜ ⎟ p(f|e)
⎝ p ( f | e) p( f | f ) ⎟⎠
e
f
p(e|e) p(f|f)
p(e|f) p(T|f)
p(A|e) p(C|e)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
⎛ p(A| e) p(C| e) p(T | e) p(G| e) ⎞ p( A) =

⎜⎜ ⎟⎟ p( e ) p( A | e ) + p( f ) p( A | f )
⎝ p(A| f ) p(C| f ) p(T | f ) p(G| f ) ⎠
6
3
HMM: An example
From Finite Markov Model to Hidden

Markov Model
We add a output (emission) for each state in Finite Markov
Model, it becomes a Hidden Markov Model
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 XL-1 xL
Notations:
Sequence of symbols (output) (X1…XL), modeled by the Emission
probabilities: p(Xi = b| Si = s) = es(b)
Why called Hidden Markov Model?

The sequence of states is hidden!!!
8
4
Hidden Markov Model
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 XL-1 xL
Given the “visible” sequence x=(x1,…,xL),:

How to find the most probable (hidden) path?
Viterbi’s algorithm
M M M M
S1 S2 SL-1 SL
x1 x2 XL-1 xL
p(f|e)
e
f
p(e|e) p(f|f)
p(A|e) p(C|e) p(e|f) p(T|f)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
Given the “visible” sequence x=(x1,…,xL),: How to estimate
the parameters for HMM? Baum-Welch Algorithm 10
5
p jj α (i t ) = ∑ α (jt −1) pij
pij j
state 1 state 2 state i state j state N − 1 state N
p kj
α1 α2 αi α j ααi k=− 1lim t → ∞ααki( t )
How many states ?

.
p(f|e)
e
f
p(e|e) p(f|f)
p(e|f) p(T|f)
p(A|e) p(C|e)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
p( X ) =
p( A) =
p( e ) p( A | e ) + p( f ) p( A | f ) p(e) p( X | e) + p( f ) p( X | f )
12
6
1. Most Probable state path
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
First Question: Given an output sequence x = (x1,…,xL),

A most probable path s*= (s*1,…,s*L) is one which
maximizes p(s|x).
s* = ( s1* ,..., s*L ) = maxarg p(s1 ,..., sL | x1 ,..., xL )
(s1 ,..., sL )
13
Most Probable path (cont.)
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
p ( s, x)
Since p ( s| x) = α p ( s, x)
p ( x)
we need to find s which maximizes p(s,x)
14
7
Model 3-Hidden Markov Chain
For a given sequence ATCGCCGGGA, assume that

The probability of each letter occurring in the sequence does not
depends on known factors.
The probability of each letter occurring in the sequence depends on
some unknown status (hidden status).
For example, there are 2 unknown status e and f.
z When the unknown status is e, we assume that A,C,T and G have a
probability distribution, and denoted as
p(A|e), p(C|e), p(T|e), p(G|e) (called as conditional probability).
z When the unknown status is f, we assume that A,C,T and G have a
probability distribution, and denoted as
p(A|f), p(C|f), p(T|f), p(G|f) (called as conditional probability).
15
Viterbi’s algorithm for most probable path
s1 s2 si
x1 x2 xi
The task: compute maxarg p ( s1 ,..., sL ; x1 ,..., xL )

(s1 ,..., sL )
Idea: for i=1,…,L and for each state l, compute:
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
Exercise: For i = 1,…,L and for each state l:

vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl }
k
16
8
Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
17
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable

path up to i, which ends in state l .
p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

= p(s1 ,..., si −1 , si , x1 ,..., xi −1 | xi , si = l ) p( xi | si = l )
p (s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

= p (s1 ,..., si −1 , x1 ,..., xi −1 ) p(si = l | s1 ,..., si −1 , x1 ,..., xi −1 )
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p(xi | si = l ) = el ( xi )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
18
9
s1 s2 si-1 si
x1 x2 xi-1 xi
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 )
s1 s2 si-1 si
x1 x2 xi-1 xi
p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi )

= p (s1 ,..., si −1 , si = l, x1 ,..., xi −1 )
19
Viterbi’s algorithm
s1 s2 si sL-1 sL
0
x1 x2 xi xL-1 xL
We add the special initial state 0.

Initialization: v0(0) = 1 , vk(0) = 0 for k > 0
For i=1 to L do for each state l :
vl(i) = el(xi) MAXk {vk(i-1)akl }
ptri(l)=argmaxk{vk(i-1)akl} [storing previous state for reconstructing
the path]
Termination:
Result: p(s1*,…,sL*;x1,…,xl) = max{vk ( L)}

k
20
10
Parameter Estimation for HMM
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
An HMM model is defined by the parameters: akl and ek(b), for all
states k,l and all symbols b.
Let θ denote the collection of these parameters.
akl
k l
ek(b)
b
21
{X1,...,Xn}
X1
Xj
Xn
22
11
Case 1: Sequences are fully known
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
We know the complete structure of each sequence in the

training set {X1,...,Xn}. We wish to estimate akl and ek(b)
for all pairs of states k, l and symbols b.
23
Case 1 (Cont)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
For each k the parameters {akl|l=1,..,m} and

{ek(b)|b∈Σ}:
Akl Ek (b)
akl = , and ek (b) =
∑ l ' kl '
A ∑ b ' Ek (b ')
Ald denotes the number of transitions from state l to state d
Ek(b) denotes the number of emitting b as in state k
24
12
ML for Parameter Estimation (CASE 1)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
For each Xj we have:

p( X j | θ ) = ∏ asi−1si esi ( xij )
i
25
Case 1 (cont)
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
Thus, if Akl = #(transitions from k to l) in the training set,

and Ek(b) = #(emissions of symbol b from state k) in the
training set, we have:
26
13
Case 1 (cont)
So we need to find akl’s and ek(b)’s which maximize:
∏ kl
a Akl
∏ k
[ e (b )] Ek (b )
( k ,l ) ( k ,b )
Subject to:
For all states k , ∑a l

kl = 1 and ∑ e (b ) = 1
b
k
[a kl , ek (b ) ≥ 0 ]
27
Generalization for distribution with

any number n of outcomes
Let X be a random variable with n values x1,…,xk denoting the k
outcomes of an iid experiments, with parameters
θ ={θ1,θ2,...,θk} (θi is the probability of xi).
Again, the data is one sequence of length n:
Data = (xi1,xi2,...,xin)
Then we have to maximize
P( Data | θ ) = θ1n1 ⋅ θ 2n2 ⋅⋅⋅⋅θ knk , (n1 + ... + nk = n)

Subject to: θ1+θ2+ ....+ θk=1
nk
⎛ k −1 ⎞
i.e., P( Data | θ ) = θ1n1 ⋅⋅⋅ θ kn−k −11 ⋅ ⎜1 − ∑ θi ⎟
⎝ i =1 ⎠ 28
14
Generalization for n outcomes (cont)
By treatment identical to the die case, the maximum is obtained
when for all i:
ni nk
=
θi θk
Hence the MLE is given by:
ni
θi = i = 1,.., k
n
29
Fractional Exponents
Some models allow ni’s which are not integers (eg,

when we are uncertain of a die outcome, and consider it
“6” with 20% confidence and “5” with 80%):
We still can have
P( Data | θ ) = θ1n1 ⋅ θ 2n2 ⋅⋅⋅⋅θ knk , (n1 + ... + nk = n)

And the same analysis yields:
ni
θi = i = 1,.., k
n
30
15
Side comment: Sufficient Statistics
To compute the probability of data in the die example
we only require to record the number of times Ni
falling on side i (namely,N1, N2,…,N6).
We do not need to recall the entire sequence of
outcomes
(
P ( Data | Θ) = θ1N1 ⋅θ 2N 2 ⋅θ 3N 3 ⋅θ 4N 4 ⋅θ 5N 5 ⋅ 1 − ∑i =1θ i
5
)
N6
{Ni | i=1…6} is called sufficient statistics for the

multinomial sampling.
31
Sufficient Statistics
A sufficient statistics is a function of the data that
summarizes the relevant information for the
likelihood
Formally, s(Data) is a sufficient statistics if for any
two datasets D and D’
z s(Data) = s(Data’ ) ⇒ P(Data|θ) = P(Data’|θ)
Exercise:
Define “sufficient
statistics” for the
HMM model.
Datasets
Statistics
32
16
Case 2: State paths are unknown:
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
In this case only the values of the xi’s of the input

sequences are known.
This is a ML problem with “missing data”.
We wish to find θ* so that p(x|θ*)=MAXθ{p(x|θ)}.
For each sequence x,
p(x|θ)=∑s p(x,s|θ),
taken over all state paths s.
33
Case 2: ML Parameter Estimation for HMM

Informally, the general process for finding θ in this case is
1. Start with an initial value of θ.
2. Find θ’ so that p(X1,..., Xn|θ’) > p(X1,..., Xn|θ)
3. set θ = θ’.
4. Repeat until some convergence criterion is met.
A general algorithm of this type is the Expectation

Maximization (EM) algorithm..
34
17
Baum-Welch training (EM algorithm for HMM)
the process is iterated as follows
z Estimate Akl and Ek(b) by considering probable
paths for the training sequences using the current
values of akl and ek(b)
z Use the approach for case 1 to derive new values
of the a’s and e’s
Akl Ek (b)
∑ l ' Akl ' ∑ b ' Ek (b ')
Ek(b) denotes the number of emitting b as in state k
35
Baum Welch: step 1

Si-1 Si sL
.. ..
s1
x1 xi-1 xi xL
Count expected number of state transitions: For each

sequence Xj, for each i and for each k,l, compute the
posterior state transitions probabilities:
P(si-1=k, si=l | Xj,θ)
Akl Ek (b)
∑ l ' Akl ' ∑ b ' Ek (b ')
Ek(b) denotes the number of emitting b as in state k 36
18
Step 1: Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

Xj = fk(i-1) aklel(xi ) bl(i)
Via the forward Via the backward
algorithm algorithm
fk(i-1) aklel(xi ) bl(i)

p(si-1=k,si=l | Xj) =
P(x1,…,xL)
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)

37
Step 1: Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi x-1 xL
P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklek(xi ) P(xi+1,…,xL |si=l)

= fk(i-1) aklek(xi ) bl(i)
Xj
Via the forward Via the backward
algorithm algorithm
fk(i-1) aklel(xi ) bl(i)

p(si-1=k,si=l | xj) =
ΣK’ Σl’fk’(i-1) ak’l’ek’(xi ) bl’(i)
38
19
The forward algorithm
s1 s2 si
x1 x2 xi
The task: Compute f(si) = P(x1,…,xi,si) for i=1,…,L (namely,

considering evidence up to time slot i).
P(x1, s1) = P(s1) P(x1|s1) {Basis step}
P(x1,x2,s2) = Σ s P(x1,s1,s2,x2)
1
{Second step}
= Σ s P(x1,s1) P(s2 | x1,s1) P(x2 | x1,s1,s2)
Last equality due 1
to conditional
independence = Σs 1
P(x1,s1) P(s2 | s1) P(x2 | s2)
{step i}
P(x1,…,xi,si) = Σs i-1
P(x1,…,xi-1, si-1) P(si | si-1 ) P(xi | si)
39
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
To compute the likelihood of evidence P(x1,…,xL), do one

more step in the forward algorithm, namely,
Σs f(sL) = Σ s P(x1,…,xL,sL)
L L
40
20
The backward algorithm
si si+1 sL-1 sL
xi+1 xL-1 xL
The task: Compute b(si) = P(xi+1,…,xL|si) for i=L-1,…,1

(namely, considering evidence after time slot i).
P(xL| sL-1) = Σ sLP(xL ,sL |sL-1) = Σ sLP(sL |sL-1) P(xL |sL-1 ,sL )=
Last equality due to
conditional independence = Σ s P(sL |sL-1) P(xL |sL )
L
{first step}
{step i}
P(xi+1,…,xL|si) = Σ si+1 P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
=b(si) =b(si+1)
41
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
do one more step in the backward algorithm, namely,

Σ b(s1) P(s1) P(x1|s1) = Σ P(x2,…,xL|s1) P(s1) P(x1|s1)
s1 s1
=p(x1,…,xL),
42
21
Step 1
for each pair (k,l), compute the expected number of state
transitions from k to l:
n
1 L j
Akl = ∑ j ∑
p( si −1 =k , si =l , xXj | θ )
j =1 p ( x ) i =1
n L
1
=∑ jj ∑ f kj (i − 1)akl el ( xi )bl j (i )
j =1 p( xX ) i =1
n L
1 X j |θ )
Akl = ∑ ∑ p ( si −1 =k , si =l , x j
p ( xX ) jj
j =1 i =1
n L
1
=∑ ∑ f kj (i − 1)akl el ( xi )bl j (i )
j
j =1 p ( xX ) j
i =1 43
Baum-Welch: Step 2
k
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
b
for each state k and each symbol b, compute the expected
number of emissions of b from k:
n
1
Ek ( b ) = ∑ j ∑ f k j (i )bkj (i )
j =1 p ( x
Xj ) i: xij = b
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)
44
22
Baum-Welch: step 3
Use the Akl’s, Ek(b)’s to compute the new values of akl

and ek(b). These values define θ*.
Akl Ek (b)
∑ l ' kl '
A ∑ b ' Ek (b ')
It can be shown that:
p(X1,..., Xn|θ*) > p(X1,..., Xn|θ)
i.e, θ* increases the probability of the data
This procedure is iterated, until some convergence

criterion is met. 45
Sequence Comparison using HMM
“Hidden” States Symbols emitted

Match M Match: {(a,b)| a,b in ∑ }
Insertion in x Insertion in x: {(a,-)| a in ∑ }.
insertion in y Insertion in y: {(-,a)| a in ∑ }.
We call this type of model a pair HMM to

distinguish it from the standard HMMs that emit
single sequence
Each aligned pair is generated by the above HMM
with certain probability
Most probable path is the best alignment (Why?)
46
23
The Transition Probabilities
M X Y
Transitions probabilities 1-2δ δ δ
M
δ: transition from M to an
insert state X 1- ε ε
ε: staying in an insert state
1- ε ε
Y
Emission Probabilities
Match: (a,b) with pab – only from M states
Insertion in x: (a,-) with qa – only from X state
Insertion in y: (-,a).with qa - only from Y state.
47
Adding termination probabilities

We may want a model which defines a
probability distribution over all possible
sequences.
M X Y END
For this, an END state is 1-2δ
added, with transition M
-τ
δ δ τ
probability τ from any 1-ε -
other state to END. This X ε τ
τ
assume expected sequence 1-ε -
length of 1/ τ. Y τ ε τ
END 1
48
24
HMM for Sequence Alignment: detailed
algorithm
•Most probable path is the best alignment
Let vM(i,j) be the probability of the most probable

alignment of x(1..i) and y(1..j), which ends with a
match. Then using a recursive argument, we get:
⎛ (1 − 2δ )v M (i − 1, j − 1) ⎞
⎜ ⎟
v M [i, j ] = pxi y j max ⎜ (1 − ε )v X (i − 1, j − 1) ⎟
⎜ ⎟
⎜ (1 − ε )vY (i − 1, j − 1) ⎟
⎝ ⎠
49
Most probable path

Similarly, vX(i,j) and vY(i,j), the probabilities of the most probable
alignment of x(1..i) and y(1..j), which ends with an insertion to x
or y:
X
⎛ δ v M (i − 1, j ) ⎞
v [i, j ] = qxi max ⎜ ⎟
⎜ ε v X (i − 1, j ) ⎟
⎝ ⎠
Y
⎛ δ v M (i, j − 1) ⎞
v [i, j ] = q y j max ⎜ ⎟
⎜ ε vY (i, j − 1) ⎟
⎝ ⎠
Termination:
(
v E = τ max v M (n, m ), v X (n, m ), vY (n, m ) )
50
25
p ij p jj α (t)
i = ∑ j
α ( t −1)
j p ij
state 1 state 2 state i state j state N −1 state N
p kj
α 1
α 2
α i α j α αi k=− 1lim (t )
t → ∞ α ki α
either or
G (x µ, Σ )
q(x) = ∑r=1αrG x µr , Σr
k
( )
.
26

Lecture07 HMM S

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture07 HMM S

Uploaded by

Copyright:

Available Formats

Algorithms in

z From Finite Markov Model to Hidden Markov Model

 Find the Most Probable Path for HMM

 Parameter Estimation for HMM (by EM Algorithm)

 HMM for Sequence Alignment

Sequence modeled by Finite Markov

 Similarly, (X1,…, Xi ,…) is a sequence of probability

⎛ p(A| e) p(C| e) p(T | e) p(G| e) ⎞ p( A) =

From Finite Markov Model to Hidden

Why called Hidden Markov Model?

Given the “visible” sequence x=(x1,…,xL),:

state 1 state 2 state i state j state N − 1 state N

How many states ?

First Question: Given an output sequence x = (x1,…,xL),

Most Probable path (cont.)

we need to find s which maximizes p(s,x)

For a given sequence ATCGCCGGGA, assume that

Viterbi’s algorithm for most probable path

The task: compute maxarg p ( s1 ,..., sL ; x1 ,..., xL )

Exercise: For i = 1,…,L and for each state l:

vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable

p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

p (s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi )

We add the special initial state 0.

Result: p(s1*,…,sL*;x1,…,xl) = max{vk ( L)}

We know the complete structure of each sequence in the

For each k the parameters {akl|l=1,..,m} and

For each Xj we have:

Thus, if Akl = #(transitions from k to l) in the training set,

For all states k , ∑a l

Generalization for distribution with

P( Data | θ ) = θ1n1 ⋅ θ 2n2 ⋅⋅⋅⋅θ knk , (n1 + ... + nk = n)

Some models allow ni’s which are not integers (eg,

P( Data | θ ) = θ1n1 ⋅ θ 2n2 ⋅⋅⋅⋅θ knk , (n1 + ... + nk = n)

 {Ni | i=1…6} is called sufficient statistics for the

In this case only the values of the xi’s of the input

Case 2: ML Parameter Estimation for HMM

A general algorithm of this type is the Expectation

Baum Welch: step 1

Count expected number of state transitions: For each

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

fk(i-1) aklel(xi ) bl(i)

P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)

Step 1: Computing P(si-1=k, si=l | Xj,θ)

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklek(xi ) P(xi+1,…,xL |si=l)

fk(i-1) aklel(xi ) bl(i)

The task: Compute f(si) = P(x1,…,xi,si) for i=1,…,L (namely,

To compute the likelihood of evidence P(x1,…,xL), do one

The task: Compute b(si) = P(xi+1,…,xL|si) for i=L-1,…,1

do one more step in the backward algorithm, namely,

P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)

Use the Akl’s, Ek(b)’s to compute the new values of akl

This procedure is iterated, until some convergence

Sequence Comparison using HMM

“Hidden” States Symbols emitted

 We call this type of model a pair HMM to

 Insertion in x: (a,-) with qa – only from X state

 Insertion in y: (-,a).with qa - only from Y state.

Adding termination probabilities

Let vM(i,j) be the probability of the most probable

Find the Most Probable Path for HMM

Parameter Estimation for HMM (by EM Algorithm)

HMM for Sequence Alignment

Similarly, (X1,…, Xi ,…) is a sequence of probability

Result: p(s1,…,sL;x1,…,xl) = max{vk ( L)}

{Ni | i=1…6} is called sufficient statistics for the

We call this type of model a pair HMM to

Insertion in x: (a,-) with qa – only from X state

Insertion in y: (-,a).with qa - only from Y state.