Professional Documents
Culture Documents
Sushmita Roy
BMI/CS 576
www.biostat.wisc.edu/bmi576/
sroy@biostat.wisc.edu
Oct 21st, 2014
Recall the three questions in
HMMs
How likely is an HMM to have generated a
given sequence?
Forward algorithm
What is the most likely path for
generating a sequence of observations
Viterbi algorithm
How can we learn an HMM from a set of
sequences?
Forward-backward or Baum-Welch (an EM algorithm)
Learning HMMs from data
Parameter estimation
If we knew the state sequence it would be
easy to estimate the parameters
But we need to work with hidden state
sequences
Use expected counts of state transitions
Reviewing the notation
States with emissions will be numbered
from 1 to K
0 begin state, N end state
observed character at position t
Observed sequence
Hidden state sequence
or path
Transition probabilities
0 2 2 4 4 5
C A G T
begin 1 3 end
0 5
2 4
Emission probabilities
C A G T
begin 1 3 end
0 5
2 4
To compute
A 0.4 A 0.2
C 0.1 0.8 C 0.3
G 0.2 G 0.3
0.5 0.6
T 0.3 T 0.2
begin 1 3 end
0 5
0.5 A 0.4 A 0.1 0.9
C 0.1 0.2 C 0.4
G 0.1 G 0.4
T 0.4 T 0.1
2 4
0.8 0.1
C A G T
Example of computing
0.2 0.4
A 0.4 A 0.2
C 0.1 0.8 C 0.3
G 0.2 G 0.3
0.5 0.6
T 0.3 T 0.2
begin 1 3 end
0 A 0.4 5
0.5 A 0.1 0.9
C 0.1 0.2 C 0.4
G 0.1 G 0.4
T 0.4 T 0.1
P( 3 4, x)
2 4 P( 3 4 | x)
0.8 0.1 P(x)
f4 (3)b4 (3)
C A G T f5 (4)
Steps of the backward
algorithm
Initialization (t=T)
Recursion (t=T-1 to 1)
Termination
The expectation step:
transition count
Expected number of times of transitions
from k to l
The maximization step
f1 (1)a12 e2 ( A)b2 (2) f1 (2)a12 e2 (G)b2 (3) f1 (1)a12 e2 (C )b2 (2) f1 (2)a12 e2 (G)b2 (3)
n1 2
f3 (3) f3 (3)
n11
a11
n11 n12
n1 2
a12
n11 n1 2
Computational complexity of
HMM algorithms
Given an HMM with S states and a sequence of
length L, the complexity of the Forward, Backward
and Viterbi algorithms
2 is
O( S L)
this assumes that the states are densely
interconnected
Given M sequences of length L, the complexity of
Baum-Welch on each iteration is
2
O( MS L)
Baum-Welch convergence
Summary