Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu

Learning HMM parameters
Sushmita Roy
BMI/CS 576
www.biostat.wisc.edu/bmi576/
sroy@biostat.wisc.edu
Oct 21st, 2014
Recall the three questions in
HMMs
How likely is an HMM to have generated a
given sequence?
Forward algorithm
What is the most likely path for
generating a sequence of observations
Viterbi algorithm
How can we learn an HMM from a set of
sequences?
Forward-backward or Baum-Welch (an EM algorithm)
Learning HMMs from data
Parameter estimation
If we knew the state sequence it would be
easy to estimate the parameters
But we need to work with hidden state
sequences
Use expected counts of state transitions
Reviewing the notation
States with emissions will be numbered
from 1 to K
0 begin state, N end state
observed character at position t
Observed sequence
Hidden state sequence
or path
Transition probabilities
Emission probabilities: Probability of

emitting symbol b from state k
Learning without hidden
information
Learning is simple if we know the correct path
for each sequence in our training set
0 2 2 4 4 5
C A G T
begin 1 3 end
0 5
2 4
Estimate parameters by counting the number

of times each parameter is used across the
training set
Learning without hidden
information
Transition probabilities
Number of transitions from state k to state
Emission probabilities
Number of times c is emitted from k

Learning with hidden
information
if we dont know the correct path for each sequence
in our training set, consider all possible paths for the
sequence
0 ? ? ? ? 5
C A G T
begin 1 3 end
0 5
2 4
estimate parameters through a procedure that

counts the expected number of times each
parameter is used across the training set
The Baum-Welch algorithm
Also known as Forward-backward algorithm

An Expectation Maximization (EM)
algorithm
EM is a family of algorithms for learning probabilistic
models in problems that involve hidden information
Expectation: Estimate the expected number of
times there are transitions and emissions (using
current values of parameters)
Maximization: Estimate parameters given expected
counts
Hidden variables are the state transitions
and emission counts
Learning parameters:
the Baum-Welch algorithm
algorithm sketch:
initialize parameters of model
iterate until convergence
calculate the expected number of times each
transition or emission is used
adjust the parameters to maximize the
likelihood of these expected values
The expectation step
We need to know the probability of the symbol at

t being produced by state k, given the entire
sequence x
We also need to know the probability of symbol

at t and (t+1) being produced by state k, and l
respectively given sequence x
Given these we can compute our expected

counts for state transitions, character emissions
Computing
First we compute the probability of the

entire observed sequence with the tth
symbol being generated by state k
Then our quantity of interest is computed

as
Obtained from the forward algorith

Computing
To compute
We need the forward and backward

algorithm
Forward algorithm fk(t) Backward algorithm bk(t)

Computing
Using the forward and backward variables,

this is computed as
The backward algorithm
the backward algorithm gives bk (t)
us , the
probability of observing the rest of x, given that
were in state k after t characters
0.2 0.4
A 0.4 A 0.2
C 0.1 0.8 C 0.3
G 0.2 G 0.3
0.5 0.6
T 0.3 T 0.2
begin 1 3 end
0 5
0.5 A 0.4 A 0.1 0.9
C 0.1 0.2 C 0.4
G 0.1 G 0.4
T 0.4 T 0.1
2 4
0.8 0.1
C A G T
Example of computing
0.2 0.4
A 0.4 A 0.2
C 0.1 0.8 C 0.3
G 0.2 G 0.3
0.5 0.6
T 0.3 T 0.2
begin 1 3 end
0 A 0.4 5
0.5 A 0.1 0.9
C 0.1 0.2 C 0.4
G 0.1 G 0.4
T 0.4 T 0.1
P( 3 4, x)
2 4 P( 3 4 | x)
0.8 0.1 P(x)
f4 (3)b4 (3)

C A G T f5 (4)
Steps of the backward
algorithm
Initialization (t=T)
Recursion (t=T-1 to 1)
Termination
Note, the same

quantity can be
obtained from the
forward algorithm as
well
Computing
This is the probability of symbols at t and

t+1 emitted from states k and l given the
entire sequence x
Putting it all together
Assume we are given J training instances

x1,..,xj,.. xJ
Expectation step
Using current parameter values compute for each xj
Apply the forward and backward algorithms
Compute
expected number of transitions between all pairs of states
expected number of emissions for all states
Maximization step
Using current expected counts
Compute the transition and emission probabilities
The expectation step:
emission count
We need the expected number of times c is
emitted by state k

1
nk,c j fk (t)bk (t)
j j
f (T) {t |x j c}
x j N
t
sum over positions

xj : jth training sequences where c occurs in x

The expectation step:
transition count
Expected number of times of transitions
from k to l
The maximization step
Estimate new emission parameters by:

nk,c
ek (c)
nk,c'
c'
Estimate new transition parameters by
Just like in the simple case but typically well do

some smoothing (e.g. add pseudocounts)
The Baum-Welch algorithm
initialize the parameters of the HMM

iterate until convergence
initialize , with pseudocounts
E-step: for each
nk,c nkltraining set sequence j = 1n
calculate values for sequence j
calculate values for sequence j
fk (t)
add the contribution of sequence j to ,
M-step: update
bk (t) the HMM parameters using
, n n
k, c kl
nk,c nkl
Baum-Welch algorithm
example
Given
The HMM with the parameters initialized as
shown
Two training sequences TAG, ACG
1.0 A 0.4 0.2 A 0.1 0.9

begin C 0.1 C 0.4 end
G 0.1 G 0.4
0 T 0.4 T 0.1 3
1 2
0.8 0.1
well work through one iteration of Baum-Welch

Baum-Welch example (cont)
Determining the forward values for TAG
f0 (0) 1
f1 (1) e1 (T ) a01 f0 (0) 0.4 1 0.4
f1 (2) e1 ( A) a11 f1 (1) 0.4 0.8 0.4 0.128
f2 ( 2) e2 ( A) a12 f1 (1) 0.1 0.2 0.4 0.008
f2 (3) e2 (G) a12 f1 ( 2) a22 f2 (2)
0.4 (0.0008 0.0256) 0.01056
Heref3we a23 f2 (just
(3) compute 3) the
0.9 0.01056
values 0are
that .009504
needed for
computing successive values.
For example, no point in calculating f2(1)
In a similar way, we also compute forward values for
ACG
Determining the backward values for TAG
b2 (3) a23 0.9

b2 (2) a22 e2 (G) b2 (3) 0.1 0.4 0.9 0.036
b1 (2) a12 e2 (G) b2 (3) 0.2 0.4 0.9 0.072
b1 (1) a11 e1 ( A) b1 (2) a12 e2 ( A) b2 (2)
0.8 0.4 0.072 0.2 0.1 0.036 0.02376
b0 (0) a01 e1 (T) b1 (1) 1.0 0.4 0.02376 0.009504
Again, here we compute just the values that are
needed
In a similar way, we also compute backward values
for ACG
determining the expected emission counts for
state 1 contribution contribution
of TAG of ACG
f1 (2)b1 (2) f1 (1)b1 (1)

n1, A 1
f3 (3) f3 (3)
f1 (2)b1 (2)
n1,C 1
f3 (3)
n1,G 1
f1 (1)b1 (1)
n1,T 1
f3 (3)
*note that the forward/backward values in these two columns differ; in each column
they are computed for the sequence associated with the column
Determining the expected transition counts for
state 1 (not using pseudocounts)
Contribution of Contribution of ACG
TAG
f1 (1)a11e1 ( A)b1 (2) f1 (1)a11e1 (C)b1 (2)

n11
f3 (3) f3 (3)
f1 (1)a12 e2 ( A)b2 (2) f1 (2)a12 e2 (G)b2 (3) f1 (1)a12 e2 (C )b2 (2) f1 (2)a12 e2 (G)b2 (3)
n1 2
f3 (3) f3 (3)
In a similar way, we also determine the

Maximization step: determining probabilities for
state 1
n1, A
e1 ( A)
n1, A n1,C n1,G n1,T
n1,C
e1 (C )
n1, A n1,C n1,G n1,T

n11
a11
n11 n12
n1 2
a12
n11 n1 2
Computational complexity of
HMM algorithms
Given an HMM with S states and a sequence of
length L, the complexity of the Forward, Backward
and Viterbi algorithms
2 is
O( S L)
this assumes that the states are densely
interconnected
Given M sequences of length L, the complexity of
Baum-Welch on each iteration is
2
O( MS L)
Baum-Welch convergence
Some convergence criteria

likelihood of the training sequences changes little
fixed number of iterations reached
Usually converges in a small number of
iterations
Will converge to a local maximum (in the
likelihood of the data given the model)
log P(sequences | ) log P(x j | )
xj

Summary
Three problems in HMMs

Probability of an observed sequence
Forward algorithm
Most likely path for an observed sequence
Viterbi
Can be used for segmentation of observed sequence
Parameter estimation
Baum-Welch
The backward algorithm is used to compute a quantity
needed to estimate the posterior of a state given the entire
observed sequence

Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu

Uploaded by

Copyright:

Available Formats

Learning HMM parameters

Emission probabilities: Probability of

Estimate parameters by counting the number

Number of times c is emitted from k

estimate parameters through a procedure that

Also known as Forward-backward algorithm

We need to know the probability of the symbol at

We also need to know the probability of symbol

Given these we can compute our expected

First we compute the probability of the

Then our quantity of interest is computed

Obtained from the forward algorith

We need the forward and backward

Forward algorithm fk(t) Backward algorithm bk(t)

Using the forward and backward variables,

Note, the same

This is the probability of symbols at t and

Assume we are given J training instances

sum over positions

Estimate new emission parameters by:

Just like in the simple case but typically well do

initialize the parameters of the HMM

1.0 A 0.4 0.2 A 0.1 0.9

well work through one iteration of Baum-Welch

b2 (3) a23 0.9

f1 (2)b1 (2) f1 (1)b1 (1)

f1 (1)a11e1 ( A)b1 (2) f1 (1)a11e1 (C)b1 (2)

In a similar way, we also determine the

Some convergence criteria

Three problems in HMMs

You might also like