HMM Tutorial

'
HMM Tutorial/ c Tapas Kanungo{1
Hidden Markov Models
Tapas Kanungo
Center for Automation Research
University of Maryland
Web: www.cfar.umd.edu/~kanungo
Email: kanungo@cfar.umd.edu
&
'
Outline
1. Markov models
2. Hidden Markov models
3. Forward/Backward algorithm
4. Viterbi algorithm
5. Baum-Welch estimation algorithm
&
'
Markov Models
Observable states:
1; 2; : : : ; N
Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t
First order Markov assumption:

P (q = j jq ,1 = i; q ,2 = k; : : :) = P (q = j jq ,1 = i)
t
Stationarity:
P (q = j jq ,1 = i) = P (q + = j jq + ,1 = i)
t
&
'
Markov Models
State transition
matrix A :
2
66 a11 a12 a1
66
66
66 a21 a22 a2
66
66 .
.. ..
66 .
A = 666
66 a 1 a 2 a
66
66 .
.. ..
66 .
66
4a
1 a 2 a
where
ij
Nj
a = P (q = j jq ,1 = i)
ij

a1
a2
a
a
..
iN
..
NN
3
77
77
77
77
77
77
77
77
77
77
77
77
77
75
1 i; j; N
Constraints on a :
ij
a 0;
X
a = 1;
ij
&
=1
ij
8i; j
8i
'
Markov Models: Example
States:
1. Rainy (R)
2. Cloudy (C )
3. Sunny (S )
State transition probability matrix:
2
66 0:4
66
A = 6666 0:2
66
4
3
0:3 777
77
0:2 777
77
5
0:3
0:6
0:1 0:1 0:8
Compute the probability of

observing SSRRSCS given that today is S .
&
'
Basic conditional probability rule:

P (A; B ) = P (AjB )P (B )
The Markov chain rule:

P (q1; q2; : : : ; q )
= P (q jq1; q2 ; : : : ; q ,1)P (q1 ; q2; : : : ; q ,1)
= P (q jq ,1)P (q1 ; q2; : : : ; q ,1)
= P (q jq ,1)P (q ,1jq ,2)P (q1 ; q2; : : : ; q ,2)
= P (q jq ,1)P (q ,1jq ,2) P (q2 jq1 )P (q1 )
T
&
'
Observation sequence O :
O = (S; S; S; R; R; S; C; S )
Using the chain rule we get:

P (Ojmodel)
= P (S; S; S; R; R; S; C; S jmodel)
= P (S )P (S jS )P (S jS )P (RjS )P (RjR)
P (S jR)P (C jS )P (S jC )
= 3a33 a33 a31a11 a13 a32 a23
= (1)(0:8)2 (0:1)(0:4)(0:3)(0:1)(0:2)
= 1:536 10,4
The prior probability = P (q1 = i)

i
&
'
What is the probability that the sequence remains in state i for exactly d time units?
p (d) = P (q1 = i; q2 = i; : : : ; q = i; q +1 6= i; : : :)
= (a ) ,1 (1 , a )
i
ii
ii
Exponential Markov chain duration density.

What is the expected value of the duration d in
state i?
d =
i
1
X
=1
1
X
=1
dp (d)
i
d(a ) ,1(1 , a )
d
ii
= (1 , a )
ii
1
X
ii
=1
d(a ) ,1
d
ii
1
@ X
@a 0=1(a ) 1
@
a A
= (1 , a ) @
@a 1 , a
= (1 , a )
ii
ii
ii d
ii
ii
&
1,a
ii
ii
ii
'
Avg. number of consecutive sunny days =

1
1 , a33
1 , 0:8
=5
Avg. number of consecutive cloudy days = 2.5

Avg. number of consecutive rainy days = 1.67
&
'
Hidden Markov Models
States are not observable

Observations are probabilistic functions of state
State transitions are still probabilistic
&
'
Urn and Ball Model
N urns containing colored balls

M distinct colors of balls
Each urn has a (possibly) dierent distribution
of colors
Sequence generation algorithm:
1. Pick initial urn according to some random

process.
2. Randomly pick a ball from the urn and then
replace it
3. Select another urn according a random selection process associated with the urn
4. Repeat steps 2 and 3
&
'
The Trellis
STATES
4
3
2
1
1
t-1
t+1
t+2
T-1
TIME
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
Elements of Hidden Markov

Models
N { the number of hidden states

Q { set of states Q = f1; 2; : : : ; N g
M { the number of symbols
V { set of symbols V = f1; 2; : : : ; M g
A { the state-transition probability matrix.
a = P (q +1 = j jq = i) 1 i; j; N
ij
B { Observation probability distribution:

B (k) = P (o = kjq = j ) 1 k M
j
{ the initial state distribution:

= P (q1 = i) 1 i N
i
{ the entire model = (A; B; )
&
'
Three Basic Problems
1. Given observation O = (o1; o2; : : : ; o ) and model

= (A; B; ); eciently compute P (Oj):
T
Hidden states complicate the evaluation

Given two models 1 and 2, this can be used
to choose the better one.
2. Given observation O = (o1; o2; : : : ; o ) and model

nd the optimal state sequence q = (q1; q2; : : : ; q ):
T
Optimality criterion has to be decided (e.g.

maximum likelihood)
\Explanation" for the data.
3. Given O = (o1; o2; : : : ; o ); estimate model parameters = (A; B; ) that maximize P (Oj):
T
&
'
Solution to Problem 1
Problem: Compute P (o1; o2; : : : ; o j)

T
Algorithm:
{ Let q = (q1; q2; : : : ; q ) be a state sequence.

{ Assume the observations are independent:
T
P (Ojq; ) =
P (o jq ; )
= b (o1 )b (o2) b T (o )
T
=1
q1
q2
{ Probability of a particular state sequence is:

P (qj) = a a a T ,
q1
q1 q2
q2 q3
1 qT
{ Also, P (O; qj) = P (Ojq; )P (qj)

{ Enumerate paths and sum probabilities:
P (Oj) = X P (Ojq; )P (qj)
q
N state sequences and O(T ) calculations.

Complexity: O(TN ) calculations.
T
&
'
Forward Procedure: Intuition
STATES
aNk
a3k
2
a1K
t+1
t
TIME
&
'
Forward Algorithm
Dene forward variable (i) as:

t
(i) = P (o1; o2; : : : ; o ; q = ij)

t
(i) is the probability of observing the partial

sequence (o1; o2 : : : ; o ) such that the state q is i.
t
Induction:
1. Initialization: 1(i) = b (o1)

2. Induction:
i
2
3
X
+1(j ) = 4 (i)a 5 b
N
=1
ij
(o +1)
t
3. Termination:
P (Oj) =
X
N
=1
(i)
T
Complexity: O(N 2T ):
&
'
Example
Consider the following coin-tossing experiment:

State 1 State 2 State 3
P(H) 0.5
0.75 0.25
P(T) 0.5
0.25 0.75
{ state-transition probabilities equal to 1/3
{ initial state probabilities equal to 1/3
1. You observe O = (H; H; H; H; T; H; T; T; T; T ): What
state sequence, q; is most likely? What is the
joint probability, P (O; qj); of the observation sequence and the state sequence?
2. What is the probability that the observation sequence came entirely of state 1?
&
'
3. Consider the observation sequence

O~ = (H; T; T; H; T; H; H; T; T; H ):
How would your answers to parts 1 and 2 change?

4. If the state transition probabilities were:
2
66 0:9
66
A0 = 6666 0:05
66
4
3
0:45 777
77
0:45 777 ;
77
5
0:45
0:1
0:05 0:45 0:1
how would the new model 0 change your answers

to parts 1-3?
&
'
Backward Algorithm
STATES
1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
aiN
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
i1111111
000000
111111
0000000
000000
111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
ai1
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
4
3
2
1
1
t-1
t+1
t+2
T-1
TIME
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
Backward Algorithm
Dene backward variable (i) as:

t
(i) = P (o +1; o +2; : : : ; o jq = i; )

t
(i) is the probability of observing the partial

sequence (o +1; o +2 : : : ; o ) such that the state q is
i.
t
Induction:
1. Initialization: (i) = 1
2. Induction:
T
(i) =
X
N
&
=1
a b (o +1) +1(j );
1 i N;
t = T , 1; : : : ; 1
ij
'
Choose the most likely path

Find the path (q1; q ; : : : ; q ) that maximizes the
likelihood:
P (q1; q2; : : : ; q jO; )
t
Solution by Dynamic Programming

Dene:
(i) = maxt, P (q1; q2; : : : ; q = i; o1; o2; : : : ; o j)
t
q1 ;q2 ;:::;q
(i) is the highest prob. path ending in state i

t
By induction we have:
+1(j ) = max[ (i)a ] b (o +1)
t
&
ij
'
Viterbi Algorithm
STATES
1111111
0000000
0000000
1111111
0000000
1111111
aNk
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
k
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
a1k
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
4
3
2
1
1
t-1
t+1
t+2
T-1
TIME
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
Viterbi Algorithm
Initialization:
1(i) = b (o1); 1 i N
1 (i) = 0
i
Recursion:
(j ) = 1max
[ (i)a ]b (o )
,1
(j ) = arg 1max
[ (i)a ]
,1
2 t T; 1 j N
t
ij
ij
Termination:
P = 1max
[ (i)]

q = arg 1max
[ (i)]

i
Path (state sequence) backtracking:
&
q =
t

+1(q +1);
t = T , 1; T , 2; : : : ; 1
'
Estimate = (A; B; ) to maximize P (Oj)

No analytic method because of complexity { iterative solution.
Baum-Welch Algorithm:
1. Let initial model be 0:

2. Compute new based on 0 and observation
O:
3. If log P (Oj) , log P (Oj0) < DELTA stop.

4. Else set 0 and goto step 2.
&
'
Baum-Welch: Preliminaries
Dene (i; j ) as the probability of being in state

i at time t and in state j at time t + 1:
(i; j ) = (i)a Pb ((Oo j+1)) +1(j )
(i)a b (o +1) +1(j )
= P P
=1 =1 (i)a b (o +1) +1(j )
t
ij
N
i
ij
N
j
ij
Dene (i) as probability of being in state i at

time t; given the observation sequence.
t
(i) =
X
N
=1
(i; j )
t
P =1 (i) is the expected number of times state i

is visited.
T
t
P =1,1 (i; j ) is the expected number of transitions

from state i to state j:
T
t
&
'
Baum-Welch: Update Rules
= expected frequency in state i at time (t = 1)

= 1(i):
i
a = (expected number of transition from state i

to state j )/ (expected nubmer of transitions from
state i):
P
ij
a = P (i;(ij) )
t
ij
b (k) = (expected number of times in state j and

observing symbol k) / (expected number of times
in state j :
P
j
b (k) =
j
&
(j )
(j )
t =k
t;o
'
Properties
Covariance of the estimated parameters

Convergence rates
&
'
Types of HMM
Continuous density
Ergodic
State duration
&
'
Implementation Issues
Scaling
Initial parameters
Multiple observation
&
'
Comparison of HMMs
What is a natural distance function?

If (1; 2) is large, does it mean that the models
are really dierent?
&

HMM Tutorial

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HMM Tutorial

Uploaded by

Copyright:

Available Formats

'

HMM Tutorial/ c Tapas Kanungo{1

Hidden Markov Models

HMM Tutorial/ c Tapas Kanungo{2

HMM Tutorial/ c Tapas Kanungo{3

 First order Markov assumption:

HMM Tutorial/ c Tapas Kanungo{4

HMM Tutorial/ c Tapas Kanungo{5

Markov Models: Example

 Compute the probability of

HMM Tutorial/ c Tapas Kanungo{6

Markov Models: Example

Basic conditional probability rule:

The Markov chain rule:

HMM Tutorial/ c Tapas Kanungo{7

Markov Models: Example

 Using the chain rule we get:

 The prior probability  = P (q1 = i)

HMM Tutorial/ c Tapas Kanungo{8

Markov Models: Example

 Exponential Markov chain duration density.

HMM Tutorial/ c Tapas Kanungo{9

Markov Models: Example

 Avg. number of consecutive sunny days =

 Avg. number of consecutive cloudy days = 2.5

HMM Tutorial/ c Tapas Kanungo{10

Hidden Markov Models

 States are not observable

HMM Tutorial/ c Tapas Kanungo{11

Urn and Ball Model

 N urns containing colored balls

1. Pick initial urn according to some random

HMM Tutorial/ c Tapas Kanungo{12

HMM Tutorial/ c Tapas Kanungo{13

Elements of Hidden Markov

 N { the number of hidden states

 B { Observation probability distribution:

  { the initial state distribution:

  { the entire model  = (A; B; )

HMM Tutorial/ c Tapas Kanungo{14

Three Basic Problems

1. Given observation O = (o1; o2; : : : ; o ) and model

 Hidden states complicate the evaluation

2. Given observation O = (o1; o2; : : : ; o ) and model 

 Optimality criterion has to be decided (e.g.

HMM Tutorial/ c Tapas Kanungo{15

 Problem: Compute P (o1; o2; : : : ; o j)

{ Let q = (q1; q2; : : : ; q ) be a state sequence.

{ Probability of a particular state sequence is:

{ Also, P (O; qj) = P (Ojq; )P (qj)

 N state sequences and O(T ) calculations.

HMM Tutorial/ c Tapas Kanungo{16

Forward Procedure: Intuition

HMM Tutorial/ c Tapas Kanungo{17

 De ne forward variable (i) as:

(i) = P (o1; o2; : : : ; o ; q = ij)

 (i) is the probability of observing the partial

1. Initialization: 1(i) =  b (o1)

HMM Tutorial/ c Tapas Kanungo{18

Consider the following coin-tossing experiment:

HMM Tutorial/ c Tapas Kanungo{19

3. Consider the observation sequence

First order Markov assumption:

Compute the probability of

Using the chain rule we get:

The prior probability = P (q1 = i)

Exponential Markov chain duration density.

Avg. number of consecutive sunny days =

Avg. number of consecutive cloudy days = 2.5

States are not observable

N urns containing colored balls

N { the number of hidden states

B { Observation probability distribution:

{ the initial state distribution:

{ the entire model = (A; B; )

Hidden states complicate the evaluation

2. Given observation O = (o1; o2; : : : ; o ) and model

Optimality criterion has to be decided (e.g.

Problem: Compute P (o1; o2; : : : ; o j)

{ Also, P (O; qj) = P (Ojq; )P (qj)

N state sequences and O(T ) calculations.

Dene forward variable (i) as:

(i) = P (o1; o2; : : : ; o ; q = ij)

(i) is the probability of observing the partial

1. Initialization: 1(i) = b (o1)

how would the new model 0 change your answers

Dene backward variable (i) as:

(i) = P (o +1; o +2; : : : ; o jq = i; )

(i) is the probability of observing the partial

Choose the most likely path

Solution by Dynamic Programming

(i) is the highest prob. path ending in state i

Path (state sequence) backtracking:

Estimate = (A; B; ) to maximize P (Oj)

1. Let initial model be 0:

3. If log P (Oj) , log P (Oj0) < DELTA stop.

Dene (i; j ) as the probability of being in state

Dene (i) as probability of being in state i at

P =1 (i) is the expected number of times state i

P =1,1 (i; j ) is the expected number of transitions

= expected frequency in state i at time (t = 1)

a = (expected number of transition from state i

b (k) = (expected number of times in state j and

Covariance of the estimated parameters

What is a natural distance function?