You are on page 1of 31

'

HMM Tutorial/ c Tapas Kanungo{1

Hidden Markov Models

Tapas Kanungo
Center for Automation Research
University of Maryland
Web: www.cfar.umd.edu/~kanungo
Email: kanungo@cfar.umd.edu

&

'

HMM Tutorial/ c Tapas Kanungo{2

Outline

1. Markov models
2. Hidden Markov models
3. Forward/Backward algorithm
4. Viterbi algorithm
5. Baum-Welch estimation algorithm

&

'

HMM Tutorial/ c Tapas Kanungo{3

Markov Models

 Observable states:
1; 2; : : : ; N

 Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t

 First order Markov assumption:


P (q = j jq ,1 = i; q ,2 = k; : : :) = P (q = j jq ,1 = i)
t

 Stationarity:
P (q = j jq ,1 = i) = P (q + = j jq + ,1 = i)
t

&

'

HMM Tutorial/ c Tapas Kanungo{4

Markov Models

 State transition
matrix A :
2
66 a11 a12    a1
66
66
66 a21 a22    a2
66
66 .
..    ..
66 .
A = 666
66 a 1 a 2    a
66
66 .
..    ..
66 .
66
4a
1 a 2  a
where

ij

Nj

a = P (q = j jq ,1 = i)
ij








a1
a2
a
a

..

iN

..

NN

3
77
77
77
77
77
77
77
77
77
77
77
77
77
75

1  i; j;  N

 Constraints on a :
ij

a  0;
X
a = 1;
ij

&

=1

ij

8i; j
8i

'

HMM Tutorial/ c Tapas Kanungo{5

Markov Models: Example

 States:

1. Rainy (R)
2. Cloudy (C )
3. Sunny (S )
 State transition probability matrix:
2
66 0:4
66
A = 6666 0:2
66
4

3
0:3 777
77
0:2 777
77
5

0:3
0:6
0:1 0:1 0:8

 Compute the probability of


observing SSRRSCS given that today is S .

&

'

HMM Tutorial/ c Tapas Kanungo{6

Markov Models: Example

Basic conditional probability rule:


P (A; B ) = P (AjB )P (B )

The Markov chain rule:


P (q1; q2; : : : ; q )
= P (q jq1; q2 ; : : : ; q ,1)P (q1 ; q2; : : : ; q ,1)
= P (q jq ,1)P (q1 ; q2; : : : ; q ,1)
= P (q jq ,1)P (q ,1jq ,2)P (q1 ; q2; : : : ; q ,2)
= P (q jq ,1)P (q ,1jq ,2)    P (q2 jq1 )P (q1 )
T

&

'

HMM Tutorial/ c Tapas Kanungo{7

Markov Models: Example

 Observation sequence O :
O = (S; S; S; R; R; S; C; S )

 Using the chain rule we get:


P (Ojmodel)
= P (S; S; S; R; R; S; C; S jmodel)
= P (S )P (S jS )P (S jS )P (RjS )P (RjR) 
P (S jR)P (C jS )P (S jC )
= 3a33 a33 a31a11 a13 a32 a23
= (1)(0:8)2 (0:1)(0:4)(0:3)(0:1)(0:2)
= 1:536  10,4

 The prior probability  = P (q1 = i)


i

&

'

HMM Tutorial/ c Tapas Kanungo{8

Markov Models: Example

 What is the probability that the sequence remains in state i for exactly d time units?
p (d) = P (q1 = i; q2 = i; : : : ; q = i; q +1 6= i; : : :)
=  (a ) ,1 (1 , a )
i

ii

ii

 Exponential Markov chain duration density.


 What is the expected value of the duration d in
state i?
d =
i

1
X

=1
1
X

=1

dp (d)
i

d(a ) ,1(1 , a )
d

ii

= (1 , a )

ii

1
X

ii

=1

d(a ) ,1
d

ii

1
@ X
@a 0=1(a ) 1
@
a A
= (1 , a ) @
@a 1 , a

= (1 , a )

ii

ii

ii d

ii

ii

&

1,a

ii

ii

ii

'

HMM Tutorial/ c Tapas Kanungo{9

Markov Models: Example

 Avg. number of consecutive sunny days =


1

1 , a33

1 , 0:8

=5

 Avg. number of consecutive cloudy days = 2.5


 Avg. number of consecutive rainy days = 1.67

&

'

HMM Tutorial/ c Tapas Kanungo{10

Hidden Markov Models

 States are not observable


 Observations are probabilistic functions of state
 State transitions are still probabilistic

&

'

HMM Tutorial/ c Tapas Kanungo{11

Urn and Ball Model

 N urns containing colored balls


 M distinct colors of balls
 Each urn has a (possibly) di erent distribution
of colors
 Sequence generation algorithm:

1. Pick initial urn according to some random


process.
2. Randomly pick a ball from the urn and then
replace it
3. Select another urn according a random selection process associated with the urn
4. Repeat steps 2 and 3

&

'

HMM Tutorial/ c Tapas Kanungo{12

The Trellis

STATES

4
3
2
1
1

t-1

t+1

t+2

T-1

TIME
o1

o2

o t-1

ot

o t+1

o t+2

o T-1

oT

OBSERVATION

&

'

HMM Tutorial/ c Tapas Kanungo{13

Elements of Hidden Markov


Models

 N { the number of hidden states


 Q { set of states Q = f1; 2; : : : ; N g
 M { the number of symbols
 V { set of symbols V = f1; 2; : : : ; M g
 A { the state-transition probability matrix.
a = P (q +1 = j jq = i) 1  i; j;  N
ij

 B { Observation probability distribution:


B (k) = P (o = kjq = j ) 1  k  M
j

  { the initial state distribution:


 = P (q1 = i) 1  i  N
i

  { the entire model  = (A; B; )

&

'

HMM Tutorial/ c Tapas Kanungo{14

Three Basic Problems

1. Given observation O = (o1; o2; : : : ; o ) and model


 = (A; B; ); eciently compute P (Oj):
T

 Hidden states complicate the evaluation


 Given two models 1 and 2, this can be used
to choose the better one.

2. Given observation O = (o1; o2; : : : ; o ) and model 


nd the optimal state sequence q = (q1; q2; : : : ; q ):
T

 Optimality criterion has to be decided (e.g.


maximum likelihood)
 \Explanation" for the data.

3. Given O = (o1; o2; : : : ; o ); estimate model parameters  = (A; B; ) that maximize P (Oj):
T

&

'

HMM Tutorial/ c Tapas Kanungo{15

Solution to Problem 1

 Problem: Compute P (o1; o2; : : : ; o j)


T

 Algorithm:

{ Let q = (q1; q2; : : : ; q ) be a state sequence.


{ Assume the observations are independent:
T

P (Ojq; ) =

P (o jq ; )
= b (o1 )b (o2)    b T (o )
T

=1

q1

q2

{ Probability of a particular state sequence is:


P (qj) =  a a    a T ,
q1

q1 q2

q2 q3

1 qT

{ Also, P (O; qj) = P (Ojq; )P (qj)


{ Enumerate paths and sum probabilities:
P (Oj) = X P (Ojq; )P (qj)
q

 N state sequences and O(T ) calculations.


Complexity: O(TN ) calculations.
T

&

'

HMM Tutorial/ c Tapas Kanungo{16

Forward Procedure: Intuition

STATES

aNk
a3k

2
a1K

t+1

t
TIME

&

'

HMM Tutorial/ c Tapas Kanungo{17

Forward Algorithm

 De ne forward variable (i) as:


t

(i) = P (o1; o2; : : : ; o ; q = ij)


t

 (i) is the probability of observing the partial


sequence (o1; o2 : : : ; o ) such that the state q is i.
t

 Induction:

1. Initialization: 1(i) =  b (o1)


2. Induction:
i

2
3
X
+1(j ) = 4 (i)a 5 b
N

=1

ij

(o +1)
t

3. Termination:
P (Oj) =

X
N

=1

(i)
T

 Complexity: O(N 2T ):

&

'

HMM Tutorial/ c Tapas Kanungo{18

Example

Consider the following coin-tossing experiment:


State 1 State 2 State 3
P(H) 0.5
0.75 0.25
P(T) 0.5
0.25 0.75
{ state-transition probabilities equal to 1/3
{ initial state probabilities equal to 1/3
1. You observe O = (H; H; H; H; T; H; T; T; T; T ): What
state sequence, q; is most likely? What is the
joint probability, P (O; qj); of the observation sequence and the state sequence?
2. What is the probability that the observation sequence came entirely of state 1?

&

'

HMM Tutorial/ c Tapas Kanungo{19

3. Consider the observation sequence


O~ = (H; T; T; H; T; H; H; T; T; H ):

How would your answers to parts 1 and 2 change?


4. If the state transition probabilities were:
2
66 0:9
66
A0 = 6666 0:05
66
4

3
0:45 777
77
0:45 777 ;
77
5

0:45
0:1
0:05 0:45 0:1

how would the new model 0 change your answers


to parts 1-3?

&

'

HMM Tutorial/ c Tapas Kanungo{20

Backward Algorithm

STATES

1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
aiN
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
000000
111111
0000000
1111111
000000
111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
000000
111111
i1111111
000000
111111
0000000
000000
111111
0000000
1111111
0000000
1111111
000000
111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
ai1
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111

4
3
2
1
1

t-1

t+1

t+2

T-1

TIME
o1

o2

o t-1

ot

o t+1

o t+2

o T-1

oT

OBSERVATION

&

'

HMM Tutorial/ c Tapas Kanungo{21

Backward Algorithm

 De ne backward variable (i) as:


t

(i) = P (o +1; o +2; : : : ; o jq = i; )


t

 (i) is the probability of observing the partial


sequence (o +1; o +2 : : : ; o ) such that the state q is
i.
t

 Induction:

1. Initialization: (i) = 1
2. Induction:
T

(i) =

X
N

&

=1

a b (o +1) +1(j );
1  i  N;
t = T , 1; : : : ; 1
ij

'

HMM Tutorial/ c Tapas Kanungo{22

Solution to Problem 2

 Choose the most likely path


 Find the path (q1; q ; : : : ; q ) that maximizes the
likelihood:
P (q1; q2; : : : ; q jO; )
t

 Solution by Dynamic Programming


 De ne:
 (i) = maxt, P (q1; q2; : : : ; q = i; o1; o2; : : : ; o j)
t

q1 ;q2 ;:::;q

  (i) is the highest prob. path ending in state i


t

 By induction we have:
 +1(j ) = max[ (i)a ]  b (o +1)
t

&

ij

'

HMM Tutorial/ c Tapas Kanungo{23

Viterbi Algorithm

STATES

1111111
0000000
0000000
1111111
0000000
1111111
aNk
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
k
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
a1k
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111

4
3
2
1
1

t-1

t+1

t+2

T-1

TIME
o1

o2

o t-1

ot

o t+1

o t+2

o T-1

oT

OBSERVATION

&

'

HMM Tutorial/ c Tapas Kanungo{24

Viterbi Algorithm

 Initialization:
1(i) =  b (o1); 1  i  N
1 (i) = 0
i

 Recursion:
 (j ) = 1max
[ (i)a ]b (o )
  ,1
(j ) = arg 1max
[ (i)a ]
  ,1
2  t  T; 1  j  N
t

ij

ij

 Termination:
P  = 1max
[ (i)]

q = arg 1max
[ (i)]

i

 Path (state sequence) backtracking:

&

q =
t


+1(q +1);

t = T , 1; T , 2; : : : ; 1

'

HMM Tutorial/ c Tapas Kanungo{25

Solution to Problem 3

 Estimate  = (A; B; ) to maximize P (Oj)


 No analytic method because of complexity { iterative solution.
 Baum-Welch Algorithm:

1. Let initial model be 0:


2. Compute new  based on 0 and observation
O:

3. If log P (Oj) , log P (Oj0) < DELTA stop.


4. Else set 0  and goto step 2.

&

'

HMM Tutorial/ c Tapas Kanungo{26

Baum-Welch: Preliminaries

 De ne (i; j ) as the probability of being in state


i at time t and in state j at time t + 1:
(i; j ) = (i)a Pb ((Oo j+1)) +1(j )
(i)a b (o +1) +1(j )
= P P
=1 =1 (i)a b (o +1) +1(j )
t

ij

N
i

ij

N
j

ij

 De ne (i) as probability of being in state i at


time t; given the observation sequence.
t

(i) =

X
N

=1

 (i; j )
t

 P =1 (i) is the expected number of times state i


is visited.
T
t

 P =1,1  (i; j ) is the expected number of transitions


from state i to state j:
T
t

&

'

HMM Tutorial/ c Tapas Kanungo{27

Baum-Welch: Update Rules

  = expected frequency in state i at time (t = 1)


= 1(i):
i

 a = (expected number of transition from state i


to state j )/ (expected nubmer of transitions from
state i):
P
ij

a = P (i;(ij) )
t

ij

 b (k) = (expected number of times in state j and


observing symbol k) / (expected number of times
in state j :
P
j

b (k) =
j

&

(j )
(j )

t =k

t;o

'

HMM Tutorial/ c Tapas Kanungo{28

Properties

 Covariance of the estimated parameters


 Convergence rates

&

'

HMM Tutorial/ c Tapas Kanungo{29

Types of HMM

 Continuous density
 Ergodic
 State duration

&

'

HMM Tutorial/ c Tapas Kanungo{30

Implementation Issues

 Scaling
 Initial parameters
 Multiple observation

&

'

HMM Tutorial/ c Tapas Kanungo{31

Comparison of HMMs

 What is a natural distance function?


 If (1; 2) is large, does it mean that the models
are really di erent?

&

You might also like