Professional Documents
Culture Documents
• Graphical Model
• Circles indicate states
• Arrows indicate probabilistic dependencies
between states
What is an HMM?
K K K K K
• {S, K, Π, Α, Β}
• S : {s1…sN } are the values for the hidden states
• K : {k1…kM } are the values for the observations
HMM Formalism
S A S A S A S A S
B B B
K K K K K
• {S, K, Π, Α, Β}
• Π = {πι} are the initial state probabilities
• A = {aij} are the state transition probabilities
• B = {bik} are the observation state probabilities
Inference in an HMM
o1 ot-1 ot ot+1 oT
O = (o1...oT ), µ = ( A, B, Π )
Compute P(O | µ )
Decoding
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
T −1
P (O | µ ) = ∑π
{ x1 ... xT }
b
x1 x1o1 Πa
t =1
b
xt xt +1 xt +1ot +1
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
α j (t + 1)
= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
α j (t + 1)
= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
α j (t + 1)
= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
α j (t + 1)
= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )
= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )
= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )
= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )
= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )
= ∑α (t )a b
i =1... N
i ij jot +1
Backward Procedure
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
β i (T + 1) = 1
β i (t ) = P (ot ...oT | xt = i ) Probability of the rest
of the states given the
β i (t ) = ∑a b
j =1... N
ij iot β j (t + 1) first state
Decoding Solution
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
N
P(O | µ ) = ∑ α i (T ) Forward Procedure
i =1
N
P(O | µ ) = ∑ π i β i (1) Backward Procedure
i =1
N
P(O | µ ) = ∑ α i (t )β i (t ) Combination
i =1
Best State Sequence
o1 ot-1 ot ot+1 oT
• Viterbi algorithm
• arg max P( X | O)
X
Viterbi Algorithm
x1 xt-1 j
o1 ot-1 ot ot+1 oT
o1 ot-1 ot ot+1 oT
δ j (t + 1) = max δ i (t )aij b jo t +1
i Recursive
Computation
ψ j (t + 1) = arg max δ i (t )aij b jo t +1
i
Viterbi Algorithm
x1 xt-1 xt xt+1 xT
o1 ot-1 ot ot+1 oT
B B B B B
o1 ot-1 ot ot+1 oT
B B B B B
o1 ot-1 ot ot+1 oT
α i (t )aij b jo β j (t + 1)
pt (i, j ) = t +1
Probability of
∑α m (t ) β m (t )
m =1... N
traversing an arc
γ i (t ) = ∑ p (i, j )
j =1... N
t
Probability of
being in state i
Parameter Estimation
A A A A
B B B B B
o1 ot-1 ot ot+1 oT
πˆ i = γ i (1)
∑
T
p (i, j ) Now we can
= t =1 t
aˆij
∑ γ (t )
T compute the new
t =1 i estimates of the
bˆik =
∑ γ (i )
{t :ot = k } t
model parameters.
∑ γ (t )
T
t =1 i
HMM Applications
B B B B B
o1 ot-1 ot ot+1 oT