You are on page 1of 8

Discrete Hidden Markov Models for Sequence Classification

Pi19404
February 24, 2014

Contents

Contents
Discrete Hidden Markov Models for Sequence Classication
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 0.2 Hidden Latent Variables . . . . . . . . . . . . . 0.2.1 Forward algorithm . . . . . . . . . . . . 0.2.2 Observation Sequence Likelyhood 0.2.3 Sequence Classification . . . . . . . . 0.2.4 Code . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4 7 8 8 8

2|8

Discrete Hidden Markov Models for Sequence Classification

Discrete Hidden Markov Models for Sequence Classification


0.1 Introduction
In this article we will look at hidden Markov models and its application in classification of discrete sequential data.

   

Markov processes are examples of stochastic processes that generate random sequences of outcomes or states according to certain probabilities. Markov chains can be considered mathematical descriptions of Markov models with a discrete set of states. In Markov chain the observed variables are states ,now we will consider the case that the observed symbol is not directly the states but some random variable related to the underlying state sequence. The true state of the system is latent and unobserved.

0.2 Hidden Latent Variables

 

Let Y denote a sequence of random variables taking the values in a euclidean space 0 A simplistic model is where each observation state is associated with a single hidden state,Each hidden state emits a unique observation symbol. In this case if we know exactly if a observed variable corresponds to a hidden state then the problem reduces to Markov chain. However it may happen that observed state corresponds to more than one hidden state with certain probabilities. For ex-

3|8

Discrete Hidden Markov Models for Sequence Classification ample

P (X = x jY P (X = x jY
0 0

y =y
=

: ) = 0:1

)=09

       

Thus if the hidden state x ,it is likely to emit a observed state Y = y with probability 0.9 and observed state y with probability 0.1
0 1 1

Thus we can consider than 90% of time if we observe the state y and 10% of times we will observe the y
1 2

The random variables Y are not independent nor do they represent samples from a Markov process. However given a realization Xi = xi the random variable Yi is independent of other random variables X and therefore other

Given that we know the hidden/latent state,the observation probabilities are independent.

P (Y Y jX = xi ) = P (Y jX = xi )P (Y jX = xi )
1 2 1 2

The sequence of observation/emission and corresponding probabilities that emission corresponds to hidden state is specified by a emission matrix. If N is the number of state and M is number of observations the emission matrix is NxM matrix. The model is called discrete hidden Markov model since the emission probabilities are discrete in nature,another class of hidden Markov models exist which are called continuous hidden Markov model wherein the emission probabilities are continuous and modeled using parametric distribution.

0.2.1 Forward algorithm

 

The probability of observed sequence given we are in a hidden state zn can be computed using forward algorithm In forward algorithm we compute the probability of observed sequence from first element of sequence to last element of sequence as opposed to backward algorithm where we compute the same starting from last sequence moving backward for 1st element of sequence.

4|8

Discrete Hidden Markov Models for Sequence Classification

     

The forward and backward algorithm can be used to compute the probability of observed sequence The idea is to estimate the underlying hidden states which are Markov chains. Since we have N hidden states we can compute NXL matrix containing the probabilities of observing the hidden state ,where L is length of sequence. The result is stored in the matrix Each column of matrix ie (j; i) provides the probabilities of observing the sequence at each increment of the sequence. The Forward algorithm computes a matrix of state probabilities which can be used to assess the probability of being in each of the states at any given time in the sequences. as j (j; len 1)

Let

(zn ) = p(x ; : : : ; xn ; zn ) (z ) = p(x ; zn ) = p(x jz )p(z ) (zn ) = p(x ; : : : ; xn ; zn ) (zn ) = p(x ; : : : ; xn ; xn ; zn ; zn )


1 1 1

(zn ) = (zn ) =

X z X p x ;:::;x ;z n n z X p x ;:::;x ;z
1 1 1 1 1

P (xn ; zn jx ; : : : ; xn ; zn
1 1

P (zn jzn ; x ; : : : ; xn ; zn ) Since xn is independent of all other states given zn Since zn is independent of all other zi given zn p(x ; : : : ; xn ; zn )P (xn jzn )P (zn jzn ) =
1 1 1

zn1

n1

n1 )P (xn zn ; x1 ; : : : ; xn1 ; zn1 )

zn1

(zn ) =

zn1

X z

( n1 )

P (xn jzn )P (zn jzn

1 2 3

/** * @brief forwardMatrix : method computes probability * compute p(x1 ... xn,zn)

5|8

Discrete Hidden Markov Models for Sequence Classification


4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

* using the forward algorithm * @param sequence : is input observation sequence */ void forwardMatrix(vector<int> &sequence) { int len=sequence.size(); for(int i=0;i<len;i++) { for(int j=0;j<_nstates;j++) { if(i==0) _alpha(j,i)=_emission(j,sequence[i])*_initial(0,j); else { float s=0; //average over pos for(int k=0;k<_nstates;k++) s=s+_transition(k,j)*_alpha(k,i-1); _alpha(j,i)=_emission(j,sequence[i-1])*s; } } //stores the confidence and normalizing factor float scale=0; for(int j=0;j<_nstates;j++) { scale=scale+_alpha(j,i); } //normalizing factor is set to 1 for initial value scale=1.f/scale; if(i==0) _scale(0,i)=1; else _scale(0,i)=scale; //normalize the probability values of hidden //states for(int j=0;j<_nstates;j++) { _alpha(j,i)=scale*_alpha(j,i); } }

6|8

Discrete Hidden Markov Models for Sequence Classification


48 49

0.2.2 Observation Sequence Likelyhood

    

This if we are given a random sequence of observations x Starting from After which

; : : : ; xn

(z ) computer (zn )
1 1

P (X = x ; : : : ; xn ) =

Pz

(zn ) can be computed.

This gives us a estimate of probability of observing the sequence x ; : : : ; xn using the forward algorithm.
1

The probability estimate is often computed as log probability


n X Y prob zn z i X log X z z logprob
= ( )
=1

P (x ; : : : ; x n ) =
1

X z
zn
(

( n)

n ( n ))

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

/** * @brief likelihood : method to compute the log * likelihood of observed sequence * @param sequence :input observation sequence * @return */ float likely hood(vector<int> sequence) { float prob=0; //computing the probability of observed sequence //using forward algorithm forwardMatrix(sequence); backwardMatrix(sequence); //computing t}he log probability of observed sequence for(int i=0;i<sequence.size()+1;i++) { //for(int j=0;j<_nstates;j++)

7|8

Discrete Hidden Markov Models for Sequence Classification


19 20 21 22 23 24 25 26 27 28

{ }

prob=prob+std::log(_scale(0,i));

} } return -prob;

0.2.3 Sequence Classification

  

Again for sequence classification assume we have two hidden markov models  = ( ; trans ; emission ) and  = ( ; trans ; emission
1 ! 1 2 1 2 2

Given a observation sequence X = x ; : : : ; xn we compute the probability of observing the sequence or probability that models have generate the observation sequence.
1

The observation sequence is estimated to be produced by the model which exhibits the highest probability to have generated the sequence.

y = argmaxN n prob(X jn )


=1

0.2.4 Code

code for discrete hidden Markov models can be found at git repository https://github.com/pi19404/OpenVision in files ImgML/hmm.hpp and ImgML/hmm.cpp.

8|8