You are on page 1of 9

Hidden Markov Model For Gesture Recognition

Pi19404
December 23, 2012

Contents

Contents
Gesture Recognition using HMM
0.1 0.2 0.3 Gesture . . . . . . . . . . . . . . . Hidden Markov Model . . . . . . Gestures and HMM . . . . . . . . 0.3.1 Gestures Representation . 0.3.2 Normalizing Gestures . . 0.3.3 Discretization of Gestures References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
3 3 4 4 4 5 9

0.4

2|9

Gesture Recognition using HMM

Gesture Recognition using HMM
0.1 Gesture

A gesture is represented as a spatio-temporal sequence of feature vectors that describe the direction of hand movement. The hand gesture continuous time representation of feature vectors ,this is converted to one of codewords from predefined set using vector quantizer. we construct a codebook for each gesture and build a HMM model utilizing the spatio temporal information encoded by the codewords representing the gesture. A unique initial and final state are defined in the model. The number of states in the model are defined by the complexity of the gesture. Higher number of states represent the gesture better but at the cost of performance,
0.2 Hidden Markov Model

A hidden Markov model is a collection of finite states connected by transitions. Each state is characterized by two sets of probabilities: a transition probability, and either a discrete output probability distribution A Hidden Markov models is defined using : Q - sequence of hidden states O - sequence of observed states S - input symbol set V - observation symbol set Qo - initial state of system A - Transition matrix B - Emission matrix For each class we create a hidden Markov model.

3|9

Gesture Recognition using HMM
After observing the sequence of observations we can determine which HMM is more likely to generate the sequence. The HMM outputs a likelihood measure for given sequence of observed sequence . The observed sequence is assigned to the class which produces higher likelihood of producing the sequence. Since any random input will produces some likelihood of belonging to one class or another we need to determine some threshold below which sequence does not belong to either class. This threshold is determined by taking average likelihood observed by processing observed sequence from training or validation data using the HMM of specific class. Thus if likelihood is below this average likelihood we infer that non of the HMM models are likely to produces the given observation.

0.3 Gestures and HMM
0.3.1 Gestures Representation

The task of gesture recognition is given a gesture we need to decide if it belongs to known class of gesture.Thus a mathematical representation of gestures is required and a means to compare two gestures is required. Consider the case of 2D gesture. A point of 2D gesture is represented by co-ordinates (x,y) The gesture can be represented as spatio-temporal sequence of 2D points. The first step is to record the gesture . Thus we record some examples of following two type of gesture.
0.3.2 Normalizing Gestures

Gestures representation is required to be independent of scale and position in 2D space it is performed. Thus we need to normalize the gesture . All the gesture need to be scaled to the same size and aligned with each other and represented using same number of points First scaling is performed followed by translation and then re-sampling To scale the points calculate the minimum and maximum values in each dimension and apply linear scaling of points. To center about the origin the centroid of the points are calculated and all the points are translated wrt to the centroid.

4|9

Gesture Recognition using HMM

Below are plots after normalization steps ,the gestures are normalized

(a) gesture 1

(b) gesture 2
Figure 1: Normalized gesture plots

The next step is to resample so that all gestures are represented by same number of points. Simplest re-sampling strategy is of uniform re-sampling. below are example gestures after re-sampling

(a) gesture 1

(b) gesture 2
Figure 2: Normalized gesture plots

0.3.3 Discretization of Gestures

We require a mathematical representation of gesture.Consider the ideal gesture than is required to be identified. Each discrete pixel value can be represented as a symbol and thus gesture can be represented as sequence of symbols however the number of symbols in this case will be 1600 for each pixel representation. And since gesture will not occupy all the pixels the representation will be typically sparse and have high redundancy Calculations for such a large representation will be huge and not suitable for real time application. Since gesture is represented by 30 points we required only 30 symbols to represent a particular gestures Thus it is desirable to represent the gesture reduced set of symbols. Even 30 symbols may be too large to achieve real time performance ,if we reduces the gesture set

5|9

Gesture Recognition using HMM
further we are compromising accuracy for performance. One simple method to do this is to perform clustering fix the number of desired symbols and perform clustering on the input data.Thus each pixel will be represented by a cluster and thus reduce the representation size. First mean is calculated over all the samples at each point of the re-sampled data. Thus if gesture is represented using 30 points we calculate average co-ordinates of these 30 points over all the examples of the gesture. Then clustering is performed over the mean.The centroids of clusters formed will be define new points used to represent the gesture. Instead of 30 points the gesture is represented using reduced number of states . Once centroids are determined all the points of gesture are assigned to their closest centroid.

(a) gesture 1

(b) gesture 2
Figure 3: Result after Clustering

Each centroid can be though of a observed state . Each data point is represebted centroid/symbols.We have discretized the continous data into discrete once. Thus gesture is defined as sequence of the observed states of the gesture we have estimated from the training data Each input data point is then associated with a observed state after the discretization process and gesture is presented as sequence of observed states. During the training process the model parameters are determined which maximize the observed sequence. The parameters to be decided are number of input states,number of output states. Based on the number of output state the gestures are discretized as sequence of ob-

6|9

Gesture Recognition using HMM
served states. This is used in the training process to determine the parameters of HMM. Once the training parameters are determined calculate the average likelihood of observed sequence on the training/validation set.This will be the threshold of given HMM. For the given two set of gestures : 1. M (8) -number of observed states 2. N (4) -number of input states Thus we train two HMM : 0.0000 0.0000    0.0000 0.6251 0.3749 0 0    0 0.3728 0.6272 0   B1 = 0.0000 A1 =    0 0.0000 0 0.8556 0.1444  0.6017 0 0 0 1.0000  0.3983 0.0000 0.0000 0.0000    0.0000 0.1000 0.9000 0 0    0 0.7568 0.2432 0   B2 = 1.0000 A2 =    0 0.0000 0 0.7501 0.2499  0.0000 0 0 0 1.0000  0.0000 0.0000   0.0000 0.0000 0.0000 0.0000 0.0077 0.9923 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0270 0.0000 0.9730 0.0000 0.0000 0.0000 0.0000 0.5866 0.4134 0.0000 0.0000 0.0000 0.0000 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000 0.9997  0.2525 0.1163  0.2359   0.0000  0.1429  0.0000  0.0000 0.2525  0.2193 0.2246  0.1818   0.0053  0.2032  0.1604  0.0053 0.0000

P1 = P2 = 1.0000 0.0000 0.0000 0.0000 we compute the likelihood of validation/training data. we determine the average thresholds for each gesture as twice the average likelihood For present data we obtain the likelihood as : TA = −79.4310 and TB = −60.5704. Now given test gesture likelihood are computed against the HMM For the first model likelihood of 4 test data are : −42.296347, −142.296347, −30.364364, −70.747603 This third gesture is incorrectly identified as being generated by the model.

7|9

Gesture Recognition using HMM

(a) gesture 1

(b) gesture 1

(c) gesture 1

(d) gesture 1
Figure 4: Test Data

Different number of samples,input states and observed symbols were tried outcome and reasonable parameters of model. The final configuration used was : The number of samples of 120 were chosen for representing the gesture. Number if input and output symbols were chosen to be 20. The thresholds in this case were −405.235169and − 389.913413 the values of the 4 gesture for model 1 were −212.277444, − In f , −373.451343, −325.134281 the values of the 4 gestures for model 2 were −560.252356, −210.214212, −717.249978, −439.386705 In this simple model also though respective gestures are identified correctly some incorrect gesture keep on occuring inspite of using a threshold based approach to eliminate the gestures with lower likelihood. The gesure that were incorrectly identified they form part of the gesture model 1. A template based approach gives better performance for such gestures. Need to check if on how to improve the system to that likelyhood . This aspect need to be improved is HMM are to be used in real time system where a lot of invalid gesture would occur and the model must be capable of eliminating such gestures.

8|9

Gesture Recognition using HMM
0.4 References

1. http://cs229.stanford.edu/section/cs229-hmm.pdf

2. http://www.creativedistraction.com/demos/gesture-recognition-kinect-with-hidden-marko

9|9